Masters of the External Environment - Seven Habits of Highly Effective Data Enabled, AI Powered Business
Traditional Data Management has focused on curating and governing the Data within your business. If you weren’t already using lots of external data before GenAI came along, it should now be clear that there is more useful data outside your business to help you develop products, services and serve your customers, than you can ever hope to harvest from your own systems.
The vast amount of Data that can help your business is not held within your business and that is why being a Masters of the External Environment, is critical for any Intelligent Enterprise.
Big Data
I first quoted an IBM statement that ‘90% of the World’s Data had been created in the last 2 years’ in 2003, and it seems that rule or law has stood true for at least the last 20 years!
In the mid-00s at Capgemini (my team) and elsewhere others started working on what we called ‘mash-ups’ connecting structured and unstructured data for example adding reporting insights to real time Maps, and subsequently the term ‘Big Data’ emerged.
The advent of Generative AI, and Large Language Models, has helped to bring the importance of this to life. These models such as Open.AI ChatGPT and Google Gemini are typically trained on a ‘large’ sub-set of the available ‘big data’ on the internet and can help you and your business with a broad variety of problems, providing you have experts in asking the right questions (today known as Prompt Engineering). Of course, if every company has access to the same LLMs like ChatGPT 4o and Claude Sonnet is this just upping the ante of the ‘Big Data’ arms race?
Strong Signals and Weak Signals
In the world of statistics, we discuss Correlation and Causation.
We like to think that we can build Causal models that give us certainty that if we do X and Y, we are guaranteed to get our target outcome Z. whilst this can be true insomuch as if I do my 40 hours per week and deliver what my boss wants, I will get paid my salary each month. The reality is most of what we deal in is Correlations, if we do X & Y there is a 60-70% likelihood that we will get our desired outcome Z.
In my language in driving insights, your organisation needs to learn to listen and learn from ‘strong’ signals and ‘weak’ signals.
Recommended by LinkedIn
The most obvious ones of late would be the full implementation of Brexit, the Russia/Ukraine conflict, and the COVID-19 Pandemic. These would, if correctly interpreted have told us in March 2020 and March 2022 that are highly curated Machine Learning Models would rapidly start to deliver us horribly inaccurate results.
A lesson to self – I bought a new house in January 2022. Without the advent of Ukraine, my analysis showed with Brexit and other factors the economic environment would be far worse than the Banks predicted. I spent weeks telling the Bank that Interest Rates were going up, and my Bank manager spent weeks telling me that they would never go above 2%. Unfortunately, the Bank wouldn’t take my ‘Weak Signals’ over their ‘Strong Signals’ and circumstances meant I was not able to change banks, so neither of us ended up with the best result!
Those companies that live in highly competitive markets have already had to understand this, they use data and analytics to understand what competitors and new entrants are doing and aim to be at least one step ahead. As markets and environments become more and more volatile, all businesses need to be more agile and data is central to sensing, identifying, and responding to potential external challenges as, and when, if not before they happen! Those Weak Signals can become quite strong, quite quickly!!
Being a Master of the External Environment means trawling, crawling, and harnessing what you can find about your business and customers from the public internet and elsewhere. Like Internet Search this is not a perfect science, but the better you are at it, the more you can identify those ‘weak signals’ and use them to either augment or challenge the ‘strong signals’ you’re getting from you carefully curated internal data sources.
Building on this for your Data and AI strategy
This brings us back to Data Governance/Data Quality – which we will come back to in future posts. The structured data folks will continue (today) to recite the rote language that we cannot get deliver good insights from data without high quality, curated data.
You need your Data and AI leader and their teams spending as much time scanning and analysing the external environment for Weak Signals as working on those Strong Signals.
Conclusion
Crawling the external environment puts you in the same world as economists and econometricians. The best thing said about those people is that ‘whatever they forecast it will be wrong…’ However, Governments, Banks and others rely on their tools and techniques, on the basis that any forecast is better than no forecast.
CDO | Author | Conference Speaker | CDO Matters Podcast Host | Thought Leader
3wCould not agree more. In the inherently probabilistic world of AI, the idea that perfect data is needed to drive value from AI is a myth.
Composable Enterprises :Data Product Pyramid, AI, Agents & Data Object Graphs | Data Product Workshop podcast co-host
1mogreat post Eddie - for me seeing data as a feature of driving out valuable business ie reframe the mental model, totally in with your post. as you know we have done numerous AI solutions on data that wouldn’t pass the mustard in traditional data gov/mgt practices. being able quickly and easily use the data you’ve got along with (again quickly and easily) investigate new data (eg external) to test and quickly drive out new valuable opportunities is key. Is the data good enough? if not what can we do? Get more data and/or pair with more heuristic methods etc. Ai is one tool in the box and can be paired with others approaches but the bus focus is the key
I'd say that about 99% of the world's misinformation has probably been created in the past few years.
CEO at Consider Solutions
1moGreat point, which is why RAG on internal trusted data is THE key component of GenAI. Relying on the genralised LLM training data only is dangerous (to say the least!). Dirty external data is always subjective, but if we focus on ensuring veracity and relevance of trusted data, we have a chance business deicisons being based on some fact/evidence . . . 😀
Eddie Short!!! I agree!!! "History repeats itself." Who among us hasn't said this... or thought this. Your external environment provides signs that will help you make choices to decide your history. Data & Analytics and AI Leaders need to be aware of how the exterrnal environment... that is also producing external data... that you can almost count on impacting the Intelligent Enterprises. "You're only as strong as your weakest link." Who among us hasn't said this or thought this? Don't just follow the external environment. Understand the environment... challenge it... socialize it with others in your circle of connections and colleagues and even clients. Be ever vigilant and sensitive to external environments... AND the data it is producing.