Masters of the External Environment - Seven Habits of Highly Effective Data Enabled, AI Powered Business

Masters of the External Environment - Seven Habits of Highly Effective Data Enabled, AI Powered Business

Traditional Data Management has focused on curating and governing the Data within your business.  If you weren’t already using lots of external data before GenAI came along, it should now be clear that there is more useful data outside your business to help you develop products, services and serve your customers, than you can ever hope to harvest from your own systems.

 The vast amount of Data that can help your business is not held within your business and that is why being a Masters of the External Environment, is critical for any Intelligent Enterprise. 

 

Big Data

 I first quoted an IBM statement that ‘90% of the World’s Data had been created in the last 2 years’ in 2003, and it seems that rule or law has stood true for at least the last 20 years!

  • The real advent of ‘BigData’ was the capability and in fact the need to start to collect and leverage data from outside the organisation including Video, Audio, social media etc.  That data can be used to augment an understanding of Customers and in addition central to understanding our external environment, competitors, and suppliers. 
  • Back in the early 00s we had two ‘Data’ communities the structured folks who created Databases and DataWarehouses, to help companies ‘mine’ insights on Customers, Supply Chains and Financials, and the Unstructured folks, who typically called themselves Information Management professionals who often came from a Library heritage of aiming to electronic store Paper based documents (Forms, Books, Photographs etc).  Internet Search was largely born out of the Information community, as search was the primary to query complex Electronic Document and Records Management systems as they were often known.

In the mid-00s at Capgemini (my team) and elsewhere others started working on what we called ‘mash-ups’ connecting structured and unstructured data for example adding reporting insights to real time Maps, and subsequently the term ‘Big Data’ emerged.

  • From this we generated the many V’s of Big Data.  Doug Laney of Gartner had first explored this in the late 90s and at one point had 11 Vs, whilst at the dawn of mainstream Big Data in the early 2010s the focus according to McKinsey was three, Volume (there is a huge amount), Variety (many formats Video, Audio, social media) and Velocity (you need very fast response times).
  • IMHO this is where Big Data went badly wrong, and everyone focused on it as a technical discipline.  We saw the emergence of the first Data Scientists who were uber Techies with PhDs in Physics and Maths – phenomenally intelligent, some devasting skills, but sometimes as practical as a ‘chocolate teapot’…

 

The advent of Generative AI, and Large Language Models, has helped to bring the importance of this to life.  These models such as Open.AI ChatGPT and Google Gemini are typically trained on a ‘large’ sub-set of the available ‘big data’ on the internet and can help you and your business with a broad variety of problems, providing you have experts in asking the right questions (today known as Prompt Engineering).  Of course, if every company has access to the same LLMs like ChatGPT 4o and Claude Sonnet is this just upping the ante of the ‘Big Data’ arms race?


Strong Signals and Weak Signals

 

In the world of statistics, we discuss Correlation and Causation. 

We like to think that we can build Causal models that give us certainty that if we do X and Y, we are guaranteed to get our target outcome Z.  whilst this can be true insomuch as if I do my 40 hours per week and deliver what my boss wants, I will get paid my salary each month.  The reality is most of what we deal in is Correlations, if we do X & Y there is a 60-70% likelihood that we will get our desired outcome Z. 

  • Even in my simplistic example above we probably have at least a 5% chance that whatever we do, our company may stumble and has to make me redundant, so I don’t get paid…

 

In my language in driving insights, your organisation needs to learn to listen and learn from ‘strong’ signals and ‘weak’ signals.

 

  • Strong Signals come from insights and models built from your structured Data Warehouse, Data Lake (Lakehouse) and the sophisticated analytical techniques they can be exploited with.  These models give us the 50-95% probability that we are looking for and provide large amounts of data to train Machine Learning models which allow them to develop sophisticated correlations and sometimes actual causation that and will help you with immediate models we discussed in ‘Customer First’ – Churn, Propensity, Decisioning!  We use these repeatable models all the time and most of the time they serve our business well!
  • Weak Signals come from the data in the external environment.  If our best models give us 80-90% confidence, this is the analysis of what is going on outside, that is likely to contribute to the 10-20% when the models fail.  These have been previously known as the ‘Black Swan’ events because you can’t predict them.  However, whilst you can’t be highly confident, you can identify the potential black swans and help your business create scenarios for what to do if one happens. 

 

The most obvious ones of late would be the full implementation of Brexit, the Russia/Ukraine conflict, and the COVID-19 Pandemic.  These would, if correctly interpreted have told us in March 2020 and March 2022 that are highly curated Machine Learning Models would rapidly start to deliver us horribly inaccurate results.

  • I use the story from my Telecoms career, as lockdowns were implemented, and I recovered from my own first dose of COVID I watched as ‘Our ML models broke down…’  The team worked wonders to keep the business running, but we had to rely on manual forecasts and guesstimates as models trained on history no longer had a valid history to work with and started to produce spurious results.  The fortunate factor here is ‘human beings’ who recognised that our Propensity and Churn models were producing incongruous results and we focused on updating them whilst keeping the business running.
  • Likewise, looking for signals in the external environment would tell you that the implementation of Brexit would indicate that Supply Chains were going to be badly damaged
  • and yet most Retailers, Automotive manufacturers.  The UK had its most successful Car Manufacturing in 2016 and by 2023 is running at the worst rate of production since the 1950s….

A lesson to self – I bought a new house in January 2022.  Without the advent of Ukraine, my analysis showed with Brexit and other factors the economic environment would be far worse than the Banks predicted.  I spent weeks telling the Bank that Interest Rates were going up, and my Bank manager spent weeks telling me that they would never go above 2%.  Unfortunately, the Bank wouldn’t take my ‘Weak Signals’ over their ‘Strong Signals’ and circumstances meant I was not able to change banks, so neither of us ended up with the best result!

Those companies that live in highly competitive markets have already had to understand this, they use data and analytics to understand what competitors and new entrants are doing and aim to be at least one step ahead. As markets and environments become more and more volatile, all businesses need to be more agile and data is central to sensing, identifying, and responding to potential external challenges as, and when, if not before they happen!  Those Weak Signals can become quite strong, quite quickly!!

Being a Master of the External Environment means trawling, crawling, and harnessing what you can find about your business and customers from the public internet and elsewhere.  Like Internet Search this is not a perfect science, but the better you are at it, the more you can identify those ‘weak signals’ and use them to either augment or challenge the ‘strong signals’ you’re getting from you carefully curated internal data sources.

  • There are a variety of tools that can be leveraged, including specialist tools that focus specifically on social media, and can help you to try and identify from that Social Data patterns that means that someone on X/Instagram/TikTok is your customer – of course that’s not a perfect science either!

 

Building on this for your Data and AI strategy

This brings us back to Data Governance/Data Quality – which we will come back to in future posts.  The structured data folks will continue (today) to recite the rote language that we cannot get deliver good insights from data without high quality, curated data. 

  • And yet Google Search, and latterly ChatGPT can crawl the entire internet and give you some pretty good answers to complex questions. 
  • They may not be perfect, and have flaws, but if you understand the downside risk you can get to answers far faster than waiting for your perfect manicured data to be delivered!

 

You need your Data and AI leader and their teams spending as much time scanning and analysing the external environment for Weak Signals as working on those Strong Signals.

  • Performance Marketing (again discussed in Customer First Habit) is increasingly focused on leveraged these capabilities – after all the biggest value is how effectively you can manage the spend and results you get with Google and Facebook, whose entire business is social media.  Looking at social media to find out what Customers and the Market are saying about your company and its products is an increasingly important capability.
  • Trawling and crawling market and financials data to understand movements in the markets and potential impacts on your competitors, suppliers, and customers.

 

Conclusion

 

Crawling the external environment puts you in the same world as economists and econometricians.  The best thing said about those people is that ‘whatever they forecast it will be wrong…’  However, Governments, Banks and others rely on their tools and techniques, on the basis that any forecast is better than no forecast.

  • A critical skill of any Data and AI leader is therefore to arbitrate these strong and weak signals and detect the ‘signal in the noise’. 
  • As ever be a Servant Leader and work with colleagues to inform and challenge them with the insights you develop.  Be ready with a scenario that can support the Board if the signals start to look ugly, but equally don’t be a doom monger! 

 

Malcolm Hawker

CDO | Author | Conference Speaker | CDO Matters Podcast Host | Thought Leader

3w

Could not agree more. In the inherently probabilistic world of AI, the idea that perfect data is needed to drive value from AI is a myth.

Jon Cooke

Composable Enterprises :Data Product Pyramid, AI, Agents & Data Object Graphs | Data Product Workshop podcast co-host

1mo

great post Eddie - for me seeing data as a feature of driving out valuable business ie reframe the mental model, totally in with your post. as you know we have done numerous AI solutions on data that wouldn’t pass the mustard in traditional data gov/mgt practices. being able quickly and easily use the data you’ve got along with (again quickly and easily) investigate new data (eg external) to test and quickly drive out new valuable opportunities is key. Is the data good enough? if not what can we do? Get more data and/or pair with more heuristic methods etc. Ai is one tool in the box and can be paired with others approaches but the bus focus is the key

I'd say that about 99% of the world's misinformation has probably been created in the past few years.

Dan French

CEO at Consider Solutions

1mo

Great point, which is why RAG on internal trusted data is THE key component of GenAI. Relying on the genralised LLM training data only is dangerous (to say the least!). Dirty external data is always subjective, but if we focus on ensuring veracity and relevance of trusted data, we have a chance business deicisons being based on some fact/evidence . . . 😀

Eddie Short!!! I agree!!! "History repeats itself." Who among us hasn't said this... or thought this. Your external environment provides signs that will help you make choices to decide your history. Data & Analytics and AI Leaders need to be aware of how the exterrnal environment... that is also producing external data... that you can almost count on impacting the Intelligent Enterprises. "You're only as strong as your weakest link." Who among us hasn't said this or thought this? Don't just follow the external environment. Understand the environment... challenge it... socialize it with others in your circle of connections and colleagues and even clients. Be ever vigilant and sensitive to external environments... AND the data it is producing.

To view or add a comment, sign in

More articles by Eddie Short

Insights from the community

Others also viewed

Explore topics