Emergent behaviour: applying the AI paradigm shift to the built environment
"The next revolution will be the coalescence of programmable machines into systems beyond programmable control." - George Dyson, Analogia
Glossary
First, a quick, basic, and probably slightly incorrect glossary of terms:
Primer: AI, LLMs, and software 2.0
Like Moore's Law before it, the growth in AI capability is non-linear, driven both by changes in methodology (such as the use of neural network transformer models) and far greater availability of the raw materials of AI: training data and processing power. Recent developments in artificial intelligence have been nothing short of profound, and to the casual observer they have come out of nowhere. Even in the time between originally writing this paragraph, and coming back a few months to edit it, the frontier of what AI is capable of has shifted markedly.
AI models are self-reinforcing (to a point at least, there will be limitations in terms of both processing power and training data). It is as if on Monday your child learned to draw stick figures, on Tuesday they learned how to shade 3D bodies in various positions, and by Wednesday they were painting in the style of a renaissance master. Even if one understands this intellectually, the way that rapid improvement in AI manifests itself can still take you by surprise. We humans are not programmed to account for exponential progress, we have grown accustomed to our own fitful erratic means of learning, not complex systems where all feedback is immediately channelled into improvement. This makes it difficult for us to anticipate the changes that AI will bring and how we can use them in our work on the built environment. We know that increasingly powerful generations of AI will appear over the coming few years, but it is hard to imagine how those capabilities will manifest themselves when we have not even fully appreciated what we can accomplish with today’s tools such as GPT4. Our blindness makes it challenging to identify how AI can add value to the tangible problems that construction and built environment projects face, and what intractable problems we face now that will be trivially easy to solve tomorrow.
Whilst the capabilities of artificial intelligence evolve constantly, the underlying techniques are- broadly, and over the short term- constant. In the same way that today’s computers use the same logic gates and mathematical principles as 1950s machines (and indeed mathematical theories prior to the invention of the computer), just far faster; so many of the advances in AI that we have seen over the last few years have come from the scale of processing that is now possible, rather than a change in the underlying techniques (neural networks were, for example, first articulated in the 1940s). It is a massive over-simplification, but for non-specialists the history of AI is about two massive changes:
The major change in AI logic occurred in the late 1980s as researchers shifted from rules-based to statistical/Bayesian neural networks. However, it took until the availability of huge quantities of both computing power and training data (via the cloud and the internet respectively) for neural networks to realise the level of progress that we see today. At the forefront of neural network development (and hype) today are Large Language Models or LLMs which, as the name suggest, generate text in response to text inputs. The capability of these models is, broadly, a function of:
Over the past two years generative AI and LLMs have moved from being a niche experimental toy that one needed to apply to use, to something that is readily available for a small monthly charge. Increasingly LLMs are available as plug-ins to other applications, and will soon be integrated into much of the cloud architecture that built environment organisations use for the enterprise IT.
It was the development of a simple but ingenious method- transformers, first realised by Google in 2017- that allowed the development of large language models (LLMs) that can process large quantities of input text in parallel (rather than recursively). This meant that computation could be parallelised allowing AI firms to train models on enormous corpuses faster than was previously possible. These models included improved attention algorithms which meant that they could understand the relative contextual importance of input text, as well as clever means to refine the outputs to make them more amenable to human consumption. With models such as OpenAI’s GPT models, Google’s Bard, and Meta’s LLaMa now available to consumers and organisations alike, it is now possible to generate enormous quantities of quality text using simple natural language prompts. Whilst the user interface of these models (the ‘assistant’ component) makes it feel like these models are responding to your questions, they are in actual factor merely predicting the next word (token) in a series of tokens; in this case a conversation composed of user ‘prompts’ and model responses.
Having generated your text, you can also illustrate that text with tools such as DALL-E, Midjourney, or Stable Diffusion which combine LLMs to parse human prompts with a different kind of ‘generative’ neural networks such as diffusion models and Generative Adversarial Networks (GANs). Diffusion models essentially start with random static and quickly refine it using their neural network to iteratively sculpt images that align to the models understanding of what the concepts in the prompt look like. Combining LLMs, diffusion, and GANs models, it is now possible to ‘write’ and ‘illustrate’ entire novels merely with a few prompts. Soon the same will be true of videos as stable diffusion moves on from static images to entire videos.
But that’s just the tip of the AI-ceberg, there is something more foundational happening here. For LLMs to generate convincing text they require training on massive quantities of human-generated text. This training essentially involves the model challenging itself to guess the next word in billions of text samples, iteratively updating the weightings of its neurons to improve its accuracy. Training data includes high quality datasets such as Wikipedia which are ~80Gb in size, as well as broader web-scrapes of internet pages which can be upwards of a Tb in size. Whilst these volumes may seem relatively small, it is worth noting that this is only text data, not images or video which tend to have comparatively larger file sizes. For reference, the complete works of Shakespeare is about 5Mb or about one 20,000th of the size of Wikipedia or one 600,000th of the size of an internet crawl dataset used by an LLM. As well as using these libraries of written text, firms also train the same models on existing human-generated code. OpenAI, Microsoft, and Amazon have trained their transformer models on open-source software repositories such as GitHub (as well as code help message boards such as Stack Overflow) to offer services like GitHub Co-pilot X and Amazon Code Whisperer. One can prompt these tools to generate working code, as well as unit tests, and code explanations. They can also suggest changes and highlight potential errors to human programmers, and proactively identify security issues. This is a profound change. Since its inception in the 1940s with the work of Turing and Von Neumann computer programming has changed our world. For that entire time the creation of code has been a one-way street: humans create the prompts, computers execute. At least this is true at a superficial level; actually, computer programmes such as viruses and compilers have been writing code for decades, but always according to rules originally set out by humans. However Turing’s initial insight was that code and information (data) are essentially interchangeable, and now we stand on the cusp of the ultimate illustration of that argument: artificial intelligence (code, in other words) that can not only generate code, but can do so without instruction from human beings, using multi-billion parameter neural networks that are not explicable to human beings. This will result in code that works, but which humans may not necessarily be able to explain. This is the transition that Andrej Karpathy, a founding member of OpenAI, described as the move from “Software 1.0… explicit instructions to the computer written by a programmer” to “Software 2.0… written in much more abstract, human unfriendly language, such as the weights of a neural network.”
To AI or to automate?
The shift from human to AI code might help to remedy some of the built environment’s specific challenges:
The possibilities are exciting, but the built environment is rarely a priority for tech giants. So how should built environment organisations begin to realise these benefits? And how do we play a constructive role as subject matter experts? The forefront of artificial intelligence is largely limited to a few giant corporations, elite academic institutions, and well-funded Bay Area start-ups. We are all 'non-specialists' in this context, even if you have the word data in our job title. Working in resource-constrained built environment organisations, our approach to AI should be to benefit from the products produced by the tech giants, rather than expect to fully understand or rival them. Thankfully, it is increasingly possible to use open-source libraries published by big tech firms to run powerful models without needing to look under the hood. Consequently, it is more important that we understand the emerging trends in AI and data science rather than feeling the need to fettle with their source code.
In his brilliant presentation “How to recognize AI snake oil,” Arvind Narayanan distinguishes between three distinct types of problem practitioners have sought to apply AI to solving, and the differing levels of success that AI has had at cracking these problems:
Problem type #1: Perception - “Genuine rapid progress”
Progress: “AI is already at or beyond human accuracy” because “there is no uncertainty or ambiguity”. For example, “given two images of faces, there’s ground truth about whether or not they represent the same person.” Processing of unstructured data, such as archive documents and drawings, into machine-readable data.
Built environment example: Adding edge computing logic to asset sensors, for example detecting patterns the movement of asset components (girders, joints, etc.) that indicate that an asset is performing outside of tolerance. Using pattern detection to scan sites to identify safety issues, errors/snags, or improper site set-up, or usage patterns such as passenger flow.
Problem type #2: Automating judgement - “Imperfect but improving”
Progress: “Far from perfect, but improving”, tasks where “humans have some heuristic in our minds” and “AI will never be perfect at these tasks because they involve judgement and reasonable people can disagree about the correct decision.”
Built environment example: Identifying non-compliance in project data, which may indicate poor cost / schedule / risk management. Predicting failure or reduced performance of an asset based upon suitably accurate monitoring data.
Problem type #3: Predicting social outcomes - “Fundamentally dubious”
Progress: “Fundamentally dubious” where AI is applied to problems that “are hard because we can’t predict the future.”
Built environment example: Identifying whether a project is likely to be delivered on time or to budget, particularly at an early stage. Anything involving true complexity, where there is the introduction of noise through human actors or political uncertainty.
Recommended by LinkedIn
Arvind’s distinctions show us that we are most likely to see a return on investment where we automate perception rather than prediction. Stephen Wolfram makes a similar distinction in his GPT explainer, and it is worth quoting at length:
“There’s an ultimate trade-off between capability and trainability: the more you want a system to make “true use” of its computational capabilities, the more it’s going to show computational irreducibility, and the less it’s going to be trainable. And the more it’s fundamentally trainable, the less it’s going to be able to do sophisticated computation.”
What Wolfram is saying here goes to the heart of what LLMs are and are not. Fundamentally LLMs work by identifying patterns, not by executing mathematical logic. A LLM might give you the correct answer to a problem, not because it did the correct computation, but because of a pattern that it identified in its corpus which includes the solutions to the same or similar problems. In this sense LLMs are statistical rather than computational products. Wolfram’s distinction reveals a bias in how we humans conceive of the relative difficulty of problems, as he writes, “what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.”
We need to take these distinctions on-board if we want to identify what kind of problems neural networks can help the built environment to address. It is worth noting at this stage that neural networks are only one kind of AI, and there are many problems that the built environment faces that can be addressed with other techniques including:
For use cases where there is a clear process that we just need to be able to execute at speed then automation rather than AI is likely to be the answer. This might include a lot of the bulk tasks that are common in-built environment organisations such as processing invoices, progress reporting, reconciling schedules, and the like. Conversely for use cases that stray into ‘predicting social outcomes’ there is a lack of evidence for AI being able to address these types of problems. In an uncertain world, there are obvious risks associated with predicting the future based upon patterns that we have recognised from a historical data set. Anyone selling these types of solutions risks fundamentally misleading the organisations that they work for, or at least consuming a bunch of resources that would realise greater value applied to more prosaic use cases.
Categorising our problems
Arvind’s categories grant us a useful rule-of-thumb in prioritising our application of AI. In the data hype-cycle new and ground-breaking ideas often garner more attention than established and foundational techniques, However, working in the built environment you are unlikely to be employed by an organisation that has sufficient maturity or capability to necessarily make use of cutting edge techniques, nor is it likely to have fully exploited more rudimentary or established data science practice. In recent roles your authors have found that it is easier to build credibility by meeting more of an organisation’s immediate needs rather than going for moon-shot technologies. This might mean making data available for self-service analytics, providing means to easily catalogue and navigate large repositories of documents and images, or automating existing spreadsheet-based workflows. Whilst hardly the stuff of data science dreams, these kinds of use cases will show early value, and help familiarise business users with data science techniques that might otherwise seem esoteric and ineffable.
One of the big advantages of LLMs in this context is that they are easier to use in a way that complements existing ways of working than other AI technologies. Rather than moving users to a whole new set of tools for managing data, they can continue to use their spreadsheets and dashboards, whilst receiving guidance from an LLM (in the same way that many of us sense-check our assumptions using a search engine).
It's worth exploring the following series of questions to help triage our problem statements by identifying the right kind of technique demanded.
Many of our challenges in the built environment don’t really need an intelligent agent to solve at all. These might be areas where we benefit from automation rather than intelligence. Data quality is an example of this area. Built environment organisations manage sizable business critical databases and datasets, a few examples include: asset registers, document repositories, BIM CDEs, ERP systems, etc. Often these datasets are sufficiently large and inadequate quality that it is prohibitive to manually identify and fix errors. But it is possible to define data quality rules that in turn we can code as algorithms to automatically search through our datasets and identify errors. This is a 100% rule-based application that can realise massive benefits to an organisation without having to go anywhere near AI.
Where the construction industry has most often struggled is on the second question. We know the types of problems that we want to solve, we know that they are too complex to hard code, but we don’t always have sufficient high-quality training data to allow us to leverage AI to solve them. This might be a chicken and the egg situation, where we struggle to justify the cost of instrumentation and data collection required to fuel AI use cases, which in turn mean that we don’t have any proof that the data collection is worthwhile. Similarly, we may struggle to justify the resources required to curate datasets to the point where they are sufficiently accurate to act as training fodder. Again, this is another area where pre-trained core LLMs may offer advantages as they reduce the volume of high quality data that organisations must collate themselves to start generating useful insight. Examples of where AI could be applied include:
Each of these use cases will require access to a substantial amount of high-quality instrumentation and/or records and would require substantial calibration to align with the judgement of human experts.
Specialist built environment models.
It is no coincidence that amongst the first use cases of neural networks to gain publicity was identifying cats and dogs. The amount of training data for cat and dog recognition algorithms is unlimited because people are enamoured with their pets and willingly upload images of them to the internet without prompting. They are also happy to curate such images (for example to remove any pictures of other animals that might sneak in, or misidentified objects). The same is not true of sewage pipes, masonry walls, or plaster soffits. Can we as data leaders make the case for firstly. substantial improvements in the granularity and accuracy of data collection, particularly instrumentation of assets, and secondly, can we find cost-effective means of calibrating the outputs of AI tools with the industry’s extensive existing engineering judgement and? by meeting these two criteria can we ensure that our AI applications can operate at sufficient scale and with sufficient accuracy to effectively augment our limited human resources.
When contrasting widely available information (cats, dogs) with less freely available information (engineering data, asset-specific data) we are stumbling into the distinction between general knowledge and specialist expertise. This is where the training dataset for LLMs becomes pertinent. In his talk State of GPT, Karpathy reminds us that in their hunt for the next best token to continue any string of tokens, an LLM like GPT4 isn’t looking for the correct answer but rather the likely answer. This is because the training dataset (e.g., all of the internet) that the model is trained on will include both correct and incorrect answers to problems. Karpathy describes ways of correcting for the models’ indifference to truth, which are not unlike how we use technical experts within our own professions. These techniques include “prompt engineering” the model to use expertise (e.g., “imagine you are a leading expert in the field”), to show its work, and point it towards additional contextual information. This last point is going to be crucial in how built environment firms derive value from LLMs.
Whilst training these models on the internet will provide a fair amount of superficial information on engineering concepts, there is a lot of specialist engineering knowledge that is not available freely on the internet. As one pseudonymous writer noted, “The entire available quantity of data in highly specialized domains… is woefully tiny, compared to the gains that would be possible if much more such data were available”. By providing specialist engineering context to existing LLMs in the form of access to engineering standards, textbooks, and trade publications, we can train SME LLMs (what Karpathy calls ‘Retrieval-Augmented LLMs’). Like real SMEs, these SME LLMs will have both the advantages of a broad understanding of language and general facts, but also specific training and targeted prompts concerning the behaviour of physical assets in the real world. Such engineering-focused branches of existing LLMs are likely to prove the most fruitful in providing reliable and accurate decision-support to built environment professionals.
In his description of how GPT models work, renowned mathematician Stephen Wolfram frames the advantages and disadvantages of neural networks in general, and large language models specifically. In Wolfram’s words, “The tasks we’re typically trying to get neural nets to do are “human-like” ones—and neural nets can capture quite general “human-like processes”. One implication of this insight is that by virtue of an LLM’s human-friendly natural language interface, they may be well-placed to interact with human beings in a way that accelerates Nonaka’s knowledge creation cycle. The diagrams below (before and after) describe how we might use LLMs not only to better process our explicit knowledge, but also to ease the friction between human-embodied tacit knowledge and computer-embodied explicit knowledge through natural language internalisation and externalisation.
AI will succeed here not by replacing human engineers, but by allowing those engineers to process far greater quantities of information and sense check their decision. Many of us have spent our careers trying to collate and improve vast troves of quantitative data on the presumption that this is the first step towards generating insight. The human-like capabilities of LLMs demonstrate that there are also considerable gains to be made in harnessing qualitative data, and that we favour quantitative over qualitative at our peril.
As built environment organisations have belatedly realised the value of data to their operations, they have invested copious time and resources in centralising their data management. The putative advantages are clear:
It is possible, but far from assured, that LLMs provide us with an alternative to endless fights over who owns the platform. If we can use LLMs to consolidate and share insight, whilst leaving the data wherever it happens to reside, it might be possible to create knowledge across the organisational siloes that characterise the industry. LLMs also bring the added bonus of obfuscating the source or sources of any particular insight, removing some concerns that organisations may bring in terms of exposing their own data or their own failings. Perhaps the next step towards the long-awaited data commons for the built environment isn’t a massive database, or even decentralised data sharing, but actually a neural network.