Data vs. BIM

Data vs. BIM

Problem statement

How do we make Data and BIM play nice? How do we deliver data-led BIM, or if you prefer, BIM-led data?

This is not some academic question, it is a real challenge to organisations because the two disciplines have drifted apart. But BIM and data were meant to be together:

  • Not all data is BIM, not all BIM is data.
  • But you can't BIM without data.
  • You probably shouldn’t build things without BIM, or data.

Despite their co-dependency, the organisational structure of infrastructure owners and construction projects usually artificially splits these two disciplines. The variations are many, but broadly speaking:

  • Data usually gets lumped in with IT and BI functions, though in practice it’s everywhere.
  • BIM gets split between the engineering part of client organisations and supply chain, with varying degrees of integration.
  • Various other parties- who may wish to make use of BIM and data- risk being left in the dark.

Perhaps in the future we will have Digital Construction departments, multi-disciplinary teams formed of a mix of specialists: engineers, BIM specialists, data ninjas, experts in quality, assurance, HSEQ, project controls. For the time being however, such unions are built through consensus rather than organisational hierarchy. It's time that we bridged the divide, because wherever BIM goes next, it's probably gonna need a shedload of data.

It’s like we're at that point of the romcom where the two protagonists are slowly drifting towards a dramatic 3rd act reunion. How do they get there? By learning to change, to compromise, and remembering what they loved about each other in the first place!

The value of BIM

As a result of the artificial divide between data and BIM, the two professions may only be passingly aware of each other. This can lead to conflict. No IT department likes being presented with a fait accompli, no BIM team likes being forced to justify their need for tooling over and over again. And as soon as you start to dig into the concept of a Common Data Environment, the dependencies and costs can mount pretty quickly. This means that early doors construction projects and infrastructure owners will need to have compelling answers to questions such as:

  • Where are we likely to derive the greatest value from BIM and related digital construction technologies?
  • Where do we start? What is the ‘minimum viable product’?

There is always room here for honesty and truth-telling. It is only sensible to approach the hype and marketing material associated with Digital Construction with appropriate scepticism. There is enormous value to be derived from investment in this area, but only if we make those investments in a circumspect and value-focused manner. Too often we chase shiny tooling, and forget that shiny tooling only works on a solid foundation of people, process, data, and IT infrastructure. 

Many organisations will already have a BIM strategy or similar. The data function should seek to complement rather than replace or contradict the BIM Strategy. The BIM Strategy will likely bring a wealth of technical detail and considered ways of working. We will probably sacrifice a lot of the detail for the sake of establishing a clear, honest, and pragmatic narrative that can readily be translated into slideware to inform Board-level conversations. We will also focus on the models, architecture, and infrastructure that Data & Digital need to put in place, rather than how these tools will be used by our colleagues elsewhere in the business. Ultimately, we want the benefits of Digital Construction to be defined and visible across our organisation, and to avoid self-licking ice cream cones.

As described later in this paper, the relevant BIM standard (ISO19650) sets out value propositions and describes the roles, artefacts, and processes required to realise that value proposition. It also advocates for the creation of a Common Data Environment and Federated Information Model, which alongside information management and data standards will be the primary means whereby the data function can contribute to a project’s use of BIM. What 19650 doesn't do is provide much tangible guidance in terms of how a CDE or BIM environment should be built or used (the non-functional requirements if you will). In many ways 19650 is a generic information management approach applied to physical assets, all of the specifics are left to the client organisation, supply chain, and the 'Construction Tech' industry. As such, we should look to 19650 for high-level guidance, but we still need to reckon with how to apply BIM in a manner that meets the needs of the organisations for which we work.

Data and the built environment

In order to speak with meaning about the use of data concerning the built environment, we first must reckon with what is and is not unique about our domain.

Data is data, the ones and zeroes that comprise built environment data are no different to those used in any other domain. Nor are the volumes particularly profound. IoT-enabled infrastructure can quietly rack up billions of rows of data. BIM, point clouds, high-definition imagery, they all get pretty chunky. But rarely do we come close to the types of volumes seen in social media, streaming, e-commerce etc.

The problem that bedevils our domain is ensuring that the moderate amounts of data that we do hold actually reflects the real world, as well as the need to manage alphanumeric, document, and model data (see later diagram) in some sort of reasonably coordinated manner. Our data problems are often as much a question of definition as accuracy. I became aware of the importance of pedantry early in my career as I tried to help a large infrastructure owner figure out how many tunnels it owned/maintained. The answer was always somewhere between roughly five-hundred and seven hundred, a trivial amount of information. The specific answer was another matter entirely, and depended on meaningfully answering questions such as:

  • At what bore diameter does a culvert become a tunnel?
  • If I build a really wide bridge, cover it with soil, and run trains under it, have I actually built a tunnel?
  • Does a tunnel that has been sealed up and is no longer users still count as a tunnel?
  • How do we account for tunnels that we share with other infrastructure owners?

 At times it was possible to enjoy the absurdity of our task, was it even possible to meaningfully answer these questions. But then parts of tunnels and culverts started failing, in ways that could have hurt lots of people. And it became clear that the categorisation of these assets would determine how they were maintained, who was responsible for keeping them safe, and what kind of level of investment they would attract. The definitions were somewhat arbitrary, the act of definition was crucial.

However consequential the data that we hold may be, our means of modelling and storing it are built on tooling and standards that would not be unfamiliar in other sectors. Even the most domain-specific schemas- such as UniClass or CoBIE- sensibly leave the heavy-lifting of data management to established open formats such as XML, CSV, RDF, XLS, that would not be unfamiliar to a data professional joining from another domain. Similarly, many asset-centric products such as Asset Management Systems are user interfaces built on top of standard components like Oracle databases. Much of the use of data is sector-specific in topic rather than tooling. We still have the same CSV files, SQL databases, and Python scripts running on the same public clouds as every other sector, it’s only the definitions and attributes recorded by the data that are specific to the assets we manage. 

No alt text provided for this image

 The obvious exception is BIM technology, where a built environment-specific proprietary eco-system has evolved. And to understand BIM we first have to talk about ISO 19650. 

There’s no shade in the shadow of 19650

 19650 provides a useful intellectual framework for understanding the requirements and division of responsibilities associated with managing information before, during, and after construction project. It ties nicely back to ISO 9001, reading almost as a built-environment-specific elaboration of 9001's industry-agnostic principles. The standard centres around the artefacts (OIR, PIR, AIR, AIM, PIM, EIR), with the presumption being that if a project completes and maintains these artefacts then they will have sufficient information to deliver the works in a collaborative, safe, and efficient manner. The artefacts will also provide sufficient information to answer the questions that the organisation may have of the project.

Data appears as a concept in 19650, but usually in a subsidiary role to information (e.g., the project information will be recorded using various structured and unstructured data types). It acknowledges the role of a Common Data Environment to store the data but does not go into detail around how the CDE should be built or operate. It does, however, advocate for a 'container'-based approach to data, whereby information is segregated according to its originating organisation, type, sensitivity, and/or subject (e.g., asset hierarchy). This approach allows the CDE to manage many atomic parts of information through their lifecycle (changes of status), with clear obligations in terms of meta-data and ownership, and has the added advantage of splitting the volume of information associated with models into manageable chunks.

19650 is a technology-agnostic standard; it makes no attempt to specify the infrastructure that should be used to realise the CDE or artefacts. Indeed, as a non-specialist reading the standard it is often quite hard to visualise what form many of the artefacts should take, whether these are spreadsheets, or documents, or databases? What this means in practice is that the IT implementation of an ISO19650 compliant organisation is left open to interpretation, and some careful translation will be needed between BIM requirements, and IT system requirements.

To my mind, coming from the proudly ‘agile’ data fraternity, 19650 struggles in its tendency towards bureaucratic and contractual control. At times it labours under the presumption that it is indeed possible to articulate most-if-not-all of an organisation's information requirements at any given point in time, and then to persist that unity of requirements across organisational silos and boundaries.

Anyone who has worked with big data on megaprojects knows that this is an almost impossible goal, either in terms of defining the data needs or maintaining consensus. It is important that we bring an agile perspective to the admirable goals of 19650. We should aim to create the relevant artefacts, but in a 'wiki'-inspired manner that allows us to quickly and collaboratively update our requirements without requiring a substantial bureaucratic function to administer those changes. This may mean that we fall short of 19650 'compliance' in some areas, by virtue of making more pragmatic decisions in terms of how we allocate our limited resources.

Leaping gleefully into BIM 'dimensions'

I find it useful to distinguish between Building Information Modelling (BIM) and 3D visualisation. The former is a rigorous means of capturing information during the design and construction of built assets, the latter is a type of projection. The peers of BIM are other disciplines such as Asset Management, Cost Management, Project Controls, etc, the peers of 3D visualisation are other means of displaying information such as 2D projections, charts, sketches, GIS and the like. It just so happens that 3D visualisation is uniquely well suited to presenting the data collected by BIM processes. 

It is taken as given that any project adopting BIM will at a minimum capture a 3D representation of the assets in question, along with some basic level of attribution of those assets (preferably using a common standard such as UniClass or CoBIE). However, as both BIM and 3D visualisation have become more common in the built environment, opportunities have emerged to present additional layers of information using the same 3D visualisation interface. So, one may have a BIM model that captures and presents both the spatial geometry of the assets in question, as well as data points related to scheduling logic, cost, carbon, safety, and other dimensions. As these types of functionalities have emerged it has become more common to refer to multi-dimensional BIM, e.g., “4D, 5D, 6D, 7D, 8D BIM”. 

The pedants amongst us will point out that this is inherently a mixed metaphor, the use of the word dimension means very different things when it is used to refer to the three dimensions of space, versus additional layers of contextual information. Additional terminology, such as Digital Rehearsals and Digital Twins, only add to the confusion. At this point are we using the word 'dimension' to describe every possible source of information, and if so who's patented INFINITE-D BIM? However, the terminology has stuck despite this obvious flaw. 

As the illustrious and wise Henry Fenby-Taylor (currently Head of Information Management at the Centre for Digital Built Britain) put it: 

The ‘dimensions’ just define domains of functionality… between 2011 and 2015 everything was BIM, it began to suck up every piece of digital innovation in the sector… it all got rebranded, so the academics are left chasing definitions developed by consultants who aren’t computer scientists.

What all this lexical noise obscures is the fundamentally viable idea of combining information from the Project Controls (cost, time, risk), Quality, and Engineering/Design professions with 3D visualisation to better explain the progress and challenges of complex projects. N. The question that each project/client should seek to answer is not “how many dimensions of BIM do I need?” but rather “how do I use BIM and 3D visualisation to help me better predict, control, and measure the performance of my works?”

In general, I do not expect that our project will derive value from BIM and Digital Construction simply by accumulating ever more dimensions and data layers on our model. On the contrary, my intuition at this time is that much of the value will be derived from 'Digital Rehearsal' techniques built upon trusted sets of core data, principally geometry, time/schedule, and cost. As my colleague Andy Bishop writes:

"Digital Rehearsal allows us to not only prioritise and re-prioritise the logic within our schedules but also allows us to test and then complete a supporting VfM statement."

Building with data concepts

The beauty of a data-first approach to Digital Construction is that it allows us to step away from the specifics of software, and instead to focus on the logic of the questions being asked by their users. Whichever ‘dimensions’ we choose to represent in a 3D environment, we will need both:

  1. a fit-for-purpose data set to populate each dimension, and 
  2. fit-for-purpose logic (model) for how each dimension relates to the other dimensions. 

If we wish to add the concept of ‘time’ or ‘schedule’ to our BIM model to arrive at a ‘4D BIM model’ or a ‘Digital Rehearsal’, we need:

  1. a source of information on time (e.g. our project plan in Primavera), and 
  2. logic for how our time dimension affects our spatial model (e.g. tasks in the project plan are associated with zero, one, or more assets in the 3D model, and can add, remove, move, or change those assets over time in accordance with some defined physical rules). 

To think logically about data in the built environment- and to derive value from combining data with our spatial model- we must define the entities (or concepts) about which we can or should collect data. Our entities must be simple, logical, and mutually exclusive. We can then use our entities and the relationships between those entities to define the data that we wish to collect, and the analyses that we wish to conduct. We will then arrive back at the type of functionality proposed by multi-dimensional BIM, but with a clearer idea of: 

  • the value statement, 
  • the questions that we are trying to answer, and
  • the data architecture, engineering, and analysis required to get us there. 

Any sort of classification of concepts is going to be somewhat subjective and arbitrary. We can use existing top-level ontologies for inspiration, structure, and discipline. But in practice these rarely express concepts in a manner that is meaningful to business users. What we are trying to accomplish here is to arrive at a common language (conceptual model) that we can use when describing our data and which is sufficiently self-evident and meaningful to allow us to have conversations with the people who use that data daily (and their bosses). That said, where we are dealing with concepts that exist outside the context of our project or sector then we should seek to adhere to commonly used definitions.

When defining entities, simplicity and flexibility is key. So, we will seek to start with sufficiently universal concepts that we can use our common language to describe most data uses that our organisation will need. These concepts will be our top level ‘parent entities’, and we will then break those parent entities down into whatever child entities we are likely to need when describing the specifics of individual use cases and linking out entities to specific IT systems.

We might start with parent entities such as:

  1. Time: a temporal position on a timeline.
  2. Resource: an object with intrinsic value.
  3. Concept: an abstract shared idea.

Like primary colours, we can then use these three parent entities to create the first generation of child entities. The more colours and shades we produce, the more verisimilitude we can create in our work. 

Some useful child entities could include:

  • Asset: a physical resource with value, child of Resource.
  • Person: an instance of Homo Sapiens, child of Resource.
  • Duration: an interval between two points in time.
  • Activity: an action performed by people over a particular duration.
  • Cost: financial assets committed when people perform activities over time.
  • Role: a position held by one or more people that entitles them to perform an activity.
  • Construction: an activity, with a cost, performed by people with roles, which in turn creates new assets.
  • And so on, for as long as is necessary to describe our use cases without undue complexity.

As we define our child entities, it is important to also define the relationships between our entities. These combinations of two entities with a vector between them are known as triples and will be the building blocks of our data logic. 

Some examples might include:

  1. [Activity] [is completed by] [Person]
  2. [Cost] [is accrued by] [Organisation]
  3. [Cost] [is calculated using] [Time]
  4. [Construction] [is delivered over a] [Duration]

The use of triples allows us to define the logic of our data models which will in turn inform the data sets that we expect to use in our tooling. If we cannot define our logic or outcomes in terms of triples, then we have no assurance that we can implement that logic in a product or derive value from that logic. For example, if we wish to combine spatial data with schedule data (e.g., integrating Primavera with our BIM CDE), we should be able to define the logic of that integration using triples before we implement it in code. Triples have the added benefit of being both human-readable, and machine-readable (provided they are written carefully).

Once we have defined our entities and their relationships, we can start to catalogue the instances of those entities and relationships. For example, if we have defined entities for dataset, software, and role, we can in turn catalogue the instances of dataset (Finance, CRM, 3D Model), software (Oracle, SharePoint, Aconex), and role (BIM Manager, Project Manager, Information Controller). We can then in turn define the sensitivity of our software using the instances of relationships between dataset and software, and the rules of our role-based access control (RBAC) as instances of relationships between dataset, software, and role.

Modelling data logic in this way isn’t a purely academic exercise, it is work that allows us to begin to define the data that should be held in systems (for example a cost management system), but more importantly the data that must be common across systems (e.g., asset breakdown structure, roles, calendar, etc.). 

The data first approach is in keeping with ISO19650 Part 2, which states that project organisations should establish project information:

  • Requirements (what);
  • Delivery milestones (when);
  • Standards (what will be shared);
  • Production methods and procedures (how);
  • Reference information (validation rules); 
  • Common Data Environment (storage); and,
  • Protocol (governance).

19650 Part 2 provides a wealth of guidance that will form part of the functional and non-functional requirements for any CDE development, but, crucially, it does not offer guidance on what specific data sources should form the CDE or how these should be structured. For guidance in this area we will need to refer to existing top-level ontologies (such as BFO, ISO1526) and industry-standard data models (such as CoBIE, UniClass, or IFC). Whilst we should seek to ensure that we align to the principles of ISO19650, use a top-level ontology, and borrow eagerly from industry-standard data models, we cannot expect to remove the need to originate some organisation-specific logic to: 

  • Glue all of these borrowed components together, and 
  • Reflect the specific goals and idiosyncrasies of each organisation.

A meta-model for Digital Construction

At the start of this paper, I sought to answer the following questions:

  • Where are we likely to derive the greatest value from BIM and related digital construction technologies?
  • Where do we start? What is the ‘minimum viable product’?

To best answer these questions, we should put the previous sections into practice and define a model of the entities that drive decision-making during design and construction, and which we expect to store, manage, and analyse using our Digital Construction tooling. We can then prioritise investment in systems that utilise the entities associated with valuable use cases. Which is a complicated way of saying “if a lot of users care about X, then prioritise bringing X data into your model.”  Needless to say, you can’t do this without extensive and empathetic stakeholder engagement.

The diagram below provides an example of linking an organisation’s logical entities to data standards, systems, and ultimately use cases (or vice versa). It’s nice that the specifics of the system architecture is a function of the abstractions of data and user needs, not the other way around.

No alt text provided for this image

The diagram above is a simplification of the detailed logic required for clarity on the data required to meet each use case. Creating data models allows us to ensure that:

  1. We surface data to users in a manner that reflects the language and logic that they use in their work, rather than the sometimes arbitrary or generic terms used by our software.
  2. We understand the data requirements of each use case, and the data aggregation that is required to meet those use cases.
  3. We are able to specify the movement of information between systems in a manner that reflects a consistent business logic (e.g. avoiding conflicts between our different data standards and data sets).
  4. We are able to specify the movement of data between our organisation and our supplier organisations in a manner that ensures that the data we receive is fit-for-purpose.
  5. We understand the sensitivity and source of the data that we use so that we can apply appropriate access control and security measures.

Note that requirements 3 and 4 above are closely related to the approach to managing document information, which is a core part of delivering successful Digital Construction but also has wider application across the business for reporting, audit, quality management and assurance.

It is important that we create these models with the business, not just for the business. It is imperative that we test our models against the extensive heuristics / tacit knowledge / experience that our colleagues possess. And so, it is incumbent upon us to de-mystify data modelling and make it accessible to subject matter experts.

AS discussed earlier, one quick and flexible way to create a data model using triples is an Ontology. Ontologies are great because they take a similar form to ‘mind maps’ that most people are familiar with. Defining our data logic in terms of an ontology, knowledge graph, or conceptual data model allows us to create a common set of definitions. Ontologies tend to become very common very quickly, but there are useful tools that we can use to simplify how we visualise them, allowing us to review only relevant parts with our stakeholders. We should ensure that these definitions are arrived at through consultation with our stakeholders across the business and our supply chain and formalised in a set of human-readable and machine-readable standards.

Start with Data

To deliver a data-driven approach to Digital Construction within the organisations that we work for, we should seek to build common and accessible definitions that we share (e.g., a Common Data Model). We can define and build out our Common Data Models through the following steps (artefacts in bold): 

  1. High-level map of the data sets that we manage and how they are related to each other (e.g., our Ontology or Conceptual Data Model). This should be reviewed extensively with stakeholders across the business to ensure the model reflects their understanding of 'how things work'.
  2. Connect our Ontology to our Information Asset Register and Data Governance Framework to ensure that we are clear on who is responsible for each data set and our expectations of their responsibilities.
  3. Use our Ontology to inform the prioritised creation of Logical Data Models which describe the information contained within relevant data sets / source systems (e.g., schedule data in P6, cost data in cost management system, geometric data in BIM), and Reference Data Library and Interface Specifications which describe how data from different data sets can be integrated into a single source of truth within the CDE.
  4. We can then implement the aggregation (ELT) of our data into the CDE using the transformation logic defined in our Interface Specifications. This will in turn serve as the data source for Digital Construction (e.g., Digital Rehearsal, multi-dimensional BIM/planning).

By underpinning our investment in BIM with these foundational investments in data management and data modelling, we can ensure that our data-led BIM realises the best of both professions. Only through this commitment to creating common definitions and representations will we end the unnecessary divide between data and BIM.

No alt text provided for this image

 

Liam McGee

Making data fit for human use. Keeping it real (ontologically). Building tools for the future of infrastructure.

2y

Daniel Lenagan IEng MCIHT this is the article I was talking about in our walkthrough of the MVP ontology the other day... please pass along to Ken and Patrick!

Ilsa Kuiper

Consultant at MBB Group

2y

Suggest BIM is a subset of data (subject to how one defines BIM, and recognising data, like BIM, is also relative). Agree objective/value positions are likely to focus on what can be done with BIM and/or data.....but there is also scope to consider what service, function or outcome remains unrealised or unachievable due to the lack of data or implementation of BIM, whether in contrast to business as usual and/or intended future states. Like you note Ian, there is always more to the statistical or descriptive characterisation of infrastructure assets (i.e. the number/types of "tunnels", BIM dimensions) and data review administration (i.e. ISO19650)... if it is prefaced in terms of performance, accountability and risk (particularly for the entity or agency held responsible for such, the degree of self regulation, collective expectation/practice etc). Good luck with your deliberations... I found I end up contemplating whether there is more to the story about interpretative capabilities and data transactions to drive informational systems dynamics and outcomes (part of a PhD on BIM, data, infrastructure projects etc). How this could then be framed for commercial or institutional purposes, however, was left for future research.

Like
Reply
James Edwards

Digital Consultant at Costain

2y

Couldn't resist; Data Wars: The BIM Menace

Ryan Johnston

Spatial Digital Twins|XR Technology | Net Zero | Geographer | GIS Specialist | 🌍🥽🏙

2y

Great read ! But I always wonder why GIS gets considered as just a visualisation. In your two camps of DATA or BIM if they have a connection to the real world do you consider them non geographic ?

Like
Reply
David Bailey

Head of Digital Estates at Manchester University NHS Foundation Trust

2y

Really interesting. We are beginning the journey to understand how our data needs to be structured, to meet the demands of all of our teams. The myriad of different data sets we have needs this organised approach for us to prioritise its importance; eg accuracy of space data is vital for many metrics but I'm not too worried right now about how many taxis we book for patients etc. Nick Campbell-Voegt something to consider

To view or add a comment, sign in

More articles by Ian G.

Insights from the community

Others also viewed

Explore topics