Why is deep learning for con-tech so hard?
Construction technology, or con-tech, is an industry ripe for disruption through the use of artificial intelligence (AI) and machine learning (ML). However, the application of deep learning algorithms to con-tech problems presents significant challenges that make it a difficult task. In this article, we will explore some of the reasons why deep learning for con-tech is so hard.
Numerous of classes
One of the main challenges in applying deep learning in con-tech is the enormous number of classes that need to be classified. In order to report progress in construction work, we need to recognize classes that are characteristic of each stage of construction on different types of construction sites in different environments. Additionally, we are involved in different types of construction sites, from solar farms and road construction to railway and pipeline construction.
Classes are hard to distinguish
Another challenge arises from the broad representation of classes that varies according to the construction region, due to the significant differences in materials and technologies used. Moreover, materials can appear differently coloured in different parts of the globe. What's more, the classes we distinguish are very similar to each other. It takes an excellent eye to distinguish gravel from sand, cement or asphalt without any mistakes. The mentioned gravel can take on various colours, from white, through red, to black, and the granularity of this material is often lower than the resolution of the photos we receive. In addition, the orthophotomap may be overexposed and the photos blurred. Our goal is to protect ourselves against all of these eventualities and to teach the models to recognize classes that even human beings have trouble with.
No clear boundaries between classes
The problem we face when labelling data is to set a clear boundary between overlapping classes. We are talking about a soft border between, for example, sand and cement. The problem, in this case, is that usually, both classes will get a small percentage of data that is identical for both classes. However, thanks to this, the model will know how to behave in questionable places. A similar problem occurs in the case of marking tree crowns or tufts of grass on the sand. All these examples expect from us a lot of reflection on what we actually expect from the model, how accurate it should be and how to show its data so as not to obscure its image by mixing classes, and at the same time maintain a certain coarseness so as not to waste too much time on marking.
Recommended by LinkedIn
Many different objects within a single class
In order to optimize the time spent on data labelling, we have generic classes in our dataset. These are classes containing groups of objects, which we do not mark in detail yet, but we mark the place of their occurrence, and we want the model to be able to indicate analogous places as well. One such class is 'storage area'. It is characterized by a large variety of objects that are included in this class. This class is marked very roughly, which is undoubtedly a difficulty for the model. However, after providing a wide range of examples of this class, the model is able to find examples of its occurrence with high efficiency.
“Ghost” class
Another interesting example we are struggling with is ghost objects. These are objects that are visible as transparent, which results from the orthophotomap creation process. The problem affects objects that are in motion during the raid. Creating an orthophotomap can be based on averaging the values of overlapping areas of photos. If the displacement occurred between the moment of taking these photos, then on the orthophotomap we will see a semi-transparent object, often appearing in several copies. While in the case of labelling it is a problem that we can solve by, for example, adding a "ghost" mask, the problem arises when we want to count the number of given objects in an area. In another technology of creating an orthophotomap, an analogous problem of an object in motion will be manifested by cut objects, for example, a car visible in half, and the other half will be invisible.
Our approach
All the situations described above are characteristic of the con-tech industry and thus of us. Each of the situations described above has a very large representation of the data in our dataset. As a company, we gave ourselves the task of training our models to deal with all the above-mentioned problems. Achieving this requires us to provide our models with as much data as possible. Accordingly, a team of labellers works tirelessly to annotate the massive datasets we have at our disposal. At AIC, we have developed a comprehensive manual to help maintain uniformity in data labelling. So far, we have marked over 100 km^2 square orthophotomaps, and we are not stopping there. Recently, we have been using a tool using the SAM algorithm created by Facebook, which significantly accelerated our work.
We have seen a significant rise of triple nine classes over time with our approach, and even daily as the data lake grows.
Commited by Adam Wisniewski & Jakub Łukaszewicz
AI Tech Lead
1yIt's hard indeed, but also very rewarding to later see the results of the trained model! 😎