Clean Data – Introduction
Cleaning or cleansing in data processing can mean many things. The goal is always the same: to make the data more reliable and usable.
There are many techniques for doing this.
Simply by removing duplicates objects makes of course quantity take-off and all the related task more reliable. No double quantities, no double cost in the estimations. With Simplebim’s dataflows you can remove duplicate objects from the model with ease.
It is very common that the models have redundant property information. The type identifier of the object might be found from tree different properties. Which one you should you use? Removing the extra data with dataflows makes data more reliable to use in the downstream.
One of the key tasks in any data processing is to clean up the data. In Simplebim you can trim the objects, remove unnecessary data and data structures, and even compress and optimize the models in various innovative ways.
The main topics of this article are: Reduce File Sizes, Trim and Remove Data, Anonymize Data and Fixing GUID.

Reduce File Sizes
BIM models can be very large and Simplebim can help you in different ways to reduce file sizes.
This is always a good idea. The data industry always talks about endlessly scalable cloud systems, but using data on a large scale quickly starts to build up. Moving and storing data becomes expensive and cumbersome. There are powerful viewers and data analysis tools out there, but they have their limits, too. If the models are too big, simply the IFC import or rendering of models in the downstream applications can become a problem. Or moving files and data over networks.

File sizes for models depend on two factors: how much actual information the model contains and how this information is structured in the file.
Use the ifcZIP Format
Simplebim lets you export IFC models using the ifcZIP format, which is really just a normal ZIP file with one file, the actual IFC file, in it. Using ifcZIP compresses the IFC file without any changes to it. Depending on the structure of the IFC file the compression ratio can be very significant. However, when the IFC file is opened in another application the memory footprint for that application is identical to opening the original IFC file.
Simplify Geometry
Extra details in the model geometry are useful for architectural visualization, but not always downstream. The extra details can mean unnecessarily long upload and download times. The models can become too heavy to use in a chosen BIM viewer or tool. Usually, you cannot go back to the model author tools and simplify the geometry. Only you, as the model user, know what should be simplified. Simplebim allows you to simplify geometries in various ways.

Optimize IFC Data Structures
When you optimize IFC data structures you don’t loose any information, you just use the IFC model in a more efficient way. You can optimize both the geometry and the properties.
Let’s say you have a model with 100 identical chairs. In the IFC model, each of these chairs can have its own copy of the geometry, which would result in a big file. However, all 100 chairs can also use the same geometry and just define their own location and rotation, which of course makes the file much smaller.
The same applies to properties. In IFC most properties are independent objects that are referenced by the ‘actual objects’, for example by our chairs. All chairs could have their own copy of the properties, or they could reference the same properties whenever possible. Either way the information content is identical, but sharing is more efficient.
In Simplebim the optimization of data structures happens automatically. This means that geometry, appearances and properties are all compressed when you export a new IFC file from Simplebim.
Trim and Remove Data
Model trimming is one of the unique core features of Simplebim. When you trim the model you loose information on purpose and you can have many good reasons for doing this.
- When you are the model author you will want to trim away all data that you don’t want to share with others. BIM authoring applications let you do this to some extent, but with Simplebim you have much better control and visual feedback – and thus confidence in the models you share.
- When you receive models, you know what data is important to you and you can remove everything else. Unnecessary data just causes unnecessary problems.
- Removing extra and redundant information makes data more reliable to use.
- Removing unnecessary data makes the whole BIM process smoother: moving data over the internet, storing the data, importing and using the data in other applications.
The basic trimming in Simplebim involves which objects and properties are included and which are excluded. Only the included objects and properties are exported to IFC. If you are for example an structural engineer, you can trim all of our 100 chairs from the model, which reduces their ‘weight’ in the model to zero.
You can also remove unnecessary classifications, groups and type objects from the model. When you for example determine that a classification in the model is incomplete or otherwise not trustworthy, you should simply remove it to make sure nobody in your organization can use it by mistake.
How to trim models in Simplebim is described in detail here.
Anonymizing Data
Sometimes you need to make the models anonymous. The receiver should not be able to recognize who created the model. For example, in an architectural competition, the requirement might be, that the data sent to the jury is completely anonymized. Simplebim has a tool for this too.
Fixing GUID
Data cleaning is not just about removing unnecessary data from the models. It is also about making sure that the mandatory data is correct. While most of the time you don’t have to worry about GUIDs (Global Unique Identifier), there are special cases when they are not correct, unique. Simplebim has a way to fix the GUIDs.
Next Steps
If you followed the documentation so far, you now understand the meaning of exploring, structuring and cleaning the data. Next we get to the fun part. Learn more about enriching data here.