This project is of great interest to the architecture, engineering and construction sector. If you want to join the community, feel free to drop a PR with your suggestions and ideas.
- What should you know?
- Architecture of the code.
- How can you contribute?
- Adding or modifying code
- How does the parser work?
The concept of app is simple: it parses IFC files and converts them to 3d geometry that can be displayed in browsers. Thus, the code is based on three fundamental topics: parsing, 3d and IFC. Knowing a bit of at least one of these topics is essential to understand the code and be able to contribute.
-
The parsing is done with Chevrotain, which is a fast Parser Building Toolkit for JavaScript. You can learn about it in the Chevrotain docs.
-
The 3d is done with Three.js, a well known lightweight 3d library for the web based on WebGL. Some of the most popular resources are Threejs docs and ThreejsFundamentals.
-
IFC is the most widely used open format for storing building and infrastructure BIM models. The references I use are the BuildingSMART 2x3 implementation guide and the official IFC documentation.
The code is composed of 3 decoupled parts, each one responsible of a single task:
The IFC PARSER
is the part of the code that reads IFC files, converts them into token sequences, structures them according to a syntax and loads them into memory according to some semantic rules. The output is a JavaScript object with all entities and their attributes in the form of their respective data type.
The IFC PROJECT BUILDER
is responsible for receiving the parser output and structuring the IFC project data. For example, it builds the spatial structure (associating the entities IfcProject - IfcSite - IfcBuilding - IfcBuildingStorey - IfcSpace) and converts the indirect IfcRel relations into references to the entities loaded in memory.
The IFC TO THREEJS
generates the Three.js geometry that is displayed in the browser. In IFC there are different types of geometrical representation (extrusion, limit definition, etc) that correspond to geometry defined in Three.js. This part of the code maps both geometric definitions using the output of the project builder as input. For example, this creates a ExtrudeGeometry (Three.js) for each IfcSweptAreaSolid of the given IFC file.
+------------+ +------------------------+ +-----------------+
| | | | | |
IFC FILE ------>| IFC PARSER |------>| IFC PROJECT BUILDER |------>| IFC TO THREE.JS |------> THREE.JS GEOMETRY
| | | | | |
+------------+ +------------------------+ +-----------------+
This architecture is reflected in the folder structure of the code. Each part is divided in several sub-tasks that perform atomic actions inside the src folder. Additionally, here are other folders that serve other purposes:
- libs contains resources outside of npm, like the logic for the smooth navigation.
- resources contains screenshots and icons.
- dev contains functions for development (for example, a function that tells the ifc models of the readed IFC file that have not been implemented yet).
- styles contains basic CSS code for the GUI.
- utils contains common code, like global variables or the logic for converting UNICODE text.
IFC.js
├───libs
├───resources
└───src
├───dev
├───ifc-parser
│ ├───ifc-models
│ ├───ifc-services
│ ├───lexer
│ ├───parser
│ └───semantic
├───ifc-project-builder
├───ifc-to-three.js
│ ├───geometry-generator
│ ├───geometry-operator
│ ├───geometry-transformer
│ └───scene
├───styles
└───utils
As a general rule, if you are not sure about contributing, I encourage you to fork this repository, try it on your browser and get your feet wet with a PR. You may not be familiar with the three themes on which the application is based. In that case, it is possible to contribute only to some of the parts of the application without worrying about the rest. In general, there are four scenarios in which you can find yourself right now:
-
You are interested in IFC, but don't know how to code: As you may know, the IFC schema has a lot of models; you can help me develop the IFC schemas, as well as develop future documentation.
-
You are interested in 3d and/ or Three.js: This application is based on the creation of 3d geometry from given data. I suggest you take a look at the geometry creation module to see if you can come up with new ideas or refactorings.
-
You are interested in parsing: The parser module of this app is very concise and self contained. You can also check this link to understand how it works.
-
You only know JavaScript and have never heard of IFC before: As you can see, 99% of this application is JavaScript. The libraries used are really easy to use, and spending some time with the above mentioned documentation should be more than enough to get you started with this project with the topic that interests you most.
-
Other: Is there something else you can contribute with? Feel free to PR directly or contact me to let me know of your ideas.
This is a usual project using node and npm. As in any project, you can contribute using npm install
npm run dev
.
If you find any issue while trying to load an IFC, take a look at the console to see more information about the issue.
If you find difficulties understanding the following information, you should take a look here and here. Also, if you are not familiar with IFC syntax, you should take a look here, especially at the examples. Also bear in mind that the parser is ultimately based on regular expressions.
Before digging into the implementation of the parser, it is necessary to understand how ifc entities look like. At a basic level, an IFC is nothing more than a list of objects with attributes. Each attribute can be a primitive value (a number, a text, a boolean) or a reference to another object. For example, a point in space in IFC is expressed as follows:
#6= IFCCARTESIANPOINT((0.,0.,0.));
All the objects can be broken down in three parts: express ID, ifc class and properties. So, the general schema is something like:
#ID= IFCCLASS(PROPERTIES);
Note that the properties are always between parenthesis. In this case:
#6
is the express ID: the unique number that identifies this entity within this file.
IFCCARTESIANPOINT
is the is the ifc class, that is, the type of the data. Points in IFC are expressed as IfcCartesianPoint instances.
((0.,0.,0.))
are the properties of this instance. This has two parts:
()
the outer parenthesis always have to enclose the properties of any instance.
(0.,0.,0.)
is a property of type number set, whose pattern is a set of numbers separated by commas and surrounded by parenthesis. The points after the zeros are because in IFC all the numbers that are not integers need to have a point for decimals. The point has to be there, even if there are no decimals.
Finally, ;
means the end of the instance declaration.
So, esentially, what #6= IFCCARTESIANPOINT((0.,0.,0.));
means is: This is an instance of IfcCartesianPoint with ID 6 and only one property, of type number set, that contains 3 numbers (representing X, Y and Z) whose value is 0. Easy, right? However, there are properties that can be references to other objects. For example:
#6= IFCCARTESIANPOINT((0.,0.,0.));
#7= IFCDIRECTION((0.,0.,1.));
#8= IFCDIRECTION((1.,0.,0.));
#9= IFCAXIS2PLACEMENT3D(#6,#7,#8);
The ifc class IfcAxis2Placement3D is used to define a coordinate system in space. It has three properties: position , axis (direction of Z axis) and refDirection (direction of X axis). As you can see, these three properties are not expressed directly in the instance of IfcAxis2Placement3D, but are references to other objects. This is really handy because it favors the Single Responsibility Principle and allows to have more compact files. As you may have guessed, IfcDirection is an entity similar to IfcCartesianPoint, but specific for defining vectors in space.
We have not mentioned all the types of primitive data that an IFC might contain, but the general idea is allways the same. There are two more things to take into account, though:
$
means undefined. For example, something like #9= IFCAXIS2PLACEMENT3D(#6,$,$);
represents a coordinate system in space where the axes are not defined, so they have a default value (0,0,1) for the z direction and (1,0,0) for the x direction.
*
means that the properties of the parent class are inherited. Yes, IFC is Object Oriented, and the structure of entities are organized in a hierarchical inheritance structure. Nonetheless, this type of value is rare and the parser doesn't implement it for now, so you don't have to worry about this.
The process has 5 steps:
- The items reader will extract the individual entities of the IFC.
- The lexer defines the vocabulary.
- The parser defines the primitive syntax (structures of tokens) for every data type in IFC.
- The parser defines the high-level syntax for every IFC class.
- The parser reads every item of the IFC using the specific syntax for that ifc class.
- The semantics query the result to retrieve the parsed information and load it in memory.
This might sound confusing at first, but it is actually really simple.
Essentially, the mission of the items reader is to extract the individual ifc items from the raw IFC.
As you already now, an IFC file is a plain text file containing an array of items which look like #ID= IFCCLASS(PROPERTIES);
. Trying to parse an IFC in a "monolithic" way, i.e. extract all its information at once from thousands of lines of text, is a very difficult task. Therefore, this first step will extract each statement, so the task of the parser will be much easier. That is, instead of parsing the following text with one single (and complex) algorithm:
#6= IFCCARTESIANPOINT((0.,0.,0.));
#7= IFCDIRECTION((0.,0.,1.));
#8= IFCDIRECTION((1.,0.,0.));
#9= IFCAXIS2PLACEMENT3D(#6,#7,#8);
The items reader constructs an array with the following structure:
[{"id": 6, "type": "IFCCARTESIANPOINT", "properties":"(0.,0.,0.)"},
{"id": 7, "type": "IFCDIRECTION", "properties":"(0.,0.,1.)"},
{"id": 8, "type": "IFCDIRECTION", "properties":"(1.,0.,0.)"},
{"id": 9, "type": "IFCAXIS2PLACEMENT3D", "properties":"(#6,#7,#8)"}]
This way, the parser can iterate all the items and extract the information one by one. Extracting the express ID and the type is a trivial task that you can see implemented in the items reader module. Now, extracting the information from the properties is where the difficulty lies, and the rest of the steps of the parser will concentrate on this task. Thus, note that from now on each parsing step is referring exclusively to parsing a single properties field. The code will iterate through this object and apply the following logic to the properties of each item.
In short, the mission of the lexer is to define the words that can be found in an IFC. It receives an array of characters (the properties of an ifc item as text) and outputs an array of tokens.
As we have seen, an IFC is a plain text, that is, a long sequence of characters. The lexer defines several tokens or words that make up the vocabulary. Actually, tokens are just small regular expressions that recognize distinguishable units of text to be parsed. For example, the bools in IFC are expressed as .T.
(true) and .F.
(false). The token for recognizing bools in IFC would be as follows (following chevrotain's syntax):
const booleanToken = newToken({
name: "BooleanToken",
pattern: /\.T\.|\.F\./,
})
So, every time that the lexer sees .T.
or .F.
in a text, it recognizes it as a token. For example, an input text like .F..T..T.
would be converted into the following sequence: BooleanToken BooleanToken BooleanToken
. Note that the name of the token is not important.
There might be text that need to be ignored; for example, space characters. This is defined using the chevrotain.lexer.SKIPPED flag, which will create tokens that will be ignored by the parser:
const spaceToken = newToken({
name: "SpaceToken",
pattern: /\s+/,
group: chevrotain.Lexer.SKIPPED,
})
This is all the lexer does; it defines one token for each recognizable unit of text that can be found within an IFC. But instead of defining the tokens one by one (which would be very repetitive), there is an iteration through an object that defines all the names and regular expressions for each token. Actually, there are two objects: one for the read tokens and other for the ignored tokens.
As you may know, the patterns (tokens) to be found within the properties of an ifc item depend on the data type of each property. For example, a property of type number set always looks like this: (2.,1.)
, whereas a property of type boolean always looks like this: .T.
. Therefore, most of the tokens defined in the lexer are correspondant to a data type. For this reason, the name of the created tokens is linked to a JS object listing all the data types. This object is simply used as an enum to ensure nomenclature consistency whenever data types are referred to. Beware: here, data types is only referring to the pattern of the text that represents the property, that is, how a property is supposed to look like.
Now that the vocabulary has been defined, it is necessary to create syntatic rules. The term syntactic rule simply means conditional structure of words / tokens. To be able to parse any text, we have to tell the software what the structure of the text is. For example, imagine that we want to parse a sum like the following:
1+2
In this case, the structure is really simple: NumberToken PlusSignToken NumberToken
. Previously the lexer had converted the text to an array of tokens. In this step we are telling the parser the sequence of tokens we are expecting, so it can group tokens toguether into recognizable structures (in this case, a sum expression). Obviously, this example is really easy, but let's go back to the example of a point in space:
#6= IFCCARTESIANPOINT((0.,0.,0.));
As mentioned before, the parser is only parsing the properties part, which in this case are (0.,0.,0.)
. In the lexer we have defined tokens like number, coma, closingParenthesis, openingParenthesis, etc. To build the syntax, we have to construct a structure that is able to match the pattern of the property of type number set. At a high level, it should be something like:
number set = 1 OpenParenthesisToken + 1 or more (NumberToken + (optional) CommaToken) + 1 CloseParenthesisToken
The CommaToken
is optional because the last number of the set is not followed by a comma. How can we express this pattern in chevrotain? The syntax is defined here and looks like this:
// Check the _chevrotain_'s documentation for further details about the syntax
function NumberSet_Parser($) {
return () => {
$.OR([
{
ALT: () => {
$.CONSUME(v.OpenPar);
$.MANY(() => {
$.CONSUME(v[d.number]);
$.OPTION(() => {
$.CONSUME(v.Comma);
});
});
$.CONSUME(v.ClosePar);
},
},
{
ALT: () => {
$.CONSUME(v[d.default]);
},
},
]);
$.OPTION2(() => {
$.CONSUME2(v.Comma);
});
};
}
Even though at first glance this might look intimidating, it is simply a conditional tree of tokens, which is somehow a way of creating complex regular expressions using the tokens as building blocks. For example, the OR
structure allows to define a syntax that can have different tokens, with ALT
beeing each alternative. Note that here, for example, there is an OR
statement first that states that a property of type number set can either be something like (3.1,4.23)
or something like $
(when the number set is not defined).
It is important to note that in this structure all the posibilities have to be covered: if the parser finds something that is not expected in this structure, it will throw an error and will be incapable of parsing the text correctly. In other words: note that this last big and verbose chunk of code is only for parsing number sets. Of course, this is great if we are parsing something like an IfcCartesianPoint or a IfcDirection, which only have one parameter of type number set. But imagine that we want to parse an instance of type IfcProject, which looks like this:
#119= IFCPROJECT('1Xsibz5yH5MxqU$tFrpDk0',#41,'0001',$,$,'Name of project','State of project',(#111),#106);
Notice how big (and unmantainable) this structure will become; perhaps hundreds of lines just for one ifc class (IfcProject in this case). And yes, we have to define a sintactic structure to be able to parse all the ifc classes, so this is a problem. This is the reason why in the parser primitives module I define one syntactic structure per data type that can be found in an IFC. The idea is that instead of defining a structure for each ifc class , I can use these basic syntactic structures as building blocks to create the syntactic structure for each ifc class dinamically at runtime. This is explained in the next point.
Basically, now we have a bunch of syntactic structures; specifically, one per data type. How can we build a structure for each ifc class using this? To me, the answer is the strategy pattern. As mentioned above, the items in the IFC can be seen as a sequence of properties, and each property has a predefined type. For example, IfcProject has the following properties:
#119= IFCPROJECT(GlobalId, OwnerHistory, Name, Description, ObjectType, LongName, Phase, RepresentationContexts, UnitsInContext);
These properties are of type:
#119= IFCPROJECT(guid, id, text, text, text, text, text, text, id);
So, defining factory function that is able to create syntactic structures taking as argument an array of data types, I was be able to define all the ifc classes simply as an array of data types and the factory would automatically create the syntactic structures for them. This is something you can see in any of the files in the models folder. Notice how easy it is to add new ifc classes. The newObject function is simply constructing the parser map, which maps each ifc class to its correspondant syntactic structure. And this is the key of this implementation, explained in the following point.
Notice that in the models folder some of the properties have names imported from the namedProps object. This is simply used as enum for ensuring the naming consistency.
All the previous points come toguether in the parser process module. This is the module that orchestrates everything and that reflects the steps explained above:
- The arguments are the ifc properties and the ifc type (both strings) of item to parse.
- The lexer converts the ifc properties to an array of tokens.
- The parser applies the syntactic structure correspondant to the given ifc type.
The output is a chevrotain structure that contains the parsed information. Finally, this information is retrieved by the visitor, which is an object defined following the chevrotain architecture explained in the following point.
(WIP)