CIMD

Architecture

Overview

The core of cimd is the parser. Every feature of cimd is built on top of the parser.

Parser

The parser indexes the input XML and creates an “address book” of all the cim objects. The core idea of CIMD is that we do not store content when parsing, we only store references to the content. These are the steps:

Find all start and end positions in the text of the XML tags. This creates a list of TagBoundary. See src/tag_index.zig for code. Here we levarage single instruction multiple data (SIMD) for speedup.

pub const TagBoundary = struct {
    /// Position of '<' character
    start: u32,
    /// Position of '>' character
    end: u32,
};

For each TagBoundary, check if tag is an CIM object (i.e. not a property or reference). For each CIM object, find its (self-)closing TagBoundary postition. Also its element type (e.g. ACLineSegment) and its rdf id are stored. This creates a list of CimObject. Note that each CimObject stores a zero-copy reference to the original xml and the list of TagBoundary.

pub const CimObject = struct {
    xml: []const u8,
    boundaries: []const TagBoundary,

    object_tag_idx: u32,
    closing_tag_idx: u32,

    id: []const u8,
    type_name: []const u8,
};

Note that at this point, we have reduced the data from potentially millions of bytes to an index of thousands of CimObjects.

Create the CimModel, containing the following two hash maps:

Rdf ID -> index in TagBoundary list;
Type name -> list of indices to TypeRange;

pub const CimModel = struct {
    objects: []CimObject,
    id_to_index: std.StringHashMap(u32),
    type_index: std.StringHashMap(TypeRange),

    xml: []const u8,
    boundaries: []TagBoundary,

    const TypeRange = struct { start: u32, len: u32 };
};

Note that since we are mostly storing integers in CimModel (i.e. positions in the text/ lists), the resulting index has a tiny memory footprint.

The parsing step reduces massive CGMES profiles to lists of indices, so the next steps can deal with lists of hundreds of objects instead of millions of raw characters. All features of cimd use these reduced lists and hashmaps.

The rest of the features are using this index in a lazy manner (only look up actual content when it is needed).

Using the CimModel, all sorts of hashmaps can be built that model the relationships in the grid model. This approach yields tight and hot loops that optimize CPU and reduce memory footprint.