CIMD
Architecture
Overview
The core of cimd is the parser. Every feature of cimd is built on top of the parser.
Parser
The parser indexes the input XML and creates an “address book” of all the cim objects. The core idea of CIMD is that we do not store content when parsing, we only store references to the content. These are the steps:
- Find all start and end positions in the text of the XML tags. This creates a list of
TagBoundary. Seesrc/tag_index.zigfor code. Here we levarage single instruction multiple data (SIMD) for speedup.
pub const TagBoundary = struct {
/// Position of '<' character
start: u32,
/// Position of '>' character
end: u32,
};
- For each
TagBoundary, check if tag is an CIM object (i.e. not a property or reference). For each CIM object, find its (self-)closingTagBoundarypostition. Also its element type (e.g.ACLineSegment) and its rdf id are stored. This creates a list of CimObject. Note that eachCimObjectstores a zero-copy reference to the original xml and the list ofTagBoundary.
pub const CimObject = struct {
xml: []const u8,
boundaries: []const TagBoundary,
object_tag_idx: u32,
closing_tag_idx: u32,
id: []const u8,
type_name: []const u8,
};
Note that at this point, we have reduced the data from potentially millions of bytes to an index of thousands of CimObjects.
- Create the CimModel, containing the following two hash maps:
- Rdf ID -> index in
TagBoundarylist; - Type name -> list of indices to
TypeRange;
pub const CimModel = struct {
objects: []CimObject,
id_to_index: std.StringHashMap(u32),
type_index: std.StringHashMap(TypeRange),
xml: []const u8,
boundaries: []TagBoundary,
const TypeRange = struct { start: u32, len: u32 };
};
Note that since we are mostly storing integers in CimModel (i.e. positions in the text/ lists), the resulting index has a tiny memory footprint.
The parsing step reduces massive CGMES profiles to lists of indices, so the next steps can deal with lists of hundreds of objects instead of millions of raw characters. All features of cimd use these reduced lists and hashmaps.
The rest of the features are using this index in a lazy manner (only look up actual content when it is needed).
Using the CimModel, all sorts of hashmaps can be built that model the relationships in the grid model. This approach yields tight and hot loops that optimize CPU and reduce memory footprint.