Skip to content

Data Flow

The goal of the MorphMapper application is for user to construct a template once, guided by a GUI, which will then automatically parse future data of similar structure. The template creation involves a data transformation flow consisting of four distinct data structures. The application initiation depends on whether the user is developing a new template or continuing work on an existing one, starting from either a Schema or Save File. These initial structures are transformed (build, hydrated) into a tree structure, which is then used by the application to perform mutations on the underlying data. Upon completion of the user’s definitions, the tree undergoes a transformation (rendering) to produce the final template.

Application Flow

Schema

A schema is used to bootstrap the initial structure of the resulting template. This includes which variables exist for a template, which are required, which are optional as well as which variables relate to each other in what way and which variables have predefined values. Depending on the type of user different templates are available, these are loaded in based on environment settings (not yet implemented). A schema consists of 2 types:

Schema Map

A schema map outlines the framework of a template, indicating how variables should be grouped and nested. The keys of a SchemaEntry map directly to the variable names in the resulting template.

type SchemaMap = {
isArray?: boolean;
values: {
[key: string]: SchemaEntry | SchemaMap;
};
};

Schema Entry

A schema entry defines how a variable’s value should be specified in a template. The validation field ensures that the template logic, stored at the current variable, results in the correct type during document parsing. The type field specifies the default user interface behaviour for defining a variable. The prefill function implements default values based on the environment, but if the user defines their own value, the prefill is overridden.

type SchemaEntry = {
validation: z.ZodType<any, any>;
type: EntryItemType;
prefill?: (name: string) => string;
};

Entry Item Type

The EntryItemType is used by the UI to give users different ways to define a variable.

enum EntryItemType {
Simple = "simple",
Graph = "graph",
Dict = "dict",
Internal = "internal",
}

A simple type gives the user a field to define a constant value. A graph type will give the user a graph view where they can define complex logic to extract data from a document. A dict type gives the user a table where rows can be added and removed, where each row represents a variable and can be of a simple or graph type. Internal types are predefined and used by the application, remaining hidden from the user as they are not displayed in the UI.

Save File

A save file is serialized application state. Because the application stores validation functions in its state they are first serialized by a custom function, and then again hydrated by a custom function, in order to prevent undefined function errors.

Tree

The tree constitutes the largest portion of the global application state. Although the underlying data structure is conceptually a tree, it is manifested in the application state as an object utilizing key-value pairs. In this representation, each key corresponds to the ID of the node stored at that specific location. The tree can be navigated by traversing the parentId and childNodeIds fields.

Tree nodes manifest in two distinct types: TreeItem nodes and TreeMap nodes. TreeItem nodes encapsulate definitions dictating how a key-value pair should be implemented, essentially serving as local definitions. Conversely, TreeMap nodes provide definitions for the framework of the resulting template, assuming responsibility for environment variable and iterator definitions. As such, they possess a global scope over their children.

type TreeNode = TreeItem | TreeMap;
type TreeItem =
| TreeGraphItem
| TreeDictItem
| TreeSimpleItem
| TreeInternalItem;

Tree Item Nodes

type TreeGraphItem = {
id: string;
name: string;
validation: z.ZodType<any, any>;
graphId: string | undefined;
computed: any;
type: EntryItemType.Graph;
parentId: string;
};
type TreeDictItem = {
id: string;
name: string;
validation: z.ZodType<any, any>;
type: EntryItemType.Dict;
variables: {
[key: string]: DictVariable;
};
parentId: string;
};
type TreeSimpleItem = {
id: string;
name: string;
validation: z.ZodType<any, any>;
type: EntryItemType.Simple;
value: any;
parentId: string;
};
type TreeInternalItem = {
id: string;
name: string;
validation: z.ZodType<any, any>;
type: EntryItemType.Internal;
parentId: string;
};

Tree Map Nodes

type TreeMap = {
id: string;
name: string;
validation: z.ZodType<any, any> | undefined;
context: MapContext;
computed: any;
options: {
isArray: boolean;
};
parentId: string | undefined;
childNodeIds: string[];
};