Two level modeling and Data Oriented Programming

Two level modeling and Data Oriented Programming

[post-meta]

Data-Oriented Programming is a new programming paradigm which simplifies the design of software where information has the focus. Instead of creating classes/entities with code works on the instantiated objects, DOP encourages to separate code from data.

OpenEhr is a clinical oriented database system which has its data-definitions in Archetypes. The concept can be seen in a wider perspective, as Two Level Modeling.

 

 

Data Oriented (DO) Programming

The principles of Data Oriented (DO) Programming are:

Separate code from data

Code resides in functions which do not depend on data. This is very similar to Functional Programming but DOP is language agnostic, it can be done in very many languages.

Example, we describe a car with brand, type and price. We have a Jaguar, E-Type and it costs $ 52.000

We want some code which returns brand and type, and we want some code which returns if the car is expensive. In OOP we would probably break the DOP-guideline in a Javascript class like this:

class Car { constructor(brand, carType, price) { this.brand = brand; this.carType = carType; this.price = price; } brandAndType() { return this.brand + " " + this.carType; } isExpensive() { return this.price > 50000; } }

Functional, we would break it like this, in Javascript:

function createCar(brand, carType, books) { return { brandAndType: function() { return brand+ " " + carType; }, isExpensive: function () { return price > 50000; } }; } 

When we would want to be compliant to the DOP-guideline, we would separate data and code, and the Javascript code could look like this: For OOP:

class CarData { constructor(brand, carType, price) { this.brand = brand; this.carType = carType; this.price = price; } }
class brandTypeCalculation { static brandAndType(data) { return data.brand + " " + data.carType; } }
class CarRating { static isExpensive (data) { return data.price > 50000; } } 

Typical are the static functions in separate classes. Code and data are separated. The benefits are that code can be reused, tests are possible in isolation, and the systems seem to become less complicated.

Represent data entities with generic data structures and how Two Level Modeling comes in

The data can be represented in generic data-structures likes maps, lists, collections, etc. The disadvantage of having generic data-structures is that the layout of datasets is unknown. A way is to have descriptions of datasets which is quite cumbersome without an infrastructure.

Here comes Two Level Modeling in with archetypes. Archetypes are formal data-set descriptions, and data have a pointer to the appropriate archetypes.

The archetypes serve to describe data, but also validate data to the constraints which are defined in the archetypes. In fact, an archetype has three parts, first the metadata, who wrote it, purpose, language. The second part is the constraints-part. For example: One data-item in a set must be lower than another. And the third parts is ontology, which explains the data-set per data item, it can connect terminologies and give directions for other methodologies, for example, message related.

And last but not least, archetypes add semantics to the data.

Archetypes are subsets of an overall Reference Model. You can read more about Two Level Modeling and OpenEhr: https://www.openehr.org

Because the purpose of this thinking is broader then clinical data, the OpenEhr concepts are only an example. The architectonic concepts of OpenEhr can easily be broadened to any Reference Model and any derived archetypes.

Why is it called Two Level Modeling?

This is because data-models are modeled two times, one time in the Reference Model, this is a professional data-modelers job. The second time it will be modeled in the archetypes, which will be typical a job for domain-experts.

For a way to design the Reference Model, I would advise to take a look at OCL-implementations.

For understanding, for explaining the relation between Reference Model and archetype an example:

Think of the Reference Model as a small number of main classes, in a clinical setting like OpenEhr, this can be Observation, Action, Evaluation. The archetype would describe a specific kind of Observation, like the boring example Blood Pressure. The Observation from the Reference Model would contain a data recording item, the Blood Pressure archetype would specify this to bloodpressure, systolic and diastolic.

Data are immutable

The DOP concept requires data to be immutable. It is very strict in this. Also here comes the OpenEhr concept to a help. In OpenEhr data-immutability is completely worked out in a versioning system. Immutability has strong benefits regarding functions that never change data, predictable code and concurrency-safety.

Conclusion

Two Level Modeling and DOP are a nice pair to write high quality software, flexible and safe. This can be the start of an architecture for a semantic multi-purpose data-storage-application

[post-meta]

Two level modeling and Data Oriented Programming

[post-meta]

Data-Oriented Programming is a new programming paradigm which simplifies the design of software where information has the focus. Instead of creating classes/entities with code works on the instantiated objects, DOP encourages to separate code from data.

OpenEhr is a clinical oriented database system which has its data-definitions in Archetypes. The concept can be seen in a wider perspective, as Two Level Modeling.

 

 

Data Oriented (DO) Programming

The principles of Data Oriented (DO) Programming are:

Separate code from data

Code resides in functions which do not depend on data. This is very similar to Functional Programming but DOP is language agnostic, it can be done in very many languages.

Example, we describe a car with brand, type and price. We have a Jaguar, E-Type and it costs $ 52.000

We want some code which returns brand and type, and we want some code which returns if the car is expensive. In OOP we would probably break the DOP-guideline in a Javascript class like this:

class Car { constructor(brand, carType, price) { this.brand = brand; this.carType = carType; this.price = price; } brandAndType() { return this.brand + " " + this.carType; } isExpensive() { return this.price > 50000; } }

Functional, we would break it like this, in Javascript:

function createCar(brand, carType, books) { return { brandAndType: function() { return brand+ " " + carType; }, isExpensive: function () { return price > 50000; } }; } 

When we would want to be compliant to the DOP-guideline, we would separate data and code, and the Javascript code could look like this: For OOP:

class CarData { constructor(brand, carType, price) { this.brand = brand; this.carType = carType; this.price = price; } }
class brandTypeCalculation { static brandAndType(data) { return data.brand + " " + data.carType; } }
class CarRating { static isExpensive (data) { return data.price > 50000; } } 

Typical are the static functions in separate classes. Code and data are separated. The benefits are that code can be reused, tests are possible in isolation, and the systems seem to become less complicated.

Represent data entities with generic data structures and how Two Level Modeling comes in

The data can be represented in generic data-structures likes maps, lists, collections, etc. The disadvantage of having generic data-structures is that the layout of datasets is unknown. A way is to have descriptions of datasets which is quite cumbersome without an infrastructure.

Here comes Two Level Modeling in with archetypes. Archetypes are formal data-set descriptions, and data have a pointer to the appropriate archetypes.

The archetypes serve to describe data, but also validate data to the constraints which are defined in the archetypes. In fact, an archetype has three parts, first the metadata, who wrote it, purpose, language. The second part is the constraints-part. For example: One data-item in a set must be lower than another. And the third parts is ontology, which explains the data-set per data item, it can connect terminologies and give directions for other methodologies, for example, message related.

And last but not least, archetypes add semantics to the data.

Archetypes are subsets of an overall Reference Model. You can read more about Two Level Modeling and OpenEhr: https://www.openehr.org

Because the purpose of this thinking is broader then clinical data, the OpenEhr concepts are only an example. The architectonic concepts of OpenEhr can easily be broadened to any Reference Model and any derived archetypes.

Why is it called Two Level Modeling?

This is because data-models are modeled two times, one time in the Reference Model, this is a professional data-modelers job. The second time it will be modeled in the archetypes, which will be typical a job for domain-experts.

For a way to design the Reference Model, I would advise to take a look at OCL-implementations.

For understanding, for explaining the relation between Reference Model and archetype an example:

Think of the Reference Model as a small number of main classes, in a clinical setting like OpenEhr, this can be Observation, Action, Evaluation. The archetype would describe a specific kind of Observation, like the boring example Blood Pressure. The Observation from the Reference Model would contain a data recording item, the Blood Pressure archetype would specify this to bloodpressure, systolic and diastolic.

Data are immutable

The DOP concept requires data to be immutable. It is very strict in this. Also here comes the OpenEhr concept to a help. In OpenEhr data-immutability is completely worked out in a versioning system. Immutability has strong benefits regarding functions that never change data, predictable code and concurrency-safety.

Conclusion

Two Level Modeling and DOP are a nice pair to write high quality software, flexible and safe. This can be the start of an architecture for a semantic multi-purpose data-storage-application

Recent

[post-meta]

Marc Woolderink