Developing Simulators for Testing and Early Integration in the Model Driven Way

The Summer is gone, the Autumn is almost gone and so many things have happened. Unfortunately, I have little time to write nowadays. I still have lots of topics to ponder about together so I hope that my writing trend will improve a bit in the future.

About 3 weeks ago, Bits&Chips, a Dutch magazine for high-tech industries, has organized a two days conference on software product line engineering: High-Tech Product Lines 2011 (also known as Practical Product Lines in the previous editions).

The event had great keynotes (among others, Jan Bosch and Linda Northop) and there were also many good presentations (Markus Voelters, Jukka-Pekka Tolvanen and others) in the sessions during these two days. You can find the whole conference program, including most of the presentations online here.

In the second day, Loek Cleophas and I had a talk on applying domain-specific modeling and software product line engineering (SPLE) techniques to developing software-in-the-loop simulators (SILS) for early testing and hardware-software integration. In our talk, we presented an overview of a domain-specific modeling language for the modeling of hardware configurations combined with material flow (in particular, wafer flow) in complex manufacturing systems. As an automated production mechanism, we used a code generator that generates the final source code of the executable simulators from the models.

In the presentation below and in the abstract here, you can find the rest of the story about how this DSL looks like and how we use it. If you know other projects on applying SPLE or MDE techniques on virtualization and/or testing, please let me know, I am very interested hear about them.


Four Ways of Creating Domain Specific Languages

Language is a process of free creation; its laws and principles are fixed, but the manner in which the principles of generation are used is free and infinitely varied. Even the interpretation and use of words involves a process of free creation.~Noam Chomsky

The creation of a new domain specific modeling (DSM) language has never been an easy task. A primary reason is that there is no single “fit-for-all” recipe that can tell and guide us how to define DSM languages in a generic way. For the same reason, the definition of the language can also be the most interesting and exciting part of the language “creation” process. One of the most difficult and critical aspects of language design is to capture and define the various constituting elements of a DSM language. The goal here is the identification of:
  • the language concepts (otherwise known as language construct),
  • the relationships and constraints among the concepts and
  • the dynamics (otherwise known as execution behavior) of the language
There are various possible ways to define the concepts of a DSM language and usually, there are also various means that can help (and/or drive) the definition of a language. Considering the means, you can use, for example:
  • knowledge and expertise of domain experts and designers,
  • existing libraries and APIs capturing the domain already,
  • documentation,
  • technology roadmap,
  • and if it exits, the product family engineering process of an organization.
Naturally, having more input source in the language definition process may not necessarily simplify the definition of the language - it can actually create more difficulty in establishing the concepts of the domain due to inconsistencies, for example, between the various existing artifacts (e.g. source code vs. documentation).

Considering the possible ways, Juha-Pekka Tolvanen and Steven Kelly in [1] has identified four general types of approaches (based more than 20 cases of DSM definition):

  1. Domain Expert’s Concept
  2. Generation Output
  3. Look and Feel of System Built
  4. Variability Space
We applied some of these approaches in the various projects that I have been involved with. In the following sections I will revisit these approaches and share some experience on their application.

1. Domain Expert’s or Developer’s Concepts (Top-Down Approach)

This style of language definition is based on identifying directly the concepts applied by domain experts and the developers who are supposed to create the models in the designed DSM language. The definition of the language is typically an interactive and iterative process together with the domain experts. In this process, the use of existing notations such as UML or BPMN can help in establishing the language; however, not all domain experts might be familiar with these notations and discussions can easily lose focus. This problem can often be overcome by the use of simple graph notations (nodes, arrows, labels, colors, etc.). Alternatively, sketching the languages in a simple textual format often works out relatively well too. Using this style of language definition, the resulting languages tend to be vertical (more narrow by nature and pertain to a certain type of industry, such as IP telephony, home automation, insurance products, etc.) rather than horizontal DSLs (technical and broad in nature).

Advantages:
The most important aspect of this approach is that it takes the concepts directly from the domain experts instead of, for example, implementation artifacts such as source code that implements a domain model. In this way, the DSM language is established first and it is mapped to one or more target execution platform (using generic programming languages) later. Hence, there is a better chance to achieve a higher-level of abstraction than using bottom-up approaches. Since it is a top-down approach, there is also a better chance to achieve 100% code generation.

When one can apply this definition technique it means that the domain is usually discovered and relatively well established already, especially if there are more domain experts that use a common domain vocabulary. In others word, a language designer who can apply this style of language definition have a jump ahead of the ones who still needs to establish the vocabulary of the domain first.

Difficulties:
Depending on the project context, it can also happen that the domain experts may not agree entirely on definition of each domain concept. Alternatively, it can also happen that they may not agree on the necessity of certain concepts proposed by other experts. One way to clarify these types of ambiguities is to figure out what modeling problems can actually be solved by the proposed concept. It might also be good to check how much modeling effort will be required by using the various concept alternatives. Both of these techniques should basically help in determining the value of the modelling concepts in question.

2. Generation Output (Bottom-Up Approach)

Another broadly applied style of language definition is based on identifying the concepts of the language indirectly by “extracting” them from existing source code (mostly written in a generic purpose programming language such Java, C or C++). This style is typically applied when there is a large amount legacy code written that consists of reoccurring idioms, patterns that express certain domain concepts or concepts from higher level of abstraction and automation of these idioms and patterns required in future products. Another typical application of this approach is when the DSL is quickly “prototyped” by coding in a general purpose programming language first.

Advantages:
The advantage of this language definition style is that the applied patterns and idioms in the source code show already how the domain concepts are actually used through concrete examples. In other words, the source code already contains model instances that “only” need to be extracted by the language engineers. If the idioms and patterns are well modularized in the given GPL, it is relatively easy to extract the domain concepts.

Difficulties:
It can also happen, especially in large-scale legacy systems, that the idioms and patterns are not so consistently applied, or the patterns expressing a certain domain concept are intertwined with other technological concerns (e.g. inter-process communication, profiling, error-handling, etc.). In other words, less structured and/or more complex code can also give more difficulty in isolating the domain concepts of the language. For the same reason, 100% code generation is not always possible when this style of language definition is applied, which will also impose difficulties in the verification of the models later.

The bottom-up nature of the process may pose a serious risk on reaching an abstraction level for the language that can provide a sufficient return on investment. Furthermore, dedicated domain libraries & APIs can help to isolate further the domain concepts, however, the question arises: is it worth to create an explicit (external) DSL over the existing API? Alternatively, realizing the domain as internal DSL may be sufficient, especially if the target language can support the development of internal DSLs (e.g. Python); however, making the choice on the implementation technique fortunately comes in the language realization phase.

All in all, I believe that this style of language definition can lead to DSLs that can offer relatively low-intermediate level of abstractions in the design process and help in the separation and automated construction of technological concerns during the implementation process. However, 100% is code generation not easy to achieve (often not worth to achieve) and the final languages tend to be horizontal DSLs rather than vertical.

3. Look and Feel of the System Built

The third style of language definition is applicable to “products whose design can be understood by seeing, touching or by hearing”. In this category of language definition, the end-user product concepts act as modeling constructs / abstractions of the DSM language under construction. The authors of [1] give an example of a language that can be used to develop UI application of Series 60 and Symbian-based smartphones (based on the type of widgets available in these platforms).

Telling the truth, this is type of language definition that I have not encountered in my work since I am involved with completely different type of systems. So if you are looking for more experience with this style of language definition, I recommend to look into [1].

4. Variability Space

The last style of language definition is based on expressing variability (and commonality) of a product line. By using this approach, the concepts of the resulting DSL can capture the complete variability space of the product line - that is, the product assets and their possible composition. Since the DSM language focuses on expressing variations, the resulting models strongly resemble configurations of products. DSLs defined by this style are strongly declarative (DSLs are declarative in general): models typically describe what problem they solve instead of showing how they solve it. The language concepts are typically identified by using a systematic commonality/variability analysis - for example, see SCV analysis presented in [2]. Finally, the uniqueness of this language definition is that the DSL is typically shaped not only by an existing product line (i.e. existing artifacts) but also by the vision and anticipation of product experts and developers predicting future, potential product variations.

Advantages:
This style of definition typically gives very high return on investment (ROI) for two reasons: 1) the resulting DSL is defined at a very high abstraction level - concepts of the language are directly expressing product assets and 2) the anticipation of future variations can leads to a combination of pre-aligned meta-models and code generators which are easy to maintain and extend. In addition, the declarative nature and the close coupling with product assets in the DSL make the models easy to communicate and share with different stakeholders.

Difficulties:
High ROI may sound very attractive, however, this style of language definition is the most difficult one to carry out. Realization of the language can be especially challenging if the platform architecture is not expressive enough for supporting product variations, or not flexible enough to accommodate future product variants. For this reason, incremental development strategy is often applied with a reactive product line engineering approach: the most common concepts of the product family are implemented first to maximize the ROI already on the first versions of the language.

Another challenge with this approach is to come up with the dynamics (i.e. runtime behavior) of the language. This is due to the fact that the language definition first focuses on modeling the product variations and the execution of the product is often addressed only in a later / separate stage (consider, for example, the execution architecture, code generation, etc). One way to handle this problem is to simply start “coding” how the model is supposed to run (without considering the identified product variations) and do code / architecture refactoring later based on the previously identified product variation points.

Naturally, I can imagine that there are other approaches out there besides the four ones described here. If you know or are aware of another language definition style, please let me know, I am really interested in hearing about it.


[1] Tolvanen, J.-P.; Kelly, S.: Defining Domain-Specific Modeling Languages to Automate Product Derivation: Collected Experiences. Proceedings of the 9th International Software Product Line Conference, H. Obbink and K. Pohl (Eds.) Springer-Verlag, LNCS 3714, pp. 198 – 209, 2005.

[2] Coplien, J., Hoffman, D.; Weiss, D.: Commonality and Variability in Software Engineering, IEEE Software 15, 6 (November/December 1998):37-45.

Charting Trends on (Model-Driven) Software Engineering by Google’s Ngram Viewer

I have to admit that I am caught up in the epidemic of playing around with Google’s Ngram Viewer. It is an amazing tool that can visualize the rise and fall of concepts across 5 million books over 500 hundred years.

My first encounter with the viewer happened while I was actually reading Jean Bezivin’s blog. Jean used the viewer to observe trends on technological buzzwords: he drew comparative curves for the terms “object-oriented” and “model-driven” - I recommend to visit his blog for reading his interpretation on the curves.

 Curves of the terms Object oriented vs. Model Driven

After spending a couple of minutes on tweaking some comparisons, I realized that I started to ‘cook up’ comparative curves for questions that I was often wondering about. So here are some of those curves:

Model-driven vs. model-based vs. code generation?

The first chart (see the figure below) is a result of adding the two terms model based and code generation to Jean's original two terms (and also making the terms lower case).

Curves of the terms model driven vs. model based vs. code generation (vs. object oriented)

I know that this comparison may not be entirely fair, since model-based design (see Wikipedia for a reference here) has a longer history than model-driven engineering. Still, I was curious to see whether the curves match the expectation and indeed, they did.

Many of the technological terms (not shown here) have the classic bell-shaped curved of normal distribution. It is interesting to see that the process of adoption of new technologies and innovation over time is also illustrated by the bell-shaped curve in the well-known technology adoption lifecycle model of Everett Roger (see figure below). The curve of the term ‘code generation’ may seem to be break this pattern since it is stable from nineties until today. This may indicate that code generation is not a specific technology but rather only means used in various specific technologies.

Roger's Bell Curve


Java, C++, C, Ada and Perl - and the winner is...?

The popularity of programming languages was always a hot topic in software engineering. So I was curious to see the trend on some of the languages in the literature back to seventies.

Curves of General-Purpose Programming Languages 

Like it or not, Java and C were and are still dominating the ‘literature market’ just like they are occupying the first two position in the often cited TIOBE programming community index. Considering again Roger’s bell curve, the ngram curves of these two dominating languages seem to indicate that they have already passed their peaks and new candidates are rising on the horizon. However, the question still remains: what is going to be the next language that can reach the fame of Java or C?

...and which model of computation?

Another returning question in software engineering and computer science is about the model of computations that can be used to reason on the properties of models. Such properties are, for example, freedom from deadlocks and livelocks, or worst-case and best case execution times of concurrent systems.

Curves of the terms timed automata, Petri net, process algebra, finite state machine and actor model

This chart presents a couple of comparative curves for well known (types of) models of computations. Petri nets are known to be one of the most popular formalism to model concurrent systems so the curve of the term Petri net came as no surprise. On the other hand, it is interesting to see that the term timed automata still has a steady gain since the nineties. I am very curious to see how these trends will continue in the future.

The comparisons that I show here are some random thoughts that came to my mind while I was using the ngram viewer rather than carefully designed experiments. Very likely, you will also find other languages, formal methods - that I did not or forgot to mention - interesting enough to chart for further comparison.

So If you feel like giving it a try, just start punching in your favourite terms - it may give you some very interesting results. I can tell you that it is definitely fun to use the Ngram Viewer. However, I also need to warn you in advance: the rumor says that it is very addictive.