Examples of domain knowledge driven program analyses

Research statement: The explicit use of domain knowledge for analyzing programs leverages the classical program analyses and opens opportunities for completely new program analyses.

Below are several categories of conceptual program analyses together with a set of concrete examples of conceptual defects in the Java standard API. Conceptual defects originate from the inadequate representation of business domain knowledge in programs. They are not bugs per se but can lead to bugs, difficulties in maintaining the API clients, redundancy in the clients, or different hacks.
The examples were identified in an automatic manner by mapping the Java API to different ontologies. We have chosen parts of the Java standard API in order to demonstrate the need for domain knowledge driven analyses, the pervasiveness of logical defects, and to make our examples easier to follow. We are convinced that the Java API has a much higher quality than common programs and therefore the logical defects are even more pervasive in the practice.
 Naming problems
A good naming of program entities is a central prerequisite to an easy understanding of the code. In the case of APIs bad naming can create confusions or difficulties in the use of the API. Good naming implies that the concepts implemented by program elements are reflected accurately and consistently throughout the program.                                                           Polysemy in java.util.Date                                         The Ellipse-Oval synonymy in the Java API                                               Naming ambiguities create confusion in class BorderLayout. 
Inheritance defects
The inheritance hierarchy should mirror the is-a relation between a sub-concept and a super-concept. By doing this, we make the APIs natural and easy to use in analogy to the domain knowledge of programmers.                                                                         Example of inverted inheritance
Logical modularity
In order to be easy to understand and maintain programs should be built in a modular manner. The layered nature of the Java standard API is clearly suggested in the figure below (taken from here) -- we can notice for example that 'java.lang' belongs to the base libraries and 'java.awt' to UI toolkits. Furthermore, even if not shown in this figure, it is well known that the Swing API is built upon the Awt API.. Is this really so? Well, no :-(

                                               Java layered architecture
                                                         java.lang depends on java.awt picture
                                                  java.awt depends on javax.swing

Please not that these violations of the architecture can not be discovered by structural analyses (there is no structural reference from 'java.lang' to 'java.awt' or from 'java.awt' to 'java.swing'). This is a typical example of an architecture that looks good in the documentation but that is implemented differently in the code. Traditionally, the logical violations of the architecture can beidentified only through manual code reviews.
Logical redundancy
Ideally, in order for an API to be concise, a domain concept should be implemented only once. In practice, due to different constraints or bad API design, domain concepts are implemented several times. Even worse, many times the implementations are not consistent with each other.Redundancy in the API leads to redundancy and heterogeneities in their clients.
                                                   Multiple definition of the concept 'point' in the Java API   
                                                          Redundant and inconsistent representation of months in the Java API.
                                                   Redundant definition of NORTH, SOUTH, etc. in the Java Swing
Conceptual coverage
Before we use a domain specific API, we should be informed about its conceptual coverage. Ideally, the API should provide direct access to all domain concepts that we need, and all relations between them. In the case when the API does not offer implementation for one of the concepts needed by us, we have to extend the API ourselves (or to migrate the application to another API, which is normally very difficult). We measure the domain coverage of an API by mapping it to a domain ontology that covers the same domain (for example, to an ontology from our knowledge repository).
Last modified: Thu Oct 8 16:49:25 CEST 2009