Examples of domain knowledge driven program analyses
Research statement: The explicit use
of domain knowledge for analyzing programs leverages the classical
program analyses and opens opportunities for completely new program
analyses.
Below are several categories of conceptual program analyses together
with a set of concrete examples of conceptual defects in the Java
standard API. Conceptual defects originate from the inadequate
representation of business domain knowledge in programs. They are not
bugs per se but can lead to bugs, difficulties in maintaining the API
clients, redundancy in the clients, or different hacks.
The examples were identified in an automatic manner by
mapping the Java API to different ontologies. We have chosen parts of
the Java standard API in order to demonstrate the need for domain
knowledge driven analyses, the pervasiveness of logical defects, and to
make our examples easier to follow. We are convinced that the Java
API
has a much higher quality than common programs and therefore the
logical defects are even more pervasive in the practice.
A good naming of program entities is a central prerequisite to an easy
understanding of the code. In the case of APIs bad naming can create
confusions or difficulties in the use of the API. Good naming implies
that the concepts implemented by program elements are reflected
accurately and consistently throughout the program.
- polysemy example: in the name of the class
'java.util.Date' and in some constructors, the word 'date' is an specific
instance
in time, whereas in parameters of other constructors of
the same class the word 'date' denotes the day of the month
between 1-31.
This polysemy might create confusions in the use of class Date and
thereby it makes the Java API easier to misuse.
- synonymy example: the concept ellipsis is
referenced under two names: 'ellipse' in the name of the class
'java.awt.geom.Ellipse2D', and 'oval' in the name of the method
'java.awt.Graphics.drawOval()'. This inconsistency lowers the
homogeneity of the API and makes it more difficult to use.
- ambiguous names:
in the class java.awt.BorderLayout both the positioning constants and
the components situated at the corresponding positions have the same
name. Please remark that due to this ambiguity the Java programming
team themselves used bad comments. Luckily, the instances of Component
have the visibility level 'package' and thereby this issue does not
disturb the Java API users.
The inheritance hierarchy should mirror the is-a relation between a
sub-concept and a super-concept. By doing this, we make the APIs
natural and easy to use in analogy to the domain knowledge of
programmers.
- inverted inheritance: in the collections framework of
the Java API, the class 'java.util.LinkedList' implements the interface
'java.util.Queue' and thereby whenever objects of Queue are requested,
we can use objects of LinkedList. This can lead to unexpected errors
when the object is used both as a Queue (and thereby an ordering of
elements is expected) or as a LinkedList (which allows random access to
its elements) as shown on the right-hand side of the figure below. This
is an example of the violation of the "Liskov's Substitution
Principle".

In order to be easy to understand and maintain programs should be built
in a modular manner. The layered nature of the Java standard API is
clearly suggested in the figure below (taken from here) -- we can
notice for example that 'java.lang' belongs to the base libraries and
'java.awt' to UI toolkits. Furthermore, even if not shown in this
figure, it is well known that the Swing API is built upon the Awt API..
Is this really so? Well, no :-(

- 'java.lang' knows about 'java.awt': in the class
'java.lang.SecurityManager' we have the method
'checkAwtEventQueueAccess' and thereby is a logical dependency between
'java.lang' and 'java.awt'. This dependency is a violation (at the
logical level) of the Java platform architecture shown in the above
figure.

- 'java.awt' knows about 'java.swing': in the class
'java.awt.Component', in the method 'doSwingSerialization' there is a
dependency to the 'javax.swing' framework. The method
'doSwingSerialization' instantiates and invokes these classes through
the Java reflexion mechanism and thereby the structural dependency is
avoided (even the AWT developers wrote in a comment that their solution
'is a hack').

Please not that these violations of the architecture can not be
discovered by structural analyses (there is no structural reference
from 'java.lang' to 'java.awt' or from 'java.awt' to 'java.swing').
This is a typical example of an architecture that looks good in the
documentation but that is implemented differently in the code.
Traditionally, the logical violations of the architecture can
beidentified only through manual code reviews.
Ideally, in order for an API to be concise, a domain concept should be
implemented only once. In practice, due to different constraints or bad
API design, domain concepts are implemented several times. Even worse,
many times the implementations are not consistent with each
other.Redundancy in the API leads to redundancy and heterogeneities in
their clients.
- redundant definition of the 'point' concept in 'java.awt':
in
'java.awt' are three classes that define the 'point' concept:
'java.awt.Point', 'java.awt.geom.Point2D.Double', and
'java.awt.geom.Point2D.Float'. We consider these implementations to be
redundant. In this case, the redundancy is due to the performance
constraints.
- redundant representation of the months of the year concepts
in the Java API: the months are represented as static constants in
two classes: 'java.util.Calendar', and
'sun.util.calendar.BaseCalendar'. Besides the redundancy per se, there
were different constants choosen to implement the same month. By using
for example the 'sun.util.calendar.BaseCalendar.MARCH' within the
'java.util' part we can produce unexpected results (e.g. set the 31th
day of the February)

- redundant representation of the 'NORTH' concept in
'javax.swing': in the class 'javax.swing.SwingConstants', 'NORTH'
is a constant with type 'int', while in the class
'javax.swing.SpringLayout', 'NORTH' is a constant with type 'String'.
Thereby there is a redundancy in the representation that requires
developers to write additional code that only converts between the
interpretation of integer values as positions and the interpretation of
string values.

Before we use a domain specific API, we should be informed about its
conceptual coverage. Ideally, the API should provide direct access to
all domain concepts that we need, and all relations between them. In
the case when the API does not offer implementation for one of the
concepts needed by us, we have to extend the API ourselves (or to
migrate the application to another API, which is normally very
difficult). We measure the domain coverage of an API by mapping it to a
domain ontology that covers the same domain (for example, to an
ontology from our knowledge
repository).
- collections framework: until Java 1.5, the collections
framework did not contain implementation for 'queues'.
- AWT: the AWT part of the Java library, does not cover
typical graphical concepts like: 'tables', 'tool tips', 'tool bars',
'trees', or more advanced dialogs like 'print dialog', 'font dialog',
etc.
Last modified: Thu Oct 8 16:49:25 CEST 2009