LJV: What We Can Learn From Java Data Structures Visualization

When I started to prepare a course of lectures on the Java language for the Moscow Institute of Physics and Technology, I became to search for material that can be used as illustrations. ‘One picture is worth a thousand words’ as they say, and sometimes it’s worth even more because it seems impossible to explain e. g. how a hash table works without drawing something. My task was to explain to students many Java concepts, from string pool to advanced data structures. I was looking for a solution that is able to make the visualization of Java data structures in the easiest and precise way, ideally compatible with ‘presentation as code’ technology.

There exists Aleksey Shipilev’s JOL (Java Object Layout) tool known among Java performance engineers. JOL analyzes the memory layout for any given object, including all the auxiliary data and fields, and the graph of objects reachable from the given instance. This tool gives accurate estimations of the object size and also shows the addresses in memory where objects are located. However, it’s not yet able to visualize the object graph and gives too many low-level details that are irrelevant for the students who just started to learn Java.

While surfing the web looking for something that better fits my needs, I came across a page on the University of Auckland’s website dedicated to the GPL-licensed program called LJV (Lightweight Java Visualizer). This tool was developed by Dr. John Hamer in Java version 1.4 in 2004, which is the year when Maven 1.0 was released, and four years before GitHub appeared. Although it hadn’t been developed since then, it still worked perfectly and was able to run at least on Java 8 and do its job! I used LJV extensively while preparing the slides for my lectures, and in the fall of 2020 two of my students, Ilya Selivanov and Nuras Nogaev updated the code to Java 11, added tests, CI, documentation, a couple of new features, and published it on GitHub and Maven Central as a part of their semester project. Here I would like to briefly explain the LJV working principle and show some examples of its usage.

The idea behind LJV is very simple. It takes an arbitrary object and then uses Java Reflection API to traverse the graph of objects reachable from a given one. It forms a representation of the graph in DOT language that can be later converted into png or svg image using Graphviz software. Actually we even need not to have graphviz installed: GraphvizOnline can render an image in a browser.

Thus, this one-liner

System.out.println(new LJV().drawGraph("Hello"));

gives us the following in the console:

digraph Java {
    rankdir="TB";
    node[shape=plaintext]
    . . .
}

In order to convert this to an image one might use

dot -Tpng myfile.dot > myfile.png

or, even simpler, just paste the resulting script to GraphViz Online and get a picture:

By calling configuration methods on the LJV object we can customize the way it visualizes data. We can choose fonts and colors for classes and fields, whether to hide null-valued fields, which objects should be represented only by their toString method invocation result, diagram drawing direction (top-to-bottom or left-to-right), etc., etc.

String graph = new LJV()
    .addFieldAttribute("left", "color=red,fontcolor=red")
    .addFieldAttribute("right", "color=blue,fontcolor=blue")
    .addClassAttribute(Node.class, "color=pink,style=filled")
    .addIgnoreField("ok")
    .setTreatAsPrimitive(String.class)
    .setShowFieldNamesInLabels(false)
    .drawGraph(n1);

The things that we see as a result of the visualization are, in my opinion, interesting and instructive not only for Java beginners but also for all the practicing Java engineers, even those who already know about the effects shown.

Let’s see some examples.

String

The most widely used type of data in Java is, of course, String. Starting from Java 9, the internal representation of String has changed: char[] was replaced by byte[], and coder flag was introduced in order to switch between 8-bit and 16-bit character representation. This allowed significant memory optimization for strings that contain only LATIN-1 charset characters:

/*Java 8-: one 16-bit char per character*/
new LJV().drawGraph("abcαβγ");

/*Java 9+: coder set to 0 and one byte per LATIN-1 character*/
new LJV().drawGraph("abc");

/*Java 9+: coder set to 1
and 2 bytes per character
if there are symbols outside
LATIN-1 set*/
new LJV().drawGraph("abcαβγ");

String creation

One rarely needs to create String using new operator, however it’s worth noticing that String(String original) constructor reuses the internal byte array of its argument. Concatenation (even with an empty string!) always produces a full new copy:

String x = "Hello";
new LJV().drawGraph(new String[]{
    x, new String(x),
    new String(x.toCharArray()),
    x + ""});

String interning

Calling intern() deduplicates all the String objects and reduce them to a single value kept in the String pool (compare with the previous example):

String x = "Hello";
new LJV().drawGraph(new String[]{
  x, new String(x).intern(),
  new String(x.toCharArray()).intern(),
  (x + "").intern()}));

Boxed primitives caching

Usually we create boxed primitives via autoboxing. In rare cases when we do need to create e. g. Integer object explicitly, the correct way to do this is with Integer.valueOf method. This method deduplicates values in the range from -128 to 127 or -XX:AutoBoxCacheMax value.

Values outside this range will not be deduplicated even when autoboxing is used.

Integer created with constructor will never be deduplicated, and this constructor is deprecated since Java 9.

new LJV().drawGraph(new Integer[]{
    42, Integer.valueOf(42),
    new Integer(42),
    -4242, -4242
});

LinkedList

Linked list is a data structure with theoretical O(1) efficiency for adding/removing its random node that can acts both as List and Deque. In Java practice, however, LinkedList is superceded by ArrayList and ArrayDeque in all the cases, and it’s questionable whether this class is needed in standard library at all.

List<Integer> list = new LinkedList<>();
list.add(1); list.add(42); list.add(21);

new LJV()
  .setTreatAsPrimitive(Integer.class)
  .setDirection(Direction.LR)
  .drawGraph(list);

ArrayDeque

If not LinkedList, then what? Java has a number of high-performant array-based data structures. ArrayList is well-known, but there are also ArrayDeque based on looped array and PriorityQueue based on balanced binary heap, which is actually also an array.

Let’s see, for example, how looped buffer of ArrayDeque works.

This structure implements queue capabilities. If maximum number of elements in the queue does not grow over time, this data structure works very fast and memory efficient, with constant time for every operation.

LJV ljv = new LJV().setTreatAsPrimitive(Integer.class);

//note that this sets initial capacity to 5
Deque<Integer> arrayDeque = new ArrayDeque<>(4);
arrayDeque.add(1); arrayDeque.add(2); arrayDeque.add(3);

ljv.drawGraph(arrayDeque)

arrayDeque.poll(); //returns 1
arrayDeque.poll(); //returns 2

ljv.drawGraph(arrayDeque);

Here we reach the end of the buffer and start writing from the beginning:

arrayDeque.add(4); arrayDeque.add(5); arrayDeque.add(6);

ljv.drawGraph(arrayDeque);

HashMap

HashMap is a widely used data structure in Java. For many people, implementation details of HashMap is also a favorite topic of discussion in a Java programmer job interview.

There are a number of ways to implement hash collisions resolution in a hash map, developers of Java platform chose linked lists:

Map<String, Integer> map = new HashMap<>();
map.put("one", 1);   map.put("two", 2);
map.put("three", 3); map.put("four", 4);

new LJV()
    .setTreatAsPrimitive(Integer.class)
    .setTreatAsPrimitive(String.class)
    .drawGraph(map);

Collision

While the number of collisions on a single HashMap bucket is small, the linked list keeps growing:

List<String> collisionString = new HashCodeCollision().genCollisionString(3);
Map<String, Integer> map = new HashMap<>();

for (int i = 0; i < collisionString.size(); i++) {
    map.put(collisionString.get(i), i);
}

new LJV()
    .setDirection(Direction.LR)
    .setTreatAsPrimitive(Integer.class)
    .setTreatAsPrimitive(String.class)
    .setIgnoreNullValuedFields(true)
    .drawGraph(map);

'Treeified' collision

However, if a single bucket becomes overloaded with collisions, and keys implement Comparable interface, the linked list turns to a tree.

This reduces the search time in a bucket from O(N) to O(log(N)) and mitigates a certain kind of DDoS attacks:

List<String> collisionString = new HashCodeCollision().genCollisionString(6);
Map<String, Integer> map = new HashMap<>();

for (int i = 0; i < collisionString.size(); i++) {
    map.put(collisionString.get(i), i);
}

String graph = new LJV()
    .setTreatAsPrimitive(String.class)
    .setTreatAsPrimitive(Integer.class)
    .setIgnoreNullValuedFields(true)
    .drawGraph(map);

LinkedHashMap

One of the features of HashMap is that this data structure completely 'forgets' the order of insertion of its elements. Also, iteration over HashMap is not very effective from performance point of view. When insertion order matters, we can use LinkedHashMap, which is actually a HashMap combined with linked list. One of the possible use cases for LinkedHashMap is LRU cache implementation.

Map<String, Integer> map = new HashMap<>();
map.put("one", 1);   map.put("two", 2);
map.put("three", 3); map.put("four", 4);

new LJV().setDirection(LR)
    .setTreatAsPrimitive(Integer.class)
    .setTreatAsPrimitive(String.class)
    .drawGraph(map);

TreeMap

TreeMap in Java is a Red-Black tree that implements NavigableMap. This implementation provides guaranteed O(log(N)) time cost for get/put/remove operations, which in practice is inferior to HashMap.

We use TreeMap when we need lowerKey(..), higherKey(..) and other NavigableMap capabilities not provided by a simple Map.

Map<String, Integer> map = new TreeMap<>();
map.put("one", 1);         map.put("two", 2);
map.put("three", 3);       map.put("four", 4);
new LJV().setDirection(LR)
    .setTreatAsPrimitive(Integer.class)
    .setTreatAsPrimitive(String.class)
    .setIgnoreNullValuedFields(true)
    .drawGraph(map);

ConcurrentSkipListMap

ConcurrentSkipListMap is a thread-safe NavigableMap implementation, that uses quite a complex non-blocking algorithm involving random numbers generator. That’s why for a given input its internal representation never looks the same from one run to another:

ConcurrentSkipListMap<String, Integer> map = new ConcurrentSkipListMap<>();

map.put("one", 1);
map.put("two", 2);
map.put("three", 3);
map.put("four", 4);

String actualGraph = new LJV()
        .setTreatAsPrimitive(Integer.class)
        .setTreatAsPrimitive(String.class)
        .drawGraph(map);

First run

Second run

Conclusion

I hope you liked these examples. Maybe you can think of other examples of data layout visualization that are worth adding here. In this case, experiment with Lightweight Java Visualizer and share your ideas!