LJV: What We Can Learn From Java Data Structures Visualization
When I started to prepare a course of lectures on the Java language for the Moscow Institute of Physics and Technology, I became to search for material that can be used as illustrations. ‘One picture is worth a thousand words’ as they say, and sometimes it’s worth even more because it seems impossible to explain e. g. how a hash table works without drawing something. My task was to explain to students many Java concepts, from string pool to advanced data structures. I was looking for a solution that is able to make the visualization of Java data structures in the easiest and precise way, ideally compatible with ‘presentation as code’ technology.
There exists Aleksey Shipilev’s JOL (Java Object Layout) tool known among Java performance engineers. JOL analyzes the memory layout for any given object, including all the auxiliary data and fields, and the graph of objects reachable from the given instance. This tool gives accurate estimations of the object size and also shows the addresses in memory where objects are located. However, it’s not yet able to visualize the object graph and gives too many low-level details that are irrelevant for the students who just started to learn Java.
While surfing the web looking for something that better fits my needs, I came across a page on the University of Auckland’s website dedicated to the GPL-licensed program called LJV (Lightweight Java Visualizer). This tool was developed by Dr. John Hamer in Java version 1.4 in 2004, which is the year when Maven 1.0 was released, and four years before GitHub appeared. Although it hadn’t been developed since then, it still worked perfectly and was able to run at least on Java 8 and do its job! I used LJV extensively while preparing the slides for my lectures, and in the fall of 2020 two of my students, Ilya Selivanov and Nuras Nogaev updated the code to Java 11, added tests, CI, documentation, a couple of new features, and published it on GitHub and Maven Central as a part of their semester project. Here I would like to briefly explain the LJV working principle and show some examples of its usage.
The idea behind LJV is very simple. It takes an arbitrary object and then uses Java Reflection API to traverse the graph of objects reachable from a given one. It forms a representation of the graph in DOT language that can be later converted into png
or svg
image using Graphviz software. Actually we even need not to have graphviz installed: GraphvizOnline can render an image in a browser.
Thus, this one-liner
System.out.println(new LJV().drawGraph("Hello"));
gives us the following in the console:
digraph Java {
rankdir="TB";
node[shape=plaintext]
. . .
}
In order to convert this to an image one might use
dot -Tpng myfile.dot > myfile.png
or, even simpler, just paste the resulting script to GraphViz Online and get a picture:
By calling configuration methods on the LJV
object we can customize the way it visualizes data. We can choose fonts and colors for classes and fields, whether to hide null-valued fields, which objects should be represented only by their toString
method invocation result, diagram drawing direction (top-to-bottom or left-to-right), etc., etc.
String graph = new LJV()
.addFieldAttribute("left", "color=red,fontcolor=red")
.addFieldAttribute("right", "color=blue,fontcolor=blue")
.addClassAttribute(Node.class, "color=pink,style=filled")
.addIgnoreField("ok")
.setTreatAsPrimitive(String.class)
.setShowFieldNamesInLabels(false)
.drawGraph(n1);
The things that we see as a result of the visualization are, in my opinion, interesting and instructive not only for Java beginners but also for all the practicing Java engineers, even those who already know about the effects shown.
Let’s see some examples.
String
The most widely used type of data in Java is, of course, String
. Starting from Java 9, the internal representation of String
has changed: char[]
was replaced by byte[]
, and coder flag was introduced in order to switch between 8-bit and 16-bit character representation. This allowed significant memory optimization for strings that contain only LATIN-1 charset characters:
|
|
|
String creation
One rarely needs to create String
using new
operator, however it’s worth noticing that String(String original)
constructor reuses the internal byte array of its argument.
Concatenation (even with an empty string!) always produces a full new copy:
String x = "Hello";
new LJV().drawGraph(new String[]{
x, new String(x),
new String(x.toCharArray()),
x + ""});
String interning
Calling intern()
deduplicates all the String
objects and reduce them to a single value kept in the String
pool (compare with the previous example):
String x = "Hello";
new LJV().drawGraph(new String[]{
x, new String(x).intern(),
new String(x.toCharArray()).intern(),
(x + "").intern()}));
Boxed primitives caching
Usually we create boxed primitives via autoboxing. In rare cases when we do need to create e. g. Integer
object explicitly, the correct way to do this is with Integer.valueOf
method. This method deduplicates values in the range from -128 to 127 or -XX:AutoBoxCacheMax
value.
Values outside this range will not be deduplicated even when autoboxing is used.
Integer
created with constructor will never be deduplicated, and this constructor is deprecated since Java 9.
new LJV().drawGraph(new Integer[]{
42, Integer.valueOf(42),
new Integer(42),
-4242, -4242
});
LinkedList
Linked list is a data structure with theoretical O(1) efficiency for adding/removing its random node that can acts both as List
and Deque
. In Java practice, however, LinkedList
is superceded by ArrayList
and ArrayDeque
in all the cases, and it’s questionable whether this class is needed in standard library at all.
List<Integer> list = new LinkedList<>();
list.add(1); list.add(42); list.add(21);
new LJV()
.setTreatAsPrimitive(Integer.class)
.setDirection(Direction.LR)
.drawGraph(list);
ArrayDeque
If not LinkedList
, then what? Java has a number of high-performant array-based data structures. ArrayList
is well-known, but there are also ArrayDeque
based on looped array and PriorityQueue
based on balanced binary heap, which is actually also an array.
Let’s see, for example, how looped buffer of ArrayDeque
works.
This structure implements queue capabilities. If maximum number of elements in the queue does not grow over time, this data structure works very fast and memory efficient, with constant time for every operation.
LJV ljv = new LJV().setTreatAsPrimitive(Integer.class);
//note that this sets initial capacity to 5
Deque<Integer> arrayDeque = new ArrayDeque<>(4);
arrayDeque.add(1); arrayDeque.add(2); arrayDeque.add(3);
ljv.drawGraph(arrayDeque)
arrayDeque.poll(); //returns 1
arrayDeque.poll(); //returns 2
ljv.drawGraph(arrayDeque);
Here we reach the end of the buffer and start writing from the beginning:
arrayDeque.add(4); arrayDeque.add(5); arrayDeque.add(6);
ljv.drawGraph(arrayDeque);
HashMap
HashMap
is a widely used data structure in Java. For many people, implementation details of HashMap
is also a favorite topic of discussion in a Java programmer job interview.
There are a number of ways to implement hash collisions resolution in a hash map, developers of Java platform chose linked lists:
Map<String, Integer> map = new HashMap<>();
map.put("one", 1); map.put("two", 2);
map.put("three", 3); map.put("four", 4);
new LJV()
.setTreatAsPrimitive(Integer.class)
.setTreatAsPrimitive(String.class)
.drawGraph(map);
Collision
While the number of collisions on a single HashMap
bucket is small, the linked list keeps growing:
List<String> collisionString = new HashCodeCollision().genCollisionString(3);
Map<String, Integer> map = new HashMap<>();
for (int i = 0; i < collisionString.size(); i++) {
map.put(collisionString.get(i), i);
}
new LJV()
.setDirection(Direction.LR)
.setTreatAsPrimitive(Integer.class)
.setTreatAsPrimitive(String.class)
.setIgnoreNullValuedFields(true)
.drawGraph(map);
'Treeified' collision
However, if a single bucket becomes overloaded with collisions, and keys implement Comparable
interface, the linked list turns to a tree.
This reduces the search time in a bucket from O(N) to O(log(N)) and mitigates a certain kind of DDoS attacks:
List<String> collisionString = new HashCodeCollision().genCollisionString(6);
Map<String, Integer> map = new HashMap<>();
for (int i = 0; i < collisionString.size(); i++) {
map.put(collisionString.get(i), i);
}
String graph = new LJV()
.setTreatAsPrimitive(String.class)
.setTreatAsPrimitive(Integer.class)
.setIgnoreNullValuedFields(true)
.drawGraph(map);
LinkedHashMap
One of the features of HashMap
is that this data structure completely 'forgets' the order of insertion of its elements.
Also, iteration over HashMap
is not very effective from performance point of view.
When insertion order matters, we can use LinkedHashMap
, which is actually a HashMap
combined with linked list.
One of the possible use cases for LinkedHashMap
is LRU cache implementation.
Map<String, Integer> map = new HashMap<>();
map.put("one", 1); map.put("two", 2);
map.put("three", 3); map.put("four", 4);
new LJV().setDirection(LR)
.setTreatAsPrimitive(Integer.class)
.setTreatAsPrimitive(String.class)
.drawGraph(map);
TreeMap
TreeMap
in Java is a Red-Black tree that implements NavigableMap
. This implementation provides guaranteed O(log(N)) time cost for get/put/remove operations, which in practice is inferior to HashMap
.
We use TreeMap
when we need lowerKey(..)
, higherKey(..)
and other NavigableMap
capabilities not provided by a simple Map
.
Map<String, Integer> map = new TreeMap<>();
map.put("one", 1); map.put("two", 2);
map.put("three", 3); map.put("four", 4);
new LJV().setDirection(LR)
.setTreatAsPrimitive(Integer.class)
.setTreatAsPrimitive(String.class)
.setIgnoreNullValuedFields(true)
.drawGraph(map);
ConcurrentSkipListMap
ConcurrentSkipListMap
is a thread-safe NavigableMap
implementation, that uses quite a complex non-blocking algorithm involving random numbers generator. That’s why for a given input its internal representation never looks the same from one run to another:
ConcurrentSkipListMap<String, Integer> map = new ConcurrentSkipListMap<>();
map.put("one", 1);
map.put("two", 2);
map.put("three", 3);
map.put("four", 4);
String actualGraph = new LJV()
.setTreatAsPrimitive(Integer.class)
.setTreatAsPrimitive(String.class)
.drawGraph(map);
Conclusion
I hope you liked these examples. Maybe you can think of other examples of data layout visualization that are worth adding here. In this case, experiment with Lightweight Java Visualizer and share your ideas!