Stalwarts of Tech – An Interview with John Davies – EIP and XML Performance
About this series
This is the next interview in a regular series of interviews with stalwarts of the technology industry. We wanted to highlight many of the unsung heroes of the technology industry, the people and projects that have made huge impacts in our lives as developers and technologists!
John Davies – CTO of C24 and XML Performance Guru
We’re really pleased to have John Davies, CTO at C24, Enterprise Integration Patterns (EIP) Guru, XML performance guru, and popular international speaker! John is well know for several reasons, most recently his Simple Data Object (SDO) library for high performance XML packaging and serialisation in Java. So without further ado, let’s find out about John’s contributions, his history with enterprise integration patterns, tuning, motivations behind SDO and his thoughts on Java and developers today!
1. Would you like to introduce yourself John?
Hi, I’m currently CTO of C24 and I was one of the co-founders too. I’ve been deep into technology since the late 70s, I’m sure many of you may wonder what we had technology-wise in the 70s or even if we had technology then but believe me much of what we did then it still extremely relevant today, it’s just faster and smaller but the concepts are the same. I’m very much hands-on, mostly server-side technology and extreme-scale architectures.
2. You’re most famous in the Java community for your tech & methodology talks on messaging systems. Can you tell us how you got into the world of EIP?
When I started programming (after hardware and the boring Fortran stuff at Uni), I was writing software (in C) to distribute market data over networks. In fact I even had to write the TSR network drivers for DOS. ARCnet I think it was, in those days you could debug it with an oscilloscope as it was only 2Mbit/s. But, back in the late 80s, then with Objective C and later C++ we were discovering our own EIPs. Bringing it all to mainstream was probably the famous and still essential reading, Gang of Four Design Patterns book and Gregor Hohpe’s Enterprise Integration Patterns from where I believe we get the term EIP today.
3. You’ve spoken at almost every industry conference in the world. What is your best conference memory? Do you have worst moment to share?
You’re right, I think I’ve spoken in more cities and countries than most people visit in a lifetime, what a privilege, China, Japan, India, Singapore, US, Canada, most of Europe, the Middle East and Africa, missing talks are Australia, NZ and South America (please let me know!!!). The most awe-inspiring was probably the JavaOnes in the late 90s. For me the high point of every conference is meeting and socialising with the other speakers, sometimes we just chat about our businesses, sometimes we just swap new tools, gadgets and shortcuts we’ve found, it’s almost always over a lot of drinks though. I’ve never had a bad moment but I do have a rather funny story about an opening keynote I did back in 2006 that followed a very late night out with Cameron Purdy, Gregor Hohpe, Rod Johnson, James Strachan and Ross Mason. I’ll spare you the details here but feel free to ask me if you bump into me.
4. Most recently, your company C24 has created a new compaction technology for Java, called SDO. What motivated you to create this?
SDOs (Simple Data Objects) were an idea we had many years ago when we started to notice the amount of memory Java was using to store data. You only need a few Strings, Integers, BigDecimals and Dates and you’re quickly into hundreds of bytes. Going back to my early days when we only had a few bytes to store everything this all just seemed wrong to me. So we looked at a way of creating a Java-binding engine to bind everyday Java types to basic binary types, what I like to think of as a binary codec. This is not compression, it’s not an alternative syntax like ASN.1 or serialisation mechanism like Google’s protocol buffers it’s a Java binding technology that creates highly compact binary streams.
5. SDO is rumoured to have some proper mechanical sympathy in its design, how close to the wire did you go?
Mechanical Sympathy by Martin Thompson’s definition is Hardware and software working together in harmony. SDOs are purely software working, for now, in the JVM. One could claim sympathy in the JVM and OS but sticking to the definition then, where we really see mechanical sympathy is the in the CPU cache and the network MTU. Let me explain…
With SDOs we not only bind to binary but we compact everything in the message into a single byte array, this byte array is the entire message including repeating elements. As an example a complex derivative in the form of FpML can be compacted down to under 400 bytes, this means that the entire message or even several messages can be loaded into the CPU cache (L1 & L2). The content of the message not distributed all over the JVM memory space as is the case with standard Java. As every programmer should know the L1 and L2 caches are respectively about 200 and 15 times faster than memory. On the network we get the same message into a single MTU (Maximum Transmission Unit) or network packet, meaning it’s not split into multiple packets. The real advantage at the network though is the fact that the entire message is serialised as one byte array rather than multiple objects. All these mechanically sympathetic advantages lead to some pretty impressive performance improvements.
6. We’re always interested in performance here at jClarity. Are there any peer reviewed numbers you can share about SDO’s performance?
It’s only fair to give you guys a plug here as it was thanks to jClarity consulting and superb tooling that we were able to achieve the most impressive figures. I think it’s very misleading to simply quite a "we did x in y seconds" because everyone’s setup is different. The understanding of end-to-end or a transaction varies wildly so let me give you some facts and a few numbers and hopefully from there most people can extrapolate.
We can take any message and create a binary binding, FpML, FIX, SWIFT, CSVs, ISO-20022 in finance, 4G telco messages, any XML or even a simple Java Object. Going from CSV to binary doesn’t offer huge gains but going from a large XML schema model to binary does. An FpML derivative for example averages 7.4k in raw XML and around 25k when bound to Java using standard Java Binding. Using SDOs we get that down to around 400 bytes. That 400 bytes is not compressed it is compacted, every single element of the original XML message is available without any decompression. We can hold a good 20 million in under 10GB of RAM, those same 20 million can be sent down 10G Ethernet in a few seconds. Compare that to the 25k objects or the 7.4k raw XML, the objects result in some 500GB in size and the XML, even at 100k messages per second (which is fast) will take over 3 minutes to parse 20 million. Using FpML as a good example and writing/searching in in-memory “databases” or caches such as Coherence, GemFire, GigaSpaces, EHCache and HazelCast etc. It’s anything from 20-50 times faster or more memory efficient. Remember this doesn’t replace the caches, it compliments them by storing binary objects in memory rather than classic Java Object trees.
7. What features or changes would you like to see happen in the JVM with regards to reducing object overhead?
Putting my business hat on I’d rather nothing changed but I do believe there are some interesting changes planned for Java 10+. Even so these changes are only likely to affect the instance data, with our SDOs we are able to custom code the compaction based on the model rather than the instance data. Perhaps annotations could take it a step further, I’d certainly be interested in working on it.
8. Do you think the latest crop of Java developers will need to understand mechanical sympathy again, like with the C/C++ developers of today?
In the days of the C/C++ developers there were perhaps only a few thousand globally who knew about these things, it was a large percentage of the programming population though, they got paid a lot of money. Today there are millions of Java programmers but I doubt more than a few thousand understand much about mechanical sympathy or even care about it. Do they need to understand? In a perfect world yes but Java was designed to abstract much of the hardware from the programmer, memory, disk, network etc. so the vast majority of programmers really don’t need to worry themselves about the details.
9. As the CTO of a tech firm, what disruptive trend do you see hitting the Java ecosystem in the next 5 years?
I think the biggest change we’ll see over the next few years will be the use of concurrency though functional programming, i.e. lambdas in Java. The vast majority of code I see (looking at what our customers send us and what we work on for customers) is still very single-threaded. Even Spring Integration (in the EIP world) is limited in how it uses threading but as we start to embrace Java 8 this will change. The biggest change I’ve seen so far in customer code is the use of AKKA, combine this with lambdas and we’re going to see some extremely interesting leaps in performance and efficiency.
10. Do you have a blog or twitter account?
I don’t have a blog, I used to use TheServerSide as my blog but after Floyd, Nitin and Jo left it sort of died. I do use Twitter (@jtdavies) though and pump out the odd white paper from time to time on InfoQ. I’ll be speaking at most of the Java conferences and quite often talk at the NYJavaSIG or anywhere else I’m invited to speak, remember you just need to invite me, the more unusual the location the better 🙂
Once more we’d like to thank John for his detailed and thoughtful answers and encourage all of you to experiment with Simple Data Objects (SDO) to increase your application’s performance around XML and object processing!
Martijn (CEO) and the jClarity Team!