The Blunders backend is written in Kotlin. Kotlin runs on the JVM, and of course Blunders is used to profile itself. Dogfooding allows (and forces) me to improve the product, both by learning paint points and by making it faster.
Since Blunders continuously runs in production, I am able to see what code paths end up using the most resources for real requests. For example, I just saw that a large amount of CPU is spent on converting class names for flame charts. And sometimes like 30% of the allocations were in serialising JSON to send to the front-end. What's your code doing in production?
I find it quite empowering to know what the production app is doing. I don't always act on it, my time is often better spent elsewhere. But when I want to change stuff, I have a clearer picture in my mind of what the effects will be.
When I want to spend my time improving performance, I have a very un-exciting way of doing so:
The most performance-sensitive part of Blunders is transforming Java Flight Recorder (JFR) data into visualisation data. For example, to render a chart of the allocation rate per second Blunders parses JFR files, finding all allocation events and sums it up per second. Being performant in this area enables visualization of more data and makes the visualisations snappier.
At the start of Blunders, I simply used the JFR parser built into OpenJDK. This worked, but was not quite fast enough for my use case. I used Blunders to figure out what the issues with the implementation was, and used that to figure out what I could do to get around it.
I ended up Building my own parser for JFR files. It has taken a few generations, and every generation has taught me something. The first few generations relied on the built-in parser, but added in-memory caching. I then went on to a custom implementation of the parser. After a quite a few iterations, it is now quite a bit faster. A big reason is that it allocates nothing on the heap for most events. The custom parser is about 5 times faster without any caching, and about 60 times faster with caches (which are pre-populated).
Building and using Blunders helps me a great deal in building faster software, and has taught me a lot of things for how to work with performance in practice:
Speeding up Blunders means that I can offer users faster data visualisations of more profiling data. This makes it easier for them to make informed decisions about how to work with performance in their applications.