Showing posts with label Fudge. Show all posts
Showing posts with label Fudge. Show all posts

Monday, June 14, 2010

Performance of Fudge Persistence in MongoDB

Here at OpenGamma we make considerable use of MongoDB, in particular as a persistent store for data which we either don't want to spend the time on normalizing to an RDBMS Schema, or where we positively value the schema-free design approach taken by MongoDB.

We also make extensive use of the Fudge Messaging project for virtually all of our distributed object management needs (whether using Fudge Proto files, or manually doing the Object/Document encoding ourselves). Luckily, the two work extremely well together.

Because of the way that we defined the Fudge encoding specification, and designed the major Fudge Reference Implementation classes and interfaces, it's extremely easy to map the worlds of Fudge, JSON, and XML (in fact, Fudge already supports streaming translations to/from XML and JSON). We've actually had support for converting in between Fudge objects and the BasicDBObject that the MongoDB Java driver uses since Fudge 0.1, and we use it extensively in OpenGamma: anywhere you have a Fudge object, you can seemlessly persist it into a MongoDB database as a document, and load it back directly into Fudge format later on.

So with that in mind, I decided to try some performance tests on some different approaches that you can take to go from a Fudge object to a MongoDB persisted document.

Benchmark Setup

The first dimension of testing is the type of document being persisted. I had two target documents:

Small Document
This document, intended to represent something like a log file entry, consists of 3 primitive field entries, as well as a single list of 5 integers.
Large Document
This document, intended to represent a larger concept more appropriate to the OpenGamma system, consists of several hundred fields in a number of sub-documents (sub-DBObject in MongoDB, sub-FudgeFieldContainer in Fudge), across a number of different types, as well as some large byte array fields.

I considered had three different approaches to doing the conversion between the two types of objects:

MongoDB Native
In this case I just created BasicDBObject instances directly and avoided Fudge entirely as a baseline.
Fudge Converted
Created a Fudge message, and then converted to BasicDBObject using the built-in Fudge translation system
Fudge Wrapped
This one wasn't built in to Fudge yet (and won't be until I can clean it up and test it properly). I kept a Fudge data structure, and just wrapped it in an implementation of the DBObject interface, which delegated all calls to the appropriate call on FudgeFieldContainer.

Additional parameters of interest:

  • Used a remote MongoDB server running on Fedora 11 (installed from Yum, mongo-stable-server-20100512-mongodb_1.fc11.x86_64 RPM) running on a VM with reasonably fast underlying disk.
  • Local MongoDB server was 1.4.3 x86_64 running on Fedora 13 on a Core i7 with 8GB of RAM and all storage on an Intel SSD
  • MongoDB Java Driver 1.4 (pulled from Github)
  • JVM was Sun JDK 1.6.0_20 on Fedora 13 x86_64

Benchmark Results

Test Description MongoDB Native Fudge Converted Fudge Wrapped
Creation of 1,000,000 Small MongoDB DBObjects 539ms 1,603ms 839ms
Persistence of 1,000,000 Small MongoDB DBObjects 41,188ms 46,201ms 92,866ms
Creation of 100,000 Large MongoDB DBObjects 15,351ms 23,956ms 15,785ms
Persistence of 100,000 Large MongoDB DBObjects (remote DB) 57,207ms 60,511ms 56,236ms
Persistence of 100,000 Large MongoDB DBObjects (local DB) 66,557ms 74,763ms 58,816ms

Results Explanation

The first thing to point out is that for the small DBObject case, the particular way in which MongoDB encodes data for transmission on the wire matters a lot. In particular, there's one decision that the driver has made that changes everything: it does a whole lot of random lookups.

A BasicDBObject extends from a LinkedHashMap, and so doing object.get(fieldName) is a very fast operation. However, because Fudge is a multi-map, we don't actually do that in Fudge, and by default we store fields as a list of fields (JSON stores lists as a, well, list; Fudge stores them as repeated values with the same field name). Because this makes point lookups slow, we intentionally do whole-message operations as often as we can, and just iterate over all the fields in the message.

The MongoDB driver code does the same thing, but instead of doing a for(Entry entry : entrySet()) style of operation, it iterates over the keys and does a separate get operation for each key. In Fudge, this is potentially a linear search through the whole message.

To work around this, in my wrapper object I built up a map where there was only a single value per field. This works well, but the small document case has 1/6 of the fields be a list, making this test thrash in CPU on doing the document conversion (which explains why the small document persistence test is more than twice as fast with the wrapper as just rebuilding the objects). Yes, I could do this optimization further, but it would be difficult to improve on the combined setup (document construction) and runtime (persistence) performance of just building up a BasicDBObject, which is what the Fudge conversion does anyway.

The wrapped Fudge object wins in every case for the large document test, no matter how many times I run them (and I've done it quite a few times for both local and remote, with all outliers eliminated). Moreover, I actually get faster performance running on a remote DB than on a local DB (which surprised me quite a bit).

The only things that I can conclude from this are:

  • FudgeMsg limits the data size on insertion into the message (when you do a msg.add() operation, not on serialization) for small integral values (if you put in a long but it's actually the number 6, Fudge will convert that to a byte). However, the ByteEncoder which converts values in MongoDB to the wire representation will never do this optimization, and will actually upscale small values to at least a 32-bit boundary. This means that if you put data into a FudgeMsg first and then put it into the MongoDB wire encoding, you shrink the size of the message. Given the number of pseudo-random short, int and long values in this message, it's a clean win.
  • The object churn for the non-wrapped form (where we construct instances of BasicDBObject from a FudgeFieldContainer) causes CPU effects that the wrapped form doesn't suffer from.

Conclusion

One of the things that was really pleasant for me in running this test is just how nice it is to take a document model that's designed for efficient binary encoding (Fudge), and persist it extremely quickly into a database that's designed for web-style data (MongoDB). The sum total of the actual persistence code is all of about 10 lines; I spend far more lines of code building the messages/documents themselves.

The wrapped object form definitely wins in a number of cases. My current code isn't production-quality by any means, but I think it's a useful thing to add to the Fudge arsenal. That being said, I think the real win is to rethink the way in which we get data into MongoDB in the first place.

Given the way the MongoDB Java driver iterates over fields, it seems to me that a far better solution is to cut out the DBObject system entirely, and write a Fudge persister that speaks the native MongoDB wire protocol directly, and take advantage of the direct streaming capabilities of the Fudge protocol. When we've done that, we should be going just about as fast and efficiently as we can and Fudge will have a seamless combination of rich code-level tools, efficient wire-level protocol for binary serialization, great codecs for working with text encodings like JSON and XML, and a fantastic document/message database facility using MongoDB.

Tuesday, March 02, 2010

Whence Fudge? Why Not Just Use/Extend Avro/GPB/Thrift?

Or, In Which I Make The Case For Fudge Existing At All

tl;dr: Fudge is binary for speed and self describing to do interesting things when you don't have schema support at the time of processing.

After I announced the Fudge Messaging 0.2 release I realized that we had a back-link from @evan on Twitter with a quite pithy characterization of the situation. I've followed up with Evan quite a bit over twitter, and I think this requires a bit more detailed description of why Fudge is different from other encoding systems.

What Do You Transmit On The Wire?

Let's start with a very simple question: if you're trying to transmit data in between machines, whether temporally concurrent or not, how do you encode said data? Broadly speaking, you have a number of options:
Hand-Rolled Binary Protocol
Yep, you could directly emit the individual bytes in the appropriate byte order and hand-roll the encoders and decoders, and it'll probably be pretty fast. It'd be a complete and utter waste of development resources, but fast it would undoubtedly be.
Compact Schema-Based Representation
The next option is that you use a system based on a schema definition, and use that schema definition and the underlying encoding system to automatically do your encoding/decoding. Examples of this type of system are Google Protocol Buffers, Thrift, and Avro. This is going to be fast, and an efficient use of development resources, but the messages are useless without access to the schema.
Object Serialization
Every major Object-Oriented language has some built-in serialization system for object graphs, and it's usually dead easy to use. However, they all suck in a number of ways, and don't work at all if you don't have matching code objects on both sides of the communications channel.
Text-Based Representation
You could just say "I don't care about performance at all" and just use a text-based encoding like XML, JSON, or YAML. The messages are at least marginally human readable, and you can do stuff with them without any schema (which may or may not be used at all), but it's going to be slow, slow, slow.
Compact Schema-Free Representation
The final option is to say "I like having my data in a compact binary representation, but I don't want to have to have access to any type of schema in order to work with messages." The most well known example of this is TibrvMsg from Tibco Rendezvous, but it's not a very good implementation, and it's proprietary. This is what we created Fudge to solve.

Hey, Isn't Avro Schema-Free?

No, no it's not. The Avro encoding system requires access to a schema in order to be able to decode messages at all, because all metadata is lost during encoding. However, it has two characteristics which are similar to schema-free encoding systems:
  • It decodes into a schema-free representation. What I mean by this is that if you did have the schema for a message, you can decode it into an arbitrary data structure which isn't tightly bound to the schema (think generic structure vs. schema-generated code).
  • The communications protocols involve out-of-band schema transmission. When you start communicating using RPC Avro semantics, you communicate the schema of the messages that you're going to be transmitting using a JSON-encoded schema representation. But once you've done that handshake, the actual messages are schema-bound.

Thus you can view "Avro the communications protocol" as schema-free, as you don't have to have a compile-time-bound schema to be able to communicate effectively, but "Avro the encoding system" as schema-bound.

Why Is Schema-Free Encoding Useful?

If you're looking at temporally concurrent point-to-point communications (think: RPC over a socket), a schema-bound system is pretty useful. You can squeeze every single unnecessary byte out of the communications traffic, and that's pretty darn useful from an efficiency perspective. But not all communications are temporally concurrent and point-to-point.

Let's consider the other extreme: temporally separate publish/subscribe communications (think: market data ticks, log messages, entity update messages). In that case, you have to make sure that every single point where you want to process the message has access to the schema used at the point of message serialization; if you don't have it, all you have is an opaque byte array. That's a logistical nightmare to keep everything in sync over time, particularly when you start doing logging and replay of messages for debugging and auditing.

Moreover, there's a whole host of things that you can do in between producer and consumer if you have a schema-free encoding:

  • You can visually inspect the contents of messages. Not in a text editor unless you're using a text-based encoding system, but in some type of general purpose tool. This is an unbelievably useful development, debugging, and system support tool.
  • You can do content-based routing on them. It's quite easy using XML, for example, to create XPath rules that route messages through the system ("send messages where /Company/Ticker = AAPL to Node 5; send other messages to Node 7"). You don't have to compile each little rule into native code, you can do it based purely on configuration. Heck, you could even build a custom routing system for ActiveMQ or RabbitMQ and put the functionality right in your message oriented middleware brokers.
  • You can automatically convert them to other representations. Have data in XML but need it in JSON? No problem; you can auto-convert it. Have data in Fudge but need it in XML? No problem; you can auto-convert it.
  • You can store and query them efficiently. You can take any arbitrary document and put it into a semi-structured data store (think: XML Database, MongoDB), and then query it in an efficient way ("find me all documents where /System/Hostname is node-5 and /Log/Type is AUDIT or SECURITY").
  • You can transform the content using configuration. You can take an arbitrary message, and just using configuration (so no more custom code) change the contents to take into consideration renames, schema changes, add stuff, remove stuff, do whatever you want.

The SOA guys have known about these advantages for years, which is why they use XML for everything. The only primary flaw there is XML is, quite frankly, rubbish in every other way.

That's Why We Created Fudge

Most of the network traffic in the OpenGamma code isn't actually point-to-point RPC; it's distributed one-to-(one or more), and we use message oriented middleware extensively for communications. We needed a compact representation where given an arbitrary block of data, we could "do stuff" with it, no matter when it was produced.

None of the compact schema-free systems support this type of operation. If someone can point me to an Open Source system which covers these types of use cases, and it's better than ours, we'll gladly merge projects. But Avro ain't it. Avro's good for what it does, but what it does isn't what we need.

But Your Messages Must Be Huge

The Fudge Encoding Specification was very carefully designed to scale both down and up in terms of functionality. At the most verbose, for each field in a message/stream (of course Fudge supports streaming operations; if it didn't, do you think we could efficiently do all those auto-transformations?), the Fudge overhead consists of:
1-byte Field Prefix
This has the processing directives to say what else is in the stream for that field.
1-byte Type ID
We have to know at the very minimum the type of data being transmitted for that field.
The Field Name
The UTF-8 encoded name ("Bid", "Ask", "System", "UserName", etc.)
A 2-byte Field Ordinal
A numeric code for the field (nope, not the index into the message, just a numeric ID like the name is a textual ID)

Here's the thing though: only the field prefix and type ID are required. Although right now Fudge-Proto doesn't do so, you could easily use Fudge in a form where you rely on the ordering of fields in the message to determine the contents (which is exactly what Avro does). In that case, you pay a 2-byte penalty per field. Yes, if you're transmitting a lot of very small data fields that's a relatively high penalty, but if you're transmitting 8-byte numbers and text, it's really not a whole heck of a lot.

You want names so that you can have something like Map<String, Object> in your code, but you don't want to transmit those on the wire? That's what we invented Fudge Taxonomies for. 2-byte ordinal gives you full name data at runtime, without each message having to include the name.

So in one encoding specification, we can support very terse metadata as well as very verbose metadata.

Conclusion

So ultimately, we created Fudge to be binary (for speed and efficiency), self-describing (for schema-free operation even when a schema exists), and flexible (so you can only use the bits you need).

Yes, we could have started with an Avro or a GPB or a Thrift, but there's no way that you could have done what we've done without breaking existing implementations. And I think if you started with an Avro and worked down all the use cases we've run into in the past, you'd end up with Fudge in the end anyway; we just skipped a few steps.

The binary schema-bound systems are great and they define things like RPC semantics which Fudge doesn't. But hopefully it's clear that us creating Fudge wasn't just a case of NIH, it's that we had very specific use cases that they can't support naturally.

Thursday, February 25, 2010

Fudge Messaging 0.2 Release

After I initially announced the first release of the Fudge Messaging project, we've had a lot of feedback on the code and on the specification, and we've been using it in anger quite a bit here at OpenGamma.

We've also been making a lot of enhancements, and are pleased to announce the 0.2 Release of the combined Fudge Messaging project.

While there have been a number of enhancements throughout the code, the primary focus of this release has been on making it easier for you to start using Fudge in your applications. With that focus has come a number of significant new features that you should be aware of:

  • The Fudge Proto sub-project has been released for the first time. Taking our cue from Google Protocol Buffers, we have a .proto file format which can then generate out Java and C# language bindings for automatic conversion of objects to and from the Fudge encoding specification.
  • We've made it substantially easier to build up an Object Encoding/Decoding Dictionary so that if you want to hand-write your mappings between Fudge messages and Objects, you can still access the auto-conversion functionality in the Fudge Proto project.
  • We've built up a number of Automatic Object Conversion systems, so that even if you have nothing other than Java Beans/POJOs/PONOs, you can still automatically convert those to and from Fudge messages.
  • We've developed Streaming Access APIs so that you can consume and write Fudge fields without having to work with an entire message as a bulk operation (we'll be using this for our automatic conversions to and from XML and JSON).
  • A C Reference Implementation has been written. Already tested on a number of different Unix-like platforms and Windows, this will be the foundation of our eventual C++ version. As well, it can be used to support Fudge messaging in your favorite language which allows easy C bindings.

Our next release (0.3), will really be a release candidate release for 1.0. Our primary focus there is to work through any features that have been submitted by the community, and to finalize the one piece of the binary encoding that hasn't been completed (moving from Modified UTF-8 to true UTF-8, which won't affect you unless you want to encode Unicode null or certain high-bit multi-byte Unicode characters).

If you've thought having a self-describing, compact, binary encoding system ideally suited to distributed applications was interesting, but didn't know if we were ready for you to use in anger, don't wait. It's ready.

Thursday, December 24, 2009

Confluence, iSCSI, NetApp, Flexiscale, Fail

We've been hosting the FudgeMsg website using Confluence (and mad props for the free Open Source license by the way!) on a VM hosted by Flexiscale. We chose Flexiscale for the following reasons:
  • Confluence is ridiculously tricky to cluster, meaning that you can't benefit from the scale-out capabilities of the Amazon EC2 model. (Edit 2009-12-24 - We're not attempting to cluster it; this statement is to indicate that because it's so tough to cluster, we're not trying to. Confluence is thinking we might be, and crashing as a result).
  • Getting your AMI+EBS+ElasticIP for such a single-point vendor app is very difficult and time consuming, again meaning that the Amazon EC2 model isn't ideal.
  • We didn't want to go for dedicated hardware for a low-volume application stack.
  • We don't have a stable enough internet connection to host the application stack ourselves.
We thought we were pretty darn clever to be honest, but then we noticed that things started going wrong. In particular, we started getting this error a lot:


We were running a relatively complicated Java setup (jsvc to run as chrooted user, on Tomcat, with multiple web applications running), so we thought we had done something wrong. After all, the first rule in software engineering is always, always, assume you have caused the break.

We were wrong.

What we've since found out is that the entire Flexiscale model is that your VMs have effectively no local storage whatsoever, and everything is iSCSI hosted off of a NetApp cluster. That means that your storage is fully persistent, and survives migrations of your VMs across physical hardware. Which is a great concept when it works.

Except that the Flexiscale NetApp cluster is borked. Essentially, sporadically the iSCSI services will completely shut down. Sometimes for a second, sometimes for 4 hours. Your processes will still be running, but if they attempt to hit disk for any reason, they'll hang and ultimately timeout. Your processes are still running, but they can't talk to disk.

In the case of Confluence, which was actually trying to talk to PostgreSQL, which itself was trying to talk to disk, Confluence detects the complete timeout hang of PostgreSQL as a cluster violation, and puts itself into crash mode.

You want to know the best part of this? When you're in this state, there's nothing you can do. You can't SSH into the VM, because it can't read the password file to let you in. The Flexiscale tools won't even let you hard bounce the machine. And they won't tell you when it's back up directly, so you just have to keep trying until you can finally get into the VM, to restart your servlet container.

This has caused no fewer than 20 instances where Confluence has died for us, sometimes lasting hours until we can actually recover the VM, and twice now in 2 days. It makes us look like morans who can't even run Confluence, much less guide development of a message encoding system.

So until we manage to get off of Flexiscale (haha, can't even get in to back up the data at the moment), if you see that error when going to the Fudge Messaging website, now you'll know why.

There are two morals of the story:

Friday, November 06, 2009

Announcing the Release of the Fudge Messaging Project

Along with the rest of the OpenGamma team, I'm pleased to announce the first open release of a project we're calling Fudge Messaging [1].

What Is Fudge?

Fudge came out of a desire for a system to encode data which is:
  • Hierarchical so that you can represent common document idioms (XML or JSON) using it
  • Typesafe by storing the type of data along with the payload.
  • Binary for compactness on the wire or on disk.
  • Self-Describing in its contents so that you can introspect a message without knowing anything about any schema used for its production

In the past, we've looked at a number of different encoding schemes, and rejected them for various reasons:

  • Google Protocol Buffers and Thrift are great if you have access to to the schema, and know what the type of messages on a particular communications channel are. But because they're not inherently self-describing, you can't build interesting middleware around their flow.
  • XML and JSON are big (if you don't gzip them) and slow (if you do). Plus by being text-based, you have to convert everything to and from text.
  • ASN.1 is conceptually nice in a lot of respects, but it's become such a niche that the tooling is utterly rubbish, and I don't see it getting any better.

Ultimately, for those of you familiar with Tibco Rendezvous, we really liked the TibrvMsg system, but couldn't use it because it's proprietary, and the implementations are surprisingly slow for something designed for low-latency messaging.

What Is The Fudge Project?

The Fudge project consists of the following major components:
  • An Encoding Specification, conceptually similar to ASN.1 BER, DER, or PER, which specifies how to encode data in a linear sequence of bytes.
  • A set of Reference Implementations, which adhere strictly to the encoding specification and provide access to Fudge-encoded data in a number of programming languages.
  • Tools to make it easy to work with data encoded in the Fudge format.

OpenGamma is sponsoring the Fudge project, but in the classical Open Source way: we found something that was missing out there, built it to suit our needs, and have the capacity to help it grow while we use it.

Project Status

I'm not going to kid you: this is a very early stage project. We have Java and C# reference implementations (and the C# one, for example, is more advanced than the Java one in a lot of respects, mostly because it's leveraging relatively new features in the C# language). We have a wiki and Jira installation. We have code up in Github. But that's about it.

Don't let me mislead you: the Java reference implementation, for example, is rock-solid in stability and very good in performance; we use it daily here at OpenGamma. But we don't have polished releases and distributions and everything else as of yet. That will come, but we wanted to get the code out there as soon as we reasonably could.

License

We're releasing the code under the APLv2 because we believe that it's the best type of license for this type of work: it's developer friendly, as well as employer friendly.

The specification is free to use for everyone. If you're going to use the ideas but not the full specification we'd ask you to not call your project Fudge Compliant [2], but anyone can create their own Fudge implementation using whatever license they want.

Next Steps

If you're interested, take a look at the wiki, grab the code, and take a look. I hope you find it useful.

From our perspective, we're going to continue to develop the system and grow it.

Footnotes

[1]: Fast Unstructured Data Generic Encoding
[2]: Fair Disclosure: we are considering getting a trademark on the use of the word Fudge for messaging to allow us to ensure that only compliant implementations can use the name.