Kirk's Rants: Meeting With Solace Systems: Hardware MOM

Thursday, November 20, 2008

Meeting With Solace Systems: Hardware MOM

After my post on wanting an AMQP appliance, Hans Jespersen, Principal Systems Engineer with Solace Systems, got in touch with me and we met up yesterday when we were both in San Francisco, and I had a chance to talk tech with him on what precisely they do (as opposed to the marketing speak on their website). Hans is a pretty smart guy (ex-Tibco on the Rendezvous side rather than the EMS side), and seemed really technical and up on some of the other fringe technical topics we talked about.

Company Background And Overview
This is an Ottawa-based company, which is interestingly drawing from a group of ex-Telco engineers for the hardware side and a bunch of ex-MOM people (particularly Tibco, which gave them a big OEM win I'll talk about below). That's interesting to me because the two worlds haven't really interacted that much that I've seen, but there's a lot in common and a lot that I think both sides could probably learn from each other. I didn't ask about revenue or staff or anything else, I just had time to cover the technology itself.

The real goal of the company is to focus on low-latency messaging, which means that they do everything with a no-general-purpose-OS passthrough. As you'll see below, messages are delivered without ever hitting a general purpose OS or CPU, meaning that there's no chance to introduce random amounts of latency. But the same hardware approach allows them to also hit the high-volume and persistent messaging cases.

Base Technology Platform
The base platform is a pair of chassis, each running a customized motherboard, which has a number of PCIe slots in it (5 slots in the 2u 3230 and 10 slots in the 4u 3260). The motherboard itself has an embedded Linux system (with 8 cores actually), but at this point is only being used for management tasks and doesn't get involved in the message delivery path (though this wasn't true in the past, which is why there's such a beefy box to handle things like SNMP and web-based administration). By itself, the chassis is useless.

Each of the "blades" (which I'm going to call cards, as they're really just PCIe cards) provides a particular function to the box, but they're very strangely named on their website. These guys, rather than running general purpose chips, are all FPGA based. Here's what they do.

Network Acceleration Blade
This is the strangest name ever for what this thing does, because it's actually the primary message delivery platform for the box, as well as the only card that has any NICs (4 or 8 1GbE links) that can be used for messaging (the NIC on the chassis is just for management). This consists of:

The network ports themselves
A TCPoE on board
Hardware (I'm not clear whether this is ASIC or FPGA based; everything else is FPGA, but these you can get off-the-shelf ASICs for, so I'm not sure what they did here) to do SSL and GZIP
A multi-gigabyte RAM cache to store messages before they're delivered on
An FPGA to do the protocol management

Interestingly, this is useless in and of itself, because the network card specializes in receiving messages, unpacking portions of the message to send over the bus to the other cards, and delivering them out. To actually route messages, you'll need one of the next one.

Topic Routing Blade
This is another FPGA-based card, which manages subscription lists of clients and has FPGA-accelerated regex-based subscription management. The way this works is that the Network Acceleration Blade extracts the topic name for a message from the message, and sends that over the internal bus to the Topic Acceleration Blade, which then responds to the NAB with the list of subscription channels (TCP sockets) to which the message should be forwarded (the message contents never travel over the internal bus in this case). Handles both RV-style wildcarding (.>) and JMS-style wildcarding (.#).

This isn't the only option for routing, which I'll get to separately, but assume that it's the only useful one for the time being.

These two cards are the only things that you need in a chassis to support basic transient store-and-forward MOM functionality. For persistence, you need the next one.

Assured Delivery Blade
This is a card which acts to ensure guaranteed delivery of messages, both in the persistent-sending case (don't acknowledge to publisher until the message can survive broker failure) and the disconnected-client case (if I'm down, store messages until I get back up). It's quite an interesting beast actually. This one has its own buffer of RAM, and two FC HBAs, and you would deploy this in a HA pair of chassis with a crossover between the two assured delivery cards. It's probably easiest to describe how it functions and you'll get the picture of what's in there:

Message contents get delivered from the NAB to the Assured Delivery Blade over the chassis PCIe bus (yes, in this case the whole message body has to be transferred).
The ADB sticks the contents into its internal RAM buffer
The ADB sends the contents over the interconnect to the passive ADB in the other chassis
The ADB acknowledges the message back to the NAB, which then is able to acknowledge back to the client that the message is persisted (more on why this works in a second)
The ADB then batches writes (if necessary) to the underlying disk subsystem (they've only tested with EMC, he wasn't sure how high-end you had to go up the EMC product line) to clear the RAM buffer in a more optimized form.

I was a little sceptical about acknowledging the message just after the passive node acknowledges it (this in SonicMQ terms is DeliveryMode.NON_PERSISTENT_REPLICATED), but here's where having a pure hardware platform helps here. Each card has a big capacitor and a Compact Flash slot, and on failure, the capacitor has sufficient charge to flush the entire contents of the RAM buffer to the CF card, guaranteeing persistence.

It's a pretty clever model, and pretty indicative of what you can do with a dedicated hardware platform.

Marketing-Driven Blades
This is where things got a bit more sketchy. Note that in the above stack there's no way to do header-based message delivery (JMS Message Selectors). That's a pretty important thing if they're going to try to hit existing JMS customers who will have message selector-based systems. So they worked out a way to do that.

The problem here is that they added it with their Content Routing Blade, which is entirely XML based (the message selector language is XPath-based, rather than SQL-92-based). This is where he lost me and I told him that. While I'm sure this is great from a marketing perspective because it lets them sell into the Service Oriented Architecture Solution space, I think that space is rubbish, and the types of people who buy into it are not the types of people who are going to evaluate bespoke hardware from a minor vendor to increase message throughput. It also doesn't help porting existing applications, which I hit on in my initial AMQP analysis is one of the most important things you can do to try to drive early adoption of new technology.

They also have an XSLT-in-Hardware card (the Content Transformation Blade), but I'm so uninterested in hardware XSLT that I didn't bother talking to him about it.

Protocol Considerations
Given my writing about this, I did hit him pretty hard on the fact that right now they only have a proprietary protocol with binary-only drivers (C, C#, Java). The fact that it's a C and not C++ client library makes it much easier to port because you get to avoid the name mangling issues that are the worst in intermingling native compiled code, but there's nothing else you can use to talk to the box. They have a JMS driver you can use, but given that they don't support all of even the topic-half of JMS (like message selectors), I'm not sure of how much utility that is at the moment.

That being said, the fact that they've rolled with FPGAs and not ASICs means that they can flash the NAB to support more protocols (AMQP was specifically mentioned here). In fact, they've already done this by providing bespoke versions of the NAB and Topic Routing Blade to support the Rendezvous protocol natively under the Tibco Messaging Appliance brand name to Tibco. In that case, (if you're familiar with Tibrv), rather than using multicast or broadcast RV, you make a remote daemon (remote RVD) connection over TCP to the Solace box, which speaks remote RVD natively. Pretty cool, and Hans is pretty sure they're going to support AMQP once it standardizes.

Analysis
The things I particularly like here:

Clever use of hardware. The use of FPGAs rather than ASICs is pretty good going, as is the custom motherboard with lots of fast PCIe interconnects. I also like that they've taken a really close look at the persistent messaging case, and leveraged the fact that they're in control of their hardware to ensure that they can optimize far more than a pure software solution could.
Pure hardware data path. I like the use of external controller cards (the two routing cards) to minimize message flow over the bus, and that there's no general purpose CPU or OS touching the messages as they flow through the system.
Speed. If his numbers are credible (and I have no reason to think they wouldn't be), they're hitting over 5MM 100b messages/second/NAB using topic routing, and 10MM subscription processes/second/card on the topic routing cards themselves. Persistence is over 100k persistent messages/second. That's pretty good and the types of numbers I'd expect from a hardware solution.

Things I don't like:

The Content Routing/Transformation stuff. Pure marketing claptrap, and utterly silly. If you know you need a low-latency hardware MOM system, are you actually going to take your nice binary messages with a custom protocol and add the Slow of XML for routing? I can see throwing compressed XML around as your body content, but routing based on it? Taking binary fields and converting them to text just so that you can have XPath-based message selectors? That doesn't seem right to me. I'd be much happier if they gave a key-(binary) value map-based system, which would map much more naturally onto AMQP, JMS, and Tibrv models that we're currently coding against. That makes it hard to port existing systems, which makes it hard to get adoption.
Proprietary Protocol. Yes, I hate these with a passion as they keep biting me in the ass every time I have to use one. You're a hardware company. If you actually think that giving me source-code access to your C client library is going to expose so much of your mojo that someone can hire up a bunch of FPGA guys and replicate your hardware that quickly, then someone will just reverse engineer it anyway. Lame decision.

Conclusion
Am I going to rush to buy some? No. I actually don't live in the ultra-low-latency space in my day job, and the technical decisions they've made would make porting my existing applications sufficiently painful that I'm not willing to go down that path for an eval when my current solution works.

Should you? I'm not 100% sold. The core direction of the product looks pretty darn good to be honest. Without seeing their proprietary client library, I can't tell you how easy it would be to port an existing JMS/Tibrv/AMQP application to their programming model for an evaluation. If I knew I was going to be working in super-low-latency space and I didn't mind an extended POC, I'd definitely add it into my evaluation stack.

But if I had an existing Tibrv application that was killing me in production (which most eventually will), I'd definitely take a look at the Tibco Messaging Appliance as fast as I possibly could. Anything that will kill off multicast and broadcast-based RV is ace in my books.

The network ports themselves
A TCPoE on board
Hardware (I'm not clear whether this is ASIC or FPGA based; everything else is FPGA, but these you can get off-the-shelf ASICs for, so I'm not sure what they did here) to do SSL and GZIP
A multi-gigabyte RAM cache to store messages before they're delivered on
An FPGA to do the protocol management

Message contents get delivered from the NAB to the Assured Delivery Blade over the chassis PCIe bus (yes, in this case the whole message body has to be transferred).
The ADB sticks the contents into its internal RAM buffer
The ADB sends the contents over the interconnect to the passive ADB in the other chassis
The ADB acknowledges the message back to the NAB, which then is able to acknowledge back to the client that the message is persisted (more on why this works in a second)
The ADB then batches writes (if necessary) to the underlying disk subsystem (they've only tested with EMC, he wasn't sure how high-end you had to go up the EMC product line) to clear the RAM buffer in a more optimized form.

Clever use of hardware. The use of FPGAs rather than ASICs is pretty good going, as is the custom motherboard with lots of fast PCIe interconnects. I also like that they've taken a really close look at the persistent messaging case, and leveraged the fact that they're in control of their hardware to ensure that they can optimize far more than a pure software solution could.
Pure hardware data path. I like the use of external controller cards (the two routing cards) to minimize message flow over the bus, and that there's no general purpose CPU or OS touching the messages as they flow through the system.
Speed. If his numbers are credible (and I have no reason to think they wouldn't be), they're hitting over 5MM 100b messages/second/NAB using topic routing, and 10MM subscription processes/second/card on the topic routing cards themselves. Persistence is over 100k persistent messages/second. That's pretty good and the types of numbers I'd expect from a hardware solution.

Things I don't like:

The Content Routing/Transformation stuff. Pure marketing claptrap, and utterly silly. If you know you need a low-latency hardware MOM system, are you actually going to take your nice binary messages with a custom protocol and add the Slow of XML for routing? I can see throwing compressed XML around as your body content, but routing based on it? Taking binary fields and converting them to text just so that you can have XPath-based message selectors? That doesn't seem right to me. I'd be much happier if they gave a key-(binary) value map-based system, which would map much more naturally onto AMQP, JMS, and Tibrv models that we're currently coding against. That makes it hard to port existing systems, which makes it hard to get adoption.
Proprietary Protocol. Yes, I hate these with a passion as they keep biting me in the ass every time I have to use one. You're a hardware company. If you actually think that giving me source-code access to your C client library is going to expose so much of your mojo that someone can hire up a bunch of FPGA guys and replicate your hardware that quickly, then someone will just reverse engineer it anyway. Lame decision.

Kirk's Rants

Thursday, November 20, 2008

Meeting With Solace Systems: Hardware MOM

Search My Ranting

Historical Rants

About Me

Labels

Programming(ish) Blogs I Roll

Kirk's Rants

Thursday, November 20, 2008

Meeting With Solace Systems: Hardware MOM

RSS/Atom And Such As

Search My Ranting

Historical Rants

About Me

Labels

Programming(ish) Blogs I Roll