Saturday, November 29, 2008

Sun: Split Up, Yo

Since the whole pick-on-sun meme has really taken off, I figured I'd post some more on the current situation, because El Reg's take led me to believe that the only really positive outcome is complete dismemberment.

Look back at my previous posting. Logically, you've got several distinct companies floating about in that mess of Fail:
  • Legacy SPARC, including Niagara. Sell it to Fujitsu and give up. Although the Niagara stuff may be cool, if you can't get it price/performance competitive with x86 just give up and use it for having a forward migration path for your existing customers.
  • Storage. This would include the StorageTek stuff, Thumper, FishWorks, and the Open Storage stuff. Separate company, which might even be able to stand on its own. Otherwise, find someone like Dell who's sick of OEMing EMC stuff and sell the business to them.
  • Software. Take all the open source stuff you have (including and especially Solaris) and give it to a new company which doesn't have anybody working for it who has ever worked for the hardware business in any form (in fact, it should probably be run by non-Sun people). Allow them to create an alternative to Novell and Red Hat as a multi-product open source company, completely ignoring your Sparc lines of business. I guarantee that this would result in the software changes that the community wants (ZFS with a Linux-friendly license) because it's in the best interest of such a company, but not in the best interest of the Sun hardware business. All those retarded decisions you've made in software seem to be to try to push your legacy hardware; free the company from that, and it'll do some great stuff.
  • Other Stuff. This is the x86 stuff and the HPC stuff (Constellation, blades) and everything else. Leave this as a rump hardware company, allowed and encouraged to compete with the Fujitsu-owned Sparc business. Alternatively, sell it to one of the Asian manufacturers that wants to bootstrap a server business (Asus?). I think it'd do quite well actually.
The simple fact is that if you look at it this way, Sun makes logical sense as 4 different companies, and really they're not getting any synergies out of having them all as one company, so why keep pretending? Split it up completely and maybe the world will be a better place.

If nothing else, we could all get on with our lives and stop blabbing on about it.

Wednesday, November 26, 2008

Programming Problems and Technical Interviews

I've been involved in quite a few technical interviews over the years, and one thing that invariably (and rightfully) comes up is the programming problem. I view this as an opportunity to see if someone can actually code, rather than just talking about it. The real issue here is that the interview room is not the same environment as a developer will be using: you don't have a keyboard in front of you, you don't have all your tool suite available, at best it's a first order approximation.

That being said, I think any technical interview which doesn't provide the interviewer with a chance to evaluate the candidate's actual coding skill is just wasted time; while we need to think, write, and discuss, software engineers are paid to provide working code. Your job as an interviewer is to determine whether the candidate can do that subject to your standards and requirements. Your job as a candidate is convince the interviewer of that fact.

So what are the basic ways in which I've seen this hurdle laid out?

Read Me The Code
This is the most obscure one, but it was part of my phone screen for a job at Google back in 2002 (disclosure: I was offered the job, and like an idiot, turned it down). Essentially, I was given a programming problem over the phone, given about 10 minutes while I was on the phone to write down a solution, and then had to read it out to the developer on the other end.

I didn't really get this as a concept, and I still don't. Is he mentally picturing the code as I write it? Writing it down himself to make sure that I'm getting the semicolons and braces correct? Why even bother? I'm not a fan of this one because I don't think it actually tells you anything about the candidate other than whether he can get his hand around describing syntactic elements of programming concisely over the phone.

I've chalked it up to an obscure Google-ism, or an interviewer experimenting with a new interviewing style (we've all done it).

This is the most common form, and in it the candidate and the interviewer get together in a room and the candidate writes some code longhand, whether on paper or a whiteboard or anything else. I think that this is a relatively valuable exercise for simple programming problems, because if you can't write out something that works elegantly for a simple problem, then you probably don't have an excellent grasp of your underlying language.

But the key thing here is that you have to engage in a dialog with the candidate about their solution, particularly if you think that there are tools or techniques or language features that they didn't employ. I've done this several times when someone had a strange solution only to find out that they didn't remember the exact syntax for using a language/library feature, and so they wanted to make sure that they did something that they knew to be correct. Because you don't know what the interviewer is looking for (and how much they'll mark off for mixing a brace or semicolon), you are at a loss to know what precisely to optimize for.

Furthermore, you have to be clear about what you're after here. Some problems are more conceptually difficult, and ideally the interviewer should be looking for your thought processes, and whether you can come up with a coded solution to the problem (damn the curly braces). Other problems are far more simple conceptually, and the interviewer should be seeing if you can write up a simple routine that is production quality longhand. Where things go awry is when interviewers want you to simultaneously come up with a solution to a really complex problem, and have every single syntactical element and edge case perfect in the first go. Gotcha-problems fit into this area as well (As a candidate, I get it; you've solved this problem in the past; you're ever so clever, and I should be thrilled to work with someone so brilliant as you; did this really teach you whether I can work with you effectively?).

The main problem with both of these approaches, though, is that they're not realistic. You don't write code by hand anymore. You definitely don't read it over the phone. These exams test (to some extent) your raw programming ability and familiarity with the syntax and libraries of your programming languages, but they don't show how well you can actually engineer software. For that, you need one of the following methodologies.

In Advance Programming Problem
I employed this as a low-pass filter while interviewing candidates at a previous employer, and it worked out pretty well. Rather than doing a phone screen at all, we had a relatively simple programming problem (in our case, XML walking) that we wanted a compilable, runnable solution to. Shouldn't have taken anybody competent more than an hour to do, and we wanted to see the style and nature of their code. What tools did they use? What language features? How did they document? How did they break up the problem? How did they package up the result?

This served two purposes: firstly, it weeded out the chaff that didn't really want to meet with us at all (if you think you're doing us a favor by even talking to us, you're not going to want to program in advance); secondly, it showed us a real-world example of what you can do with your chosen tool suite. Because it was so small, there were no excuses for providing something that didn't compile, didn't run, didn't produce correct output, or was ugly (all of which we got). Real software engineering involves spending time on niceties that aren't about the basic code execution path. Did you care enough to provide us with something that shows that you can do those?

Pair Programming Problem
Another financial services firm used this with me, and was the first time I saw it. Essentially, I came over to the interviewers computer, was asked which IDE I wanted to use (this was for a Java job, so I had my choice of Eclipse or IDEA), and he opened it up empty (new workspace for you Eclipse people). He then gave me the programming problem, and watched and talked to me as I worked through the entire end-to-end process in front of him.

I really liked this at the time. It was far more realistic than pen-and-paper could have possibly been, and also showed him how optimized I had gotten my workflow with my tool suite. It also allowed us to explore far more serious software engineering concepts like unit testing (I wrote the tests as I went along) and test coverage (making sure that my tests covered all inputs and results). I think this is actually an extremely strong way to check how well a programmer can work in a real-world scenario, and is far better than pen-and-paper development for languages like C#, Python, and Java, which have far simpler project setup times (though I think you could do it for a simple C/C++ test as well). That's the key thing here: you're watching a developer work with a real-world tool suite.

Programming Death Race
This was the most extreme form of any of these that I've ever had to do, and was extremely intense. I was given a Java file which contained four empty method bodies with documentation about what they were supposed to do, and told which ones I had to fill in. I then had 60 minutes to do two programming problems, returning the resulting Java source file, which they had an automated system to compile and run through a test suite.

These were not simple, reverse-a-linked-list problems. Each of them could easily have taken far more than an hour to do. And they were conceptually difficult as well (I'm not going to disclose the precise problems because I don't want to ruin their methodology), meaning that you have to actually think. A lot. There's not a lot of time to properly think and code a really complex solution when you know you're under a deadline!

While I completed the test successfully and got past that stage of interviews, I couldn't help thinking that it was all a bit much. I ended up providing a solution to one of the problems that was fully functional, but didn't actually look as elegant as I would have liked, because of the serious time pressure. Perhaps part of their goal was to see how fast you can get your head around a complex mathematical problem, but I felt like what it was rewarding was how fast you can churn out a solution, regardless of the underlying efficiency or elegance.

The firm was quite open that they would far rather have a hundred false negatives than a single false positive, which is why they intentionally made it so difficult, but something to me still said that proving someone can function at that level of intensity isn't necessarily the same as that someone can function on a consistent marathon pace. Then again, in financial services, if you can't function that quickly when things are going wrong and you're losing money every minute, it's probably a bonus for the employer to know that.

My Recommendation
I think I like the In Advance Programming Problem as a general concept (low-pass filter, and weeding out people who aren't really interested in the job), but I'd probably mix it with some of the stuff from the Programming Death Race: fixed time limits and automated testing. I just wouldn't make it as conceptually tough. That's better spent in person working through someone's thought processes.

But for the in-person work, I would definitely choose the Pair Programming Problem over the Pencil-and-Paper one any day. It serves the same purpose, but it also ensures that you're seeing how someone actually works. We don't code on paper anymore, we shouldn't assume that tells us whether someone's a competent programmer anymore.

If you want to test someone's thought processes, run through a conceptual problem. That's an extremely useful thing. But if you're going to ask for code, make sure that you either specify that you're looking for pseudocode, or that you don't care about the details and just want to see the rough order code. Otherwise you're at risk of taking a very valuable test (can the candidate think about problems in a reasonable way) and turning it into a trivia test.

And if you're not doing a real-world programming test, you should. Too many candidates have gotten good at the depth-of-a-tree or nth-Fibonacci-number level of problem without having any real concept of proper software construction as a discipline. Does your interviewing methodology go beyond that and try to determine whether the candidate is someone you'd want to share code with? If not, figure out how you would. Pencil-and-paper coding doesn't.

TBray Response: Sun Should Stop Sucking

(Talking about Tim Bray's opinion on what Sun should do).

As someone who's used a lot of Sun's products, here's my response as to what you can do, but more importantly, what you probably actually will do.

Price Your Hardware Less
Let's look at the Niagara vs. x86
  • T5140 base (dual 4-core T2+ processors, 8GB of RAM, some disks) is $15k.
  • X4140 (dual 4-core Opteron processors, 8GB of RAM, lots more disks) is $5.6k.
  • X4150 (dual 4-core Xeon processors, 8GB of RAM, same disks as the X4140) is $7.3k.
Lemme get this straight, Tim: you think that the web application deployment crowd are willing to spend about 2-3 times the price for your magical CMT platform? Really? Have you met your typical hosting company? Or have you been spending so long at Sun you don't know what people actually care about?

Here's the thing: you actually make good hardware. The X4140? Great server. Your Niagara processors? Probably pretty good (never had a chance to play with one yet). Constellation? Great IB switch. The multithreaded 10GbE NICs? Pretty good hardware if you have an app that is multi-socket based. But people aren't going to run web applications on something that's more than twice the price; web applications are all about horizontal scaleout. Unless your hardware is 2x the performance for 2x the price, you're going to fail.

Quit Confusing Your Branding
We get it. You invented Java. Good on you. I like Java. That's not an excuse for:
  • Changing your stock ticker to JAVA.
  • Calling every single thing you can, even when it doesn't involve Java at all (Sun Java System Messaging Server, a product which contains precisely 0% Java)
  • Grouping completely unrelated products under completely confusing banners (Sun Java Communications Suite, Sun Java System Messaging Server, Sun Java System Application Platform; these are all on your web site as of right now).
When you come up with your next branding exercise, please stop rebranding every single thing you make into one big bag of branding Fail.

Solaris Will Never Beat Linux
Like it or not, Solaris will never beat Linux at this point. You had lots of opportunities to make this not be the case, and you failed. Mostly because of your own stupid decisions, but the simple fact is that at this point, Solaris will never beat Linux for anything other than specialized systems. These are:
  • Applications that require it because they were written once 10 years ago and can never change. Milk these guys for as much as you possibly can; it's the Computer Associates business plan and they seem to do okay out of it.
  • Storage appliances (CIFS integration and ZFS are good and better than the equivalents in Linux-land).
No matter what Tim says, Solaris will never defeat Linux in the general web application deployment space, and there is absolutely nothing you can ever do to change this. Give up. Give up now. You're way too far behind, you don't get those developers, and you'll never be able to catch up with the state of the world.

The thing is that one of the points that Tim raises, Solaris having such a stable ABI, actually causes them problems in general worldview and software engineering, because it means that they can never actually change anything to make it better. But more than that, it indicates that their core focus is really about all the legacy applications which are tied to their platform, and not about driving new customers to the platform.

What Sun Could Do
Divide yourself logically into the following divisions:
  • Legacy Systems. Sparc IV, Solaris, all the old software packages nobody uses, existing StorageTek hardware. Your job is to keep these customers from spending the effort to migrate to something cheaper; no more, no less.
  • Modern Hardware. Your x86 hardware, IB hardware, networking chipsets, Niagara. Your job is to do advanced development and be technologically advanced, but at least marginally cost competitive.
  • Open Storage. OpenSolaris, the new Open Storage hardware. Your job here is to provide a new path off all the storage dead ends that you've gone down, and try to eviscerate the big storage vendors who are insanely overpriced at this point.
  • Goodwill Software. All the stuff you're never really going to make proper money off of, and probably shouldn't have gotten involved in in the first place. MySQL, Java, Glassfish, NetBeans, StarOffice. Your job here is to try to stem the loss that all of these systems are costing you, and keep from allowing their marketing teams from ruining the rest of your branding on profitable products.
Note that there are two growth markets in there (Modern Hardware and Open Storage), and the rest is all irrelevant tangents and legacy. The growth markets are where your future lies, and keeping the others around gives you the chance to migrate existing customers to the new platform, keeping your vision of a one-stop-shop IBM killer intact. But you have to be completely honest with yourselves: the existing stuff is legacy and will never go anywhere, and you need to pile resources into the growth areas without confusing your branding or customers.

What Sun Will Do
Here's my predictions:
  • Sun will continue to price all their proprietary hardware so absolutely above the costs of generic hardware that only people under serious lockin to their platform even think about buying it, never allowing them to achieve any types of economy of scale.
  • Sun will continue to give software products stupid, confusing names. I predict the Sun Java System Enterprise Database Suite being the new name for MySQL.
  • Sun will continue to try to drive Solaris to everything through a neverending sequence of initiatives, confusing anybody even considering deploying it, so that you only ever hit the legacy market and Solaris die-hards.
  • Sun will continue to invest in stuff that will never ever drive any meaningful revenue to them, but sap massive amounts of engineering resources. To try to justify this to their shareholders, they will come up with confusing branding and marketing initiatives to try to tie everything together.
In short, Sun, I have no fear that you will find some way to drag failure from the claws of oh-so-close. Just like you have for years.

Tuesday, November 25, 2008

Utility Computing in Finance: Regulatory Arbitrage

Financial Services companies (particularly those doing derivatives work) are quite big consumers of CPU power for all the computations that are necessary to support the business (and as some derivatives contracts amount to running millions of Monte Carlo simulations to capture low probability events, they can be pretty expensive to compute). For that reason, financial firms spend a lot of time working out how to minimize their computational costs and eke out every cycle that they can.

I had a conversation with my friend Nolan who knows quite a bit about the utility computing space, and thought it would be useful to the general public. I'm defining utility computing as renting out CPU time on other people's hardware (such as EC2). I'm not talking about in-house grid computing here, as you'll see when you look at some of the points below.

Broadly speaking, financial services computation boils down into the following areas:
  • Latency Sensitive. These are computations that you need to do as quickly as possible, and are normally internally parallelized where possible. But within this, you have two more distinctions
    • Market Latency Sensitive. These systems are usually automatically trading based on algorithms, and need to be able to execute trades as fast as possible with no human interaction. These guys are the ones buying ultra-low-latency MOM systems and networking switches.
    • Human Latency Sensitive. These systems are presenting data to a human user who responds in some way, and you broadly have to keep up with the market and a human's reaction time (hint: updating a screen any faster than once per 100 milliseconds ignores basic human reaction time).
  • Latency Insensitive. These are the computations that you have to run periodically, usually once a day as part of your P&L and risk cycles. They run overnight, and your basic job is to do all your computations within your computation window using as little expensive hardware as you can.
Of these, Market Latency Sensitive systems aren't a candidate for the current generation of utility computing: they operate within tolerances that current utility computing platforms can't possibly cope with. These are the guys who rent space at the same data center that their dominant exchange is located in to try to save every last millisecond (and are the reason why Lehman sold to Barclays for less than the value of its Manhattan data center). They are definitely candidates for a new generation of latency-sensitive utility computing (where the cloud guarantees that data will be delivered to your software within N microseconds of it arriving from the exchange), but until general purpose utility computing is prepared for financial services it isn't a starter because they would still face every issue below.

Human Latency Sensitive may be candidates for current approaches (once you're considering humans, you can start to think about acceptable delays for computations, and farm some things out to a utility computing cloud). Latency Insensitive definitely are (you just need the cycles once a night; as long as you finish within your window, you really don't care about anything else). Yet, financial services firms seldom use utility computing facilities of any kind.

That's actually a lie. They use them for everything that they can. The problem is that the regulators don't like them, so those uses are extremely limited.

As near as I can tell, here are the things that the regulators don't like about them:
  • The machine isn't under your control. This means that, in theory, the utility company could be doing all types of nasty things with your execution flow and you wouldn't necessarily know anything about it.
  • The data path isn't under your control. This means that you're sending (potentially) sensitive data outside your network, which someone might intercept and do something with (modify it, sell it to your competitors).
  • You have no availability guarantees. If you own your own machines, you know if you have the CPU cycles to deal with all your available trades. Utility computing companies may lie to you and you wouldn't be able to deal with critical time periods.
These are not insurmountable! Rather, it requires a discipline and auditing routine similar to what financial services firms work with all the time. While it may be good enough for your Web 2.0 startup to just assume that Amazon isn't up to anything nefarious, it isn't good enough for the regulators.

The solution here I think is two-fold:
  1. Regulators provide standards and guidelines. This would include the rules that they would require utility computing providers to adhere to, and facilities for independent audits of those procedures, precisely as financial services firms already have to for internal operation. I can think of a number of potential rules, but I'm not a regulator. They should come up with what they would be happy with, and they should be as equivalent as they can be between the four major financial services markets: London, USA (specifically New York and Chicago), Hong Kong and Tokyo.
  2. Companies build utility computing platforms to those standards. I don't expect Amazon to start providing auditable logs of everything they do to EC2. I don't expect Google to adhere to external audits of their processes and procedures and security standards every 6 months. But those are value-add services that I'm sure somebody would be willing to do, because they could charge much more than Amazon or Google do. And those would be the firms that financial services firms could use for their utility computing needs.
Financial services firms are used to working within constraints. We're used to paying for products and services that can satisfy these constraints. But in this case, I think the regulators need to step up and provide guidance about what they would expect in this space so that the market can progress.

Thursday, November 20, 2008

Meeting With Solace Systems: Hardware MOM

After my post on wanting an AMQP appliance, Hans Jespersen, Principal Systems Engineer with Solace Systems, got in touch with me and we met up yesterday when we were both in San Francisco, and I had a chance to talk tech with him on what precisely they do (as opposed to the marketing speak on their website). Hans is a pretty smart guy (ex-Tibco on the Rendezvous side rather than the EMS side), and seemed really technical and up on some of the other fringe technical topics we talked about.

Company Background And Overview
This is an Ottawa-based company, which is interestingly drawing from a group of ex-Telco engineers for the hardware side and a bunch of ex-MOM people (particularly Tibco, which gave them a big OEM win I'll talk about below). That's interesting to me because the two worlds haven't really interacted that much that I've seen, but there's a lot in common and a lot that I think both sides could probably learn from each other. I didn't ask about revenue or staff or anything else, I just had time to cover the technology itself.

The real goal of the company is to focus on low-latency messaging, which means that they do everything with a no-general-purpose-OS passthrough. As you'll see below, messages are delivered without ever hitting a general purpose OS or CPU, meaning that there's no chance to introduce random amounts of latency. But the same hardware approach allows them to also hit the high-volume and persistent messaging cases.

Base Technology Platform
The base platform is a pair of chassis, each running a customized motherboard, which has a number of PCIe slots in it (5 slots in the 2u 3230 and 10 slots in the 4u 3260). The motherboard itself has an embedded Linux system (with 8 cores actually), but at this point is only being used for management tasks and doesn't get involved in the message delivery path (though this wasn't true in the past, which is why there's such a beefy box to handle things like SNMP and web-based administration). By itself, the chassis is useless.

Each of the "blades" (which I'm going to call cards, as they're really just PCIe cards) provides a particular function to the box, but they're very strangely named on their website. These guys, rather than running general purpose chips, are all FPGA based. Here's what they do.

Network Acceleration Blade
This is the strangest name ever for what this thing does, because it's actually the primary message delivery platform for the box, as well as the only card that has any NICs (4 or 8 1GbE links) that can be used for messaging (the NIC on the chassis is just for management). This consists of:
  • The network ports themselves
  • A TCPoE on board
  • Hardware (I'm not clear whether this is ASIC or FPGA based; everything else is FPGA, but these you can get off-the-shelf ASICs for, so I'm not sure what they did here) to do SSL and GZIP
  • A multi-gigabyte RAM cache to store messages before they're delivered on
  • An FPGA to do the protocol management
Interestingly, this is useless in and of itself, because the network card specializes in receiving messages, unpacking portions of the message to send over the bus to the other cards, and delivering them out. To actually route messages, you'll need one of the next one.

Topic Routing Blade
This is another FPGA-based card, which manages subscription lists of clients and has FPGA-accelerated regex-based subscription management. The way this works is that the Network Acceleration Blade extracts the topic name for a message from the message, and sends that over the internal bus to the Topic Acceleration Blade, which then responds to the NAB with the list of subscription channels (TCP sockets) to which the message should be forwarded (the message contents never travel over the internal bus in this case). Handles both RV-style wildcarding (.>) and JMS-style wildcarding (.#).

This isn't the only option for routing, which I'll get to separately, but assume that it's the only useful one for the time being.

These two cards are the only things that you need in a chassis to support basic transient store-and-forward MOM functionality. For persistence, you need the next one.

Assured Delivery Blade
This is a card which acts to ensure guaranteed delivery of messages, both in the persistent-sending case (don't acknowledge to publisher until the message can survive broker failure) and the disconnected-client case (if I'm down, store messages until I get back up). It's quite an interesting beast actually. This one has its own buffer of RAM, and two FC HBAs, and you would deploy this in a HA pair of chassis with a crossover between the two assured delivery cards. It's probably easiest to describe how it functions and you'll get the picture of what's in there:
  • Message contents get delivered from the NAB to the Assured Delivery Blade over the chassis PCIe bus (yes, in this case the whole message body has to be transferred).
  • The ADB sticks the contents into its internal RAM buffer
  • The ADB sends the contents over the interconnect to the passive ADB in the other chassis
  • The ADB acknowledges the message back to the NAB, which then is able to acknowledge back to the client that the message is persisted (more on why this works in a second)
  • The ADB then batches writes (if necessary) to the underlying disk subsystem (they've only tested with EMC, he wasn't sure how high-end you had to go up the EMC product line) to clear the RAM buffer in a more optimized form.
I was a little sceptical about acknowledging the message just after the passive node acknowledges it (this in SonicMQ terms is DeliveryMode.NON_PERSISTENT_REPLICATED), but here's where having a pure hardware platform helps here. Each card has a big capacitor and a Compact Flash slot, and on failure, the capacitor has sufficient charge to flush the entire contents of the RAM buffer to the CF card, guaranteeing persistence.

It's a pretty clever model, and pretty indicative of what you can do with a dedicated hardware platform.

Marketing-Driven Blades
This is where things got a bit more sketchy. Note that in the above stack there's no way to do header-based message delivery (JMS Message Selectors). That's a pretty important thing if they're going to try to hit existing JMS customers who will have message selector-based systems. So they worked out a way to do that.

The problem here is that they added it with their Content Routing Blade, which is entirely XML based (the message selector language is XPath-based, rather than SQL-92-based). This is where he lost me and I told him that. While I'm sure this is great from a marketing perspective because it lets them sell into the Service Oriented Architecture Solution space, I think that space is rubbish, and the types of people who buy into it are not the types of people who are going to evaluate bespoke hardware from a minor vendor to increase message throughput. It also doesn't help porting existing applications, which I hit on in my initial AMQP analysis is one of the most important things you can do to try to drive early adoption of new technology.

They also have an XSLT-in-Hardware card (the Content Transformation Blade), but I'm so uninterested in hardware XSLT that I didn't bother talking to him about it.

Protocol Considerations
Given my writing about this, I did hit him pretty hard on the fact that right now they only have a proprietary protocol with binary-only drivers (C, C#, Java). The fact that it's a C and not C++ client library makes it much easier to port because you get to avoid the name mangling issues that are the worst in intermingling native compiled code, but there's nothing else you can use to talk to the box. They have a JMS driver you can use, but given that they don't support all of even the topic-half of JMS (like message selectors), I'm not sure of how much utility that is at the moment.

That being said, the fact that they've rolled with FPGAs and not ASICs means that they can flash the NAB to support more protocols (AMQP was specifically mentioned here). In fact, they've already done this by providing bespoke versions of the NAB and Topic Routing Blade to support the Rendezvous protocol natively under the Tibco Messaging Appliance brand name to Tibco. In that case, (if you're familiar with Tibrv), rather than using multicast or broadcast RV, you make a remote daemon (remote RVD) connection over TCP to the Solace box, which speaks remote RVD natively. Pretty cool, and Hans is pretty sure they're going to support AMQP once it standardizes.

The things I particularly like here:
  • Clever use of hardware. The use of FPGAs rather than ASICs is pretty good going, as is the custom motherboard with lots of fast PCIe interconnects. I also like that they've taken a really close look at the persistent messaging case, and leveraged the fact that they're in control of their hardware to ensure that they can optimize far more than a pure software solution could.
  • Pure hardware data path. I like the use of external controller cards (the two routing cards) to minimize message flow over the bus, and that there's no general purpose CPU or OS touching the messages as they flow through the system.
  • Speed. If his numbers are credible (and I have no reason to think they wouldn't be), they're hitting over 5MM 100b messages/second/NAB using topic routing, and 10MM subscription processes/second/card on the topic routing cards themselves. Persistence is over 100k persistent messages/second. That's pretty good and the types of numbers I'd expect from a hardware solution.
Things I don't like:
  • The Content Routing/Transformation stuff. Pure marketing claptrap, and utterly silly. If you know you need a low-latency hardware MOM system, are you actually going to take your nice binary messages with a custom protocol and add the Slow of XML for routing? I can see throwing compressed XML around as your body content, but routing based on it? Taking binary fields and converting them to text just so that you can have XPath-based message selectors? That doesn't seem right to me. I'd be much happier if they gave a key-(binary) value map-based system, which would map much more naturally onto AMQP, JMS, and Tibrv models that we're currently coding against. That makes it hard to port existing systems, which makes it hard to get adoption.
  • Proprietary Protocol. Yes, I hate these with a passion as they keep biting me in the ass every time I have to use one. You're a hardware company. If you actually think that giving me source-code access to your C client library is going to expose so much of your mojo that someone can hire up a bunch of FPGA guys and replicate your hardware that quickly, then someone will just reverse engineer it anyway. Lame decision.
Am I going to rush to buy some? No. I actually don't live in the ultra-low-latency space in my day job, and the technical decisions they've made would make porting my existing applications sufficiently painful that I'm not willing to go down that path for an eval when my current solution works.

Should you? I'm not 100% sold. The core direction of the product looks pretty darn good to be honest. Without seeing their proprietary client library, I can't tell you how easy it would be to port an existing JMS/Tibrv/AMQP application to their programming model for an evaluation. If I knew I was going to be working in super-low-latency space and I didn't mind an extended POC, I'd definitely add it into my evaluation stack.

But if I had an existing Tibrv application that was killing me in production (which most eventually will), I'd definitely take a look at the Tibco Messaging Appliance as fast as I possibly could. Anything that will kill off multicast and broadcast-based RV is ace in my books.

Wednesday, November 19, 2008

Some Interactions Are Only Indirectly Profitable

I met with Ari Zilka from Terracotta Technologies yesterday in their offices in San Francisco for a follow-up meeting from a series of meetings I had had with Ari and our sales representative over the course of the past year. My interaction with Terracotta has largely been one of "This technology really is a game-changer; I just have to find my personal game that it changes," and I've been working with Ari ever since on the first entrance path for Terracotta into my company's infrastructure.

Ari's a busy guy. He's a founder and CTO of a software startup, and having been there, I know how difficult it is for him to devote time to anyone. And I've now had three meetings with him over the course of about 9 months. And my employer has not given them a single dollar as of yet, and there's no guarantee that even if we end up rolling into production on top of Terracotta, that my employer will stump up cash to them for an open source technology (FTR, this is the company that inspired the original Open Source Cookies post). This is probably a frustrating situation for the sales guy, because sales guys have numbers and targets and need to make money, and the sales guy needs to dole out his fraction of Ari's time in the way that's going to maximize his commission. I'm clearly not that as of yet. And yet there I was for the third time.


What's the rationale for his wasting yet more time talking to me?

Having discussed some similar issues with Laura Khalil from Atlassian yesterday when I met with her earlier in the day (more on that meeting anon), I think things started to gel in a more concrete way, because they face these problems as well from a non-Open Source perspective.

The rationale here is that some interactions are only indirectly profitable, but the indirect benefits potentially vastly outweigh the direct ones. So you pursue them anyway if you're an open company; a closed company won't.

Traditional Sales Model
Consider the traditional Enterprise Software (deal supporting direct sales force, or > $100k licensing) sales model:
  1. Company sends out feelers to vendors
  2. Vendors send representatives
  3. Company goes ahead with due diligence/POC with one or more vendors
  4. Company starts negotiations
  5. Deal is signed
  6. All vendors move on
In this model, you have a discrete lifecycle of a particular sale, and most players are only around for the sale; after the cash changes hands, you're into support land, which is at least partially an insurance business. The company projects to the vendors its rough budget, the vendors determine how to maximize their portion of the budget and whether the deal is even worth pursuing. Knowledge is largely gained by the customers interacting directly with the vendors, or maybe checking with some research firms.

Note here that the vendors themselves pull out of the conversation the moment they realize that they don't want the particular deal: it's too small, they're not the right solution, whatever. Then they go back to their closed external appearance, and go back into information embargo. If they can't make a sale, why waste anybody's time on the interaction?

I think any company that thinks this way and is trying to sell to technologists is going to fail and fail hard. And I think split open/closed source companies are best suited to be able to leverage this.

I'm A Bad Customer
For any commercial software company, from a sales perspective, my employer is not their ideal customer. We're not that big for a financial services company (our parent company is, but we have our own technology stack and purchasing departments). We don't buy way more than we need. We don't like shelfware. We like best-of-breed, and don't buy whole software stacks (we will never buy a Service Oriented Architecture Solution). Our technologists are massively involved in sales decisions, even when it's really a business-facing application. We constantly evaluate software in build-vs-buy mentality (and we as a culture like writing software). We're small fry, and we're an expensive (from the vendor's perspective) sale.

But, that being said, we probably are a good candidate for a sale that influences others. We are passionate about technology. We have lots of technical contacts (friends, ex-coworkers) with people at much larger companies. We have people who are involved with lots of online technical communities. We have people who do open source work in their spare time. We have people who blog, both positively and negatively. We go to user groups. We engage vendors constantly on product improvements. We take betas and alphas and developer cuts all the time.

That means that ultimately getting us on your side (even when we don't give you a single dollar in revenue) ends up influencing a lot more people than even a single larger sale would.

So when you look at the overall picture, it starts to make sense for Ari to spend time talking with me, even without a Big Ticket Sale right in front of him. And I think that would be true of any open source technology.

Turn Indirect Profit To Direct Revenue
If you're selling anything commercial having to do with an open source product, I think you'll find that a significant proportion of your most technically savvy users are ones who will never pay you anything. They're working at home; they're in academia; they're smart but working for a poor company; they come from a less developed country. They're a massive source of improvements and knowledge and they get passionate about what they're doing, but they're not going to pay you. But they're a pretty good reason why you'll eventually get revenue from other people: they may go work for a bigger/richer company, or one which prefers Buy in Build-vs-Buy; they talk with the world constantly about what they're doing; they speak at conferences and write books and make your platform far more compelling than it otherwise would be. And so if you want to be successful, you view supporting them and interacting with them as indirectly profitable: no revenue comes from it, but it increases the profit potential of your ecosystem dramatically.

So where's the relevance with Atlassian here? They're not open source. They sell software. How have they leveraged these principles to end up with Laura wasting her time meeting with me?
  • They give away licenses all the time. You're open source? Free license (this is how I originally found out about Jira way back in the day). You're working for a non-profit? Free license. You just want to use it for your own personal stuff? Free license. This builds a passionate ecosystem and doesn't stop you extracting revenue from companies that will pay you.
  • They work with open source programmers. They have a plug-in ecosystem that they actively nurture, and many of those people are doing it open source. Those same passionate people making their ecosystem more attractive and more conducive to extracting revenue from others.
  • They engage their customers. Laura knew I'd blog about at least part of what we talked about (that's how she found me in the first place). That increases the sum knowledge that the world has about Atlassian products, and makes for free marketing.
  • They differentiate their customers. Some customers are a lot of dumb money, and some customers are a small amount of smart money. I'd like to think my firm is more of the latter, given the amount of time we devote to trying in any way to help make the products we use better.
If you're familiar with the classical Tactical/Strategic Sale quadrant, there are a whole host of people who are so tactically worthless sales wise that they're going to give you nothing. But strategically they're useful. Pursue them.

If you're a technology company and you're trying to play the old closed-information game, the every-interaction-must-be-profitable game, the "no you can't have the manuals unless you're a customer" game, the "no you can't download our whitepapers from a email address" game, you're losing out a lot. Engage people who are only indirectly profitable and you'll find more that are directly profitable.

Monday, November 17, 2008

Perforce, FishEye, CC.Net, Labels, Oh My!

My employer is a big Perforce shop. Although we've got some CVS repositories still lying around for legacy reasons, over the past few years almost all of our source code has managed to make its way into four Perforce instances (separated for geographical and organizational reasons). We use it for pretty much everything, and we've got our entire software development methodology based around its use.

That includes a number of tools that we use that are integrated with it:
  • Code review tools (all built in-house)
  • Software release and distribution tools (also all built in-house)
  • Bamboo, as our current continuous integration system
  • FishEye, as our SCM web-based visualization system
  • Jira, as our issue tracking system
In addition, in the past, before we moved to Bamboo, we had three other CI systems hitting it, CruiseControl, CruiseControl.NET, and Hudson (all but one retired, and that one is being retired shortly).

All of them are hitting it on a regular basis pretty hard to pull metadata (what happened when) and data (to pull the actual SCM data). This is a tale of where it started to go wrong and how we fixed it.

Problem Diagnosis
We noticed that FishEye was behaving pretty badly when it was rescanning a repository. We do this on a regular basis, because FishEye won't automatically rescan changelist descriptions (at least for Perforce), and sometimes we go back and edit changelist descriptions to hook them up to Jira issues or end-user requests, or just to provide more clarity on what someone did in the past. Since FishEye won't pick up those changes (if it's already processed changelist 123456, why would it process it again?), we have to periodically completely rescan our master Perforce server to pull out all changes.

Our master Perforce installation is relatively big for a non-gaming house (I can't give numbers for this in a public forum), and rescanning it was taking several days. In fact, it was taking so long, that we couldn't even plan on doing it over a weekend: if we kicked off the process on Friday night after New York went home, it wouldn't be done by the time Hong Kong and Tokyo came in on Monday. This was a problem.

Also, when Fisheye was doing this, the whole repository was noticably slower. Since the rescans started to take place over normal business hours when people were trying to work, this made the problem doubly bad: not only was Fisheye not available, it was making Perforce slow for users during the rescan.

So I started diagnosing what was going on, and the process that was taking the longest was processing labels. This alone was taking over a day, and because of the way Fisheye does this, and because forking to query Perforce on Solaris 10 is a painful experience, we needed to get this number down. We had way too many labels covering way too many revisions.

Perforce Is Not CVS
The metadata table that holds this data in Perforce (db.label) was absolutely massive: roughly 7GB, or about 60% of our entire Perforce metadata storage. This wasn't good, and it's far from ordinary. When I started investigating, we had over 12000 labels. That's a lot for the number of projects we're hosting and the number of releases we've done, but it turns out that 10000 of them were created by CruiseControl.NET builds.

This was largely done from a misconception of what labels are good for, and is basically a remnant of CVS-style thinking. In CVS, because revisions of files can be interleved together, if you want to reference the state of a particular subsection of the repository as of a particular point in time, you have to add a Tag to every revision of every file involved. This actually goes in and adds metadata for that revision to say it's part of BUILD_5 or some such thing.

A Perforce Label is different. Perforce has atomic, monotonically increasing changelist numbers, where each number uniquely identifies the state of every single revision in every single file in the entire server. And I can use them in all types of contexts. In particular, I can pull down the state of a particular project as of a particular changelist number: "Give me the Fibble project as of changelist 12345." This is how Perforce-integrating CI systems work: they query Perforce to say "tell me all the submitted changelists that I haven't seen", and then sync up particular areas as of those changelist numbers. Therefore, a changelist number is the equivalent of a tag applied to every revision of every file in the whole server.

A Label, on the other hand, is there for cases where you need to refer to revisions of files across multiple changelists. The key use case here is patching. Let's say that you've released version 1.2.0 of your software, and then you start adding changes to support 1.2.1. But in the meantime, you discover a complete showstopper bug that requires you to put out a special release with only that bug fix in, and not any of the other changes you've got prepared for 1.2.1. Since the 1.2.1 features have already started going in, if you try to pull all the source code as of the point where the critical bug fix went in, you'll get the 1.2.1 changes as well. In this case, you create a Label, and you put in the label all the 1.2.0 revisions, as well as the revisions just for the showstopper fix, but none of the rest of the 1.2.1 changes. This gives you a way to refer to a collection of revisions of files across different changelists.

What our CC.Net server was doing (and as I didn't install it, I don't know if this was default or intentional on our part), was for every single build of every single project, creating a new label which contained the state of all the files for that build. But you don't need to do that: since it was pulling everything down as of a particular changelist number, all that accomplished was saying "The revisions for build 30 of the Fibble project are the same as the files as of changelist 12345," which Perforce changelist numbers already give you. So it was unnecessary metadata that was clogging up the server intolerably.

We deleted all of those 10000 labels (the fact that we had already moved all those projects to Bamboo made this a no-brainer, as we were no longer using the CC.Net provided builds at all). But the size of the db.label table didn't actually shrink. In fact, it actually got slightly bigger during that time.

This is because Perforce as an optimization assumes that you're constantly increasing the amount of metadata that you're putting in as time goes forward, and so doesn't prune the size of the tables. So they were still too big, and sparse, so although we didn't have to do as many queries across them, it was hurting the OS caching of the files.

The solution there is to restore from a checkpoint (a Perforce checkpoint is a gzipped text file containing every binary record in your metadata in plain text; it acts as your primary backup mechanism for the metadata records that it keeps along with your RCS files). Before we did this we went through a pruning process for old workspaces that people hadn't deleted (removing several hundred) to get the db.have file down in size as well.

After this was done, the size of our db.* tables went from 11GB to 3.0GB (our gzipped checkpoints went from 492MB to 221MB). And the FishEye scans went from 3 days to 5 hours. Job done.

So after all that, what we can draw from this is:
  • Don't label unnecessarily. They're relatively expensive in terms of metadata, and you probably don't need to do it.
  • Shrink your metadata. Remove anything that you no longer need, including old workspaces.
  • When you prune significantly, restore from checkpoint. This is the only way to get Perforce to really shrink down your metadata tables on disk.

Monday, November 10, 2008

I'm Comin Back To San Francisco

San Francisco, Here I Come. At least for a visit.

Next week I'm going to be in the San Francisco Bay Area visiting friends and running errands that I've sorely neglected over my unexpected 2-year absence.

I'm already going to be meeting quite a few people down in the Valley, but if you've been lurking around here (or actively participating) and want to meet up, just lemme know (you can email me right through the blog I think, otherwise post a comment and I'll follow up) and I'll try to swing by and see you, particularly if you've got some cool tech going on!

Or cookies.

Sell Me An AMQP Appliance

I've been doing some thinking recently on the bevy of AMQP broker software which is being created, and I think it's a really positive thing that we're definitely seeing a new swathe of innovation in the broker-based middle ware space. In particular, the space seemed to many to be a fossilized space just a year or two ago, but now with AMQP and XMPP PubSub, is getting a lot of new development momentum. But ultimately, I'm starting to think that perhaps much of the innovation will be wasted on a large category of users.

I believe that we're going to see the broker-based MOM space divided into three categories:
  • Edge Distribution. Imagine that you have 10 processes on the same machine that are all receiving the same data. The most optimal way to distribute the data is for that machine to be running its own broker and connecting with the hub broker to distribute each message once for redelivery to the 10 processes on the edge system.
  • Embedded. In this case, an application is being written against the MOM protocols in question, but everything's happening inside the application's configuration, so that there are no external dependencies. Atlassian's Bamboo is a good example of this: they embed ActiveMQ so that their customers don't have to have an existing JMS infrastructure in place, and you don't even necessarily know you're running JMS at all.
  • Central Distribution. This is every other case, and generally means a node that is configured just to act as a broker, and isn't really doing other stuff.

The thing I've come to realize is that for the third case, Central Distribution or Dedicated Broker, I don't want to buy or run "software" at all. I want to buy an appliance.

Why An Appliance?
I want to buy an appliance because I don't want to have to deal with the complexity of optimizing my installation (and by that I mean the entire hardware/OS/software stack) for the purposes of running a MOM broker. Let's start anecdotizing.

When my company was planning on rolling out SonicMQ across our entire infrastructure, we spent a lot of time working on our proposed infrastructure. This included, but is not limited to:
  • Base Hardware Configuration. What processor? How fast? How many?
  • Failover Network Connections. For our High Availability Pairs, what network connection(s) did we want to use for the replication traffic? How precisely were they to be configured?
  • The OS and Patch Level. This got particularly hairy with some of the hardware options for the failover network connections, and we had to do a lot of patch wrangling to test out various hardware options.
  • Filesystem Layouts. For the various types of filesystem storage (WAL for persistent message storage, message database for persistent topic and queue storage), how did various options of disk partitioning, disk speed, filesystem layout and options affect performance?
  • RAM. How much? What GC settings for the JVM? Swap configuration?
  • SonicMQ Configuration. How did any of the above affect SonicMQ? Were there any things we had to tweak to get it to perform better?
How much of that do I think was really part of our jobs? None. None at all. I think it was completely wasted time. And that time costs money, in that I was working with our systems staff on testing and configuration and we don't work for cookies (cookies are merely a bonus that endear us to vendors).

If Progress had sold a 1U SonicMQ Appliance, which had 4 external NICs and a pair of HA connectors, we would have bought it. Even at a premium over the software+hardware, because that premium couldn't have cost more than our time did.

Things Are Getting Worse
Now, things are starting to move really quickly on a lot of fronts that would affect an MOM appliance:
  • Storage is changing rapidly. An ideal platform here would have some flash in it somewhere. But where? Fusion-IO-style PCIe card? Sun-style ZFS log acceleration? I have no idea, and I don't want to have to know precisely how to make your software go faster with Flash added. That's your job, Mr. Vendor.
  • Networks are also changing rapidly. What to use for an interconnect? 10 GbE? Infiniband? Something else? You figure it out for me.
  • Chips are opening up. Do you work best with Xeon? Opteron? Niagara? Does it matter? How fast do they have to be to saturate all network connections? I don't know. That's your job.
  • Can you go faster with TCPoE? Or does it clog you down? What about custom FPGAs for routing tables? Any way to leverage GPGPUs? Again, you're the vendor, figure out the most cost effective way for me to go as fast as possible.
  • Can you move into the kernel? If so, what parts? Remember, on a general purpose system my systems team will never allow a vendor to muck with the kernel. Once you're selling an "appliance", you can do what you want.
Just to pick on one of these that caused me no end of woe, even today, the choice of 10GbE HBAs matters: different cards have different bandwidth potentials per-socket vs. per-NIC, so this type of stuff really does start to matter more than it did before, and I don't want to have to test it all out for you. Which one do I use for an HA interconnect? What about for my connections to the outside world? I don't want to have to make that decision. I want you to do it for me based on what's optimal for your broker software.

Why Hasn't This Happened Yet?
I think the primary reason why this hasn't happened for things like MOM providers as of yet is that we, customers, just plain don't trust vendors. At all.

We don't trust them because:
  • They don't cycle hardware quickly. Our SonicMQ boxes are running on dual dual-core Opterons. It's monstrously overkill, and they're mostly idle. But generically most customers don't trust vendors, particularly of second-tier appliances, to keep up with industry trends, and that costs you in terms of overall performance. They have to buy in bulk to get the discounts from the chip vendors, and that means that they're going to push their stock as long as they have to.
  • They monstrously overcharge. Anyone who buys storage (and a perisistent MOM system is a storage beast as much as a network beast) knows this. Buy a disk drive for $X. Put it in a slide which costs $Y. Sell it for $Z where Z ~= 5(X+Y).
    Look, we get it, you need to charge a premium. But don't sell me ordinary equipment and charge a monstrous premium over what I can get myself from my reseller, or by going directly to the OEM. Clue Phone's Ringing: We're Not Idiots.
    You can charge a premium where it's warranted. But being an "appliance" vendor is not an excuse to jack prices up to a ridiculous level just because you can. You do that, and we'll start asking for generic software and configure it ourselves again.
  • They're Another Hardware Vendor. Each hardware vendor we have means another stupid type of disk slide and another "certified" RAM module which is the same as your standard PC RAM. It's all extra stock we have to hold, it's all extra work our purchasing department has to do, it's all extra validation we have to perform, it's all extra overhead.
  • They Don't Manage Well. Each appliance vendor seems to think that Their Approach is the Right One. So instead of saying "we're going to support getting machine statistics for Linux boxes" and just deploying that, we customers end up having a myriad of different SNMP MIBs and other little tweaks that means that for each new appliance, we have to change a whole lot of our management infrastructure. Blech.
    (Anecdote Time! NetApp provide SNMP MIBs on their network adapters. But for total bits, they don't provide a 64-bit counter. Meaning that if you have a 1Gbps network connection, it'll roll over in like 7 minutes. So if you're not running your stat gathering MRTG/Whatever every 5 minutes, you'll have to factor in rolls of the 32-bit counter. I will assume that you, as an "appliance" vendor, will have similar levels of Stupid).
Solution Study: LeftHand
LeftHand Networks had an extremely interesting approach to this, which is that they had software (SAN/iQ) that really benefitted from close hardware integration, but they partnered with Tier-1 system vendors (Dell, HP) to certify the LeftHand software on the system hardware. Brilliant option. They're now part of HP, but the idea was really sound. But take it a step further, and do what a lot of companies are now doing.

OEM the platform of a Tier-1 system vendor (Dell, HP, IBM, Sun). Sell us that platform as is. Don't bloody pretend it's something magical (for example, don't even bloody think of not allowing me to swap out one of your Special Magic Drives for a stock Dell/HP/IBM/Sun drive). Just sell it as an appliance with your software.

But even more than that, be quite open that what you're selling is:
  • Tier-1 Hardware, running
  • A customized Linux/OpenSolaris/*BSD operating system, with
  • Your MOM software, with
  • Perfect configuration, and
  • Special Sauce (optional).
Don't stop me from going in an poking around in a read-only way at what you're doing, make sure that I can replace your hardware with stuff from the Tier-1 vendor itself, and make sure standard Linux software (like the standard SNMP MIB and SSH and top) are all available to me in some minimal way.

This covers all the hardware vendor objections (even if you just choose one Tier-One vendor, they all have programs for just this, and we all have experience with all of them), and still allows you to optimize for the OS and hardware like mad.

The Open Source Problem
But what if you want to run things all open-source on your own hardware? Then have the project, rather than distributing an RPM, distribute a full bootable image that you load. Or have your favorite Linux/OpenSolaris distribution come up with an installer that rather than building a general purpose system, builds a system dedicated to just running that particular application (e.g. Anaconda, but custom RPMs and boot images).

I think this is probably going to start being a more common paradigm going forward, and the move to cloud computing is probably going to take things farther as people use something like CohesiveFT or some other mechanism to grab VM instances or instances for their favorite cloud computing provider for infrastructure software.

Why I don't think you're going to want a Cloud Computing solution to this in particular (e.g. an EC2 cluster) is that something like MOM done right in Central Distribution mode is pretty network topology sensitive: it doesn't make sense to virtualize it or run it as a VMWare image. It wants to be close to the hardware. Run it in another type of workload, but don't try to push tens of thousands of persistent messages through it per second; it's never going to work. But the same general approach has a lot of merit for other types of projects.

I think there's a big option for someone to provide an AMQP appliance for the centralized distribution case and push it heavily. There's no reason why I should be running what essentially amounts to a persistent network appliance on generic off-the-shelf hardware and operating systems, and there's no reason at all why I should have to do your performance optimization yourself.

But again, AMQP is the secret ingredient: I'll trust an appliance running custom software to provide a standard network service. I wouldn't like an appliance running custom software for a custom protocol.

IEEE 754 Floating Point Binary Representations

Just to gather up a whole bunch of stuff I had to slog through and make this more googleable, allow me to summarize some various trivia having to do with bitwise representation of IEEE 754 floating point values across platforms. This is primarily useful if you need to read and write floating point values from byte arrays or binary network streams across platforms, particularly if you have to interact with Steve Ballmer's Insanity.

First, there is no official standard for endian-ness when transmitting IEEE floating point data over the wire. That means that Java ends up defaulting to in DataInputStream and DataOutputStream to big-endian format (to match the fact that everything is big-endian), C# defaults to host-endian format (always little-endian in practice, as the Mono guys have learned.) for BinaryReader and BinaryWriter. First point of fun.

Secondly, IEEE 754 floating point representation defines an entire range of values to represent NaN, not a single value. Java takes the approach to make things byte compatible in the wire format by always emitting a single constant value for all NaN values (where all the meaningless bits are set to 0), while C# allows whatever cruft happens to be in the value on the CPU to flow through to your binary representation. And don't assume in C# that double.NaN has all those set to 0. It doesn't. In practice, double.NaN in C# is full of cruft.

This is fine if you read in the value and call IsNaN on it, but not so great if you want to check that your serialized/deserialized byte arrays are fine. For that, you need to mask out to ensure that you're always writing a canonical representation of your NaN values.

A useful C# block if you find yourself having to deal with this stuff is the following (using this will ensure that your binary representations are always bit-equivalent with the Java formats):

Friday, November 07, 2008

Linux Fork Performance Redux: Large Pages

After a comment from Kostas on my last Linux Fork Test post, I worked with my esteemed colleagues to try things out with large page support. Wow, what a difference that made on Linux.

The theory here is that forking involves playing around with TLBs entries quite a bit. Since a monstrous full heap will have a lot of 8KB page TLB entries to contend with, if we shrink the number of TLB entries by a factor of 256 (by working with 2048KB pages rather than 8KB ones), you'll limit the amount of time that the kernel is spending mucking with them.

First of all, make sure you read this: The large memory support page from Sun. Now that we've gotten the formals out of the way, here's some fun we had.

First of all, a stock RHEL 5.2 installation has a HugePages_Total set to 0 (cat /proc/meminfo). No huge pages whatsoever. So you need to bump that up. For my test (maximum 2GB fully populated heap on a 4GB physical RAM system), we decided to set that to 3 GB, which is 1536 2MB pages.

echo 1536 > /proc/sys/vm/nr_hugepages isn't guaranteed to actually do that, and the first time we ran it, we ended up with a whopping 2 HugePages_Total. Second time bumped us up to 4. So we went on a process hunt to eliminate any processes that were stopping us working, and got things down pretty small. Now we were able to get up to 870, which was good enough for my 1GB tests (which indicated the major performance degradation anyway), though not for the 2GB test. (Yes, I know that you're supposed to do this on startup, but I didn't have that option so we did what we could).

And so I kicked things off with the -XX:+UseLargePages flag. Fail.

Every single time I got a Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve shared memory (ero = 12). And nothing would run. Well, damn!

Turns out those little tiny bits that they say in the support page about not working for non-privileged users are completely accurate. These all went away when I had someone with sudo rights run the process as root, and all my numbers are from running as root. So just assume that even 1.6.0_10 ain't going to allow you to allocate any large pages if you're not root.

So he ran things as root (and I re-ran things as non-root without the UseLargePages flag since I changed the test slightly). Here's some fun comparison:

Heap SizeLarge Pages TimeNormal Pages TimeSpeedup
128MB3.589sec13.217sec3.68 times faster
256MB3.62sec15.314sec4.23 times faster
512MB4.638sec36.692sec7.91 times faster
1024MB3.885sec67.062sec17.26 times faster

Oh, and that speedup between 512MB and 1024MB? Completely reproducible. Not sure what precisely was going on there, I'm going to assume my test case is flawed somehow.

It's happening so quickly at this point that I'm quite suspicious that all I'm measuring is the /bin/false process startup and teardown performance, as well as the concurrency inside Java. I don't actually think I'm testing anything of any meaningful precision anymore. Maybe at a few million forks or with higher concurrency, but I've achieved essentially a constant amount of time spent forking, so I've gotten out of the heap issue really.

So it turns out that you really can make Java fork like crazy on Linux, as long as you're willing to run as root. And I don't know why and my naive googling didn't help. If someone can let me know, I'd really greatly appreciate it.

Did any of this help Solaris x86? Not one whit. Adding the -XX:UseLargePages flag (even though Solaris 10 doesn't require any type of configuration to make it work) didn't improve performance at all, and Solaris was still twice as slow as Linux without the flag.

Monday, November 03, 2008

C# BinaryWriter is Little Endian Because Microsoft Hates The Internet

Let's assume that you're developing the primary runtime class library for a programming language, and you need to write primitive types to a network connection. You've essentially got two options:
  • Allow the application to specify the endianness of the data
  • Require the application to use a particular endinanness.
The former allows developers more flexibility, but it means they have to think, and thinking is hard.

Let's assume that you've decided to not allow the user to specify the endianness of the data that the user is going to send over the wire easily. What endianness might you then choose for your developers? Might you choose the one that's officially called "Network Byte Order?"

Well, no. You're working on .NET and you work for Microsoft, so you'll use the opposite of that.

Now let's assume that you have a developer who's trying to do the right thing (where the right thing is not to start crying "Waaah, C# sucks, so you have to change the internet to support whatever Ballmer's got cooking"). You might try to make it easy for them to output data in network byte order. How might you do that?

By hiding the .NET equivalent of htonl and ntohl as static methods in System.Net.IPAddress, duh. Because that's obviously where you'd look when thinking of where to find binary data endian conversion routines. What in the world could be more logical or easy to find without StackOverflow?

Note: To be fair, it's entirely possible that Microsoft has not chosen little-endian by conscious decision, but is merely using host byte order. But for compatibility, Mono has to replicate this as little-endian-always. Which is going to be a lot of fun if and when Microsoft actually tries to port this to a new architecture.

SSL For Self-Signed Bad For The Internet?

In an article (originally posted as a blog entry, then archived as a web page), Nat Tuck complains that Mozilla's behavior on self-signed SSL certificates is bad, because it stops people from using encryption where they don't care about the possibility of a MITM attack. I think he's mostly wrong.

The big reason why I think he's wrong is that I think the default behavior is correct: the vast majority of people don't understand the complexities of a Man-In-The-Middle attack and how that can affect any self-signed certificate. However, I totally understand his desire for more encryption.

I'm pretty annoyed at the fact that at the moment, I'm pretty sure that all non-encrypted traffic on the internet is being logged and scanned at some level by multiple governments. I don't like that at all. Encryption is the only way around it.

But we've had technologies for implementing encryption on-the-fly where we don't care about the possibility of an MITM attack for yonks. Just use DH key exchange as a handshake and then use that as the key for a stream cipher. No need for certificates at all. You get strong encryption between endpoints, and acknowledge that you're potentially subject to a MITM attack. You can even combine it with some advanced DNS checking to minimize the chance that your company/government's proxy is MITM-inspecting every connection. The only problem here is that there isn't a clean URL handler supported by RFCs that is available that I know of.

Why not just start one? httpe (HTTP Encoded, but not Secure)?