Thursday, December 24, 2009

Confluence, iSCSI, NetApp, Flexiscale, Fail

We've been hosting the FudgeMsg website using Confluence (and mad props for the free Open Source license by the way!) on a VM hosted by Flexiscale. We chose Flexiscale for the following reasons:
  • Confluence is ridiculously tricky to cluster, meaning that you can't benefit from the scale-out capabilities of the Amazon EC2 model. (Edit 2009-12-24 - We're not attempting to cluster it; this statement is to indicate that because it's so tough to cluster, we're not trying to. Confluence is thinking we might be, and crashing as a result).
  • Getting your AMI+EBS+ElasticIP for such a single-point vendor app is very difficult and time consuming, again meaning that the Amazon EC2 model isn't ideal.
  • We didn't want to go for dedicated hardware for a low-volume application stack.
  • We don't have a stable enough internet connection to host the application stack ourselves.
We thought we were pretty darn clever to be honest, but then we noticed that things started going wrong. In particular, we started getting this error a lot:


We were running a relatively complicated Java setup (jsvc to run as chrooted user, on Tomcat, with multiple web applications running), so we thought we had done something wrong. After all, the first rule in software engineering is always, always, assume you have caused the break.

We were wrong.

What we've since found out is that the entire Flexiscale model is that your VMs have effectively no local storage whatsoever, and everything is iSCSI hosted off of a NetApp cluster. That means that your storage is fully persistent, and survives migrations of your VMs across physical hardware. Which is a great concept when it works.

Except that the Flexiscale NetApp cluster is borked. Essentially, sporadically the iSCSI services will completely shut down. Sometimes for a second, sometimes for 4 hours. Your processes will still be running, but if they attempt to hit disk for any reason, they'll hang and ultimately timeout. Your processes are still running, but they can't talk to disk.

In the case of Confluence, which was actually trying to talk to PostgreSQL, which itself was trying to talk to disk, Confluence detects the complete timeout hang of PostgreSQL as a cluster violation, and puts itself into crash mode.

You want to know the best part of this? When you're in this state, there's nothing you can do. You can't SSH into the VM, because it can't read the password file to let you in. The Flexiscale tools won't even let you hard bounce the machine. And they won't tell you when it's back up directly, so you just have to keep trying until you can finally get into the VM, to restart your servlet container.

This has caused no fewer than 20 instances where Confluence has died for us, sometimes lasting hours until we can actually recover the VM, and twice now in 2 days. It makes us look like morans who can't even run Confluence, much less guide development of a message encoding system.

So until we manage to get off of Flexiscale (haha, can't even get in to back up the data at the moment), if you see that error when going to the Fudge Messaging website, now you'll know why.

There are two morals of the story:

Wednesday, December 23, 2009

2009 Predictions Revisited

As promised, I'm coming back to revisit my predictions for 2009 to see how I've done. As predicted by virtue of my fuzziness, I was mostly right on most subjects. Let's go into particulars!

Messaging Breaking Out
I think I did pretty well there. We've still not got a final AMQP 1.0, but just following Twitter and blogs I'm seeing a lot more people, particularly from the non-financial world, starting to use messaging in their applications.

Cloud Becoming Less Buzzy
Complete strike-out here. The same people arguing amongst themselves over what is "cloud" is still going on. That being said, the use of utility computing (as I prefer to call it) is on the rise, and Amazon has come up with so many innovations in the space that it's hard to keep track.

Java Stagnating
Mixed bag on this one I have to say. Java has definitely stagnated, and we still don't have a Java 7. That being said, it looks like the delayed Java 7 may actually give us the chance to see JSR-310 and closures coming into the language, which would be a very positive development.

Java stagnating leads nicely into my next subject (yes, I'm going out of order now):

Non-Traditional Languages Breaking Out
I think I hit this one right on the head. The stagnation of Java, and prominent proponents of systems like Scala and Groovy, are seeing people being more willing than ever to consider these languages the "next Java". Scala in particular has gone from an interesting programming language to one which is seeing mass adoption in enterprises and in web shops.

C# Over-Expanding
To be honest, I have no idea how I did here. I've found myself completely and utterly outside of the C# ecosystem, so I'll have to leave it to one of my faithful readers to fill me in on how I did here.

Social Networking Losing Money
Yep, I failed here. Twitter's probably profitable, Facebook is almost certainly gearing up for an IPO. I was completely wrong here.

But it's not just the social networks themselves, social gaming has gone from an interesting idea to one that makes lots of money, indicating that the space of social networking has started to turn profitable not just for network providers, but also for network ecosystem partners.

Sun Radically Restructuring
I think I can say I was right here, in that they're radically restructuring themselves into Snoracle.

Next year's predictions on the way!

Monday, December 14, 2009

My Open Letter to the European Competition Commissioner

Monty urged me to help save MySQL. I couldn't possibly refuse such an offer.
Dear Competition Commissioner:

I am the Chief Executive and Technology Officer for OpenGamma, a financial technology startup located in the United Kingdom. I am writing to urge you to immediately and unconditionally approve the merger of Oracle and Sun.

I have a long standing history with both the Open Source and Database communities, having worked for a number of database startups in the United States of America, as well as working at early-stage companies making use and refining a number of Open Source technologies. I also have experience working at Oracle during a summer internship while I was attending the University of California at Berkeley.

Furthermore, in my more recent career as a consumer (rather than producer) of database technologies, I have setup and managed numerous, large MySQL installations, including one with more than 10 nodes and 150GB of active data (and several terabytes of archive data) in the financial services industry. I have also been a customer of Oracle's database technology for systems even larger.

In my mind, there is no logical reason to reject this merger based on considerations for the MySQL technology.

First of all, as a consumer of database technologies, I can tell you that Oracle and MySQL simply do not compete in the marketplace. While customers have replaced Oracle with MySQL, the applications based on Oracle that were ported to MySQL were never good candidates for Oracle and would have been ported to another database engine in due course as Oracle moves to the highest end of the market. Customers have numerous options for porting their applications off of Oracle onto a lower-cost database engine; MySQL simply has the most brand recognition in this space. Furthermore, MySQL has been used as a pricing lever by Oracle customers rather than an active option for migration.

That implies that Oracle's ownership of MySQL might see a reduction in competition for Oracle's core product, but there is a flourishing Open Source database market these days (which there wasn't when MySQL was originally created): Ingres, Firebird, LucidDB and PostgreSQL are all far more applicable to the Oracle customer base than MySQL is. Even if MySQL development were to come to an immediate halt, this wouldn't harm consumers in a such an extremely competitive environment.

However, the continued uncertainty over the Sun acquisition is potentially far more anticompetitive for consumers of technology as a whole: Sun has a number of competitive products with far greater applicability than MySQL (including their storage, networking, and computer chip technology). Allowing those products to die because Sun ran out of cash during this phase of its life would be extremely and permanently damaging to the overall computer industry inside Europe, and reduce competition significantly. Delaying this merger over the matter of MySQL would result in far greater anticompetitive results to European consumers of computing technology than even the worst case arguments of biased, self-interested advocates in this matter.

Thank you for your time, and once again, I urge you to approve this merger unconditionally and without further delay.

Sincerely Yours,

Kirk Wylie
Chief Executive Officer
OpenGamma

Thursday, December 10, 2009

Document Stores: Please Give Me A Standard API

Although I'm a long-standing RDBMS guy (having worked on Broadbase, Kidar, and Eigenbase/LucidDB [indirectly through Kidar and Broadbase]), I'm quite excited by the emerging document-oriented database movement. While I'm not a pure "SQL is bad and old-economy and you should throw it away" guy, I do think that document stores, like their Hierarchical Database precedents, have their uses in modern architectures. In particular, practical (as opposed to theoretical) aspects of the use of a hierarchical/document model allow for advanced scalability and performance optimizations to be made for modern scale-out architectures.

That being said, even though we're at early stages, I think the major proponents of the technique need to learn from the RDBMS guys in one important aspect: unified APIs are key to widescale adoption.

If I'm writing an application that's going to be backed by a relational database, if I'm in a sensible programming language, I've got a standard API that I can code against: JDBC, ODBC, ADO.NET, et. al. It doesn't shield me entirely from differences in the underlying database implementation (or else there would be no opportunities for product differentiation), but it makes those differences minimal and relatively easy for a software developer to abstract.

Ditto for message oriented middleware: I can use JMS at the code layer, or AMQP at the network layer (and thus at the code layer as well). While different MOM implementations have underlying differences, which are particularly obvious if I want to push the technology to its limits, the product-specific differences are noticeable in the breach rather than in the general.

This isn't the case right now for document stores. I know that MongoDB and Riak and CouchDB and SDB and others (which I'm sure commenters will point out below) are pretty darn similar in their functionality. I know that the conceptual models are relatively similar. I know this logically. But I still have to do custom code for each one for my own application.

With multiple implementations out there, and with users (e.g. me) looking at the different systems and seeing them logically similar, it appears that it's probably time for the teams to start working together and come up with a code-level API that I can code against, much like JDBC or JMS. While this might seem like early stages for such an effort, trust me, it'll greatly lead to increased adoption because the perceived costs of evaluating different implementations will be greatly reduced.

It doesn't mean that you can't differentiate; it doesn't mean that you can't be superior or inferior to other implementations. But it does mean that I, as a consumer of these systems, can more easily support multiple implementations. And if I can do that, I'm more likely to move to one in the first place.