Monday, December 20, 2010

Can too good development infrastructure spoil you?

At qCon San Francisco, Asish Kumar, head of development infrastructure at Google, gave a talk titled "Development at the Speed and Scale of Google". While I wasn't at qCon San Francisco, InfoQ has posted the video and slides, which I've just watched. First of all, stop what you're doing and watch the presentation if you're interested in development/build-and-test/SCM infrastructure. Stop now. This post will wait for you.

I originally wanted to write an article about how the talk shows just how much great developer infrastructure technologies can improve performance, and why every firm and team should be devoting significant resources to improving the infrastructure provided to developers. Heck, that's why OpenGamma's systems administrator spends most of his time working on our development infrastructure.

However, then as I kept watching the video, the more I saw a potential downside to this, particularly at Google scale: virtually everything you're using is bespoke. That has a number of potential issues:

  • Developers who may be quite familiar with "industry standard" development technologies will take time to come up to speed. While if you're focusing primarily on relatively long duration hires (on the order of years) this doesn't really matter that much, but many industries (for example, Finance, with its strong reliance on short-term contractors) have to factor staff churn in.
  • Unlike with most other development tools (which are dominated by Open Source technologies in the particular types of development Google does), once you start aggressively building a bespoke infrastructure you have to keep going down the bespoke route: you've effectively locked yourself into a path where you have to throw more and more resources at building your bespoke infrastructure to keep up with the state of the art.

In general, for Google, I think they broadly made the right choice to go bespoke in their tool chain. They want to leverage their own (highly proprietary) production infrastructure where possible, in probably most cases they're operating at a scale that no other tool chain would be capable of handling, and (Facebook poaching notwithstanding) they focus primarily on longer-term hiring.

But what about the developers themselves? My only worry about this type of thing from a longer term career development perspective is that, quite frankly, you get lazy in managing your own development infrastructure.

Let's say that you join Google, and spend a few years working very productively using their fantastic development infrastructure. And then you want to join a much smaller firm (which may or may not be your own startup). What then?

  • Could you even setup a makefile/ant script/maven setup yourself anymore? How long would it take you to relearn it?
  • Are you really going to have a good knowledge of what the current state of the art is in generic development infrastructure and build tools? After all, at New Gig you won't have the Google team backing you up.
  • Are you going to be perpetually frustrated in a role where you don't have a team working on your infrastructure for you?
  • Are you going to start designing systems where the build and test burden would be more than manageable with the infrastructure at Google, but where the lack of that infrastructure means that you might have been better off designing the system in a radically different way?

Note that I'm not actually saying that this is a bad thing, but I think if you find yourself as a developer in an environment where you have a large, well-skilled team taking care of stuff for you (whether that team is systems administration, operations, development infrastructure, whatever), you owe it to yourself to keep up to date with the state of the art in all the things in between writing your code, and your production environment, that other people currently do for you.

Otherwise, when you finally do get to a role where you no longer have those teams to work for you (or you have to figure out if they actually know what they're doing), you're not hamstrung by your background.

Tuesday, November 02, 2010

Another OpenGamma OpenHouse on 17 November 2010

Yep, it's that time again. The days are shorter, the nights are longer, there's a chill in the air, and it's time for OpenGamma to throw open its doors.

Beer; Food; Demos; Whiteboard sessions; Tech talk; Victorian warehouse conversion; Startup vibe. Honestly, if you're going to be in London on 17 November, why haven't you clicked the link and signed up?

Monday, October 18, 2010

I'm Hosting The Financial Track at qCon London 2011

As astute and long-term readers would no doubt be aware, I presented on RESTful Approaches to Financial Systems Integration at qCon London 2009 in the Financial Technology track. Then qCon London 2010 came, and I attended, and I was a little bit frustrated that in the city of the world most known for innovation in Financial Technology, there was no Financial Technology track.

Well, the masses have spoken. qCon London 2011 will have a Financial Technology track.

Even more humbling for me, I've been recommended as the host of the track, and I've accepted. I will be the Finance War Stories track host for qCon 2011.

I plan on making sure that this track follows on from the 2009 track, which was one of the highest rated and most oversubscribed tracks in the whole conference. Moreover, I plan on making sure that the track presentations are approachable and interesting for developers from finance as well as from people trying to learn from the financial technology industry.

I've long been of the opinion that the rest of the world, particularly web-scale industries, have a lot to teach the relatively insular community of financial technologists. I am also of the opinion that many problems people outside finance face are being reinvented by people who aren't familiar with the technologies, techniques, and architectures that have been second nature to financial industry professionals for years. I hope that we can continue breaking down the boundaries between financial technology professionals and the industry at large.

If you're interested in giving a talk, please feel free to contact me: kirk at opengamma dot com or a comment on this blog post is the best way to to the head of the list. In addition, if there's someone you really want to see presenting in this track, or something you want to see a talk about, please don't hesitate to contact me as well.

Tuesday, September 21, 2010

Back in New York City Next Week

Yes, it's only been a month since I was last in the city so nice they named it twice, but I'm coming back again.

I'm sure at some point this will be a frequent enough occurrence that I won't need to blog about it, but after the last post, I ended up having some very interesting meetups and conversations with people, so I'm trying the experiment again.

So that being said, on Monday the 27th of September I arrive in the sordid little burg on the other side of the pond. Hit me up if you want to talk about:

  • OpenGamma, in particular if you're thinking of using it
  • Fudge Messaging, in particular all the bits that we've done that aren't well documented yet
  • Startups, funding, or what a giant jerk I am on the internet (and in real life)

As usual, bonus points if you can give me a desk and get me out of the Ace Hotel when I'm not at customer/client/partner meetings!

kirk at kirkwylie dot com or kirk at opengamma dot com. You know what to do.

Monday, September 13, 2010

Cartoon Characters Discuss Web-Scale Asynchronous Communications

I couldn't resist joining in on the MongoDB/MySQL/DevNull Xtranormal meme. Many of these things I've blogged about before. But now with cartoon characters!

Thursday, September 09, 2010

Slow Clap for Item Moves in Basecamp

I follow SvN. I'm not entirely sure why at this point, as I very rarely actually see anything of merit on it anymore, but I've not gotten disappointed enough to drop it from my Google Reader.

Yesterday the Basecamp team decided to pat themselves on the back for allowing movement of items in between projects.

I look at the list of super-duper extra hard stuff that's going on, and I see one of two cases that could possibly be true:

  • They've sharded their databases to the point that even trivial operations require superhuman effort. Well-done in that case.
  • They've designed their database schema in the most completely insane way ever.

My suspicions? A little of column A, a little of column B.

Love this gem:

Is it a to-do list? It might contain to-do items that have associated time tracking entries. Move those time entries to the destination project too.

That seriously doesn't flow for free due to the relational semantics? You don't just have a FK relationship that means you're only updating the list?

Here's the lessons I think everybody should learn from this:

  • Sharding MySQL or another low-end relational database isn't a panacea, no matter how much it contributes to the web-scale secret sauce.
  • If you are sharding/partitioning data, you should probably think long and hard in advance about how that's going to impact movements across shards. Because they ALWAYS end up happening.

Wednesday, August 25, 2010

Java Initialization Barrier Pattern With AtomicBoolean

I've found myself starting to use this little mini-pattern. You might find it useful.

private final AtomicBoolean _hasBeenInitialized = new AtomicBoolean(false);
public void expensiveInitialization() {
  if (_hasBeenInitialized.getAndSet(true)) {
    // Someone else has already done the initialization,
    // or is currently doing it.
    return;
  }
  // Do the initialization that I want to be done only once.
}

Much simpler than any of the other mutual exclusion patterns that I've found myself using.

The caveat here is that in the case of multiple threads, it's possible that one thread (that got to the party late) may return to the caller before the initialization is done (if another thread is currently doing the initialization). Therefore, this pattern isn't suitable where the caller has to guarantee that the initialization is done before continuing in a multi-threaded environment.

I primarily use it where there's an initialization method that many parts of the code are going to call as a defensive measure before continuing. Think of it as a simple solution to the initializeIfYouHaventBeenButDoNothingOtherwise problem.

Of course, it well could be that everybody else in the world is already doing single-initialization this way, and I'm too addled from being outside the day-to-day coding world to have caught up.

UPDATE 2010-08-25 : I changed the name of the control AtomicBoolean to make it clearer that this is a multiple-execution-over-time pattern, and not a general purpose synchronization barrier.

Wednesday, August 18, 2010

Code in an expert programming language

Provided for your pleasure, anonymously, from an Expert Programming Language. Scala fans, you're about 2 steps removed from this.

redacted: {[str]
   map: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
   pad: #str _ss "="
   var1: 2 _vs/: map?/:str@&64>map?/:str
   var1[-1+#var1]: 6#(*|var1),6#0
   var1: ,/(-6#/:(6#0),/:var1),(pad*6)#0
   : _ci 2 _sv/: -1 8#((#var1) - pad * 8)#var1
}

Bonus prizes if:

  • You can recognize the hideous abomination of a language
  • You can figure out the common algorithm it's doing. Hint: What should redacted be named?

Tuesday, August 17, 2010

I Want A New Programming Language

Preface: This has nothing to do with the Oracle/Google Java lawsuit. Read this first.

Dear Lazyweb and Programming Language Inventors:

I want a new programming language. Although I seldom code these days for OpenGamma, I've wanted a new programming language for quite some time. I don't want an extreme language (in syntax or constraints); I don't want a purely experimental language; I don't want a faddish language. What I want is what Stephen Colebourne coins a "journeyman language."

What Is A Journeyman Language

Quite simply, a journeyman language is a programming language designed for journeyman programmers. And those guys are the hundreds of thousands of men and women working on business applications and systems programming every day.

Although there is wide variation in the quality of journeyman programmers, in general, very few of them are in the infamous Top 10% Of All Programmers level. But that's actually okay, and self selecting: the rockstar programmers simply wouldn't do the jobs that the journeyman programmer has to do every day; the job is too boring and lacks enough challenge and intellectual satisfaction/achievement. I have familiarity with firms in the financial services space who explicitly don't even interview high achievers because the employer knows the employee would hate the job and leave.

So what does a Journeyman Programming Language need? It needs to have a few general characteristics:

  • It has to be simple enough in syntax and conceptual framework that people not in the top 10% of the profession can feel comfortable working with it.
  • It has to be flexible enough to cover the amount of back-end, rich GUI, web app, and systems programming that goes on in the industry.
  • It has to make simple, common programming errors (memory allocation, array bounds, etc.) difficult or impossible.
  • It has to make it easy for Journeyman Programmers to change projects or jobs on a regular basis.

But it's a mistake to think that a completely dumbed down language can appeal and make a lasting impact in this space. Journeyman Programmers aren't idiots or morons: they often are just as good as rockstars, just not as passionate. That means that they're not investigating programming language features and Open Source libraries in their spare time, they're not going to meetups and blogging and tweeting and everything else. Raw talent often, passion seldom.

The other reason it's a mistake to dumb down the Journeyman Programming Language is that in any sufficiently large firm or project, there is often at least one rockstar programmer. And he needs to be comfortable with the tools that he uses to set the architecture and framework for other developers. Give him BASIC and he'll recoil in horror.

So here's my test: if you could write a reasonably high performance RDBMS system in the language, it has enough features. If you couldn't, it's not good enough. I like this particular test because I've done it several times, and also because there are a bunch of fiddly things having to do with getting primitives in and out of large byte blocks in space that languages like Java are particularly terrible at for no good reason that I can see (which is why LucidDB does query compilation in Java and execution in C++). RDBMS' involve everything you need: functions in the scalar expression system; objects in the query validator, compiler, and optimizer; efficient memory work in the executor. You do all of those in a simple language and system, and you've got my vote.

Positive Features

These are all characteristics that New Programming Language should have.

C-Style Syntax
Like many developers, I was born and raised on C lineage programming languages: C, C++, Java, C#. I've dabbled in many other programming languages (Pascal, Perl, Python, Scheme, Assembler), but nothing to me has the simplicity of syntactical expression that the C lineage of languages has. Let's keep that.
Garbage Collection
Getting rid of forcing memory allocation duties on the developer has been probably the single greatest boon to getting more people writing better quality code than anything else that has come out of programming language and environment development in the last 40 years.
Unambiguous Syntax
This is both a feature and an anti-feature. I want the language to be unambiguous, so that I can look at a chunk of code and, with experience with the language, know what it does. That means that any type of DSL services need to be confined to creating DSLs for other files, not intermingling DSL syntax with New Programming Language syntax.
Objects and Functions
I want an object oriented language, but one which recognizes that I will often have things that are better expressed with functional blocks. The Java approach of hanging static methods on a final class with a private constructor and import static is ridiculous and everybody knows it.
Closures
I want closures with my functions. With a nice syntax. That doesn't look like line noise. I won't want to use them for everything, but I want them to be present.
Useful Concurrency System
I want a number of low-level concurrency systems, as well as convenience operations for actor-style concurrency.
Object Immutability Services
I want some facility to mark that an object instance, or a particular call on it, will not and can not mutate the state. In other words, I want a const style system that actually works. Doesn't have to replicate the exact nature of the const keyword in C++, but just give me something that will lock down an object instance and thus allow it to be shared in a thread-safe environment without limiting me to constructor-only value injection.
Properties
I want C#-style properties. And I want a serialization system that allows me to cleanly map transport-representation objects to hierarchical data representations like Avro and FudgeMsg and JSON and XML (and will bork if I create a DAG or cyclical graph rather than a tree). Not all objects of course, but just ones that I'm going to use for transport. And I want the ability to do fast metadata-style operations on properties (think: for (Property prop : anyObject) { doStuff(prop.getName(), prop.getValue()); }). The way JSON flows for free out of JavaScript is an excellent example. Encapsulation? Pshaw. Journeyman apps are about data as much as objects, and everybody bloody well knows it at this point.
Real Generics
I want Real Generics. With runtime type information. That allow partial specialization.
VM Operation
It better run cleanly on any major OS and hardware combination. And having a VM gives me a lot of runtime management facilities for back-end processes built in (think jconsole or jvisualvm for the JVM). In particular, I'd probably like the JVM if you can shoehorn much of this into the JVM. If not, please look into making the bytecodes register-based like LLVM (even if the binary bytecode format ends up having both stack and register implementations potentially).
Stack Allocation/Memory Packing
If I really know the lifecycle of an object, and I want to bunch several similar objects together to exploit cache line efficiency, don't get in my way. Don't make it the default, but support it.
Fast Compilation and Runtime Linking
In my mind, these are one and the same thing, because each requires and mutually enables the other. The programming language needs to be able to support compile-on-save functionality (a la Eclipse) and complete runtime linkage (i.e. classloading/modules/whatever). These are massive productivity wins for a journeyman programming language.
Convenient Native Code Integration
While the core of the language should be based on bytecodes, sometimes for performance I really want/need to be able to go down to low-level coding in C. Please make that easy (JNI, for example, is horrific). Even better, if I could mark something as "this is in New Programming Language, but if there are native versions of this method/function about, use those instead" that would solve much of the JNI-style problems.
Modules
I like OSGi. I would actually use something like OSGi if it wasn't, well, OSGi. It would have to be baked into the language and compiler stack to be useful.
Partial Classes
I personally love this for combining code generation with hand-written code; C# nailed it again. And Journeyman projects tend to have a lot of places that code generation can/does assist with, particularly given the amount of XML and RDBMS and rich GUI work that goes on in large organizations.
Traits/Shared Code Blocks
A great feature to have a block of code that I can mix in without all the standard multiple inheritance issues coming in. At compile time, of course.
Random Stuff
While I'm at it, a bunch of random stuff I'd like.
  • Duck Typing
  • switch on anything. Strings, integers, enums, whatever.
  • Fixed precision decimals as a first-class type.
  • Static typing, with var lvalue determination.
  • Matching between module/type/class/whatever name and file name, to aid in automatic refactorings.
  • A good standard library, like the Python one, that gives me both an least common denominator and greatest specific feature ability to access the OS' services.

Negative/Indifferent Features

These are all features that I don't care about, or actively want not in the language.

Reuse of existing libraries
If you're targeting an existing VM like the JVM, I really don't care if I can use all the same libraries. Would be nice, but not really necessary. The only ones I'd care about are all integration systems and libraries like JDBC drivers to be honest.
Operator Overloading/Invention
Remember what I said about understandable code? Don't make it easy/possible to create a horrific mess of a language that looks like APL. The only time you ever want operator overloading for the right reasons is for the [] characters. Gosling was right on this one.
Monads
Gack. Yes, I'm very impressed that you found a way to have side effects in your pure functional programming language. No, it's not actually that useful in practice. Imperative programming languages can easily incorporate functional language features, and should do, without incorporating all the baggage that made your Functional Languages Professor at school giggle with delight.
Obsessive Terseness
Verbosity, when applied correctly, makes unfamiliar code easy to understand by someone who wasn't the author. An obsession with achieving the tersest possible language, or smallest possible number of syntactic features, makes code harder to understand. Remember, I'm talking about journeymen programmers here.
Pointers
No, the language can't expose a bloody pointer. There are only a few valid uses for them these days, and I'd hope that they're handled through other facilities. If you really really REALLY need one, just drop down to C and thrash with the system to your heart's content.
Abuse of your compiler
No lambda calculus in the generics system, please. No template metaprogramming, no hiding everything behind private typedefs, let's keep it simple.
Faddish Language Features
XML in my programming language? Yeah, in 20 years that'll totally seem relevant. Keep it in the libraries, thank you very much. Same thing with HTML, CSS, whatever.
Checked Exceptions
File this one under Seemed Like A Good Idea At The Time.
Throw Your Mom
Very clever that C++ allows you to throw anything. Null? A string? Your mom? No, thank you. I'll limit it to proper exceptions, thank you very much.

Why I Don't Have It

So why don't I have this language yet? Well, partially because programming language craftsmanship is hard. I'm pretty sure I'm not good enough to do it, which is usually my default criteria for saying something is Really Hard.

But I think as well the k3wl languages coming out are coming out of language requirements of the Top 10% crowd. They're the ones good enough to actually write the languages, and they're going to write a language that makes them happy. But then you end up with Scala, and then you end up with this monstrosity, and then you make me cry. A language in which that thing is even possible will never be a candidate as a Journeyman Programming Language.

You know who's going to do it? Someone like Gosling, who set about with the needs of the journeyman programmer in Java. But the state of the art has moved on, and Java just isn't suitable anymore.

Who I would really like to do it is Anders Hejlsberg. I am a very big fan of C#-the-Language. It's just that .Net-the-Ecosystem is so Microsoft-specific and horrific it'll never catch on in the wider world, no matter what Miguel de Icaza thinks.

So how's about this:

  1. IBM, please hire Anders Hejlsberg away from Microsoft. You know the Oracle/Google suit is scaring the crap out of you right now given how much you've invested in Java. It's not the suit itself, but the sign that Oracle, a major competitor to you, is going to leverage whatever muscle it can around Java against you eventually.
  2. IBM, please let Anders build this, which I'll call C-Prime, with smart people from the Java and LLVM communities, who all have a lot to add here.
  3. Open Source friendly licensing abounds, and the runtime works on a whole lot of interesting platforms. And if you want to pull a Larry and not support the Solaris port yourself, we'd all totally understand. You support Linux, Windows, and Mac, and 99% of developers are happy.
  4. Developers rejoice as they have the New Programming Language.
  5. Kirk is happy.

Oh, yeah, and Microsoft? If you could have broken the near pathological obsession with platform lock-in that surrounds all your interesting technology you had a really good shot with the CLR. C# could have been a contender. But at this point, your organization is so broken internally, and your reputation with the types of journeymen who work at large organizations is so tainted, that nothing you produce will get traction. Which is why I want Anders to leave you.

A little revolution every now and then isn't a bad thing. And at this point, I think it's time. Java-the-language will never advance in a standard way going forward; the collapse of the JSR has seen to that. We as a community who has worked on Java needs to move forward and onto the next language designed for the types of people who currently code in Java.

The Oracle/Google Java/Android suit and a forthcoming blog post

For those of you who aren't aware, I don't just spit out blog posts stream-of-consciousness. I mean, it might seem that way based on my terrible writing style, but I actually work at this stuff.

Many of you will know that I'm a long-time Java developer. I've professionally done a lot of other stuff, but the vast majority of my experience has been in Java. I like the ecosystem, the toolchains, the JVM; I find it a productive environment, and OpenGamma's software is predominantly written in Java.

There's a blog post that I've been working on mentally for months, and in text form for about 2 weeks. I'm planning on publishing it tomorrow. It has nothing to do with the ORCL/GOOG suit. Nothing at all. I've felt the things in the post for ages, well before Sun imploded, well before Oracle bought them.

For various reasons, I'm not going to say what I think about the suit itself. If you want to know, read Charles Nutter's analysis of the suit. Also read Stephen Colebourne's analysis. Both of these will make you smarter. Me? I have nothing to add.

When my post comes out, though, I wanted a vehicle to point all this out and a simple link in case people thought I was talking about the ORCL/GOOG situation. It's messy and complicated and wrapped in ego and profit and law and policy, and Charles and Stephen put the points across better than I could.

Friday, July 30, 2010

Perpetual Motion Machine Due Diligence Documentation

I'm not a VC, and nor do I play one on TV. Nor do I even pretend on the internet.

However, I've been backed by enough of them (or just had meaningful conversations which didn't lead to funding decisions) to be part of the network of people brought in to do initial due diligence on technology-heavy startups. And I'm getting increasingly annoyed by what I call the Perpetual Motion Machine Due Diligence Document.

How Unseasoned Entrepreneurs View Funders

Let's start with a simple premise: you're seeking funding for a technology-heavy startup. This doesn't apply to a company which has already launched (and thus should have technical credibility already established), and it doesn't apply to consumer-focused startups (where the technology better not be part of the pitch anyway).

You're doing something Really Hard. Often, you're doing something that's against conventional wisdom, to the point where many people in the market might not even attempt it. That's great! That's the type of technology-heavy startup that a lot of VCs might like to back.

But first you have to get past investors. These investors may have come from a technology background, but these days they work in senior management (if they're angels), or are retired, or are VCs and don't play with much outside of Outlook and the iPads. You've given them your slide pitch, and now they've asked for additional documentation on your Really Hard Technology. You know they won't understand it (it's Really Hard! It's against conventional wisdom! Paradigm Shift!), so what do you do?

Easy! You just dumb it down to the most outrageous claims that you possibly can. "Faster-than-light travel impossible? Those guys don't know what we know!" "Infinite compression impossible? We found a way to eliminate data entropy!" "Free energy! Fucking magnets, we know how they work!" The Perpetual Motion Machine Due Diligence Document thus emerges.

What Actually Happens

The naive entrepreneur assumes that his document is going to be read by the investors, and has to be strong enough in its claims to justify an investment, but simple enough to be understood by a general audience. So every one of these documents seems to follow a general pattern:

  • This commonly accepted wisdom/theorem is actually wrong.
  • We're the only ones who have figured that out.
  • Our technology thus solves all problems to all people.

Here's the problem: The Technical Due Diligence Document Is Never Read By The Investor.

Savvy investors know they're out of the state of the art (if they ever were; most skilled VCs were operators more than just raw techies). They know they don't have the background to determine whether your Really Hard Technology is actually good or doable.

So what they do is get a technical due diligence document from the founder, find someone in their network who is skilled with the state-of-the-art in that area, and send them the document for review.

And that's how the document that was only ever going to be read by an investor ends up in my email inbox.

Over-Inflated Claims Destroy Credibility

So now I'm reviewing a document which appears to be written to target a child, and is so laughable in its claims that it amounts to promising perpetual motion. What do I do?

If you think "Even if that's the case, surely the technical reviewer is going to be so blowed away that he'll seek out clarification directly from me, the entrepreneur" you fail at a technical startup. Why would I expose myself to the entrepreneur? What do I get out of that? Nothing good can possibly come of that.

  • VC/Angel passes on investing. Entrepreneur blames it on me and smears me in the industry. Whether the entrepreneur ultimately succeeds or fails makes no difference: no upside to me.
  • Entrepreneur turns into a complete time waster. I have a limited amount of time that I can devote to doing due diligence, and VCs are aware of that. If they want me to do direct face-to-face due diligence, they'll ask me specifically to do that.
  • Link between investor and me gets exposed. Many times both sides don't necessarily want that exposed to the world, particularly if it's a speculative investment possibility.

So what I do is send an email or have a 15-minute phone call explaining that the claims are completely overblown, the document has no technical detail, and there's a very low chance that their claims are actually going to stand up in production.

My word isn't the kiss of death, far from it. A good investor will collect initial opinions from several people he trusts, and determine whether to go farther. If they do, they'll ask for a smaller set of people to do in-person deep-dive due diligence, and use that to further the decision-making process.

But here's the thing: if I'm asked to be one of those people, if I had to say "the initial document is completely overblown in its claims and I doubt it's possible" in the initial review, I almost always decline to be part of the deep-dive due diligence process. Because I no longer trust anything the entrepreneur is going to say.

It's even worse if you're doing the rounds, and I get asked my opinion (by a potential customer, partner, or another investor) having seen that Perpetual Motion Machine Due Diligence Document. I instantly respond that I think it's probably snake-oil and the potential customer/partner/investor should stay away. When they ask me to do another review, again, I decline. I don't have time to do a gratis technology review for someone where I've already lost all trust in the entrepreneur and his claims.

Don't Be That Guy

It's very simple to avoid this.

  • Write your technical due diligence documents assuming several experts in your field are going to review it, and the investor is never even going to open the PDF file.
  • The more outrageous your claims, the less believable you're going to appear.
  • Don't assume a clever engineering workaround, providing a practical solution to a theoretic problem that's good-enough for most use cases, is a bad thing. It's not. It's been the foundation of numerous successful technology ventures.
  • Make me, as a reviewer, want to find out more, either out of personal interest, or as a potential customer or partner of yours.

But don't ever make me feel like I've just been handed Yet Another Perpetual Motion Due Diligence Document.

Thursday, July 29, 2010

Want to meet me in New York?

While I used to travel to New York City on a regular basis when I was working directly in the financial services industry, since OpenGamma got going I've been doing as little travel as I could get away with. What that's meant in practice is no work trips in a year and a half, and only a few holiday trips as well. Furthermore, when I used to go on business trips, almost all of my time was spent in our local offices, and I didn't really get a chance to meet and greet.

However, that's changing now. I'm going to be in New York City from the 12th through the 18th of August. Yes, I know how hot it is there. Yes, I know it's going to be miserable. Yes, like the locals, I too am hoping that the heat wave will pass. But I've got a few meetings I have to take, so going I am.

If you'd like to meet up for any of the following:

  • To learn more about OpenGamma (note: I'll even be packing some demo-ware and the source code if you're that keen to see what we're doing)
  • To geek out about databases (of the relational and non-relational type), MOM, Java, data encodings, Atlassian products, or anything else I blog about when I'm not shamelessly promoting myself or my company
  • To chat about startups, London, the tech scene here, or general expat stuff
  • To publicly berate me about one of the many controversial and offensive statements I've made on my blog or Twitter feed
  • To drink (coffee or alcohol) with someone you wouldn't usually drink with
  • To offer me a desk to work at when I'm not otherwise engaged (hint hint)

Just hit me up. Comments down below here, kirk at kirkwylie dot com if it's not about OpenGamma, kirk at opengamma dot com if it is!

Monday, July 26, 2010

Open Core, Natural Feature Divisions, and OpenGamma

I've written in the past on Open Core strategies for Open Source technology-based businesses. I've been following this debate for quite some time, and have at least a little bit more to say about it.

Open Core has gotten a bit of a bad name recently, largely due to two major recent events:

  • SugarCRM's new version and licensing going so far beyond any other established Open Source technology-based company that I hesitate to even group them into that category;
  • Eucalyptus' rumor of a refusal to merge NASA-provided patches to their Open Source licensed core, by a community perception that it would make the Open Source core more competitive with the proprietary version, leading to the creation of the OpenStack initiative.

People who want a lot more backstory on the arguments in the blogosphere should turn to the 451 CAOS Theory writeups (from Matt Aslett - Post The First, and Post The Second).

Personally, I believe that Open Core business strategies can work quite effectively where there are at least two conditions that hold true:

  • The core version is useful to a large subset of the target audience, without any requirement to purchase any features or services to achieve its utility; and
  • There is a natural split between features/components/modules that are licensed under the Open Source core, and the proprietary extensions.

The key thing to me is that the split has to be natural. An artificial split happens when someone looks at a distinction between an Open Source part of the overall offering and a proprietary part, and can't figure out what rationale might have been used to determine which was which, except for revenue defensibility. If your user base can't look at a feature and instinctively tell you whether it belongs in the Open Source version or the proprietary version it's artificial.

OpenGamma, the company I founded which is building an Open Source Platform for financial analytics and risk management, has from its inception planned on an Open Core strategy for part of its business model. Based on all the controversy, I've clarified our position, and how we naturally divide up features, in a new blog post on our web site. That's the official company statement.

This post is to explain my personal beliefs, and how we divided up the world. Just to make it abundantly clear, let me spell it out: I will not reject a community-generated patch just to maintain the defensibility of our revenue model.

Comments and questions more than welcome, either on the OpenGamma specific story (on the OpenGamma post) or my personal take (on this one).

Update 2010-07-26

Just to be clear, people should be aware of the changes made above about Eucalyptus: they never actually rejected contributions, but there was a rumor of it leading to massive perceptional issues that flowed through into debates about Open Core, regardless of the truth of these allegations. Thanks to some very wise birds who clarified to me back-channel, leading me to make sure that this is very very clear here.

Friday, July 16, 2010

On the subject of Bamboo upgrade woes

Kirk

Custom shirts make up for Bamboo upgrade woes, right?

Thanks
Atlassian
Dear newly fellow Accel portfolio company Atlassian,

Yes.

Sincerely Yours,
Kirk
Me in my fancy new @atlassian #BestButNotYetPerfect shirt
Fanboi for life, yo.

Friday, July 09, 2010

OpenGamma has come out of stealth mode

Ever since I announced here on my blog that OpenGamma was funded, my posting frequency has dropped off precipitously. This was in no small part because I've been spending so much time helping to build OpenGamma, but also because the constraints of being in stealth mode meant that I didn't want to accidentally disclose too much about what we were building.

Today I'm pleased to report that I can finally open up to the world, because OpenGamma has come out of stealth mode.

I'm going to let the OpenGamma blog post and web site speak for itself, except to clarify the distinction between my various internet personas and publishing vehicles:

  • My personal blog will remain just that: I'll keep posting on Kirk's Rants about technologies I'm interested in, and startup advice and culture. Nothing will change there, except that my current relatively low rate of updates will likely continue.
  • The OpenGamma team (myself included) will be blogging about OpenGamma-related matters, and subjects that are likely to be of particular relevance to the financial technology community on the OpenGamma Blog. Some of this will be technical, some of it will be related to OpenGamma, and some of it will be specific to financial services.
  • I'll try to keep the cross-posting from the OpenGamma Blog to my blog to a minimum, and only do so if I think it's a technology-related matter that would be of interest to my readers.
  • My personal blog represents nothing other than my personal beliefs, and should not be construed to be the opinion of the OpenGamma group of companies, its board of directors, or executives.

If you've been wondering what I've been doing for the past year, and what OpenGamma is going to keep doing, I encourage you to visit our new web site.

Bamboo+Clover satisfy requests for test quality

Several times recently I've been asked how much and how well we test our product. Our board wants to make sure that we're building a quality product that won't result in first-customer black-eyes. Professional indemnity insurers want to make sure that action is unlikely to be taken as a result of a bug. Potential partners and customers want to make sure that a first-generation product isn't going to be completely ridden with bugs.

Sure, I can talk about how test-infected we are, and how seriously we take quality development. That goes some distance.

But OpenGamma doesn't have a dedicated QA department. And yet I know that the quality of our source code is exceptionally high. How do I convey this as quickly as possible?

I've found a technique that works pretty darn well.

Bamboo+Clover To The Rescue

All I really have to do is open a web browser and go to our Bamboo server, where I can show a screen that shows the sheer number of build, test, and Clover-coverage build plans we have, and show that they're running all the time. That gets over the "are you guys actually testing" barrier.

For those of you trying to keep track, that's only one laptop-screen-full of many. Don't try to extrapolate the number of projects/plans we have from that. :-)

But how good are those tests? That's where Clover comes in.

Pictures like this go a really long way to putting their minds at ease:


Yes, I know that just because you're executing lines of code it doesn't mean that the tests are actually checking the results and all that jazz. But as a first-cut sign of "we take testing seriously" graphs like that pretty much immediately end the coversation.

Monday, June 14, 2010

Performance of Fudge Persistence in MongoDB

Here at OpenGamma we make considerable use of MongoDB, in particular as a persistent store for data which we either don't want to spend the time on normalizing to an RDBMS Schema, or where we positively value the schema-free design approach taken by MongoDB.

We also make extensive use of the Fudge Messaging project for virtually all of our distributed object management needs (whether using Fudge Proto files, or manually doing the Object/Document encoding ourselves). Luckily, the two work extremely well together.

Because of the way that we defined the Fudge encoding specification, and designed the major Fudge Reference Implementation classes and interfaces, it's extremely easy to map the worlds of Fudge, JSON, and XML (in fact, Fudge already supports streaming translations to/from XML and JSON). We've actually had support for converting in between Fudge objects and the BasicDBObject that the MongoDB Java driver uses since Fudge 0.1, and we use it extensively in OpenGamma: anywhere you have a Fudge object, you can seemlessly persist it into a MongoDB database as a document, and load it back directly into Fudge format later on.

So with that in mind, I decided to try some performance tests on some different approaches that you can take to go from a Fudge object to a MongoDB persisted document.

Benchmark Setup

The first dimension of testing is the type of document being persisted. I had two target documents:

Small Document
This document, intended to represent something like a log file entry, consists of 3 primitive field entries, as well as a single list of 5 integers.
Large Document
This document, intended to represent a larger concept more appropriate to the OpenGamma system, consists of several hundred fields in a number of sub-documents (sub-DBObject in MongoDB, sub-FudgeFieldContainer in Fudge), across a number of different types, as well as some large byte array fields.

I considered had three different approaches to doing the conversion between the two types of objects:

MongoDB Native
In this case I just created BasicDBObject instances directly and avoided Fudge entirely as a baseline.
Fudge Converted
Created a Fudge message, and then converted to BasicDBObject using the built-in Fudge translation system
Fudge Wrapped
This one wasn't built in to Fudge yet (and won't be until I can clean it up and test it properly). I kept a Fudge data structure, and just wrapped it in an implementation of the DBObject interface, which delegated all calls to the appropriate call on FudgeFieldContainer.

Additional parameters of interest:

  • Used a remote MongoDB server running on Fedora 11 (installed from Yum, mongo-stable-server-20100512-mongodb_1.fc11.x86_64 RPM) running on a VM with reasonably fast underlying disk.
  • Local MongoDB server was 1.4.3 x86_64 running on Fedora 13 on a Core i7 with 8GB of RAM and all storage on an Intel SSD
  • MongoDB Java Driver 1.4 (pulled from Github)
  • JVM was Sun JDK 1.6.0_20 on Fedora 13 x86_64

Benchmark Results

Test Description MongoDB Native Fudge Converted Fudge Wrapped
Creation of 1,000,000 Small MongoDB DBObjects 539ms 1,603ms 839ms
Persistence of 1,000,000 Small MongoDB DBObjects 41,188ms 46,201ms 92,866ms
Creation of 100,000 Large MongoDB DBObjects 15,351ms 23,956ms 15,785ms
Persistence of 100,000 Large MongoDB DBObjects (remote DB) 57,207ms 60,511ms 56,236ms
Persistence of 100,000 Large MongoDB DBObjects (local DB) 66,557ms 74,763ms 58,816ms

Results Explanation

The first thing to point out is that for the small DBObject case, the particular way in which MongoDB encodes data for transmission on the wire matters a lot. In particular, there's one decision that the driver has made that changes everything: it does a whole lot of random lookups.

A BasicDBObject extends from a LinkedHashMap, and so doing object.get(fieldName) is a very fast operation. However, because Fudge is a multi-map, we don't actually do that in Fudge, and by default we store fields as a list of fields (JSON stores lists as a, well, list; Fudge stores them as repeated values with the same field name). Because this makes point lookups slow, we intentionally do whole-message operations as often as we can, and just iterate over all the fields in the message.

The MongoDB driver code does the same thing, but instead of doing a for(Entry entry : entrySet()) style of operation, it iterates over the keys and does a separate get operation for each key. In Fudge, this is potentially a linear search through the whole message.

To work around this, in my wrapper object I built up a map where there was only a single value per field. This works well, but the small document case has 1/6 of the fields be a list, making this test thrash in CPU on doing the document conversion (which explains why the small document persistence test is more than twice as fast with the wrapper as just rebuilding the objects). Yes, I could do this optimization further, but it would be difficult to improve on the combined setup (document construction) and runtime (persistence) performance of just building up a BasicDBObject, which is what the Fudge conversion does anyway.

The wrapped Fudge object wins in every case for the large document test, no matter how many times I run them (and I've done it quite a few times for both local and remote, with all outliers eliminated). Moreover, I actually get faster performance running on a remote DB than on a local DB (which surprised me quite a bit).

The only things that I can conclude from this are:

  • FudgeMsg limits the data size on insertion into the message (when you do a msg.add() operation, not on serialization) for small integral values (if you put in a long but it's actually the number 6, Fudge will convert that to a byte). However, the ByteEncoder which converts values in MongoDB to the wire representation will never do this optimization, and will actually upscale small values to at least a 32-bit boundary. This means that if you put data into a FudgeMsg first and then put it into the MongoDB wire encoding, you shrink the size of the message. Given the number of pseudo-random short, int and long values in this message, it's a clean win.
  • The object churn for the non-wrapped form (where we construct instances of BasicDBObject from a FudgeFieldContainer) causes CPU effects that the wrapped form doesn't suffer from.

Conclusion

One of the things that was really pleasant for me in running this test is just how nice it is to take a document model that's designed for efficient binary encoding (Fudge), and persist it extremely quickly into a database that's designed for web-style data (MongoDB). The sum total of the actual persistence code is all of about 10 lines; I spend far more lines of code building the messages/documents themselves.

The wrapped object form definitely wins in a number of cases. My current code isn't production-quality by any means, but I think it's a useful thing to add to the Fudge arsenal. That being said, I think the real win is to rethink the way in which we get data into MongoDB in the first place.

Given the way the MongoDB Java driver iterates over fields, it seems to me that a far better solution is to cut out the DBObject system entirely, and write a Fudge persister that speaks the native MongoDB wire protocol directly, and take advantage of the direct streaming capabilities of the Fudge protocol. When we've done that, we should be going just about as fast and efficiently as we can and Fudge will have a seamless combination of rich code-level tools, efficient wire-level protocol for binary serialization, great codecs for working with text encodings like JSON and XML, and a fantastic document/message database facility using MongoDB.

Sunday, May 23, 2010

Don't Host Crowd and Jira in the same Servlet Container

This took up quite a bit of my Saturday figuring out, so I figured I'd add some pointers for other people to find.

Atlassian doesn't recommend that you host their applications in the same Tomcat instance, rather encouraging you to deploy them in different Tomcat instances and JVMs through the "Standalone" distributions. However, recommendations never stopped me before, so at OpenGamma we have two different sets of Atlassian infrastructure:

  • One set of Confluence, Crowd and Jira for the FudgeMsg project running in one Tomcat container on one VM
  • One set of Crowd and Jira for our corporate use running in one Tomcat container on one VM
  • Bamboo and FishEye for OpenGamma corporate use running behind our firewall in their own standalone implementations in their own VMs.

Yesterday I tried to upgrade Crowd and Jira for the OpenGamma corporate installation. It wasn't pretty.

First of all, I ran into this KnowledgeBase issue, where Confluence, Crowd, and Jira all ship with different versions of the Felix jar for plugin management. This stopped the plugin system from starting for Jira (since Tomcat launches the Crowd application first), so Jira was pretty borked.

Then I ran into something far more pernicious, which other people should be aware of.

Reindexing Requires Crowd

If you're running your applications with Crowd, the application delegates user-related information to Crowd and uses a RESTful approach to loading the data. So far, so good.

However, when Jira attempts to in-situ upgrade an installation (which again they don't recommend), it will do its database wrangling, and then reindex the system with the new functionality (in our case, going from 4.0 to 4.1.1 to allow searches based on votes and watches on issues). All this upgrading it does in the application startup logic, which happens in the Tomcat main thread.

When it gets to reindexing, it then attempts about 10 different RESTful calls to your Crowd instance, as it doesn't have any caches populated on user data. However, while Tomcat has opened up port 80 (or 8080 or whatever) for Crowd, it hasn't enabled application dispatch to the Crowd servlets yet.

This means that all 10 remote calls (which are actually local to the single Tomcat instance since both apps are co-hosted) hang, and the entire server startup process fails.

The only workaround is to launch Crowd in its own servlet container, change your Jira's crowd.properties to refer to the new one, startup, and then undo what you've done.

The Moral Of The Story

When your software vendor recommends that you don't do something, don't do it unless you have an exceptional reason to do so.

If you're a software vendor and you support something but don't recommend it, still test it as there will be customers who ignore your recommendations and do it anyway.

Monday, May 10, 2010

The Difference Engine: Give Up And Move To London

So I came about an article about The Difference Engine, which is attempting to be the YCombinator of Europe. I'm going to have to completely and utterly disagree with Mike Butcher here when I give some pretty pointed advice: they should give up right now, this minute, this very round of startups and escape Middlesbrough and move to London.

Editorial Note: If you're offended by some soft Southerner capping on The North, or, worst of all, an American bashing everywhere in the UK that Isn't London, just skip ahead to the comments and start bashing. You won't like the rest of this article, and you'll just start flaming the comments anyway, so save yourself some time.

Mike, you're 100% wrong that Europe is somehow Exceptional when it comes to siting of startups.

First of all, let's deal with the whole "YCombinator Of Europe" thing. YCombinator started out in Cambridge, Massachusetts, which already had a pretty large startup tech cluster. Then they started doing the program half-and-half with Silicon Valley. And then they finally gave up and moved the whole thing to Silicon Valley. They already had experience with several rounds of startups, and several rounds of exits. They saw that the #2 tech cluster in the US, and possibly the world, simply didn't have a big enough ecosystem. Yet somehow Middlesbrough does?

Second, let's deal with the whole "Don't Have To Be In London" thing. You're right, there are a number of European startups that are coming from NotLondon. Mike references Dopplr (Helsinki, but then also has a big London-based office) and Spotify (Stockholm, though how much of a scrappy startup they can be with the most opaque ownership and funding sources in the world is debatable). I'd add a number of Baltic-based startups to that list (like Erply). None of them started out in London, that's very true. But they all did start out in their country's gravity well for top talent.

One thing that strikes me as an American about Europe is that the vast majority of countries have one super-dominant central city that acts as a major pull for talent throughout the country, and which is usually the capital city: London (the UK); Dublin (Ireland); Paris (France); Copenhagen (Denmark); Prague (Czech); Stockholm (Sweden); Helsinki (Finland); Amsterdam (Netherlands, though this almost deserves an asterisk since the whole Utrecht-Rotterdam-Amsterdam cluster is one massive metropolitan area). Probably the place with the least sort of concentration is Germany, with its multiple nearly co-equal metropolitan areas.

All of these places are the dominant suck of talent from their respective nations, and produce the types of network effects that you need for a wide variety of knowledge-based industrial sectors. To me, it's no wonder that you're getting non-London-based startups out of these cities, rather than also-ran cities in each of the countries (how many startups are coming from Marseilles, Brno, Arhus, Den Haag, Valencia? Probably not more than a few). You need to be where the best pool of talent is.

So let's focus on what you need for a successful startup:

  • A Pool of Skilled Talent: While The Difference Engine may be able to get these people on the ground in Middlesbrough, what happens when the team needs to go from 2 to 3? What happens when it needs to go from 3 to 10? Where are they going to get the talent? They'll have to leave Middlesbrough, that's what.
  • Sources of Funding: This is a longer-term thing for an incubator, but you need to have the sources involved relatively early and often to make it a success; YCombinator flew the Cambridge guys to Silicon Valley just to give them experience with the funding scene, and Cambridge already has quite a few big name VCs. Middlesbrough? How many European VCs have ever been there?
  • Entrepreneurial Culture and Mentors: You've got the guys directly involved with The Difference Engine, but what about other people who have run the whole cycle a few times and can act as passive or active mentors?

I contend that none of this is present in Middlesbrough, and quite simply never will be. All of them are present in London today. The single best thing that The Difference Engine could do in order to help its startup founders is to move to London next week.

And here's the best part: it's not going to stay in Middlesbrough. That's right, it's moving to Sunderland. Which means it'll end up with the types of back-and-forth that mean neither place ever sets up any type of cluster effects. Smacks to me of the type of thinking that's led to the European Parliament splitting its time between Brussels and Strasbourg.

Here's what this is, plain and simple: Yet Another Regional Development Agency Thinking One Industrial Park Will Rebuild Its Economy. How many cities and regions say "well, if we just setup an industrial park, we'll all of a sudden be the Silicon Valley Of X"? How many cities, regions, and countries say "well, if we just setup a small venture capital fund we'll build a new tech cluster?" I've seen it all before. And it turns out that the top three sponsors of The Difference Engine are all councils and a regional development agency.

Tech clusters happen organically. They can't be willed into existence by any government or development agency.

Look, I give Middlesbrough and Sunderland councils props for trying to convert their cities from the Grim Northern spots that they're known to be. But they're being, I think, disingenuous to their founding participants.

If you care about the actual founders, rather than just about wasting development funds, get your founders out of the North and into London immediately.

If you're a founder considering The Difference Engine, my personal recommendation is to ignore it. Move to London, get yourself a desk in a shared office in the Shoreditch or Bankside area, and go to town. If you're the type of person ready to do a startup, you'll probably learn more hanging around the vibe than you would sitting in the Grim North for a few months.

Sunday, April 18, 2010

Twitter For Messaging: Encoding Binary In Unicode

This week at Chirp, Twitter announced Annotations, which is the Twitter-specific way of saying "you can assign arbitrary metadata to individual twitter updates." After a very quick Twitter (how self-referential) conversation with Alexis from Rabbit Technologies, we agreed that with this, Twitter is essentially trying to build a much more general-purpose pub-sub messaging technology. I wanted to talk about that.

History Repeats: Twitter Is MOM

The use of annotations is extremely familiar to anybody with a background in traditional messaging technologies. In general, publishing a message in a traditional environment requires:
  • A destination. In traditional pub-sub, this is a topic name; in Twitter parlance it's the publisher's twitter handle.
  • Message metadata. In traditional pub-sub, this is a set of properties (typesafe name/value pairs attached to a message); in Twitter parlance it's your annotations (full definition still forthcoming).
  • Message content. In traditional pub-sub, this is in general a byte array (though specs like JMS allow for code-level specifications that ultimately resolve to a byte array); in Twitter parlance this is your 140-character tweet.

Now it looks like we've got some pretty good equivalencies; every major headline element is covered by Twitter. So how do you adapt to an All Twitter world?

I would say the starting point is the message content. I live in a world of machine-to-machine communication (people are messy). Byte arrays don't match up with character data.

Or do they?

An Historical Diversion: BinHex64

Let's consider a problem that impacted technology professionals back before most Ruby programmers were alive: how do you transmit binary data over the internet?

You had two options:

  • Use a binary protocol, written from scratch or using something like RPC. This worked, but required endpoints that understood the protocol.
  • Transmit data over a text encoding, like email or usenet. This allowed for the greatest amount of interim-stage compatibility, but had serious interoperability issues.

The primary interoperability problem with transmitting binary data over text protocols was a pretty simple one: most Internet protocols from the Dawn of Time were written by ignorant Americans and thus only supported 7-bit ASCII. Binary data inherently is 8-bit: you're trying to transmit a byte array, and each byte has 8 bits. How do you fit a square peg (8-bit binary) into a round hole (7-bit ASCII)?

The solution is BinHex encoding. The basic idea is that you attempt to represent a high-fidelity source dataset (8-bit binary) in a low-fidelity target encoding (7-bit ASCII) by increasing the size of the encoded message to fit the target encoding.

This seems stupid and archaic, but it's still the way binary attachments go into email.

Enter The Reverse BinHex

At first glance it might not seem that way, but Twitter represents the world in a reverse form to BinHex encoding. With BinHex encoding you're trying to fit 8-bit bytes into a 7-bit world; with Twitter tweets you're trying to fit 8-bit bytes into a "character" world. The only thing that's germane is "what is a character?"

Twitter is quite clear: A Twitter character is a Unicode code point. If it weren't so, Twitter wouldn't be able to handle localized tweets as well as it does.

Right, now we're cooking with gas.

A Unicode code point is, in the most broad brushstrokes possible, drawn from one of two distinct sets of planes:

  • The Basic Multilingual Plane, consisting of code points in the numerical region from 0x0000-0xFFFF.
  • The Astral Planes, currently allowing code points in the numerical region from 0x100000-0x10FFFF.

According to their character count page, Twitter uses UTF-8 in its internal representation. Let's consider two distinct possibilities:

  • Twitter only handles the Basic Multilingual Plane. If that's the case, in general, one Twitter character can handle 2 8-bit bytes.
  • Twitter handles the full theoretic range of Unicode including the Astral Planes. If that's the case, one Twitter character can handle 2 and a half 8-bit bytes (ignoring the Supplementary Private Use B plane, to make things simpler).

If we ignore all complicating factors, what can we thus store in a Twitter 140-character tweet if we're trying to encode machine-readable byte arrays?

  • 280 bytes if Twitter only supports the Basic Multilingual Plane; or
  • 350 bytes if Twitter supports the vast majority of the Astral Planes.

These don't seem like a lot, but if you allocate a few bytes (Twitter Encoded of course) for message sequence number and chaining, and you use a compact binary representation like Avro or FudgeMsg, you can get a lot of data into 280/350 bytes.

So if you use this theoretical reverse-BinHex encoding system to expand byte arrays into Twitter Messages (after full Annotation support is released), you can get arbitrary metadata for routing decisions, and a 280/350-byte binary payload. Enough clearly for a lot of uses.

Twitter As The New Machine-to-Machine Cloud Service

Don't be daft. This is entirely a thought experiment about how you could encode Real Data into a Tweet. If you attempt to hook multiple machine processes up through Twitter as a data distribution mechanism you are a moron.

If you're interested in that type of functionality, you should talk to Rabbit or another Cloud Messaging Provider (I'm sure there will be competition forthcoming). Cloud Messaging makes sense; using Twitter as your Cloud Messaging Provider is completely stupid.

Seriously. There are certainly use cases where you can see machines, people, and other machines communicating over Twitter. But if you're going to the point of converting binary data into arbitrary Unicode codepoints for transmission over Twitter, you completely fail at asynchronous communication, and should be required to spend at least 6 months doing nothing but implementing sections from EIP as penance.

Tuesday, March 23, 2010

Looking For a WordPress Theme Developer

In case you hadn't noticed, the OpenGamma web site doesn't really have a lot of content on it. We intend to change that.

We've got content broadly ready to go.

We've got a designer working on HTML templates for the content.

We're looking for a freelance designer to convert everything to a WordPress theme for a (mostly static) site that will definitely include a blog.

Nope, you don't need to be in London. You could be virtually anywhere in the world. Send an email to jobs or info or even kirk, all at opengamma.com.

Sunday, March 21, 2010

My MP's View On The Digital Economy Bill

Bases on the tools provided by 38 Degrees, I contacted my MP to urge proper debate on the Digital Economy Bill facing the current parliament here in the UK.

Apparently I wasn't the only one, as his office had a full form response prepared and ready to go. Here's what he sent.

Full Text:

Thank you for contacting me about the Digital Economy Bill.

For nearly twelve years, the Government has neglected this crucial area of our economy. We believe a huge amount needs to be done to give the UK a modern regulatory environment for the digital and creative industries. Whilst we welcome aspects of the Bill, there are other areas of great concern to us.

We want to make sure that Britain has the most favourable intellectual framework in the world for innovators, digital content creators and high tech businesses. We recognise the need to tackle digital piracy and make it possible for people to buy and sell digital intellectual property online. However, it is vital that any anti-piracy measures promote new business models rather than holding innovation back. This must not be about propping up existing business models but creating an environment that allows new ones to develop. That is why we were opposed to the original Clause 17 and are still opposed to CLause 29, which props up ITV regional news with License-Fee-payer's money.

The Government's failure to introduce the bill until the eleventh hour of this Parliament has given rise to considerable concern that we no longer have the time to scrutinise many controversial measures it contains. We believe they should be debated in the House of Commons, and only if we are confident that they have been given the scrutiny that they deserve will we support them. My colleagues in the Shadow Culture, Media and Sport and Shadow Business, Innovation and Skills teams will do everything in their power to work towards legislation that strengthens our digital sector and provides the security that our businesses and consumers need.

Once again, thank you for taking the time to write to me.

Thursday, March 18, 2010

OpenGamma is Looking for a Browser-Based Software Engineer

We've already put this up on our jobs page, but I wanted to highlight the job posting to my readers.

OpenGamma is now looking for someone to build up our browser-based software engineering efforts. We've got a super-strong set of server-side software engineers who are well versed in building the back-ends of applications (and delivering data to front-ends in browser-friendly ways). We've got people who are very familiar with extending well-defined applications to support new functionality. What we don't have is someone who lives and breathes the browser.

That's where you'd come in.

We want someone who can come in, and make sure that we present the platform that we're building in the best way possible to end users, delivered through browser-based mechanisms. You'd get a clean slate to work with, and the chance to work against what we know will be the limits of what browsers can do. And at no point ever will we ask you to support IE 6.

While we're building financial technology, we don't think you need to know a single thing about the financial services industry to take on this role; in fact, we think our ideal candidate isn't coming from finance at all (judging by the quality of web applications we've all seen in finance). Anything you need to know you'll pick up on pretty darn quickly.

We'd prefer someone local to London (our new offices are in the Bankside area, with views of the Tate Modern). If you're based outside the M25 and need accommodation for telecommuting, we don't need you in the office every single day.

We're a well-funded startup, we're building technology that has the potential to disrupt an entire industry, we have exceptional people to work with, we have a no-bullshit stance on bureaucracy, and we all have an equity stake in the firm.

Take a look at the more comprehensive spec on our web site, and if you think you or someone you know would be perfect, contact us (jobs at opengamma dot com).

Recruitment Agencies: We are not open to unsolicited profiles or CVs for this role without an existing MSA signed by OpenGamma.

Monday, March 15, 2010

QCon London 2010

The second half of last week the entire OpenGamma team (with the exception of our Head Quant) attended QCon London 2010. Last year I was only able to attend one day (the one in which I was presenting), and it was great to see the whole conference rather than just one part of it.

QCon to me is a great experience, because it's a technology conference designed for technologists who already get it rather than as an excuse to spend their company's training budget on a jolly to get out of the office for a few days. As such, the needs and desires of the attendees factor way more into the makeup of the conference than the needs of the corporate shills who provide much of the content for what passes for a "conference" these days. You get to see trends going on in near-real-time by attending (both through the presentations and the comments and questions), making it nearly invaluable to any type of software professional.

General Comments on QCon 2010

Here are some generic comments on the conference this year.

What Happened To The Finance Track?

I've been quite vocal about this, and I'll be even more so: where in the hell did the Financial IT track go? Looking around the QCon London crowd, at least 25% of the attendees worked in finance. Every single presentation with even a hint of financial content was Standing Room Only. And yet the organizers nixed the Finance track this year. What were they smoking?

Seriously, bring it back. It reflects the target crowd for the target city in a way that no other track does, and is one of the things that used to make QCon London special and location specific. And, apparently, in 2009 it was the only track not to receive a single red card for any presentation. So it's not like the audience didn't appreciate it.

Speakers: Know Your Audience

There were three types of presentations that I didn't feel went over particularly well:
  • Preaching to the Converted: The first keynote was the definition of trying to preach to the converted, and there were other presentations that had the same problem. You want to talk about how great agile development methodologies are? You want to talk about how great it is to continuously test? You want to talk about how we should be concerned with performance? It's over. QCon audiences already know that and already believe. And unless you're in a particular track, rah-rah talks don't go over well. We already get that stuff, so don't try to "sell us" stuff we've already bought.
  • Over-Specific Talks: A QCon audience is very very diverse. In the same crowd you might have people from games, online betting, finance, and consumer internet. We might develop in Java, C#, C++, Flex, JavaScript, Ruby, and Python. Pitching a general abstract but not showing any details outside of one domain won't work. If you're going to be a specialist presentation just say it, because saying that you're general and then talking exclusively about one technology won't work.
  • Targeting General Developers: There was at least one presentation I went to that was quite obviously a canned talk written for generic in-house developers who don't as a regular part of their job work with advanced technologies. While talking to a crowd of Visual Basic programmers might require extensive background in modern software techniques, the QCon crowd doesn't. You just lose the crowd in background that the audience doesn't require. We want meat on the bones.

Big Trends

All that being said, I think that there were several trends that really bubbled up to the foreground this year that we'll all be hearing about over the next year.

CAP Theorem

Eric Brewer's CAP Theorem states that for any distributed system, you face a constant tradeoff between Consistency, Availability, and Partition tolerance; you can choose at most two of those, but not all three.

It's a familiar scenario for developers familiar with the mantra of Features, Quality, Deadline, Resources (pick two). Software and systems engineering is a game of tradeoffs in desires and resources, and you have to have variable inputs to the system. The CAP theorem simply expands the general notion to the particular problem with distributed systems (which these days is all systems, as the days of 2-tier or 1-tier systems are long gone).

So why did I pick this as a Big Trend of QCon 2010? Because it come up over and over again:

  • All the NoSQL databases are designed around the requirements of the CAP Theorem and how it plays out in web systems;
  • Any distributed system designed for scale and availability has to be aware of the particulars of the CAP Theorem;
  • Performance and availability testing has to be aware of how the particular system has been optimized for the constraints of the CAP Theorem.

In essence, this one theorem (originally put forward in 2000) ended up appearing in at least 6 presentations I went to. That's really big.

Imagine that for the first eight years after Binary Search was invented nobody really used it or talked about it. Then the next year all of a sudden you started to see a whole bunch of innovation around searching ordered collections of stuff. Then the next year all anybody could talk about was how binary searching of stuff was going to change the industry. It's a tipping point thing, and we've hit it.

So my #1 trend from QCon London 2010 is the widespread knowledge and comprehension of Brewer's CAP Theorem driving systems architecture and development. Whether it's the adoption of NoSQL technologies, or just better development of distributed systems, people now get it and there's no turning back.

Performance Testing

The needs of any modern system to consider performance as a first-level requirement are pretty well understood by a QCon-style audience. But the presentations I attended had quite a few anecdotes that show how far this need is being addressed by technology-forward audiences:
  • An online games company that spent a full man-year up-front on developing a flexible performance testing framework before they were anywhere near feature complete.
  • A trading system that was optimizing cache hits in their software written in Java before they launched.
  • A web site spending a third of their research budget making sure their underlying technology stack could support the new functionality they had to support on that stack.

What became incredibly clear from a number of presentations is that the era of "write it and performance test if performance isn't good enough" is over.

Development teams are considering performance of the system from day-one, and are discussing that. Whether they're spending engineering effort on a custom scripting system to allow them to model user behavior and refine their scripts based on actual user interactions, or whether they're building elaborate tools to capture actual system performance at runtime regardless of the user interaction, teams are considering the performance of their system to be a critical requirement that has to have tools to support it from the start.

So my #2 trend from QCon London 2010 is performance management of distributed systems has to be a pre-launch requirement. Personally, I love the capturing and processing of actual system behavior as the system naturally runs. But no matter which approach you choose, you have to bake that into the system before it goes live.

Note that this statement isn't contradicting my point above about how we all already care about performance. The point here isn't that we haven't cared about performance before, but that weren't always structuring performance management and optimization into the cores of our systems. The trend to me is to start to think of performance just as seriously and up-front as we do about storage and templating and everything else that goes into our distributed systems.

Conclusion

Will I go back in 2011? Probably not unless I'm presenting, and then probably just for the day. This isn't to say that I didn't find the conference useful, because I did. It's far more to say that my personal situation as a startup CEO can't be met by any situation where I have to take three days out of the running of the company.

Will I send the rest of the OpenGamma team to QCon 2011? Hells yes, particularly if they reinstate the Financial IT track. The crowds, the conversations in the networking breaks, the presentations; it's a brilliant conference. It's pricey, but it's worth every penny.

Footnote

Did I mention that Eric Brewer is a professor at the University of California at Berkeley? No? Oh. Go Bears!

Sunday, March 14, 2010

London Startup Office Financials

Astute readers will know that I'm the CEO of OpenGamma, a (semi-)stealth mode financial technology firm funded by Accel Partners. Longer-term readers will know that I'm a huge backer of London as a startup hub for all of Europe. A big part of navigating the London working environment is handling our pretty unique property market; I think our most recent past will be useful for someone else out there.

OpenGamma: Pre-Closing

We got our term sheet in the last half of July. Now while some people will argue that VCs take all of August off, our team didn't, and we knew that we had to work aggressively to make for a fast, clean closing.

The OpenGamma founders are a motley crew:

  • Elaine, our chief quant, had been unemployed while intentionally working through a quite enforceable non-compete.
  • Jim, our head of software engineering, arranged his end-of-contract to be the end of July, with lots of solicitor advice to ensure that there wasn't a binding and enforceable non-compete.
  • I was just ending a contract at a major international Investment Bank (astute readers will have heard of this as Big Bank B).

I argued that we needed a space that we could all be together just to make sure that all the various legal moves that resulted in the incorporation/Series A dance were pulled off okay. Although Jim and Elaine weren't sure, as CEO, I won.

Because up until then we were operating still out-of-pocket, we went with the cheapest serviced office that we could get away with. That basically meant one of the myriad of serviced offices along Borough High Street, and came down to 400 pounds per desk per month. That, to me, is the lowest you can possibly go for a private space in Zone 1 London.

In Which We Run Out Of Space

In the same serviced office block, we were able to expand to an adjoining room, giving us room for 8 people (with one space allocated for servers, scanners, and my Administrative Paperwork Overflow). That seemed fine, until we got to 6 people and had an offer accepted by Stephen Colebourne. Now we had a problem.

Up until then we had been paying for 7 potential workers (building management threw in the 8th for free), at 400/desk/month. But the problem is that the serviced office building that we moved into was full. When we took the initial space it was pretty much empty, but now we had no ability to expand at all. Worst of all, when it filled up, the space that seemed quite fine became completely unworkable.

It was clearly time to move.

Offices, Glorious Offices

We started to look at Proper Offices. We talked with two different tenant's agents, Devono and Carter Jonas, and we chose Carter Jonas as our agents. And we started to look properly.

Things looked very promising early. We found an amazing property right on Bermondsey Street that we liked, but couldn't really afford. We found other properties which were downright miserable compared to the best one, but were in our price range.

And then on a lark we saw a property that had just come on the market earlier that day. Perfect in every way: high ceilings; Victorian warehouse conversion; Geek-friendly museum in the ground floor; great location; cheap rent. This was the place. 2055 square foot (subject to survey of course).

We set our agents on negotiating the best rent possible, and once we agreed on all that, we set our lawyers on the property, started talking to build-out teams, and mentally prepared ourselves to moving.

For those of you familiar with the UK property market, we negotiated a 5-year lease with a 2-year break clause. The standard in London is a 5-year lease with a 3-year break clause, but if OpenGamma is doing well, we'll run out of space after 2 years, so the optionality is worth the slightly worse terms to be able to break early if we're a break-out success. For those of you counting, this cost us about 2 months of rent-free period at the outset of the lease. We also negotiated a 6-month deposit based on bank guarantee, where the industry norm is 12-month cash to the landlord.

The structure of the deal for the location that it's in is:

  • A discount on the asked rent over the 5-year duration of the lease;
  • 3 months rent-free at the onset of the lease;
  • 4 additional months rent-free if we choose not to exercise our break option.
If you're looking at Mayfair or the City today, you'd be looking at 5-7 months rent-free on the outset of a 5/3 lease, and 5-7 months after the break option. The South Bank area is a slightly different market, so we had to look at smaller rent-free periods.

A Tangent on Suitability of Space

Whenever I see Valley or New York firms that have successfully moved into decent space relatively quickly, I'm quite jealous, because their property markets don't work the same way as London's does.

In the states, a general rule of thumb is:

  • You move into the office as-is
  • If the office isn't in move-in condition, you negotiate what are known as Tenant Improvements to make it move-in condition, and the precise nature are a negotiation between tenant and landlord
  • When you move out, you move out of the space in as-is condition (meaning that you just leave all the "structural" stuff in place, but things like cubicles and desks you remove).

What this means in general is that there are a lot of spaces that are pretty much move-in condition if you're not particularly fussy about particulars: they have cabling running to a comms room in the major areas; they have a few offices or meeting rooms; they have connections to the outside world ready to light up.

London is a completely different market. Landlords force a completely vacant turnover; anything you've done once you took possession you have to undo. That means that if you're looking at a 2,000 square foot space, and want to build out two conference rooms and a comms room, you have to remove all those walls when you move out. You have to remove all cabling, all lighting, all electricals, all walls and doors, everything. Whether the landlord would find it easier to rent the space with all that intact is regardless: you have to remove the lot of it.

This means that when you're looking for office space you can't actually find, at all, any "move-in-condition" space unless the previous tenant went bankrupt in the middle of tenancy. And given the different treatment of bankruptcy here, that's actually quite uncommon.

It sucks.

In Which We Get Quotations

We used Carter Jonas' expertise and asked two teams to do a full build-out proposal.

The first company, which seemed to really "get" what we were going for, and which came back with proposals first, finally came back with a quotation for building out our 2,055 square foot converted Victorian warehouse space.

110,000 pounds. Just for the build-out; furniture was another 35,000 pounds.

Our heart sunk. This idea, which initially seemed reasonable, turned into a complete flight of fantasy. There's no way I could warrant as a CEO putting 110,000 into a build-out, 35,000 into furniture, and another 30,000 into a deposit on the property. No way at all.

We worked with the same firm on doing a multi-stage buildout, where we got the most necessary things right off the bat (comms room and one meeting room, and everything else in 6 months after we had a couple of sales), and that initial build-out was still coming in at 70,000 pounds. Plus deposit and furniture. Just to move in.

We had to look at other options.

So we started talking to the guys from Causata, another Accel-backed startup. They've been going for a year longer than we have, and they've spent their entire time in a better class of serviced office than we've been in.

Serviced Offices FTW?

So we started talking to higher quality serviced office providers than we were going with in the Borough High Street area. This would involve moving to second-tier space in the City (e.g. not commute convenience to Bank or Liverpool Street) or second-tier space in the posh parts of the West End (e.g. not convenient to Berkeley Square or St. James).

These buildings were much more suitable to growth companies that don't require bootstrap-levels of expenditure:

  • Chairs weren't crippling
  • The buildings had space in server rooms you could throw a few machines into
  • Phones were reasonable in quality

The thing was that these spaces were coming in at roughly 750 pounds/desk/month (9,000/year). For 12 employees, that ends up being 108,000 pounds per year.

And the best/worst part is that these numbers are based on potential employees in the space. The space in any serviced office is charged at a price per room, with a certain number of desks in it. If the space can support 20 employees and you only have 10 in the space, you pay the 750/month for 20 employees, not 10. So your entire cost savings are based on finding the right space for the right number of employees, and being able to move quickly to minimize overspend.

Capital Preservation Equals Startup Victory?

If you ask any startup, whether well-funded with a major VC backer or not, whether they're happy putting 110,000 into a capital cost on an office, they'll say no. It's a ridiculous predicament to be in.

If you run down the numbers, a Proper Office (even with 110k in capital expenditure) has an advantage (for our space) at 12 desks over the full run of the 2-year initial period, but it's all front-loaded: you have to spend capital to save at the end of the lease. If you stay in the space, in years 3-5 you're laughing at the savings.

But for a startup like ours, 110k works out at 2 months of burn. Now you're in a difficult quandry:

  • If you really believe the Next Round is going to happen at a good valuation, you JFDI; otherwise, you're pissing money down the drain.
  • If you want to preserve your options to drive for a harder bargain on the Next Round, you want to conserve capital at all costs, even if it means you're spending more per month.

It's a tough decision. Clearly (having seen the plans) the right choice for us is the full build-out, but are we willing to sacrifice the additional two months of burn for the optimal working environment over 2-5 years?

What We Did

What we ended up doing was a bit anti-climactic.

First of all, our second build-out firm finally came through with a quotation. 55,000 pounds for everything (from a requirement perspective) that the first firm did. Also, came in at 21,000 for furniture, including pretty darn good chairs. That changed everything.

So we ran the numbers. Turns out that if we assumed that a second round of funding was coming, paying all the fixed costs, no matter how noisome to me as a founding CEO, made sense as of 18 months into a two-year lease. And once the board told me "do what's best for the company over the next two years," that was a done deal.

It was tough though, because we knew all along that from a qualitative perspective, the space in Bankside was the right choice. What we needed was to be able to justify to ourselves and our board that we were sinking the right amount of capital into the right space to put the company on a trajectory that we didn't have to mess about with offices again. That point is quite important, because if you're growing rapidly you don't have the time to deal with this stuff.

Conclusion

If you want to boil all this down to a series of sound-bites, here they are:
  • The UK property market is dysfunctional in requiring tenants to restore the space to an uninhabitable shell on move-out;
  • This makes finding suitable startup office space in London a PITA;
  • ALWAYS get competitive quotes on EVERYTHING because you never know just how far off different vendors can be;
  • Serviced offices in London work out well if you feel the need to be in the City or Mayfair, but less well if you like places like the South Bank [Silicon Bridge] or Old Street [Silicon Roundabout]. Crossover point is about 10 employees;
  • Ask Your Board about the relative value apportioned to flexibility over the short term vs. cost savings over the long term. This is what your board is there for.

Anticlimax

We move in in the next 5 weeks come hell or high water (because we've already given notice on our current space, and they've already let it). Wish us luck.