Wednesday, June 10, 2009

What Type of Software Engineer Do You Aspire To Be?

This is the opening part in a series of blog posts I'm writing in helping my nephew, who's in secondary (high to the North Americans reading) school, who wants to go into software as a vocation. I've decided to experiment with a semi-didactic conversation.

If you're planning on getting started in a career in software engineering, the first thing you have to ask yourself is what type of software you're most interested in:

Pure Software Development
I classify this as working with relatively low-level systems and algorithms, and working to such a degree that satisfying business requirements (while extremely important) aren't the primary consideration of the majority of actual work you do. Rather, technical constraints factor in often enough that you are forced to innovate from a technological perspective on a regular basis.
Vocational Software Development
I classify this as working on systems where the primary consideration is the efficient encoding of business rules, where off-the-shelf frameworks/products/systems are able to satisfy virtually all the technical requirements.

While some astute readers may expect me to follow with some pithy comment about the relative quality/skills/intelligence/worthiness of people who work in one of the two areas, I won't. I actually don't fundamentally view the two as mutually exclusive, and I think assigning normative judgement isn't meaningful or helpful.

If you're going to pursue a Pure Software Development career track, you should expect:

  • You will need to go to a top-track computer science university [1], particularly one which (historically) taught SICP. [2]
  • You will need to work at said university harder than you expect, go beyond your expertise, and generally be humbled at every stage, because you will run into people who are smarter/harder working/better programmers than you.
  • You will have to decide to live in one of a few job markets in the US [3], because there's really only jobs that require that type of development in a few places [4]
  • You will find the vast majority of software engineering jobs completely boring, and more importantly, if you take a purely vocational job, you will constantly look for technical problems where there are none, to your detriment as a craftsman.
  • You will get to advance the state of the art, work with cool technologies, and generally maximize your geek nature on the job.
Perhaps most importantly, however, you will view your career through the eyes of technology.

However, if you're going to pursue a Vocational Software Development career track, you should expect:

  • You won't have to go to quite as rarified university, and you might actually properly enjoy your undergraduate years.
  • You will be able to live and work pretty much anywhere, and make choices based on you and your family's best interests rather than having to live in an extremely expensive housing market due to a pretty limited set of career opportunities outside those markets.
  • You will never find intellectual stimulation in your job, except where you add it to your detriment as a craftsman.
  • If you are good, you will never find it particularly difficult to find a job, because people who can efficiently apply technology to solve business problems are always in demand. [5]
  • You will never get to advance the state of the art, work with particularly cool technologies, and will require external stimulation to maximize your geek nature.
Perhaps most importantly, however, you will view your career through the eyes of solving business problems.

Note that saying "I want to work in Industry X" doesn't determine which of these tracks you have to go down. You can pursue either track and work in many industries:

  • Oil/Gas/Materials Exploration require loads of people who can work with 3d visualization and computational farms and GPGPUs and loads of other super-cool stuff I'm making up because I've never worked in that industry. They also require loads of people who can build systems to handle the business of pumping oil/petrol/gas/coal/gold/coltan out of the ground and into productive use.
  • Finance requires loads of people who can work with low-latency market data systems and computational farms and GPGPUs and huge risk data warehouses and loads of other super-cool stuff I can't discuss due to NDAs. It also requires loads of people who can build systems to handle moving data in between mainframes and databases and workflow engines.
  • Startups require loads of people who can generically push the state of the art in technology to assist end-users and other technologists. They also require loads of people good at building web applications that don't push the state of the art technologically at all.

Can you combine the two? Of course you can. 37Signals is a great example. They build applications that really don't push any boundaries technologically in any way: they're about more efficient ways of doing things, web-enabled, with excellent design. But they also built Ruby on Rails as a way of achieving those aims better. Financial systems are, at their heart, a way to enable a group of traders to maximize revenue subject to the needs of the business. But most of them require a massive application of technology to be able to satisfy those needs, and so require significant applications of pure technology in order to be successful.

In fact, in my mind, the best software engineers are ones who can handle both the pure technology and business vocational requirements simultaneously. They're rare, though, and most of the ones that I know start from one side and realize that they're still interested in the other, and grow their mentality to be able to simultaneously achieve both technology and business zen. You have to get to grips with what really moves you first, and then work on everything else from that perspective.

So which is it? Pure Technology Track, or Vocational Technology Track?

Footnotes

[1]: Go Bears.
[2]: If you didn't go to one of those schools, and you're getting hot and bothered at this stage, seriously, why are you reading my blog in the first place? Regardless, keep reading, as you'll find I'm not capping on your alma mater in any way.
[3]: In North America: San Francisco Bay Area, Chicago, North Carolina Research Triangle, Salt Lake City, New York, Seattle, Vancouver, Boston, Austin. Yes, there are outlier organizations everywhere, but in general, you better plan on spending the first part of your career one of those places.
[4]: Network effects in play here. You need a certain number of these people before the development team can work, and that means you have to have quite a few to draw on, and such they all flock together.
[5]: Particularly if you and your family are willing to change locations. Hiring for this type of engineer tends to follow cyclical hiring paths, and North America tends to have regional cyclical trends.

Sunday, May 31, 2009

Want to know what a really fast MOM system has to handle?

Take a look at what you have to be able to consume if you want to consume live market data from major exchange feeds: April 2009 Capacity Statistics.

These statistics are the aggregate peak message flow on various industry feeds of live market data. In other words, if you want to keep up with the flow, you have to be able to consume at least that many messages per second during peak periods.

The most interesting ones (to me at least) are the ones that are multi-exchange aggregated feeds for options (Siac OPRA and NYSE ArcaBook Options). OPRA hit 869,109 mps peak in April, and NYSE ArcaBook Options hit 565,522 mps peak. If you want to consume all the major feeds listed in one box, you'd need to be able to handle more than 2MM mps.[1] That's a lot of individual messages for any software-only product to handle.

Yet another reason why I believe systems like Tervela and Solace are going to be key to the next generation of market feed aggregation: at levels like that you need something hardware accelerated in order to handle the first round of aggregation. Furthermore, using AMQP (which both currently have, or at least have plans for), you can then integrate your favorite second-tier tick distribution system (RabbitMQ, qPid, 29West [2]). Ingress handling hardware feeds to egress handling software. [3] Who says you have to choose?

Footnotes

[1]: Yes, I know you wouldn't, because they peak at different times of the day, and there is overlap, particularly between the ArcaBook and Siac feeds.
[2]: I know they're a member of the working group, but I can't see anything explicit about AMQP support on their web site as of yet, so I presume it's forthcoming.
[3]: Assuming you don't just want to go hardware-only end-to-end, which is the approach that BarCap is taking for example.

Wednesday, May 27, 2009

Monty Bites The Hand That Fed Him: Part 2

(Take a look at Part One of my Monty-Watch for some background as to what I think about the situation).

So Monty Widenius and Peter Zaitsev did an interview with Matthew Aslett on the creation of the Open Database Alliance. It's well worth a read.

For the record, I don't want anybody to conflate my opinions on what Monty's done with anybody else associated with the Open Database Alliance. I actually think that having such an organization ex-Oracle to make sure that there's a unified voice for everybody working (and attempting to make money) from the MySQL ecosystem, outside the current corporate owner of the MySQL brand, is a Good Thing (and something I think other projects with a single major corporate sponsor may lead to in the future). As such, having a place for all the various consultancies and technology providers to work together to ensure their interests are looked after is a pretty useful and innovative thing.

However, let's play "Look At The Balls On That Guy"!

Actual Quote Time:

I have, however, offered Oracle a partnership with Monty Program Ab, under which Oracle could get access to some of the critical developer resources Monty Program Ab has available. Monty Program Ab could also help Oracle with their open source strategy and serve as a ‘trust creating’ entity between Oracle and the open source developer community. Oracle has however not yet responded to this.

Kirk's Translation:

I have hired all of the people that ever worked for me when I stormed off from Sun in a huff. Now you may have the MySQL brand name and core IP, but I have all the engineers. Furthermore, I've been sowing as much FUD as I possibly can when, quite frankly, you haven't done anything directly to harm the interests of the MySQL community. If you want me to publicly step down from the FUD-slinging, I have a bank account to which you can send some [more] money.

Anyone who doubts the actual motivations of Monty's recent efforts should read the real quotation (as well as my translation). I think it speaks volumes about what he's attempting to achieve here, which is quite simply to spread enough FUD about Oracle's relationship to MySQL that Oracle feels like it has to engage in some type of action to bring Monty back into the fold in some form, which would involve some type of cash money payment. There really is no other conclusion possible for someone who says, in no uncertain terms: I have all the core developers; I've damaged your relationship with the core community; You could pay me and make the problems go away.

Friday, May 15, 2009

How Many Times Can Monty Sell MySQL?

UPDATE 2009-05-27: Monty's spoken to Matt Aslett, and I've responded.

COMMENTARY UPDATE 2009-05-27: If you're just reading this for the first time, after posting this it became clear (through back-channel to me) that Sun did have Monty under an Non-Compete, and chose to allow Monty to get out of it. I've commented about that in the comments [which you should really read] and in the Proggit thread. I still think Monty's actions are pretty bad looking even given that, but you should understand that there was a Non-Compete, and Monty was let out of it by Sun, before you read the original article below.

I've been thinking this since it was announced, but Monty's current attempt to monetize MySQL by hamstringing the eventual owner of his original attempt is really quite, ahem, ballsy. For a much less ranty analysis with quotes from M-Dawg himself, see Stephen O'Grady of RedMonk's writeup.

Let me give the Kirk "worked for 3 database companies and founded one expressly to compete with MySQL" Wylie synopsis:

  • Monty writes MySQL way back in the day, largely so that he has a database system which doesn't have any of the complex features of an RDBMS that make it work well (you know, referential integrity, transactions, views, proper metadata support).
  • People start using it, largely because it's Free-as-in-Beer (this was back in the days of minimum $100K Oracle buys just to run a simple web site), but also because it's easy to setup and administer (which Oracle/Sybase/SQLServer/DB2 were not).
  • Monty wants to get rich.
  • In an effort to get rich, he takes a boat load of VC funding to push MySQL from being a small open source collective to a Real Company.
  • VC funding requires a business model that has real revenue behind it.
  • Company adopts a split licensing model (which pissed off a lot of people at the time), and starts being effective in attracting revenue and very, very smart people as executives.
  • Monty's dreams of success are realized when Sun pays a king's ransom for MySQL.
  • Monty wants to have his cake [1] and eat it too, and gets all pissy and storms off in a strop and founds an attempt to get rich a second time on the same project.
  • Oracle buying Sun means people take this attempt even more seriously and he attracts people who never liked the post-VC-funding MySQL business model in the first place to the cause.

So here's the question that everybody should have on their minds: How many times will Monty attempt to get rich off the same project? [2]

Now I wasn't privy to any of the contractual arrangements around MySQL's incorporation, or his common stock stake, or the Sun buyout, or any of his employment agreements [3]. I will, however, postulate that if Monty doesn't have Fuck You money at this point given a $1Bn buyout of the firm he founded, he did something Seriously Wrong, and you probably shouldn't trust his business instincts.

So one of two things is going on here:

  • f(Cake + Eating) == Cake
  • He fundamentally doesn't agree with a split licensing model and thinks it's doomed to failure. I really hope this isn't the case, because if it is, he was acting disingenuously at best when working for the Original Monty MySQL-Based Get Rich Scheme, by supporting a model that he didn't believe in.

If it's the latter, why did he start down the path of taking VC money in the first place? Seriously, did he honestly believe that he could take a boat-load of risk capital, and not have to provide returns to the limited partners at the core of any risk capital facility? Did he lose some type of boardroom squabble over the direction of MySQL and has been nursing a grudge ever since? [4]

Here's something any founders of Open Source projects need to realize: VC money comes with strings attached; do not take that money if you don't want to take the strings. The strings are entirely financial: VC/risk capital requires a very hefty payout in a relatively short (5-10 years max) timeframe to the limited partners who provided the VC firm with its capital to invest. In order to ramp up revenues in a reasonable timeframe, you will need to have some facility to generate reproducible, cheap-to-deliver revenue in that timeframe. Open Core is one approach, Split Licensing is another, all manner of Services are a third, there are a whole host. But you have to come up with one. Otherwise there's no point in raising risk capital, which must have a hefty payout.

If you just want to have a lifestyle business (and many lifestyle businesses can, over time, still provide you with Fuck You money if you structure them properly), while constantly maintaining an environment where you can do what you want technically in a purely-Libre environment, don't take risk capital. Grow your business organically, and enjoy the life that you've created for yourself.

But the moment you accept that term sheet, you've crossed the barrier beyond a pure hacker coding for fun, and a company executive who must deliver returns to his investors (and that may require doing things that the hacker side of you finds distasteful). If you're not willing to sign up to the transition between pure Open Source techie and Business Executive, don't accept the term sheet. And for the love of the FSM's noodly appendage, don't accept the term sheet thinking that you're going to screw your investors in the long term by going back to your roots once you've got your payout. [5] Doing so screws it for the rest of us.

Here's the #1 problem Monty's move has caused for anyone attempting to make Fuck You money off Open Source: it should make VCs very nervous indeed about Open Source investments. Let's examine what I would consider to be a logical thought process:

  • If we invest in an Open Source company, the most likely outcome is an acquisition by another firm.
  • If founders of projects make it a habit of storming off to fork their invention because they don't like the monetization model they helped establish, other firms are very unlikely indeed to buy Open Source companies.
  • If other companies are unlikely to buy Open Source companies, our return on investment in them will be much lower.
  • Therefore, there's no point in looking at them.

None of this impacts Monty: he presumably already has his Fuck You money.

But if I were a VC looking to invest in an Open Source company, I would insist on an enforceable non-compete if I could [6], and I would make sure that my exits prevented the founders from being able to fork. Otherwise, my assets are really only there to make the founders enough money that they can pursue their dream of working on pure Open Source code with enough money that they no longer have to try to get rich. Which is great for them, but not for the rest of us who aren't rich but wish we were.

Remember: going for the brass ring and taking VC money requires that you compromise something. If you don't like it, don't take the money. But once you have, realize that you may need to walk away from your baby once you've got the money for the sake of everybody else.

Just as an aside, bear in mind that nothing should stop you from doing Open Source work, including starting an entirely new project on the same basic idea, once you have your payoff. It happens all the time in other industries (how many networking hardware companies have been founded by the exact same executives?). But resist the urge to fork your original project. It's unseemly at best, and flat-out unethical at worst. If Monty had started MariaDB from scratch, that would be one thing. But he didn't. And that's the thing that makes this all seem, well, just a little bit wrong to me.

Footnotes

[1]: By cake, I mean chedda/dead presidents/papa. Cash money, yo.
[2]: Clearly more than once.
[3]: Hence I am totally unqualified to comment here. I'm doing so anyway, because if you keep reading, this turns less Monty-directed and more general-parable.
[4]: There's a reason nobody ever saw the code for my Compete-with-MySQL Open Source Database startup.
[5]: I'm not actually accusing Monty of this, and nor do I believe it to be the case (believe it or not). I think there's something else going on here. But I could see that some people might think that unethically, and you really shouldn't.
[6]: Yes, there are ways to structure this, usually during the M&A stage, by having deferred payments to the founders which don't trigger if they fork for some period of time, that even comply with California and UK restraint-of-trade law.

Monday, April 27, 2009

Sybase JDBC Drivers Ignore Your Database Selection

A very common strategy in doing database-driven functional tests is for each user to have their own database instance on a shared test machine. This means that they can do whatever they want on that database instance without impacting other users, and that multiple people can run potentially destructive test cases simultaneously.

A common approach to handling that is to append the user name to the database name, and use a system property to choose the correct database instance for that test run (for example, naming your DataSource bean in your Spring context.xml something like db_${user.name}). You then have the JDBC connection declared to have the user name in the connection string (a la jdbc:foo:bar:blahblahblah/myapp_${user.name}). All works great.

Except with Sybase.

In its voyage of annoying developers who have to use their managed language drivers (the C#/ADO.Net ones are particularly noxious), Sybase have decided to screw you by ignoring this parameter when it's not valid.

The Sybase low-level protocol for this scenario largely consists of the following steps (completely simplified):

  1. Connect as user Foo
  2. Connection is now bound to the default database defined for user Foo
  3. Issue a use correct_database statement
  4. Connection is now bound to the database specified in step #3

Here's the problem: if step 3 fails because the database desired doesn't exist, Sybase will silently swallow the failure, and you'll end up in the default database for the user, and not tell you (seriously, there's no log or exception or absolutely any sign of what database you're in). Even better, the Sybase driver doesn't even store this in any field that you can introspect in the debugger, so the Sybase client library thinks it's in the database instance you want, when in fact it's not.

If you have a shared test database (with some real data in it for doing manual/gui-directed testing), and surrogate databases for each user's "blow-away-the-world" style testing, this is Very Bad Behavior Indeed.

Solution: Stop using Sybase. Seriously. Yes, it may have been en vogue 10 years ago, but it sucks.

Tuesday, April 21, 2009

Oracle + Sun : A Java Perspective

Smarter people than I have written about this, but having worked at several of the major players here (a summer at Oracle proper, 12 months at BEA working on WebLogic Server, 2 years working at M7, which got bought by BEA, which got bought by Oracle), I figured I'd pipe in my $0.02.

My #1 concern here is that Larry is going to attempt to use everything in the IP arsenal that he's just acquired to screw IBM. He's done it before, he'll probably continue to do it. Considering the amount of investment that IBM has made into the Java ecosystem (at least as much as BEA, particularly when you consider Eclipse), and the amount of hostility between Oracle and IBM, this wouldn't surprise me one whit. Towards, that end, here's what I'd like to see clarification on:

  • Eclipse/SWT. For too long Sun's ridiculous love-fest with NetBeans and Swing has blocked any reasonable approach towards dealing with Eclipse and SWT. The worry that I have here is that since Eclipse == IBM in many people's minds, and both JDeveloper and NetBeans are now under the same company umbrella, Oracle may decide to let commercial considerations (sticking it to IBM) and staff considerations (keeping Sun developers who have a thing for NetBeans, plus everybody internally who's committed to JDeveloper) override ecosystem considerations (we all like Eclipse way more than NetBeans or JDeveloper). My request: please properly support SWT. Not to the exclusion of Swing, but don't fight against it.
  • OSGi. OSGi is probably tainted by Sun as being an Eclipse-technology, and therefore hurting NetBeans and helping IBM or some similar ridiculousness. Support it, please, and kill off anything whose sole raison d'etre is to replace it with something lamer. Modularizing the JRE is one thing, providing something completely useless except to replace OSGi is something altogether different.
  • Open Java Implementations. Stephen Colebourne has been talking about this at length, and blogged about the handover. Please don't be such jerks on the JCP.
  • Open Source Projects. Whither Glassfish and Metro and all the rest of it? They compete with existing BEA/Oracle assets, but are extremely valuable in making the ecosystem valuable enough to allow Oracle to extract value from their proprietary assets.
  • API Neutrality. One thing Sun has been good at, because they've never had any world-beating middleware or application infrastructure technologies, is helping to craft APIs that are by and large vendor neutral (such as JDBC and JMS). Oracle, however, doesn't. Is Oracle going to follow a path like Microsoft has with the .Net APIs, where there's enhanced support for whatever Microsoft is shilling and second-class support for everything else (ADO.Net anyone?), or is it going to realize that supporting the ecosystem means vendor neutrality as much as possible? Sun had no choice as all their app infrastructure is second-rate at best, but Oracle has a choice.
  • The Whole JCP Itself. What's going to happen to it? How is Oracle going to behave in general? Will we see projects falling out from under the JCP umbrella going forward?
  • ZFS [1]. ZFS rocks. Massively. But Oracle's been working on Btrfs in part because Sun refuses to allow ZFS to be licensed in such a way that it can be included in the Linux kernel. Given Oracle's investment in Linux, we can has ZFS in Linux?

Mostly, I think we still have yet to see whether Oracle is going to behave like the BEA side, or the Oracle side: is Oracle going to help build the ecosystem, or is it going to use its new IP assets for proprietary advantage against IBM and Microsoft? I hope it's the former, but only time will tell.

Footnotes

[1]: Yes, it's not Java. But really, I can has? Plz?

Sunday, April 19, 2009

My Ideal Social Network

I'm relatively plugged in. I blog (you're soaking in it now!); I have a FriendFeed; I tweet (and those of you who need to know who I am probably already do); I'm on Google Reader; I used to be on Orkut until I started getting befriended by Brazilians I had never met; I'm on LinkedIn (spoiler alert! it discloses who Derivatives Company A and Big Bank B are). I've never gotten onto FaceBook, largely because I've never seen any attraction to it (and if I want to be super-poked, I'd just as soon meet you first, thank you very much).

I don't actually like any of them as a full aggregation basis. I know that FriendFeed is attempting to aggregate everything together, but it's not quite what I need or want.

So all you Web 2.0 guys, here's what I want:

  • Subscriber Control: I want to have control over who subscribes to my feeds. In particular, I need to be able to divide my life into at least four categories:
    • Technical Contacts. People who follow me because periodically I write about something someone might find potentially meaningful from a technical perspective.
    • Personal Friends. People who I know personally and might be interested in zoos and what train I got home.
    • Current Coworkers. The main crux here is that there are things that I might want to share with ex-coworkers and personal friends, but not the current ones I have to see every day.
    • Family/Near-Family [1]. These are people who might be interested in stuff going on in my life that I'm not ready to share with other categories.
    Note that nowhere in there is "random gits wot I don't know who want to voyeuristically follow my life." [2] When I do this, I need to be able to either pre-approve a subscriber to a feed, or silently dump a subscriber (see later).
  • Publication Buckets: I need to be able to publish any particular item into a bucket, or a whole feed into that bucket.
  • Single Republication: I need to be able to take a URL from anywhere (e.g. treat something from Google Reader the same way as something I type directly into a text box) and have it appear to consumers the same way no matter how they choose to subscribe. [3]
  • Single Inbox: Sometimes I consume stuff on my phone, sometimes I consume stuff on my laptop, sometimes I consume stuff from Random Web Application. I want "I read this" to mean the same thing on everything.
  • Selective Subscribes: I might want to read stuff by Zack Urlocker on MySQL and Sun, but really not care how many miles he ran that day [4]. I should be able to do that, combined with publication buckets.
  • API Access: Anything I do with your web app I need to be able to do from any arbitrary app, potentially outside your control.
  • Silent Unsubscribes: I need to be able to unsubscribe to someone without them knowing that I've done it. (More on why this is important anon).

Here's the thing about the silent unsubscribe thing. Social Networks largely thrive on number of connections, because that is Very Important to them. That's fine. But I need to be able to structure things in terms of feeds that I follow without feeling like I've given a slight to someone by not re-"friend"ing them (following/friending/whatever), or by dropping them later on. In particular, I may have personal friends who produce drivel I don't have the time or energy to keep up with, but I don't want to slight them by making my personal subscription/publication decisions transparent to them. Fixing this is the heart of the "social" part of networking, and is why eventually every single social network fails as the number of connections grows beyond the desire for humans to have contact to that level.

The core thing here is that this isn't about whether I'm friends with anybody. Many of the people that are my closest friends in the world don't participate in any of this stuff at all. It's about my ability to manage the flow of information in and out of myself that I deem relevant. That's a completely different matter, and the whole focus on "Social" Networks as being about friends and connections is rubbish: it's about my ability to selectively publish and subscribe to categorized feeds that are relevant to my interests at any given point in time.

Someone who can actually do web development should do this [5]. It would rock.

Footnotes

[1]: Note to readers: If I've ever camped out in your house, or you have offspring who refer to me as "Uncle Kirk", you're in here whether I share DNA segments with you or not.
[2]: Seriously, my life isn't actually that interesting. Except that I play with Pumas and Sun Bears and Ocelots and you don't.
[3]: Note that this may mean either compliance on the part of the republishing services (e.g. Google Reader), or it may mean that I have a queue of pending stuff that I have to process before it gets republished; I'm fine with both approaches working together.
[4]: I can say this. M7 Alumni In Da Hizzy!
[5]: Not it. I'm secure enough in my 5k1LLz that I can say that these days, you want me to stay as far the heck away from the browser in a day-to-day coding perspective as possible.