Sunday, May 11, 2008

Dynamic vs. Static Generation

I've been doing a fair amount of work recently with large systems dealing with semi-dynamic documents in the financial service space (not having anything to do with web sites BTW). One thing that's an inevitable question that comes up is how much you pre-compute, and how much you rely on your dynamic generation system to work at runtime.

For example, in one of the really old-skool CMS systems (think Vignette circa 1999) you were looking at a pure generation system: text went in, went through a recompute-the-world process, and out came fully formatted linked in HTML. If you've ever looked at CNET, it's clear that they're using at least something based on this technique, because I recognize the classic Vignette-style file names.

However, in a purely dynamic system, all content is in the form of some low-level structured persistence store (e.g. RDBMS) and each page is uniquely generated for each request.

The key differentiations here have to do with what you think is your principle scalability issues:
  • Static systems are ideal for seldom (in the grand scheme of things) updating systems with a massive number of readers (think New York Times or BBC News) and minimal per-user customization
  • Dynamic systems are ideal for constantly updating systems with relatively fewer readers and maximal per-user customization (think web stores)
  • Static systems also have the downside that all your content must be generated before it can be displayed, which in the case of a nearly infinite search space of items means that you have to have everything you might possibly serve on disk. Dynamic systems just generate it as needed.

But is any of that really accurate, particularly with modern page layout technology?

Imagine Jeff Atwood's post on how WordPress gobbles CPU time. I know from experience that when we thought dynamic content was expensive (because CPUs WERE expensive), you spent as much time as possible optimizing away your CPU effort, by statically generating everything you can. How much on a typical blog entry really needs to be properly dynamic?

Well, you've got:

  • The text itself. This changes seldom, and seldom enough that in general you're going to want to have a special annotation that something has been updated as a result.
  • Your little bits on the right. These aren't changing with every post, unless you're doing some sort of randomization or prioritization on your blog roll. But that you can do with a dynamic iframe
  • Comments.

It's the last bit that I think people have gone crazy in terms of purely dynamic systems. How often is someone posting? Even for a super-hot post, I contend maybe once per minute for a few hours. How long does it take you to generate that page? Maybe a tenth of a second on a super-hot posting.

So why not follow what we used to do back in the day for systems like this:
  • Statically generate the world. Disk is cheap. Super cheap.
  • Have an event-based system that takes in new "events" (comment, new posting) and modifies the output content as a result?

The key issue here is whether your "system" is sufficient to capture all external interactions with your underlying data model or whether you are likely to go around and poke your database manually. I contend you're probably going to do this seldom enough with most systems that a "rebuild the world" operation would probably suffice, and the performance gains you get out of a mostly-static system (using things like iframe for dynamic content) are going to be massive enough that it's worth the effort.

By the way, why do I care about all this? Static content can be served fast. Super-duper-ultra fast. sendfile fast. Kernel server fast. And it's something that guarantees that you're not into user-level code that requires a lot of explicit concurrency control, which people always get wrong (meaning not fast enough).
blog comments powered by Disqus