Take a look at what you have to be able to consume if you want to consume live market data from major exchange feeds:
April 2009 Capacity Statistics.
These statistics are the aggregate peak message flow on various industry feeds of live market data. In other words, if you want to keep up with the flow, you have to be able to consume
at least that many messages per second during peak periods.
The most interesting ones (to me at least) are the ones that are multi-exchange aggregated feeds for options (
Siac OPRA and
NYSE ArcaBook Options). OPRA hit 869,109 mps peak in April, and NYSE ArcaBook Options hit 565,522 mps peak. If you want to consume
all the major feeds listed in one box, you'd need to be able to handle more than 2MM mps.[1] That's a lot of individual messages for
any software-only product to handle.
Yet another reason why I believe systems like
Tervela and
Solace are going to be key to the next generation of market feed aggregation: at levels like that you need something hardware accelerated in order to handle the first round of aggregation. Furthermore, using AMQP (which both currently have, or at least have plans for), you can then integrate your favorite second-tier tick distribution system (RabbitMQ, qPid, 29West [2]). Ingress handling hardware feeds to egress handling software. [3] Who says you have to choose?
Footnotes
[1]: Yes, I know you wouldn't, because they peak at different times of the day, and there is overlap, particularly between the ArcaBook and Siac feeds.
[2]: I know they're a member of the working group, but I can't see anything explicit about AMQP support on their web site as of yet, so I presume it's forthcoming.
[3]: Assuming you don't just want to go hardware-only end-to-end, which is the approach that BarCap is taking for example.