Essentially, the way that Google Protocol Buffers are encoded can be seen as a small stack-based state machine that is computed as part of a Builder. The Builder holds the essential state of a particular message representation (such as an Address or AddressBook or something), and runs through the bytes in your wire representation, modifying its current state for the desired fields. When you think you've consumed everything, you then extract your Address or AddressBook or whatever from the Builder.
The commands of this state machine are pretty simple:
- Set the value of Field Number X to value Y (encoded with type Z)
- Push a new context onto the stack for a sub-Message
- Pop the context off the stack to go back to the parent message
That seems like a little piece of trivia, until you realize that Google Protocol Buffers, unlike every other network message representation that I've ever worked with, lack both a sizing prefix and a terminator. Remember, it's a state machine, so it's just going to keep processing.
Again, trivia.
Until you try to store a sequence of discrete messages into a file or over a socket. In which case, what will end up happening is that if you don't explicitly do your own termination or size prefixing, the Builder will just keep processing commands and you'll end up consuming the entire stream and get one message output with only the final values for each field. So if I'm trying to save two Address messages, the first having a name of "Kirk Wylie" and the second having a name of "Wylie, Kirk", I'll only get one output, with "Wylie, Kirk".
This also has the side effect of implicitly, in Java, forcing you to do an unnecessary byte copy. You have to get the prefix number of bytes of the following message (and computing that in the first place before you do the serialization costs you in CPU time), extract the next N bytes from the stream to a byte array, and then have your Builder parse the byte array.
All annoyances more than anything else. But probably useful for other people to know.