Burgess' Hot Chips presentation lays out Bobcat's internals in some detail, and unlike Bulldozer, I don't believe AMD is holding back any major bits of information about the architecture at this point, because the products are coming to market soon.
Overall, Bobcat looks to be a very modern processor architecture with a 13-stage main pipeline. The L1 instruction and data caches are 32KB, and the L2 cache is 512KB. As far we know, the L2 cache isn't shared between two cores on Ontario. Dual-issue execution cores seem to be in vogue at AMD right now; Bobcat takes the same path as Bulldozer there. Instructions can be executed out of order, as we've mentioned, which should bring higher performance per clock than the Atom. The load/store engine is also out-of-order capable, with the ability to move stores ahead of loads.
Of course, the thing that most distinguishes Bobcat from current Phenoms or Bulldozer is its focus on keeping power consumption low. Ticking off check boxes on a feature list won't always convey the impact of the thousands of little choices chip architects and designers make in fashioning a product like this one. For what it's worth, though, Bobcat does include fine-grained clock gating, power gating, and a low-power C6 idle state. AMD has also used physical register files for local storage, to avoid the power overhead associated with dynamic register mapping.
One problem for low-power x86 processors is how to handle another sort of overhead, the sort created by decoding typically more complex x86 instructions into simpler "micro-ops" or internal instructions actually executed by the processor. Intel took a nearly CISC-like approach to the Atom in which 96% of x86 instructions are translated into single or "fused" dual micro-ops. AMD's approach with Bobcat is similar. Burgess says 89% of x86 instructions decode to a single micro-op and 10% into a pair of micro-ops, while the remaining <1% of more complex instructions are handled in microcode.
Bobcat's x86 ISA support is quite extensive, too, with support for AMD's 64-bit extensions and all SSE versions up to SSE4A, including SSSE3. The newest extensions for floating-point math like AVX and FMA aren't supported, but they don't really square with Bobcat's mission in life. Notably, Bobcat does support AMD's secure virtualization instructions, which suggests this core might be employed as part of a cloud server platform at some point in the future.
Indeed, we're fascinated by the variety of prospects for Bobcat to expand its mission beyond notebooks and low-cost desktops when AMD so chooses. Intel has quite explicitly stated that Atom will push into ever-smaller form factors as progress allows, rather than gaining additional MIPS and FLOPS. Over the next few years, manufacturing process advances and additional integration could eventually make it possible for Bobcat to fit into pocketable devices like smart phones, too, but AMD is keeping mum on that subject for now.
AMD could also choose to license the Bobcat core to third parties, much like ARM does with its processor architectures. Intel has tiptoed around the possibility with the Atom, but we've not seen much actual progress. Since AMD is now a pure-play design house—that is, it has spun off its manufacturing capacity into GlobalFoundries and focuses solely on chip design—licensing might make more sense for it than for Intel. In fact, AMD's graphics unit, the former ATI, has some experience on this front, having designed the Xbox 360 GPU for Microsoft. The fact that Ontario will be manufactured at TSMC opens up some possibilities for collaboration with other TSMC customers, although we'd be surprised to see AMD go the licensing route in this first generation of Bobcat-based devices.
For those who'd like to see them, we've assembled Burgess' detailed slides on the Bobcat microarchitecture in the image gallery attached to this story.