AMD’s reverse-HyperThreading declared a myth

Rumors regarding so-called “reverse-HyperThreading” have been running
wild ever since a report on French site X86-Secret suggested AMD
was preparing such a technology for an
upcoming CPU architecture. Reverse-HyperThreading was said to allow
multiple CPU cores to emulate a single one in order to improve
performance in single-threaded applications. X-bit labs later reported
that AMD already had the technology implemented in its current
processors and was only waiting to
enable it, and folks on the XtremeSystems Forums claimed to have found evidence that Intel had a similar
technology implemented in its Core 2 processors.

All things must come to an end, it seems: both The Inquirer and X-bit
labs now report that AMD has no plans for a reverse-HT feature. The Inq
denies the
technology’s existence
completely, while X-bit labs’ source says
there is no
such implementation
in AMD’s current dual-core processors. To top
off these reports, Jon “Hannibal” Stokes from Ars Technica has attempted
to debunk
initial claims about reverse-HT.

First off, there’s no way this would work the way the author [of one of the
stories
] seems to think it would. How would the cores’ pipelines
support this in any phase of execution? In the fetch phase, there would
have to be some arbitration mechanism whereby the two cores fetched
alternate instruction blocks from the I-cache, thus distributing the
instruction stream across two processors.

Then, once the instruction stream is fragmented inside the two cores,
how are the register files kept in sync? If an add in one line of code
writes its result to a register in one core, then how could a test
instruction in the other core read that distant register to see if it
needs to branch? Or how would out-of-order execution work across two
cores? Would the instruction schedulers have their own separate bus to
communicate over?

Stokes closes in saying that attempting to go into detail about the
rumored tech is akin to “asking how Superman could lift an entire
continent up into space without it breaking apart.”

Comments closed
    • danazar
    • 15 years ago

    Regarding the Superman thing, that’s actually somewhat easy to explain. The “continent” was a /[

    • Usacomp2k3
    • 15 years ago

    I’d be interested in hearing it straight from the horses mouth of both companies.

    • TSchniede
    • 15 years ago

    True, it is impossible without some serious combined logic (for both cores) to add both cores to get one logical one. the biggest trouble is, that the obvious way like “glue” both together and get just one doesn’t work for the reasons mentioned above. This would be actually something like a HT CPU with just more logic to allow each “core” to work like a true one witn half the pipelines. Unfortunately that is VERY difficult because of the high bandwith that is needed to transport all the data between the registers etc (see above).
    A split cpu approach where the pipelines and register files are seperate on the other hand returns diminishing results because it is very unlikely to find instructions to be that independant.
    Other options would use the unused core for something else or use some ressources together, however practically every imaginable use either requires quite tight interlocking of the two cores (the design is completely integrated, not 2x singelcore + glue logic) or the possible advantage is probbably less than the slowdon caused due to communication

    • duffy
    • 15 years ago

    News ‘reports’ that end with a question mark don’t need to be declared a myth. They’re that unless or until they can be verified.

    But thanks, X-bit labs, for admitting what we always knew about your credibility.

    • sluggo
    • 15 years ago

    It’s easy – Superman merely flies at super-speed and blows upward with his super-breath, thus supporting the entire continent. With his hands he catches the bunnies and other burrowing animals that fall through.

      • Vrock
      • 15 years ago

      Hmm, are you sure he can catch all the bunnies though? They’re so soft and cute and floofy….I couldn’t bear to think of a single bunny slipping through the cracks. It would make me cry.

      • Kreshna Aryaguna Nurzaman
      • 15 years ago

      y[

    • sigher
    • 15 years ago

    Although I obviously see the point in regards to the practicality of branching etc I’d still say it would theoretically be possible to use the second core as a ‘math coprocessor’ for instance.
    Of course most mathintensive apps are MP enabled and would not need it, still it is theoretically doable maybe to use the second core to speed up the first core on singlethreaded apps by such trickery would it not?
    In the vein of using a GPU to do physics although they aren’t designed for it.

      • UberGerbil
      • 15 years ago

      Your analogy is flawed. GPUs are designed to do a lot of floating point math in a fire-and-forget way. That’s what the physics libraries do. So it’s not much of a stretch — they are in fact using GPUs for precisely what they’re designed for (the GPU doesn’t know or care why it is transforming a set of floating point values — it could be to render a scene, it could be to calculate forces, it makes no difference). Moreover, they can’t just upload some part of the x86 instruction stream to the GPU to do that: the code has to be written with that in mind. And if you’re rewriting your code, you can certainly do the same thing with a second core: just spawn a thread.

      But spawning a second thread is one thing; using two cores to execute a single thread is quite another. You can’t just send off a chunk of the instructions in a single thread to the other core. First you have communication issues: the K8 has no communication between the two L1 caches (unlike Conroe) and it doesn’t have a shared L2 cache (unlike Conroe) so your communication would have to happen across the System Request Queue. Latency would thus be worse than an L2 cache hit. It wouldn’t pay just for single instructions — by the time you got the calculation back, you could’ve dispatched and retired it locally. And you’d have to send more than that insruction, anyway: for it to work, you’d have to transfer all processor state it might depend on (architected and rename registers, etc). That’s a lot of overhead, so to get any benefit you’d have to send several instructions, and the entire processor state. And then you have to deal with things like aliases (is the second core reading from cache/memory that the first is about to modify?) and dependent branches, etc. You risk stalling both processors transferring data around to keep them in synch.

      Now, if you’re going to go that far, it might be possible to do something like what Sun was talking about with their “scout threads” or what Intel demonstrated with their Mitosis project, using the second core to speculatively process ahead in the thread — but notice that both of those projects required not just work in the CPU but also changes in the compiler — it wouldn’t work with existing code. And neither has progressed beyond lab research. Lots of smart people have looked at this: if there was an easy win, they’d be doing it.

        • sigher
        • 15 years ago

        If you read my post you would see I understand that you cannot do normal CPU task written for one core on two cores, but I said that it might be possible to do math for instance.
        I like to point out that the AMD’s have a hypertransportbus which recently was even exposed to a socket by AMD the reason being to add add-in cards that do stuff like FPU calculations faster and other dedicated tasks, my thought was rather than just trying to make it one combined CPU make one core do slavetasks for the other, now since the cores are also on the HT bus it IS theroretically possible to use one as a dedicated slave to the other for let’s say FPU calculations speedup, the only question is if the benefits would actually give real-world added power or if the extra effort required would negate the benefits.
        As for the non-shared caches, I think the HT bus is pretty speedy and might be able to do somehing in that regard.
        And as for ‘if it was that simple it would have been done’ argument, that’s just silly, there’s millions of inventions that are constantly done that are deadsimple but nobody ever bothered to try or found a willing ear to help try it, if we thought like you suggest the wheel would have never been invented. I’m afraid there are many obvious things that would work are just waiting to finally be adopted, waiting for open minds.

          • Flying Fox
          • 15 years ago

          Co-processors mean you have to write specifc code for it, or even spawn a “thread” that can be run in that. So there is no way at present to abstract these and present as one “virtual processor”.

          BTW, HyperTransport may be fast, but will never be the same as running full speed on the CPU itself.

            • sigher
            • 15 years ago

            I’m impressed with the sheer quality of ignoring what people say and just repeating the same lines over and over and over again instead of responding seen in these comments, you should all have a wondeful career in politics.

          • UberGerbil
          • 15 years ago

          I read your post. Did you read mine? The System Request Interface I mentioned would be how the two cores exchange cache coherency data, and would be the obvious mechanism for your proposed coprocessor implementation: it comes /[

    • Fighterpilot
    • 15 years ago

    Glorious=pwned…good call Shin.

      • leor
      • 15 years ago

      stop sweatin your man crush ๐Ÿ˜›

      • Flying Fox
      • 15 years ago

      Myth != joke, so that thread title is still wrong. ๐Ÿ™‚

    • Proesterchen
    • 15 years ago

    Bummer day for AMDroids. Poor soles, all caught in Reverse Hype. Cheapness is the order of the Ruiz. All hail the Ruiz.

      • sigher
      • 15 years ago

      “all caught”? what makes you think that? or is that how things work for you and are you projecting?

        • Proesterchen
        • 15 years ago

        Didn’t you see the hordes of AMDroids sticking to the Reverse Hype cause it was their last glimmer of hope to battle Conroe? I’m sure you can look it up in the recent news comments, and probably in the forums here at TR, too.

          • sigher
          • 15 years ago

          I guess I missed it, oh well.
          BTW even if it was real it would only be a savior in singlethreaded apps, and those are getting pretty long in the tooth and won’t last.

    • Shintai
    • 15 years ago

    Surprise, surprise….now lets focus on some cheap comsumer chips instead of the fantasy. And lets drop the AMD+ATI or whatever it is aswell these days ๐Ÿ˜›

      • Ricardo Dawkins
      • 15 years ago

      ahaha…you funny, so you will be coming here after Core 2 sucessful launch ?

        • Shintai
        • 15 years ago

        And by that statement you mean that?

          • Ricardo Dawkins
          • 15 years ago

          nothig dude…just chill a bit.. ๐Ÿ˜€

        • Krogoth
        • 15 years ago

        It is more like the official NDA has fallen not really a product launch.

Pin It on Pinterest

Share This