TR Forums

NeRve · Thu Dec 30, 2004 11:22 pm

I heard that AMD64 is far superior in it's 64-prowess than Intel which actually borrowed some of the CPU instructions from AMD.

Hance · Thu Dec 30, 2004 11:28 pm

from what i have read they are identical . yes intel did copy amd's instruction set but thats alot better than if they had gone out on there own . we would have had two different instruction sets and that would have been a mess if intel would have used there own instruction set

Corrado · Thu Dec 30, 2004 11:28 pm

NeRve wrote:
I heard that AMD64 is far superior in it's 64-prowess than Intel which actually borrowed some of the CPU instructions from AMD.

That is 100% true. Intel's '64bit' is really just 64 bit memory adressing, AMD's 64 is a true 64 bit implementation of x86.

Illissius · Fri Dec 31, 2004 5:25 am

Corrado wrote:
NeRve wrote:
I heard that AMD64 is far superior in it's 64-prowess than Intel which actually borrowed some of the CPU instructions from AMD.

That is 100% true. Intel's '64bit' is really just 64 bit memory adressing, AMD's 64 is a true 64 bit implementation of x86.

I don't think so. The only difference iirc is the way they handle >4GB of memory -- AMD handles it seemlessly, while Intel does some remapping thing that makes it slow, as far as I've managed to gather. The 'just 64 bit memory addressing' part is just marketing fluff, it's Intel's way of downplaying x86-64's importance. They both get a nice boost from 64-bit code. Unfortunately, despite the large amount of articles on the topic, I have yet to see *one* where they have Intel - 32bit, Intel - 64bit, AMD - 32bit, AMD - 64bit matched up to see how their 64-bit implementations compare, which is what the real question is.

Fri Dec 31, 2004 11:13 am

Corrado wrote:
That is 100% true. Intel's '64bit' is really just 64 bit memory adressing, AMD's 64 is a true 64 bit implementation of x86.

No, actually that's not correct. Intel has been trying to spin it that way to avoid the perception that EMT64 competes with Itanium, but it is just that -- spin.

EMT64 includes all of the same 64-bit instructions that AMD64 does. It is compatible at the object code level -- Intel cloned the AMD64 instruction set. That would not be possible if it was "just 64-bit memory addressing".

Now, that said, there have also been rumors floating around that the first generation of EMT64-enabled CPUs from Intel will perform poorly in 64-bit mode. If these rumors are borne out, it may be an indication that the internal architecture isn't designed to do 64-bit computing efficiently. But this is something that Intel could easily address in the next revision.

Fri Dec 31, 2004 11:22 am

Illissius wrote:
I don't think so. The only difference iirc is the way they handle >4GB of memory -- AMD handles it seemlessly, while Intel does some remapping thing that makes it slow, as far as I've managed to gather.

You are thinking of PAE (Physical Address Extension), which has been present on Intel and AMD CPUs for a while now. The CPU is physically capable of addressing more than 4GB of RAM, but the instruction set is not 64-bit aware. AWE (Address Windowing Extensions) is a kludgy API for remapping memory beyond 4GB into a "window" in the low 4GB under application control, so that 32-bit instructions can access it. This is analogous to the EMS scheme used back in the DOS days to address memory beyond 640K.

Illissius · Fri Dec 31, 2004 12:02 pm

just brew it! wrote:
Illissius wrote:
I don't think so. The only difference iirc is the way they handle >4GB of memory -- AMD handles it seemlessly, while Intel does some remapping thing that makes it slow, as far as I've managed to gather.

You are thinking of PAE (Physical Address Extension), which has been present on Intel and AMD CPUs for a while now. The CPU is physically capable of addressing more than 4GB of RAM, but the instruction set is not 64-bit aware. AWE (Address Windowing Extensions) is a kludgy API for remapping memory beyond 4GB into a "window" in the low 4GB under application control, so that 32-bit instructions can access it. This is analogous to the EMS scheme used back in the DOS days to address memory beyond 640K.

Thanks for the explanation. So do both AMD and Intel have AWE, or just Intel? I've read in multiple places something along the lines of AMD having more efficient 64-bit addressing -- is that because Intel uses AWE even in 64-bit mode, or what?

Fri Dec 31, 2004 12:29 pm

Illissius wrote:
Thanks for the explanation. So do both AMD and Intel have AWE, or just Intel? I've read in multiple places something along the lines of AMD having more efficient 64-bit addressing -- is that because Intel uses AWE even in 64-bit mode, or what?

AWE is a feature of the OS. Historically I don't think it has been used on 32-bit AMD processors, because there weren't any Socket A chipsets or motherboards that supported >4GB of physical RAM. AFAIK Xeon-based servers is the only place it is ever used.

UberGerbil · Fri Dec 31, 2004 12:53 pm

AWE is software -- it's the API (in server versions of Windows) that 32bit apps use to map more than 2GB of real memory into their address space. This works today on P4s and Athlon XPs etc; it's ugly and slow and inefficient, and it is only available on server versions of Windows XP, but it has no real effect on the design of 64bit processors. AWE is implemented using something called PAE, which is an additional set of page table entries available (but traditionally unused) in the 32bit processors. These PTEs are part of the chip's memory management system (AWE is software, PAE is hardware) and they have extra bits beyond 32 (36bits for memory addressing, and additional status bits). Windows XP SP2 actually turns these on when it runs on the Athlon 64 and the latest Xeons, even though it is running the chip in 32bit mode, to enable the NX bit (there would be nowhere to store that extra bit of information just using 32bit PTEs). This is done "under the covers" in the HAL, so software running on XP SP2 don't see anything different. Even if they exposed the new level of PTEs fully, this (by itself) wouldn't magically turn the system into a 64bit system: the registers are still 32bit and the software is only expecting to see pointers that are 32bit (actually 31 bit, as the upper 2GB of memory is reserved for the system). Without using AWE, software can't take advantage of the larger PTEs and then they can only do so to map up to 36bits (64GB) of memory into their 32bit address space manually -- this is not 64bit computing.

When AMD designed the instruction set for the Opteron/A64, they did pretty much what anyone familiar with the x86 would expect to see in a 64bit version: they extended all the registers to 64bits, they added additional registers (because the x86 has always been a register-starved design, so if you're making a major architectural change you should try to fix some other warts too) and they extended addressing as well. They didn't actually extend addressing to a full 64bits, because that is simply more than anyone will need in the foreseeable future (16EB -- that's Exabytes). In 64bit mode the design supports 48bits (256TB) of virtual address space (pointers are 64bit but the upper byte is ignored) and 40bits (1TB) of physical memory. This means 64bit applications have "flat" addressable memory up to these limits and don't have to play messy games with things like AWE to access it.

When Intel swallowed its pride and added x86-64 to Xeon and eventually the P4 designs, it pretty much copied AMD exactly (this is nothing new: AMD copied Intel's SSE instructions exactly, and so on -- thanks to a a lot of lawyers on both sides, they can both do this). And Intel pretty much had to: Microsoft was already writing Windows for the AMD version of x86-64, and there was no way Intel was going to convince MS to do a third version of 64bit Windows (Microsoft has been shipping Windows-64 for Itanium for some time). There are some minor differences between the two implementations, the most notable being that while the Xeon with E64MT supports 48bits of virtual memory just like AMD, it only supports 36bits (64GB) of physical memory. Note that this is exactly the same as the amount of memory supported on 32bits using PAE/AWE: systems implementing this have been available for some time, so it likely made for an easier transition for Intel and its partners. There's nothing to stop them from extending the supported physical memory to match AMD in future versions of their processors. (Incidentally, while it is essentially meaningless the Itanium does support a full 64bit virtual address space, so don't be suprised if Intel gets around to touting that as an advantage, though of course in practical terms it really isn't).

Now, there's been some talk that Intel's intial implementation of 64bit (aka E64MT) is something of a kludge: the Xeon supports the full instruction set, addressing range, etc, but -- alledgedly -- it internally is handling many of the operations as 32bit (they are still 64bit operations, but the chip has to do twice as much work internally to make them happen). This wouldn't be too surprising, given the rapid time to market and the fact that all modern x86 processors decode the x86 instruction set and handle it very differently internally anyway. However, this should not be construed (despite the wishes of some fanbois) as making the Xeon "not a true 64bit processor." By every external measure, it is a 64bit chip: it does the addressing, it has 64bit registers, it implements the full instruction set. If there is indeed an internal kludge, that will manifest itself as poorer relative performance. The Opteron and Xeon are both 64bit chips; the Opteron may just be more efficient. Since both are decoding their instructions to micro-ops, it is just as fair to say that neither is a true "x86-64" chip (since they are both executing x86 instructions as micro-ops) and it may be that the Opteron -- in this iteration -- is a better x86-64 chip. I have not seen any benchmarks comparing the two yet on true 64bit code, so that's still subject to debate AFAIK.

Intel will undoubtably improve the Xeon in the future (to extend the physical address space, among other things). But the Opteron still has a huge advantage in its on-chip memory controller. This gives them lower latency, which often matters now, and equal or better bandwidth which will matter especially when we start seeing multiple cores on die. And having the memory controller on the chip means memory bandwidth scales as you add processors without requiring more complicated and expensive chipsets. Because systems accessing large amounts of memory typically use more than one processor, this is a huge win in the small server market. For these reasons HP, for one, is already telling customers looking at its 4P Proliant servers that Opteron-based systems are better performing than Xeon-based ones (HP sells both) -- even for 32bit code.

You can read an HP document (warning: 1.3MB PDF) that compares the two designs in more detail and actually makes the recommendation I mentioned (page 22).

Edit: fixed some typos, etc.

Fri Dec 31, 2004 1:03 pm

UberGerbil wrote:
Intel will undoubtably improve the Opteron in the future (to extend the physical address space, among other things).

Should say "Xeon". Although if Intel improves the Xeon, that will indirectly force AMD to improve the Opteron as well. :wink:

Excellent post, BTW. Sums up the entire AMD64/EMT64 situation quite well.

Flowboy · Fri Dec 31, 2004 2:57 pm

Also on 64 Bit Windows, it is a LLP system, i.e. in C/C++ only long longs and pointers move out to 64 bits. This has the advantage that only values that need to be 64 bits are resized; this way structures and classes take less space and that has a positive effect on performance due to lower cache / memory consumption. It might also help Intel if their internal design is splitting 64 bit ops into two 32 bit ops.

Linux is ILP on 64 bits, ints longs and pointers move out to 64 bits.

APWNH · Fri Jan 07, 2005 10:28 pm

Ubergerbil, you work in this field, right?

UberGerbil · Mon Jan 10, 2005 3:39 am

APWNH wrote:
Ubergerbil, you work in this field, right?

Uh, yeah. For longer than I'd care to admit. :-?

HiggsBoson · Mon Jan 10, 2005 4:49 am

UberGerbil++

w00t!

Yahoolian · Mon Jan 10, 2005 6:53 pm

A64 3500+ vs EM64T Xeon -Anandtech

But it's unclear if any 64bit math operations were performed.

I think a better way of finding out if Intel's 64bit is doing 2x32bit for each 64bit register would be to compare P4 without 64bit with the equivelent EM64T processor.

UberGerbil · Tue Jan 11, 2005 8:29 pm

Yahoolian wrote:
A64 3500+ vs EM64T Xeon -Anandtech

But it's unclear if any 64bit math operations were performed.

I think a better way of finding out if Intel's 64bit is doing 2x32bit for each 64bit register would be to compare P4 without 64bit with the equivelent EM64T processor.

Yeah, I saw that article when it came out. It's close to useless (when it first came out it was completely useless because they were using the wrong graphs). Only a couple of the tests are 64bit and there's not enough information about the code that was emitted by the compiler. You'd really have to do synthetic tests on 64bit values, and even then it wouldn't necessarily be clear. Ultimately it will be the guys writing the compilers that are likely to notice any oddities about Intel's implementation.

danazar · Tue Jan 11, 2005 9:03 pm

This may sound bizarre... but... Do the EM64T Xeons have double the registers in 64-bit mode like the Opterons do?

I could see Intel doing something like this, supporting 64-bit calculations and native 64-bit memory addressing but leaving the number of registers the same as in 32-bit mode to "cripple" x86-64, so that people would continue to see a real performance benefit when using Itanium...

Of course, this would render 64-bit software already coded for x86-64 incompatible if it were written to take advantage of all those registers... could this be why Microsoft delayed XP-64 for so long after Intel announced its 64-bit variant, to make sure XP-64 would run on these things too?

Edit: Nevermind, I went and read the whitepaper on EM64T, and it does note that the Xeons support "eight additional general-purpose registers" in 64-bit mode.

UberGerbil · Tue Jan 11, 2005 9:17 pm

danazar wrote:
This may sound bizarre... but... Do the EM64T Xeons have double the registers in 64-bit mode like the Opterons do?

Yes, they do. As I said above, other than the physical addressing difference, Intel's implementation is identical to AMD's wrt registers, instruction set, and virtual address space. Note that in addition to the added GPRs there are also 8 additional SSE registers (which should help for matrix operations for 3D, etc).

1.2.2. 64-Bit Mode
64-bit mode is used by 64-bit applications running under a 64-bit operating system. It supports the following features:
• Architectural support for 64-bits of linear address; however IA-32 processors supporting 64-bit extension
technology may decide to implement less then 64-bits, see Section 1.3.3.3.and Section 1.5.2..
• Register extensions accessible through a set of new opcode prefixes (REX)
• Existing general purpose registers are widened to 64-bits (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP)
• Eight new general purpose registers (R8–R15)
• Eight new 128-bit streaming SIMD extension (SSE) registers (XMM8–XMM15)
• A 64-bit instruction pointer (RIP)
• A New RIP-relative data addressing mode
• Can use flat address space with single code, data, and stack space
• Extended and new instructions
• Physical address support greater than 64 GB; however the actual physical address size of IA-32 processors
supporting 64-bit extension technology is implementation specific.
• New interrupt priority control mechanism
64-bit mode is enabled by the operating system on a code-segment basis. Its default address size is 64 bits; and its
default operand size is 32 bits. Note that these defaults can be overridden on an instruction-by-instruction basis using
the new REX opcode prefixes. The REX prefix allows a 64-bit operand to be specified when operating in 64-bit mode.
By using this mechanism, many existing instructions have been modified or redefined to allow usage of the larger
64-bit registers and 64-bit addresses.

-- Section 1.2.2, page 1-2, Intel "64-Bit Extension Technology Software Developer’s Guide, Volume 1" (1.7MB PDF)

Microsoft may have been delayed somewhat in testing Windows-64 because they would want to test it on Intel's implementation, but that would not have slowed their coding and they would have had silicon from Intel before it was released for sale (Microsoft has found compatibility bugs in pre-release Intel chips in the past); in any case, since Xeons with E64MT are now available and Windows64 is not, that isn't the gating factor -- drivers are. Microsoft has a whole lot of other companies they have to convince to write 64bit code and test with Windows before they can go out the door.

TR Forums

The 64-bit debate: AMD64 vs Intel EMT64

The 64-bit debate: AMD64 vs Intel EMT64

Re: The 64-bit debate: AMD64 vs Intel EM64T

Re: The 64-bit debate: AMD64 vs Intel EM64T

Re: The 64-bit debate: AMD64 vs Intel EM64T

Re: The 64-bit debate: AMD64 vs Intel EM64T

Re: The 64-bit debate: AMD64 vs Intel EM64T

Re: The 64-bit debate: AMD64 vs Intel EM64T

Who is online