Hypercube Memory on Xeon Phi (Knights Landing?)

Discussion of all forms of processors, from AMD to Intel to VIA.

Moderators: Flying Fox, morphine

Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 1:59 pm

I read up on the Hypercube memory Micron and Intel are using in an upcoming Xeon chip. Anyone think that'll catch on in commodity PCs? What sort of cooling challenges do you think it might involve?

I hope we don't end up with a MCM "Hershey Bar" ala Slot A.
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 2:15 pm

It'll probably trickle down to commodity systems eventually, assuming the tech catches on at all. It sounds like a good way to pack more (and faster) RAM into a smaller footprint. Yeah, power dissipation might be an issue, but AFAIK most of the power consumed by DRAM is due to the bus interface, so I'd say there's a good chance these things won't require active cooling.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37510
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 2:36 pm

just brew it! wrote:It'll probably trickle down to commodity systems eventually, assuming the tech catches on at all. It sounds like a good way to pack more (and faster) RAM into a smaller footprint. Yeah, power dissipation might be an issue, but AFAIK most of the power consumed by DRAM is due to the bus interface, so I'd say there's a good chance these things won't require active cooling.



That's awesome! I was worried about the cooling. I think intel or micron stated they can put embed 16GB RAM on it, which should make for a very interesting L3 cache. That's more RAM than my computer plus video card combined! I've only got 8GB DDR3, and 3GB GDDR5.
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 4:37 pm

If HMC can get 16 GB capacity in say 4 cubes*, I could potentially see a Sky Lake SoC using it as primary memory. 16 GB is going to be a 'good enough' amount of memory for many users even 2 years from now. HMC offers many of the benefits the current notebook market is trending towards: smaller due to less board space, lower power, and higher performance. The only negative would be cost which is to be compounded due Intel likely pairing HMC with a die with a more powerful GPU to make use of the extra memory bandwidth. The cooling aspect of HMC shouldn't be any better/worse than current Haswell GT3e parts with the external embedded DRAM chip in the package. From a system level power perspective, HMC should be on-par with the eDRAM power consumption so the external DDR3L memory could effectively be removed from the equation. This of course is my own speculation.

*OK, terminology issue here. HMC is to be included in the same package as the processor. You can't say HMC die because, well it is multiple dies. You can't say HMC package because it really isn't its own package that'd be included in the processor package. Its also not a package-on-package situation that some ultra mobile SoC's use either.
Dual Opteron 6376, 128 GB DDR3, Asus KGPE-D16, Radeon 6970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 3930K@4.2 Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 2600K@4.4 Ghz, 16 GB DDR3, Radeon 6870, GA-X68XP-UD4
the
Gerbil
Gold subscriber
 
 
Posts: 57
Joined: Tue Jun 29, 2010 2:26 am

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 6:16 pm

The first consumer application will be for GPUs...I feel certain I remember an NVidia slide saying they would have HMC for a chip in a few generations, something like 2016.
MadManOriginal
Graphmaster Gerbil
 
Posts: 1404
Joined: Wed Jan 30, 2002 7:00 pm
Location: In my head...

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 7:37 pm

MadManOriginal wrote:The first consumer application will be for GPUs...I feel certain I remember an NVidia slide saying they would have HMC for a chip in a few generations, something like 2016.


nVidia first announced support for 3D memory with their Volta chip back in 2013. Since then, Volta has been moved to presumably a 2017/2018 spot with Pascal appearing in 2016. The arrival of 3D memory hasn't changed as Pascal has that featured targeted for the high end. Though nVidia has backed down from their 1 TB/s target bandwidth in 2016 citing the costs involved.

The only one who hasn't put a form of stacked memory on their roadmap has been AMD. The reasoning for that is rather straightforward: they don't publish a discrete GPU roadmap for the public (they do for CPUs and their SoCs).
Dual Opteron 6376, 128 GB DDR3, Asus KGPE-D16, Radeon 6970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 3930K@4.2 Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 2600K@4.4 Ghz, 16 GB DDR3, Radeon 6870, GA-X68XP-UD4
the
Gerbil
Gold subscriber
 
 
Posts: 57
Joined: Tue Jun 29, 2010 2:26 am

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 9:30 pm

the wrote:If HMC can get 16 GB capacity in say 4 cubes*, I could potentially see a Sky Lake SoC using it as primary memory. 16 GB is going to be a 'good enough' amount of memory for many users even 2 years from now. HMC offers many of the benefits the current notebook market is trending towards: smaller due to less board space, lower power, and higher performance. The only negative would be cost which is to be compounded due Intel likely pairing HMC with a die with a more powerful GPU to make use of the extra memory bandwidth. The cooling aspect of HMC shouldn't be any better/worse than current Haswell GT3e parts with the external embedded DRAM chip in the package. From a system level power perspective, HMC should be on-par with the eDRAM power consumption so the external DDR3L memory could effectively be removed from the equation. This of course is my own speculation.

*OK, terminology issue here. HMC is to be included in the same package as the processor. You can't say HMC die because, well it is multiple dies. You can't say HMC package because it really isn't its own package that'd be included in the processor package. Its also not a package-on-package situation that some ultra mobile SoC's use either.




Wouldn't the extra bandwidth be served, as you say, between multiple users simultaneously accessing that resource on a server, than a stupid GPU? nVidia already allows GPU sharing in VMware, and I don't see intel wanting to fight that battle. Maybe this is more a shot at ARM, by allowing a bunch of users to share that CPU, with minimal speed hit.
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 10:02 pm

Hz so good wrote:Wouldn't the extra bandwidth be served, as you say, between multiple users simultaneously accessing that resource on a server, than a stupid GPU? nVidia already allows GPU sharing in VMware, and I don't see intel wanting to fight that battle. Maybe this is more a shot at ARM, by allowing a bunch of users to share that CPU, with minimal speed hit.


It is about what Intel is currently doing: dropping power consumption and driving smaller form factors. HMC with a mobile CPU enable the removal of the DRAM on the motherboard while maintaining power consumption in the package. The result is a less platform power for a motherboard that will go into smaller/thinner/lighter laptops. This also significantly boosts the speed of Intel's GPU in this area as it should provide more bandwidth than the eDRAM and cutting a tad bit of latency involving the extra layer of cache.*

For the server market, Intel could use HMC as a L4 cache in the future as 16 GB isn't that large as main memory for many server data sets. That's why Intel is outfitting Knights Landing with DDR4 memory controllers in addition to the HMC pool. This enables some interesting possibilities with Knights Landing like certain database applications (think Hadoop or Casandra) that would be impractical on Knights Corner given the need for data shuffling.

*The L4 cache on Haswell GT3e is a purely a victim cache design that gets checked before main memory. Latency improvements comes from the removal of the worst case scenario in the L4 of a cache miss and HMC having presumably similar overall latency as the eDRAM. Win-win.

Edit: clarified HMC in server CPU's as cache as 16 GB isn't enough for main memory.
Last edited by the on Sun Jul 13, 2014 11:43 pm, edited 1 time in total.
Dual Opteron 6376, 128 GB DDR3, Asus KGPE-D16, Radeon 6970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 3930K@4.2 Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 2600K@4.4 Ghz, 16 GB DDR3, Radeon 6870, GA-X68XP-UD4
the
Gerbil
Gold subscriber
 
 
Posts: 57
Joined: Tue Jun 29, 2010 2:26 am

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 10:17 pm

the wrote:
Hz so good wrote:Wouldn't the extra bandwidth be served, as you say, between multiple users simultaneously accessing that resource on a server, than a stupid GPU? nVidia already allows GPU sharing in VMware, and I don't see intel wanting to fight that battle. Maybe this is more a shot at ARM, by allowing a bunch of users to share that CPU, with minimal speed hit.


It is about what Intel is currently doing: dropping power consumption and driving smaller form factors. HMC with a mobile CPU enable the removal of the DRAM on the motherboard while maintaining power consumption in the package. The result is a less platform power for a motherboard that will go into smaller/thinner/lighter laptops. This also significantly boosts the speed of Intel's GPU in this area as it should provide more bandwidth than the eDRAM and cutting a tad bit of latency involving the extra layer of cache.*

For the server market, Intel could use HMC as a L4 cache in the future but for servers 16 GB isn't that large for many data sets. That's why Intel is outfitting Knights Landing with DDR4 memory controllers in addition to the HMC pool. This enables some interesting possibilities with Knights Landing like certain database applications (think Hadoop or Casandra) that would be impractical on Knights Corner given the need for data shuffling.

*The L4 cache on Haswell GT3e is a purely a victim cache design that gets checked before main memory. Latency improvements comes from the removal of the worst case scenario in the L4 of a cache miss and HMC having presumably similar overall latency as the eDRAM. Win-win.



Ooooh, OK! I like reading about chips and have a fairly decent grasp on how they work, but I couldn't tell you the best use for one, even if you handed me a directions sheet. :)

I like perusing Ars (and read Hannibals book), RealWorldTech, SemiAccurate, TR, and even the dreaded anandtech sometimes. Makes me feel smarter than I am. :P

The only prognostication I think I'm correct about, is Huawei US poaching a bunch of Cisco ASIC guys. If you believe Der Speigel, I think Huawei unknowngly hired one or more people who could sneak backdoors into all their ASICs for the NSA. That, or my tinfoil hat needs adjusting.

This chart, however, makes me wanna wrap the whole house in tinfoil!
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 10:56 pm

the wrote:
For the server market, Intel could use HMC as a L4 cache in the future but for servers 16 GB isn't that large for many data sets. That's why Intel is outfitting Knights Landing with DDR4 memory controllers in addition to the HMC pool. This enables some interesting possibilities with Knights Landing like certain database applications (think Hadoop or Casandra) that would be impractical on Knights Corner given the need for data shuffling.



That got me thinking, while having a smoke and walking the dogs. I have zero experience with large DBs (beyond *hating* MS-SQL classes), and when you mentioned data shuffling, it got me thinking about some ASICs found in switches. Cisco normally just gives you a high level overview of how their ASICs work, but they did put a very heavy emphasis on TCAM. I have to know about it, since it allows you to complete a search of the entire table of MAC address and VACLs in a single cycle. Do enterprise level DB hardware (think IBM, Oracle) contain any TCAM to speed up lookups and comparisons? Or is it too expensive to use in bulk amounts (like 1T-SRAM)?
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Sun Jul 06, 2014 11:08 pm

CAM does not scale well and is rather inflexible, since the comparison logic needs to be wired into the physical hardware. It isn't practical to use it for arbitrary sized database tables with arbitrary keys.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37510
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 1:04 pm

just brew it! wrote:CAM does not scale well and is rather inflexible, since the comparison logic needs to be wired into the physical hardware. It isn't practical to use it for arbitrary sized database tables with arbitrary keys.



So there's never some portion of a DB that constantly gets compared more often than other parts? Genuinely curious, cause like I said, I barely stayed awake in SQL courses.

So brute forcing it via DDR4 or some Cube memory is the better way to go? Well, it worked for the GPU industry, so why not! :)
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 1:39 pm

Hz so good wrote:So there's never some portion of a DB that constantly gets compared more often than other parts? Genuinely curious, cause like I said, I barely stayed awake in SQL courses.

Well, there certainly *could* be.

But the additional complexity and overhead of figuring out what part of the database index to keep in the CAM, narrow applicability, and the fact that the data record you actually care about probably still needs to be fetched from traditional RAM or secondary storage anyway (you'd want to keep just the index in CAM because CAM is very expensive), means it just isn't going to be economical even for frequently accessed data.

CAMs are great for looking up small items based on small keys, in situations where you need response times measured in nanoseconds. That's why they are used in network switching/routing equipment. In a typical database application, other delays in the system are going to be on the order of microseconds or even milliseconds; reducing the lookup time for some subset of your database keys to the nanosecond level (at great expense!) just doesn't make sense.

I could maybe see it being useful in a HFT (High Frequency Trading) system. But that's a pretty specialized type of database, in a niche where nanoseconds make the difference between making and losing large sums of money. For the pieces of infrastructure requiring that level of response time, the HFT guys roll their own hardware (using FPGAs, ASICs, etc.); they're not using general purpose commercial databases.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37510
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 1:48 pm

just brew it! wrote:
Hz so good wrote:So there's never some portion of a DB that constantly gets compared more often than other parts? Genuinely curious, cause like I said, I barely stayed awake in SQL courses.

Well, there certainly *could* be.

But the additional complexity and overhead of figuring out what part of the database index to keep in the CAM, narrow applicability, and the fact that the data record you actually care about probably still needs to be fetched from traditional RAM or secondary storage anyway (you'd want to keep just the index in CAM because CAM is very expensive), means it just isn't going to be economical even for frequently accessed data.

CAMs are great for looking up small items based on small keys, in situations where you need response times measured in nanoseconds. That's why they are used in network switching/routing equipment. In a typical database application, other delays in the system are going to be on the order of microseconds or even milliseconds; reducing the lookup time for some subset of your database keys to the nanosecond level (at great expense!) just doesn't make sense.

I could maybe see it being useful in a HFT (High Frequency Trading) system. But that's a pretty specialized type of database, in a niche where nanoseconds make the difference between making and losing large sums of money. For the pieces of infrastructure requiring that level of response time, the HFT guys roll their own hardware (using FPGAs, ASICs, etc.); they're not using general purpose commercial databases.



Ah, ok. I gotcha now. I was just idly curious, since I know TCAMs can lookup address and apply Virtual ACLs all in one op. I guess it wasn't as handy as I thought. It does make me wonder how the "Big Iron" IBM machines can do all that work as fast as they do. I've only got a manuals for their lines, ranging from AS/400 to the new z/whatever, but I don't fully understand it.
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 1:50 pm

just brew it! wrote:the HFT guys roll their own hardware (using FPGAs, ASICs, etc.); they're not using general purpose commercial databases.



Oh, btw, Cisco has a nexus 4000 line expressly for HFT. I only know this from studying for the CCNA-DC exam, and tinkering with the NX-OS simulator. Not that I'll ever get to play with those 4000 series...
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 1:59 pm

Hz so good wrote:
just brew it! wrote:the HFT guys roll their own hardware (using FPGAs, ASICs, etc.); they're not using general purpose commercial databases.

Oh, btw, Cisco has a nexus 4000 line expressly for HFT. I only know this from studying for the CCNA-DC exam, and tinkering with the NX-OS simulator. Not that I'll ever get to play with those 4000 series...

Yup, given how that industry operates, HFT firms would be willing to pay a hefty premium just to shave a few nanoseconds off of their ping times. In the vast majority of non-HFT use cases, a few nanoseconds of network latency here and there just doesn't matter.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37510
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 2:08 pm

just brew it! wrote:
Hz so good wrote:
just brew it! wrote:the HFT guys roll their own hardware (using FPGAs, ASICs, etc.); they're not using general purpose commercial databases.

Oh, btw, Cisco has a nexus 4000 line expressly for HFT. I only know this from studying for the CCNA-DC exam, and tinkering with the NX-OS simulator. Not that I'll ever get to play with those 4000 series...

Yup, given how that industry operates, HFT firms would be willing to pay a hefty premium just to shave a few nanoseconds off of their ping times. In the vast majority of non-HFT use cases, a few nanoseconds of network latency here and there just doesn't matter.



And that's the stuff I'm not allowed within 50ft of. Sure, let me run amok in COs and Co-locations, and let me tinker with 6500 series catalysts, T3 routers, multi-mile DS3 radios, MPLS routers, Sonet muxers, DSLAMS, aggregators, blade servers, and fabric extenders. But god forbid I even look at those funny.... :roll:
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 2:11 pm

just brew it! wrote: In the vast majority of non-HFT use cases, a few nanoseconds of network latency here and there just doesn't matter.


Heh, I remember when a 2 minute outage due to 802.1d spanning tree electing a new root bridge, and reconverging was normal. Now, if Joe Bob has a 4 second outage, he'll start screaming about not being able to reach facebook. *sigh*

I do have one outage story that's funny, and was tragic enough that I made a 2am truck roll just to help that one guy out. It's not PG-13, so I probably shouldn't share here.

Better yet, here it is in spoiler text. Not exactly safe for work language:

Long ago, there was an install in a certain famous college town. In order to extend WiFi coverage from the office (where all gear was) to an apartment complex down the road, the previous engineer before me decided to mount a 2.4GHz Yagi on a tall sign, to "fire" backhaul signal to that complex (the Subscriber unit was on the Apt complex office building, and spread from there). Problem is, that Yagi shot went directly over an elevated train tressle. In his defense, when he installed it, trains didn't run through there that often. BY the time I got there, they ran on very regular schedules, especially at midnight. Anyways, I got a support call from a very despondant customer about how everytime he watched porn around midnight, the train would block the signal, invariably cutting off the money shot. Dude had suffered from blue balls from 3 days straight, and sounded so pitiful. I felt so horrible for him (while doing my damnedest to not die laughing), that I truck rolled at 2AM, and re-engineered that entire backhaul link for him. I've never seen a customer so grateful.
Hz so good
Gerbil XP
 
Posts: 466
Joined: Wed Dec 04, 2013 5:08 pm

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Mon Jul 07, 2014 10:44 pm

just brew it! wrote:
Hz so good wrote:So there's never some portion of a DB that constantly gets compared more often than other parts? Genuinely curious, cause like I said, I barely stayed awake in SQL courses.

Well, there certainly *could* be.

But the additional complexity and overhead of figuring out what part of the database index to keep in the CAM, narrow applicability, and the fact that the data record you actually care about probably still needs to be fetched from traditional RAM or secondary storage anyway (you'd want to keep just the index in CAM because CAM is very expensive), means it just isn't going to be economical even for frequently accessed data.

CAMs are great for looking up small items based on small keys, in situations where you need response times measured in nanoseconds. That's why they are used in network switching/routing equipment. In a typical database application, other delays in the system are going to be on the order of microseconds or even milliseconds; reducing the lookup time for some subset of your database keys to the nanosecond level (at great expense!) just doesn't make sense.


Generally agree. The cost benefit isn't worth it, especially with low overall utility.

There is one niche I'd like to point out in this space that's a good fit: classical sorting. Being able to looking an entire array at once instead of pairs of values to compare offers a remarkable speed up in the algorithm. The cost-benefit for the niche utility just isn't worth it.

just brew it! wrote:I could maybe see it being useful in a HFT (High Frequency Trading) system. But that's a pretty specialized type of database, in a niche where nanoseconds make the difference between making and losing large sums of money. For the pieces of infrastructure requiring that level of response time, the HFT guys roll their own hardware (using FPGAs, ASICs, etc.); they're not using general purpose commercial databases.


It is mainly FPGA's from my understanding. The time to develop, test and validate an ASIC has no cost-benefit considering the resources involved to develop and the likelihood of the ASIC algorithm being deprecated before reaching the market. The time to market, flexibility and the performance of FGPA's make them ideal.
Dual Opteron 6376, 128 GB DDR3, Asus KGPE-D16, Radeon 6970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 3930K@4.2 Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 2600K@4.4 Ghz, 16 GB DDR3, Radeon 6870, GA-X68XP-UD4
the
Gerbil
Gold subscriber
 
 
Posts: 57
Joined: Tue Jun 29, 2010 2:26 am

Re: Hypercube Memory on Xeon Phi (Knights Landing?)

Postposted on Thu Jul 10, 2014 12:23 pm

Hz so good wrote:I read up on the Hypercube memory Micron and Intel are using in an upcoming Xeon chip. Anyone think that'll catch on in commodity PCs? What sort of cooling challenges do you think it might involve?

I hope we don't end up with a MCM "Hershey Bar" ala Slot A.


Well, Samsung's new 3D NAND found in the Samsung 850 SSDs is technically very similar, it's a 3D stacked concept but with NAND instead of RAM. And that will certainly catch on fast in the NAND market given the benefits it brings.

As for the Hypercube memory, that Intel is going to use it on the Xeon Phi so soon is very promising for the technology! I hope it means Intel will start to trickle it down into their eDRAM processors and then onto other chips... will be very fun to see what NVIDIA can do with the tech themselves!
Kougar
Gerbil Team Leader
 
Posts: 245
Joined: Tue Dec 02, 2008 2:12 am
Location: Texas


Return to Processors

Who is online

Users browsing this forum: Google [Bot] and 3 guests