Personal computing discussed

Moderators: renee, Flying Fox, morphine

 
chuckula
Minister of Gerbil Affairs
Topic Author
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 7:38 am

This is rather interestly: http://www.top500.org/news/china-tops-s ... p-machine/ and http://www.top500.org/lists/2016/06/highlights/
The Green 500 most efficient system list is also interesting, although it hasn't been officially posted quite yet: http://www.top500.org/green500/

Apparently the Chinese built a supercomputer named TaihuLight with 49,960 nodes and a custom processor to capture the #1 spot in the Top 500 supercomputer list at 93 Petaflops. The chips are the "SW26010 processor" which apparently includes multiple modules with a total of 260 cores in each processor. I am taking a SWAG that these cores are a form of MIPs derivative since the Chinese have used that architecture in the past, but there is very little public information about the chip architecture or how they perform outside of LinPack. The article itself mentions rumors about the Alpha architecture.

As an aside, the 40,960 node count using Knights Landing chips at about 3TFlops each would turn in a performance number of around 122 Petaflops.

One other interesting quote from the article to remember when looking at LinPack results without other context:
Memory-wise, each node contains 32 GB, adding up to a little over 1.3 PB for the whole machine. While that seems like a lot, it’s not much memory considering the number of cores it must feed. The much smaller 10-petaflop K supercomputer at RIKEN, for example, is outfitted with 1.4 PB of memory, and most of the other large systems on TOP500 list have much better bytes-to-FLOPS ratios than that of TaihuLight. It also relies on the older DDR3 technology, which is slower and more power-hungry than the newer DDR4 memory.



A few other interestly things:
1. The number of machines on the list that use GPU or other custom acceleration parts actually dropped from the last list in 2015.
2. Intel may not be in the #1 system anymore, but Intel's overall share increased slightly to 455 systems (91%).
3. In order to get on the list a system needs at least 285.9 TFlops of Linpack performance, and 958 TFlops to crack the top 100, so the top-100 mark is probably going over 1 Petaflop by the time they release the November list.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
Deanjo
Graphmaster Gerbil
Posts: 1212
Joined: Tue Mar 03, 2009 11:31 am

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 8:07 am

It will be interesting to see how many existing supercomputers that are currently running Tesla systems upgrade to the new Tesla P100 since they are drop in replacements for older ones. I know the Swiss have already committed to upgrading theirs.
 
anotherengineer
Gerbil Jedi
Posts: 1688
Joined: Fri Sep 25, 2009 1:53 pm
Location: Northern, ON Canada, Yes I know, Up in the sticks

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 8:15 am

Deanjo wrote:
It will be interesting to see how many existing supercomputers that are currently running Tesla systems upgrade to the new Tesla P100 since they are drop in replacements for older ones. I know the Swiss have already committed to upgrading theirs.



How do you know that?
Life doesn't change after marriage, it changes after children!
 
chuckula
Minister of Gerbil Affairs
Topic Author
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 8:18 am

Deanjo wrote:
It will be interesting to see how many existing supercomputers that are currently running Tesla systems upgrade to the new Tesla P100 since they are drop in replacements for older ones. I know the Swiss have already committed to upgrading theirs.


I talked to the Swiss and they were rather neutral about the whole thing.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
Deanjo
Graphmaster Gerbil
Posts: 1212
Joined: Tue Mar 03, 2009 11:31 am

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 8:19 am

anotherengineer wrote:
Deanjo wrote:
It will be interesting to see how many existing supercomputers that are currently running Tesla systems upgrade to the new Tesla P100 since they are drop in replacements for older ones. I know the Swiss have already committed to upgrading theirs.



How do you know that?


I can read.

http://www.anandtech.com/show/10433/nvi ... tesla-p100

Finally, buried in the PCIe Tesla P100 announcement, NVIDIA has also reconfirmed that the Piz Daint supercomputer upgrade project is on schedule for later this year. The Swiss National Supercomputing Center will be doing a drop-in upgrade, replacing the supercomputer’s 4,500 Tesla K20X cards with Tesla P100 PCIe cards. This will be, to our knowledge, the first Pascal P100 based supercomputer to come online once the upgrade is completed.


http://nvidianews.nvidia.com/news/nvidi ... e-than-30x

Later this year, Tesla P100 accelerators for PCIe will power an upgraded version of Europe's fastest supercomputer, the Piz Daint system at the Swiss National Supercomputing Center in Lugano, Switzerland.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 8:41 am

Ahh, the "my supercompter has more FLOPs than yours" e-peen competition brings back some (mostly) fond memories. It has been a long time since I worked in that field. Nice to see it is still going strong, over 2 decades later! :lol:
Nostalgia isn't what it used to be.
 
the
Gerbil Elite
Posts: 941
Joined: Tue Jun 29, 2010 2:26 am

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 8:44 am

Something doesn't sit right with the new TaihuLight system. The chip lacks significant amount of cache which is very surprising. In fact, the chip reads much like a GPU architecture with a management processor a set of sixteen cores. (Edit: the management processor has a 256 KB of L2 cache which makes the GPU comparison more apt and the overall architecture a bit more sane.) The interconnect also seems to be rather weak at 16 GB/s between nodes. The bandwidth and latency issues could be side stepped by having a high number of links per node and doing things like a 5D torus configuration. Otherwise the interconnect will be a huge bottleneck give the 40960 nodes in the system. (Edit: There are three tiers of interconnect with the first tier linking 256 nodes into a super node. Bandwidth within a super node is very good at 70 TB aggregate here. The second tier is at the cabinet level which links four super nodes and the their tier is the cabinet-to-cabinet link. Bandwidth and latency figure for these other tiers is not given.)

The memory configuration is just weird. It has 136 GB of bandwidth per node using DDR3 technology. That would imply a 512 bit wide bus (576 bit with ECC) with 2133 Mhz memory. The interesting part here is that each channel would only have 4 GB of memory. I strongly suspect that this configuration is done to minimize latency and reduce the number of actual memory chips as it would be trivial to double the amount of memory per node at that speed. (Edit: The PDF linked below has a picture of a node and it does strongly imply a 576 bit with ECC memory bus due to the number of memory chips.)

I'm also curious if this system is based upon the old Alpha architecture. Blast from the past and it just makes me sad that HP chose Itanium over this. (Edit: I would say these are typos but the compute processors are described as 62 bit with a 264 bit vector unit. This would imply something else other than Alpha but as I said, likely typos. The management and compute cores are different designs with the management core having either dual vector units or a single vector unit that is twice as wide as a compute core. The management cores can be used by applications for processor and are included toward that aggregate FLOPs figure.)

The summary in the PDF makes a conclusion that I can agree with: real world performance is going to be far lower than the 125 PFLOP peak this system is capable of due to memory bandwidth compared to compute. The system's relatively low memory capacity per node is seen as another issue even though the aggregate system has an aggregate 1.3 PB of memory. Another pitfall is that this system doesn't use nonvolatile memory anywhere. I would have thought that a modern super computers would make use of some SSD caching in their storage system.

Major edits: I was finally able to download this PDF that goes into more detail of the architecture. Also in the paper is that the NDA for Intel's Knight's Landing chip expires at 6 PM tonight.
Last edited by the on Mon Jun 20, 2016 9:20 am, edited 2 times in total.
Dual Opteron 6376, 96 GB DDR3, Asus KGPE-D16, GTX 970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 [email protected] Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 [email protected] Ghz, 16 GB DDR3, GTX 970, GA-X68XP-UD4
 
chuckula
Minister of Gerbil Affairs
Topic Author
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 8:47 am

the wrote:
Something doesn't sit right with the new TaihuLight system.


All your points are quite valid, and I even noticed a few caveats in the press release. One of the issues is that the Top 500 rankings are just based off of LinPack, and we all know that LinPack can be gamed for fun and profit.

As for your analysis of the memory controller layout, the article implies that each processor is actually a quad-chip module in a single package. If each chip has a rather ordinary dual-channel (128-bit) memory controller, then 4 chips in a package gets you right to a 512-bit interface using a single 4 GB DIMM for each channel.

I would hope that the Chinese designed and built this system for something other than a publicity stunt. Hopefully there will be more public information available about their setup with applications that go beyond simply running a benchmark.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 9:11 am

It is entirely possible (even likely...) that this machine was developed for the sole purpose of running specific benchmarks as fast as possible, to demonstrate China's technological "superiority".
Nostalgia isn't what it used to be.
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 9:25 am

just brew it! wrote:
It is entirely possible (even likely...) that this machine was developed for the sole purpose of running specific benchmarks as fast as possible, to demonstrate China's technological "superiority".

That's all their last machine was as well...although it's an engineering hurdle they are getting better at clearing.
Victory requires no explanation. Defeat allows none.
 
the
Gerbil Elite
Posts: 941
Joined: Tue Jun 29, 2010 2:26 am

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 9:52 am

just brew it! wrote:
It is entirely possible (even likely...) that this machine was developed for the sole purpose of running specific benchmarks as fast as possible, to demonstrate China's technological "superiority".


It'll do rather well on a few benchmarks but even for real applications it is getting between 30 and 40 PFLOP which is in the area of the previous number 1 system was able to obtain on benchmarks. Tianhe-2 and Titan also suffer from inefficiencies due to their hybrid nature of CPU + discrete accelerator. The last system to really stress high efficiency and was able to transition that to real world applications was BlueGene/Q (which still ranks rather high in the top 500).

I'm not worried about this surge in Chinese super computing when there are numerous upgrades on the horizon for US systems. IBM POWER8 + nVidia Telsa P100 will be a very potent combination as those systems come online in 2017. There will be a couple of Xeon Phi clusters appearing in 2017 that'll also be noteworthy in the US too. The Top 500 is primed for a major shake up in 2017 which has been rather dull in the past couple of years.
Dual Opteron 6376, 96 GB DDR3, Asus KGPE-D16, GTX 970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 [email protected] Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 [email protected] Ghz, 16 GB DDR3, GTX 970, GA-X68XP-UD4
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 10:27 am

the wrote:
I'm not worried about this surge in Chinese super computing when there are numerous upgrades on the horizon for US systems. IBM POWER8 + nVidia Telsa P100 will be a very potent combination as those systems come online in 2017. There will be a couple of Xeon Phi clusters appearing in 2017 that'll also be noteworthy in the US too. The Top 500 is primed for a major shake up in 2017 which has been rather dull in the past couple of years.

As someone in the (fast) energy delivery field...I'm worried.

The CORAL (Power8 + Tesla) systems will be similarly unbalanced in terms of memory capacity. Sure, they're going to crank out a lot of FLOPs, but that race is pointless if you can't run big problems. The BlueGene/Q had the same problem. The workloads they run generally aren't system-scale, so they're competing in a different arena (they do throughput-style workloads with lots of small jobs).

Trinity (LANL's machine) has the KNL portion arriving right now, it should be in the 40 PFLOP range, but more importantly, it has a lot of memory bandwidth and capacity. It should be online (with the KNL chips) this fall.

Now, China. They're building all this stuff in-house. They're throwing a LOT of resources at it. With ECP moving incredibly slowly, we aren't really poised to counter their single-mindedness...
Victory requires no explanation. Defeat allows none.
 
anotherengineer
Gerbil Jedi
Posts: 1688
Joined: Fri Sep 25, 2009 1:53 pm
Location: Northern, ON Canada, Yes I know, Up in the sticks

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 11:25 am

Deanjo wrote:
anotherengineer wrote:
Deanjo wrote:
It will be interesting to see how many existing supercomputers that are currently running Tesla systems upgrade to the new Tesla P100 since they are drop in replacements for older ones. I know the Swiss have already committed to upgrading theirs.



How do you know that?


I can read.

http://www.anandtech.com/show/10433/nvi ... tesla-p100

Finally, buried in the PCIe Tesla P100 announcement, NVIDIA has also reconfirmed that the Piz Daint supercomputer upgrade project is on schedule for later this year. The Swiss National Supercomputing Center will be doing a drop-in upgrade, replacing the supercomputer’s 4,500 Tesla K20X cards with Tesla P100 PCIe cards. This will be, to our knowledge, the first Pascal P100 based supercomputer to come online once the upgrade is completed.


http://nvidianews.nvidia.com/news/nvidi ... e-than-30x

Later this year, Tesla P100 accelerators for PCIe will power an upgraded version of Europe's fastest supercomputer, the Piz Daint system at the Swiss National Supercomputing Center in Lugano, Switzerland.


Lies

you have friends on the "inside" ;)
Life doesn't change after marriage, it changes after children!
 
paco
Minister of Gerbil Affairs
Posts: 2083
Joined: Wed Jul 21, 2004 7:14 pm
Location: So Cal

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 11:55 am

chuckula wrote:
All your points are quite valid, and I even noticed a few caveats in the press release. One of the issues is that the Top 500 rankings are just based off of LinPack, and we all know that LinPack can be gamed for fun and profit.

As for your analysis of the memory controller layout, the article implies that each processor is actually a quad-chip module in a single package. If each chip has a rather ordinary dual-channel (128-bit) memory controller, then 4 chips in a package gets you right to a 512-bit interface using a single 4 GB DIMM for each channel.

I would hope that the Chinese designed and built this system for something other than a publicity stunt. Hopefully there will be more public information available about their setup with applications that go beyond simply running a benchmark.


HPC Wire is already reporting that it's doing very poorly in other benchmarks:
However, as we know LINPACK does not tell the whole story. On the HPCG benchmark, Sunway TaihuLight reported only .371 petaflops, which is 3 percent of peak. Compare this with 0.580 petaflops on Tianhe-2 (1.1 percent of peak) and .322 petaflops on Titan (1.2 percent of peak). RIKEN’s K computer reports 0.460 HPCG performance, 4.1 percent theoretical peak.

 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 12:26 pm

Now, I'm a skeptical person, but does it bug anyone else that the report by Dongarra (http://www.netlib.org/utk/people/JackDo ... t-2016.pdf) has only renderings of the components/system? I'm surprised there are no actual pictures of anything, from the system board, CPUs, etc...not even one of any component in the system (or the system itself).

Not that I think they're lying (totally), but it does raise my eyebrow. :P
Victory requires no explanation. Defeat allows none.
 
Deanjo
Graphmaster Gerbil
Posts: 1212
Joined: Tue Mar 03, 2009 11:31 am

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 12:29 pm

anotherengineer wrote:

Lies

you have friends on the "inside" ;)


I neither confirm nor deny said accusation.
 
chuckula
Minister of Gerbil Affairs
Topic Author
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 12:33 pm

Waco wrote:
Now, I'm a skeptical person, but does it bug anyone else that the report by Dongarra (http://www.netlib.org/utk/people/JackDo ... t-2016.pdf) has only renderings of the components/system? I'm surprised there are no actual pictures of anything, from the system board, CPUs, etc...not even one of any component in the system (or the system itself).

Not that I think they're lying (totally), but it does raise my eyebrow. :P



That Dongarra guy isn't directly involved with the project. He's a professor at the University of Tennessee, which you might recall isn't all that far from Oak Ridge which is home to Titan (and a crap ton of radioactive material!).

I actually think he's trying to talk up TaihuLight. That's because making the Chinese machine seem as capable as possible is good for convincing politicians to drop money on better systems in the U.S. After all, Dr. Strangelove wouldn't want us to have an HPC gap.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Home-Grown Chinese Supercomputer #1 in Top 500

Mon Jun 20, 2016 12:39 pm

chuckula wrote:
That Dongarra guy isn't directly involved with the project. He's a professor at the University of Tennessee, which you might recall isn't all that far from Oak Ridge which is home to Titan (and a crap ton of radioactive material!).

I actually think he's trying to talk up TaihuLight. That's because making the Chinese machine seem as capable as possible is good for convincing politicians to drop money on better systems in the U.S. After all, Dr. Strangelove wouldn't want us to have an HPC gap.

I'm well aware, I was just looking forward to some actual hardware pictures! It's not like the Chinese have any reason to make things up. /sarcasm :lol:

I'm all for this system scaring politicians into moving forward. HPC is my field, and ECP has been at a standstill for too long already. :)
Victory requires no explanation. Defeat allows none.

Who is online

Users browsing this forum: No registered users and 60 guests
GZIP: On