Personal computing discussed

Moderators: renee, Flying Fox, morphine

 
chuckula
Minister of Gerbil Affairs
Topic Author
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

How to crash RyZen in an automated manner

Fri Aug 04, 2017 8:36 am

The bugs in RyZen that have been lurking since launch and that haven't been fully addressed via firmware have been the subject of extensive discussion in the Linux community where the chips tend to get driven quite a bit harder than in Cinebench-mode on Windows.

Part of the problem has been that the bugs are what we like to call "Heisenbugs" that don't occur in a neatly deterministic manner that is easily debugged.

However, thanks to some Gentoo developers it's gotten a whole lot easier to trigger the crashes in an automated manner: https://phoronix.com/scan.php?page=news ... Stress-Run

That article uses a non-overclocked 1800X as a test subject. Oh, and you can crash the non-overclocked chip even with low-clocked DDR4-2133 RAM and with SMT turned off:
We'll see now if AMD will provide public comments or if they investigate further as they now have another reproducible test case to slam the Ryzen chips hard in just a few minutes even with SMT disabled and running at DDR4-2133.


While something tells me that the video-blogs that got personally engraved ThreadRipper CPUs for review aren't going to have the technical competence levels to install and run this software, I'm 99% sure that the Threadripper will be just as crashy (if not moreso) than a single-die RyZen part. This should also give some people pause when thinking about the even more complex Epyc platform. It's easy to show off some canned benchmark scores, but people don't drop money on servers to run canned benchmarks.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
SkyWarrior
Gerbil
Posts: 54
Joined: Wed Jun 21, 2006 10:27 pm
Location: Turkey

Re: How to crash RyZen in an automated manner

Fri Aug 04, 2017 8:49 am

I feel relieved that I went with i9 for my work/compute station for genome analysis. When it runs/crunches samples it runs 6 or 7 days non stop with at least 80 percent of the cores filled with 80 to 90gigs of RAM usage.

I don't know what I would feel if I had a TR system that crashes in the middle of the analysis.
Rare is common where I work...
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: How to crash RyZen in an automated manner

Fri Aug 04, 2017 8:56 am

I've been following this thread over at the AMD forums. The S/N ratio is pretty bad (a lot of people are reporting issues which sound like they are just good ol' fashioned unstable builds due to flaky RAM/mobo), but there's some good info interspersed as well.

Bottom line: There seems to be fairly convincing evidence for incorrect behavior under heavily multi-threaded workloads. Multi-threaded gcc compiles of large projects (e.g. Linux kernel) are capable of triggering the issue on many systems. A couple of people have also come up with stress tests which seem to reliably reproduce the problem. There's also some circumstantial evidence that hyperthreading and/or ASLR affect how likely you are to hit the issue.

The resolution (or lack thereof) of this issue will be the primary determining factor for if/when I pull the trigger on a Ryzen build. I do software development on Linux, so large parallel gcc runs are something I actually do on a semi-regular basis. The symptoms seem to be indicative of an underlying hardware issue (either with the CPU or the platform) of some sort; until the problem is understood I'm not going to trust it.
Nostalgia isn't what it used to be.
 
Glorious
Gerbilus Supremus
Posts: 12343
Joined: Tue Aug 27, 2002 6:35 pm

Re: How to crash RyZen in an automated manner

Fri Aug 04, 2017 9:07 am

JBI wrote:
The S/N ratio is pretty bad (a lot of people are reporting issues which sound like they are just good ol' fashioned unstable builds due to flaky RAM/mobo), but there's some good info interspersed as well.


I gave up, did amdmatt ever come back with an update?
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: How to crash RyZen in an automated manner

Fri Aug 04, 2017 9:12 am

Glorious wrote:
I gave up, did amdmatt ever come back with an update?

I need to get caught up; the thread has grown a lot in the past few days. As of the last time I skimmed the thread (2-3 days ago) he seemed to have gone silent.
Nostalgia isn't what it used to be.
 
Glorious
Gerbilus Supremus
Posts: 12343
Joined: Tue Aug 27, 2002 6:35 pm

Re: How to crash RyZen in an automated manner

Fri Aug 04, 2017 9:22 am

Ugh.

Two weeks ago it had already been over a month.

Not looking good...
 
chuckula
Minister of Gerbil Affairs
Topic Author
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: How to crash RyZen in an automated manner

Fri Aug 04, 2017 9:27 am

Glorious wrote:
Ugh.

Two weeks ago it had already been over a month.

Not looking good...


Whatever this bug is, it has been lurking for a while. Another note here is that the stress test definitely puts a heavy load on the CPU and the memory controller but it is not using any of the "exotic" AVX instructions or weird processing modes. It's just a very heavy-duty compiler benchmark that pretty much hits the main 64-bit execution paths while churning plenty of threads and memory accesses.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
Concupiscence
Gerbil Elite
Posts: 709
Joined: Tue Sep 25, 2012 7:58 am
Location: Dallas area, Texas, USA
Contact:

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 9:51 am

I really wonder what it is. My 1700 in Windows 10's handled non-stop BOINC workloads and Handbrake transcodes for days at a time without a whimper.
Science: Core i9 7940x, 64 gigs RAM, Vega FE, Xubuntu 20.04
Work: Ryzen 5 3600, 32 gigs RAM, Radeon RX 580, Win10 Pro
Tinker: Core i5 2400, 8 gigs RAM, Radeon R9 280x, Xubuntu 20.04 + MS-DOS 7.10

Read me at https://www.wallabyjones.com/
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 10:26 am

Concupiscence wrote:
I really wonder what it is. My 1700 in Windows 10's handled non-stop BOINC workloads and Handbrake transcodes for days at a time without a whimper.

May have something to do with memory access patterns. Not sure about BOINC, but transcodes will probably tend to stream a lot. Code compilation, OTOH, will probably tend to have a lot of randomness in the access patterns (lots of hash lookups and such).
Nostalgia isn't what it used to be.
 
Concupiscence
Gerbil Elite
Posts: 709
Joined: Tue Sep 25, 2012 7:58 am
Location: Dallas area, Texas, USA
Contact:

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 10:49 am

just brew it! wrote:
Concupiscence wrote:
I really wonder what it is. My 1700 in Windows 10's handled non-stop BOINC workloads and Handbrake transcodes for days at a time without a whimper.

May have something to do with memory access patterns. Not sure about BOINC, but transcodes will probably tend to stream a lot. Code compilation, OTOH, will probably tend to have a lot of randomness in the access patterns (lots of hash lookups and such).


It's also weird that it's chiefly happening under Linux. Nary a complaint from anyone running Visual Studio, as far as I can tell.
Science: Core i9 7940x, 64 gigs RAM, Vega FE, Xubuntu 20.04
Work: Ryzen 5 3600, 32 gigs RAM, Radeon RX 580, Win10 Pro
Tinker: Core i5 2400, 8 gigs RAM, Radeon R9 280x, Xubuntu 20.04 + MS-DOS 7.10

Read me at https://www.wallabyjones.com/
 
srg86
Gerbil Team Leader
Posts: 262
Joined: Tue Apr 25, 2006 7:57 am
Location: Madison, WI

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 11:29 am

Concupiscence wrote:
just brew it! wrote:
Concupiscence wrote:
I really wonder what it is. My 1700 in Windows 10's handled non-stop BOINC workloads and Handbrake transcodes for days at a time without a whimper.

May have something to do with memory access patterns. Not sure about BOINC, but transcodes will probably tend to stream a lot. Code compilation, OTOH, will probably tend to have a lot of randomness in the access patterns (lots of hash lookups and such).


It's also weird that it's chiefly happening under Linux. Nary a complaint from anyone running Visual Studio, as far as I can tell.


I've found Linux to be vastly more efficient at parallel compiling than Windows, so this doesn't surprise me. Actually Windows just feels slow. I can very much imagine a compile job on Linux stressing the CPU more.
Intel Core i7 4790K, Z97, 16GB RAM, 128GB m4 SSD, 480GB M500 SSD, 500GB WD Vel, Intel HD4600, Corsair HX650, Fedora x64.
Thinkpad T460p, Intel Core i5 6440HQ, 8GB RAM, 512GB SSD, Intel HD 530 IGP, Fedora x64, Win 10 x64.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 12:01 pm

Different compilers, different CPU scheduler & memory management, etc... it's pretty clear that it is a corner case of some sort, or AMD would've noticed and fixed it before Ryzen launched.
Nostalgia isn't what it used to be.
 
ludi
Lord High Gerbil
Posts: 8646
Joined: Fri Jun 21, 2002 10:47 pm
Location: Sunny Colorado front range

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 10:35 pm

If the Linux crash test were run under Ubuntu on Windows 10, would it still happen?
Abacus Model 2.5 | Quad-Row FX with 256 Cherry Red Slider Beads | Applewood Frame | Water Cooling by Brita Filtration
 
Redocbew
Minister of Gerbil Affairs
Posts: 2495
Joined: Sat Mar 15, 2014 11:44 am

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 10:44 pm

That's another one of your unanswerable questions, but in this case if the bug is difficult to reproduce under Windows natively, then I wouldn't think running a Linux VM/container/whatever it is would change that. That's a weird configuration for a large scale parallel compilation job anyway.
Do not meddle in the affairs of archers, for they are subtle and you won't hear them coming.
 
synthtel2
Gerbil Elite
Posts: 956
Joined: Mon Nov 16, 2015 10:30 am

Re: How to crash RyZen in an automated manner

Tue Aug 08, 2017 11:35 pm

I read somewhere or other that using Windows Subsystem for Linux still makes it easy to crash. No source, sorry.

What's recent news to me is that disabling SMT doesn't entirely fix it. I'm fine with not using SMT for the time being, but if there's really no way around it, it's a much bigger problem.

"Performance marginality problem" is terrible as PR-speak, but the terminology seems plausibly from an engineering department. That implies that giving something or other 20mV more might be the fix. It probably isn't Vcore, though, and not all voltages are so easily tweaked.

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On