Personal computing discussed

Moderators: renee, SecretSquirrel, notfred

 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Amusing bug in RHEL 7.4 - [UPDATED]

Wed Apr 24, 2019 9:55 am

Turns out it doesn't handle EPYC + 1 TB of memory well - we have a box that will only stay up for 2 days before tossing out lots of out of memory errors and invoking the OOM-killer. Digging in in the next week or so to find out why it's bombing like this, but seeing those errors on an idle box with a terabyte of memory made me laugh. :P


UPDATE: Turns out someone misconfigured a mail server on the box and actually managed to accumulate a terabyte of debug emails. They conveniently didn't tell me this when they reported the box was crashing. :P
Last edited by Waco on Mon May 06, 2019 10:50 am, edited 1 time in total.
Victory requires no explanation. Defeat allows none.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 -

Wed Apr 24, 2019 10:25 am

We've seen some superficially similar issues with Debian and 768GB of RAM IIRC.
Nostalgia isn't what it used to be.
 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Amusing bug in RHEL 7.4 -

Wed Apr 24, 2019 1:44 pm

This is more annoying to me than anything because I've had boxes with 2 TB+ of DRAM on RHEL 6. No special kernel or anything.
Victory requires no explanation. Defeat allows none.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Amusing bug in RHEL 7.4 -

Fri Apr 26, 2019 4:17 pm

Gah, is this EPYC-specific or does 7 just not like gobs of RAM? Got a link? Asking as I've a couple of old 6 y.o. servers with 1.5 TB RAM each that I'm planning on upgrading from RHEL6 to RHEL7 in the summer (or failing that, summer 2020, EOSL for affordable RHEL6 being 18 months away).
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
NovusBogus
Graphmaster Gerbil
Posts: 1408
Joined: Sun Jan 06, 2013 12:37 am

Re: Amusing bug in RHEL 7.4 -

Fri Apr 26, 2019 9:19 pm

Oh, deary me. And here I was dreaming about upgrading to Epyc rather than dual-channel fake Xeons for my future workstation. Though in that case the memory budget is substantially smaller, so maybe it's not an issue.
 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Amusing bug in RHEL 7.4 -

Sun Apr 28, 2019 4:41 pm

Topinio wrote:
Gah, is this EPYC-specific or does 7 just not like gobs of RAM? Got a link? Asking as I've a couple of old 6 y.o. servers with 1.5 TB RAM each that I'm planning on upgrading from RHEL6 to RHEL7 in the summer (or failing that, summer 2020, EOSL for affordable RHEL6 being 18 months away).

I don't know yet - I'm hoping to spend some time debugging in the next week or two to figure that out. I would *assume* that it's Epyc (and potentially BIOS-setting) specific at this point, since RHEL5+ has been able to handle this kind of DRAM quantity forever.
Victory requires no explanation. Defeat allows none.
 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Mon May 06, 2019 10:51 am

Sigh. Easy solution: someone misconfigured mail and accumulated a terabyte of debug emails that couldn't be sent out. No apparent bugs in RHEL. :)
Victory requires no explanation. Defeat allows none.
 
wizardz
Gerbil
Posts: 93
Joined: Thu Nov 02, 2006 12:58 pm
Location: Montreal, Canada

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Mon May 06, 2019 11:13 am

a terabyte of debug email every 2 days??

wow. that's impressive.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Mon May 06, 2019 12:43 pm

wizardz wrote:
a terabyte of debug email every 2 days??

wow. that's impressive.

You haven't seen the logs from some of the systems I work with. :wink:
Nostalgia isn't what it used to be.
 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Mon May 06, 2019 1:50 pm

wizardz wrote:
a terabyte of debug email every 2 days??

Pretty easy when misconfigured really badly. :P
Victory requires no explanation. Defeat allows none.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 3:06 am

just brew it! wrote:
You haven't seen the logs from some of the systems I work with. :wink:

Hmm, a backup software product whose vendor was swallowed up in 2005 and spat out again in 2016?
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 6:17 am

Topinio wrote:
just brew it! wrote:
You haven't seen the logs from some of the systems I work with. :wink:

Hmm, a backup software product whose vendor was swallowed up in 2005 and spat out again in 2016?

Nope. A data storage product whose vendor was swallowed up in 2015 and is still being slowly digested...
Nostalgia isn't what it used to be.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 6:27 am

just brew it! wrote:
Nope. A data storage product whose vendor was swallowed up in 2015 and is still being slowly digested...

Ahh, fun :)
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
DragonDaddyBear
Gerbil Elite
Posts: 985
Joined: Fri Jan 30, 2009 8:01 am

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 7:12 am

Waco wrote:
wizardz wrote:
a terabyte of debug email every 2 days??

Pretty easy when misconfigured really badly. :P

Having touched mail systems (postfix and sendmail) briefly, I can say that's surprisingly easy to do. What shocks me more than anything is that someone didn't think to check the memory of processes or logs. That's a pretty basic troubleshooting step.

EDIT: I say that having just experienced an incident yesterday when I asked someone "did you look at the logs?" "No" was the reply. I haven't heard back since.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 7:24 am

On a couple of occasions I have been guilty of accidentally redirecting log output to a dynamically sized RAM disk. That's a ticking time bomb...

One of my co-workers can top that though. He configured a service that was running as root to send its log output to /dev/null; but the service did its own log rotation. Once it had written a certain amount of data to the "log", it renamed /dev/null. Hilarity ensued.
Nostalgia isn't what it used to be.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 7:32 am

Wow, that's a special one alright :o
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 7:42 am

...and this, boys and girls, is why it is a bad idea to run random services as root! :D

(Well, OK, one reason among several...)
Nostalgia isn't what it used to be.
 
chuckula
Minister of Gerbil Affairs
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 7:51 am

This is why you need Optane. With 12 TB of memory your misconfigured mail server can go for a full 24 days between crashes!
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 8:51 am

DragonDaddyBear wrote:
Having touched mail systems (postfix and sendmail) briefly, I can say that's surprisingly easy to do. What shocks me more than anything is that someone didn't think to check the memory of processes or logs. That's a pretty basic troubleshooting step.

EDIT: I say that having just experienced an incident yesterday when I asked someone "did you look at the logs?" "No" was the reply. I haven't heard back since.

They attempted, but when they caught the system misbehaving it was too far gone to figure out wtf was going on. Couldn't even fork a process. :)

Since it was a recent standup and still being configured, none of the typical log forwarding / monitoring had been configured yet.
Last edited by Waco on Tue May 07, 2019 1:22 pm, edited 1 time in total.
Victory requires no explanation. Defeat allows none.
 
wizardz
Gerbil
Posts: 93
Joined: Thu Nov 02, 2006 12:58 pm
Location: Montreal, Canada

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 1:04 pm

just brew it! wrote:
wizardz wrote:
a terabyte of debug email every 2 days??

wow. that's impressive.

You haven't seen the logs from some of the systems I work with. :wink:

Working in a small shop here (less than 100 employees) and our ENTIRE storage infrastructure is less than 10TB... backups and all..
Production data is probably ~1TB.

I'm somewhat jealous about the kind of systems some of you are working with.
 
Aranarth
Graphmaster Gerbil
Posts: 1435
Joined: Tue Jan 17, 2006 6:56 am
Location: Big Rapids, Mich. (Est Time Zone)
Contact:

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 1:49 pm

quote="wizardz"]
just brew it! wrote:
wizardz wrote:
a terabyte of debug email every 2 days??

wow. that's impressive.

You haven't seen the logs from some of the systems I work with. :wink:

Working in a small shop here (less than 100 employees) and our ENTIRE storage infrastructure is less than 10TB... backups and all..
Production data is probably ~1TB.

I'm somewhat jealous about the kind of systems some of you are working with.[/quote]

God I wish I had 10tb of storage.
If I totted up all my storage from all my servers I MIGHT hit 1 TB.
I just bought 10x256gb thumb drives because people have HUGE (50gb) email archives that I can't backup to the server.
Then I realized that 2.56TB of thumb drives is about 10 times the storage on one of servers with a raid array... :-/
Main machine: Core I7 -2600K @ 4.0Ghz / 16 gig ram / Radeon RX 580 8gb / 500gb toshiba ssd / 5tb hd
Old machine: Core 2 quad Q6600 @ 3ghz / 8 gig ram / Radeon 7870 / 240 gb PNY ssd / 1tb HD
 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 2:06 pm

I didn't ever think I'd have more crap on my NAS than some whole business shops with 100+ employees have in production.
Victory requires no explanation. Defeat allows none.
 
Usacomp2k3
Gerbil God
Posts: 23043
Joined: Thu Apr 01, 2004 4:53 pm
Location: Orlando, FL
Contact:

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 2:15 pm

Waco wrote:
I didn't ever think I'd have more crap on my NAS than some whole business shops with 100+ employees have in production.

I have more crap on my NAS than our Business Unit has total (outside of engineering drawings/models which I have no idea what the size is).
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 2:20 pm

My home file server is currently ~17TB.
Nostalgia isn't what it used to be.
 
SuperSpy
Minister of Gerbil Affairs
Posts: 2403
Joined: Thu Sep 12, 2002 9:34 pm
Location: TR Forums

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 2:41 pm

I think my home network is sitting around the 26TB mark between a 14TB freeNAS/plex server, an 8TB backup freeNAS server, and a 4-ish TB freeNAS scratch machine.

Work's primary and secondary backup servers are 40TB and 28TB but that's primarily because I over-specced them for projects that of course got cancelled like a month after I setup the machines.
Desktop: i7-4790K @4.8 GHz | 32 GB | EVGA Gefore 1060 | Windows 10 x64
Laptop: MacBook Pro 2017 2.9GHz | 16 GB | Radeon Pro 560
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 4:07 pm

Are we talking raw or formatted here?

I'll go raw: home = 11.2, work = 177.6 + 48 + 48 + 96 but it's old cruft and I'm working replacing most of it in the next 3m hopefully.
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 4:24 pm

I was quoting formatted, including RAID overhead.
Nostalgia isn't what it used to be.
 
MileageMayVary
Gerbil XP
Posts: 370
Joined: Thu Dec 10, 2015 9:18 am
Location: Baltimore

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 5:03 pm

I love what the culprit turned out to be.

The storage team at my work is setting up a 1 PB all flash NetApp cluster this year.
Main rig: Ryzen 3600X, R9 290@1100MHz, 16GB@2933MHz, 1080-1440-1080 Ultrasharps.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 5:45 pm

MileageMayVary wrote:
The storage team at my work is setting up a 1 PB all flash NetApp cluster this year.

FWIW one of the managers at work mentioned today that one of our larger customers has a data "ingest rate" in excess of 1 PB per day. :o

(And no, the array isn't all flash in this case...)
Nostalgia isn't what it used to be.
 
Waco
Maximum Gerbil
Topic Author
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Amusing bug in RHEL 7.4 - [UPDATED]

Tue May 07, 2019 6:03 pm

just brew it! wrote:
MileageMayVary wrote:
The storage team at my work is setting up a 1 PB all flash NetApp cluster this year.

FWIW one of the managers at work mentioned today that one of our larger customers has a data "ingest rate" in excess of 1 PB per day. :o

(And no, the array isn't all flash in this case...)

One of my in-house designed systems averaged 1 PiB per day of real data for 3 weeks straight when a particular user unleashed hell on us. Singular user with a mission.

Another user who shall remain unnamed managed to nearly fill a pair of 40 PiB filesystems in a workday. Things move quickly at 1 TB/s+.
Victory requires no explanation. Defeat allows none.

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On