TR Forums

Duct Tape Dude · Fri Jan 24, 2014 10:47 am

I'm going to have to spec out a decent NodeJS server soon, probably a 1U or maybe a 2U rackmount one. Here's the usage scenario:
-Several hundred to a few thousand requests/second (all server-to-server, I'm not serving up webpages)
-Up to a dozen NodeJS processes (around half of those will be CPU-constrained, and keep in mind NodeJS is single-threaded for the most part)
-Lots of small-file accesses (several times more reads than writes), totaling <500GB of storage
-As many nines of uptime as I can get (I'd rather just stick with a vendor for the most part instead of a fully DIY solution.)

Based on the above, I was figuring the following specs might be in order:
-Fast 4-6 core Intel CPU instead of a slower clocked 8-12 core CPU (maybe two of them, since we could do LGA 2011)
-Two SSDs in RAID 1 (Storage)
-Any boot drive (OS)
-16+ GB RAM

Let us suppose the target budget is around $5k.

Any opinions on this? What should I look out for, or what should I avoid? This is new to me, I've only ever specced out gaming rigs--never a production-grade server.

Fri Jan 24, 2014 11:43 am

Is there anything else other than NodeJS running on that server? A database instance, for example?

Duct Tape Dude · Fri Jan 24, 2014 11:50 am

Sort of. I rolled my own file-based key/value store, hence the loads of small file accesses (free caching from the OS, aww yiss). You can envision it as though it's a disk-heavy, RAM-light, and cpu-light database.

Other than that, no. NodeJS processes everywhere.

Fri Jan 24, 2014 12:14 pm

Then I think your specs should suffice.

Regarding the key/value store, not trying to tell you how to do your job, but it sounds like you'd be better of if you used a database of some sort, potentially coupling it with Memcache. The reason being that RDBMs tend to be much better at handling caching by keeping hotly accessed data in memory, you can take better advantage of indexes, etc.

Then when you use Memcache (if it's feasible), it flies.

Having said that, I have no idea what your app is built like so I may just be talking out of my bottom

JdL · Fri Jan 24, 2014 12:27 pm

morphine wrote:
Then I think your specs should suffice.

Regarding the key/value store, not trying to tell you how to do your job, but it sounds like you'd be better of if you used a database of some sort, potentially coupling it with Memcache. The reason being that RDBMs tend to be much better at handling caching by keeping hotly accessed data in memory, you can take better advantage of indexes, etc.

Then when you use Memcache (if it's feasible), it flies.

Having said that, I have no idea what your app is built like so I may just be talking out of my bottom

Regarding the k/v store, this is not necessarily good advice. I've done a similar thing using nodejs, where I load the entire k/v store db into memory and have a separate process which lazily persists the data when updated. It FLIES, and it's a whole heck of a lot less configuration and RAM use than any RDBMS + Memcache. The reason it works is because the app is NOT using most of the traditional RDBMS features and behaves as a simple key-value lookup.

As developers / administrators, having less configuration / apps running to deal with / maintain is almost as important as the functionality itself. KISS principle.

Spec-wise, it is ideal to have at least 1 CPU thread available per-nodejs-process, plus 1 or 2 free for OS / network resources. So make sure your CPU(s) can juggle enough threads.

Also - RAM is cheap right now. 16GB is low if your db (whatever it is) grows. SSD's will help mitigate RAM sizes, but then you're eating into your disk space.

Fri Jan 24, 2014 12:37 pm

JdL, that's correct, but the OP specifically said he's reading from disk, not from memory, with disk I/O being heavy. If he was just storing stuff in memory, then I'd never recommend another layer of complexity

Duct Tape Dude · Fri Jan 24, 2014 12:48 pm

morphine wrote:
Then I think your specs should suffice.

Alright, well at least I'm on the right track.

morphine wrote:
Regarding the key/value store, not trying to tell you how to do your job, but it sounds like you'd be better of if you used a database of some sort, potentially coupling it with Memcache. The reason being that RDBMs tend to be much better at handling caching by keeping hotly accessed data in memory, you can take better advantage of indexes, etc.

I'll defend the decision here for a second: All modern operating systems do the same exact thing automatically actually. Anything recently accessed is cached as long as there is RAM to spare. So we figured why not let the OS take care of caching for us, rather than setting up and maintaining a dedicated DB? All we need is a simple key/value store, so if that's not good enough then we might just add a Redis layer on top of that. Besides, finding a file is a B-tree lookup right? It's as fast as a database for large amounts of data. A filesystem is just a database at heart and we're looking to leverage that to our advantage since we don't need any heavy joining/filtering/etc.

But I do see what you mean. We're opting for simplicity at the cost of a slight performance penalty really. We're hoping it's negligible enough.

JdL wrote:
The reason it works is because the app is NOT using most of the traditional RDBMS features and behaves as a simple key-value lookup.

That's what we're hoping for. Maybe I'll look into pre-emptive caching like you describe, but our dataset can be several gigabytes and have several million entries.

JdL wrote:
As developers / administrators, having less configuration / apps running to deal with / maintain is almost as important as the functionality itself. KISS principle.

morphine wrote:
JdL, that's correct, but the OP specifically said he's reading from disk, not from memory, with disk I/O being heavy. If he was just storing stuff in memory, then I'd never recommend another layer of complexity

I'm replacing an old system that stores everything in memory, and is constantly out of memory. We're using the disk because scaling is much much easier, and as you both considered, it's simple.

JdL wrote:
Spec-wise, it is ideal to have at least 1 CPU thread available per-nodejs-process, plus 1 or 2 free for OS / network resources. So make sure your CPU(s) can juggle enough threads.

Noted.

JdL wrote:
Also - RAM is cheap right now. 16GB is low if your db (whatever it is) grows. SSD's will help mitigate RAM sizes, but then you're eating into your disk space.

Haha, I don't think any of my teammates are going to let me get away with less than 32GB from the sound of it. The K/V DB doesn't cache anything itself, I am cheating by relying on the filesystem. So additional RAM will help us out.

Flatland_Spider · Mon Jan 27, 2014 10:28 am

Duct Tape Dude wrote:
-As many nines of uptime as I can get (I'd rather just stick with a vendor for the most part instead of a fully DIY solution.)

Any opinions on this? What should I look out for, or what should I avoid? This is new to me, I've only ever specced out gaming rigs--never a production-grade server.

Lots of nines requires lots of redundancy, and lots of infrastructure. One server isn't going to give you much.
1) How redundant is the datacenter? Should be at least N+1.
2) How geographically dispersed are the systems? One datacenter getting wiped our shouldn't affect the uptime.
3) How redundant is the network?
4) How many backup systems do you have? At least one of them should be a hot backup that will allow everything to failover quickly.
5) Is the application properly segmented and abstracted?

As far as speccing out a production server, profile your workload first then spec the server to address the bottlenecks.

Memcache is just a key/value store, and all it does is cache stuff in memory. It doesn't need to be in front of an RBDMS; that's just it's most common use.

Duct Tape Dude wrote:
I'm replacing an old system that stores everything in memory, and is constantly out of memory. We're using the disk because scaling is much much easier, and as you both considered, it's simple.

Look into cgroups, if you're using Linux. (https://en.wikipedia.org/wiki/Cgroups)

Duct Tape Dude · Mon Jan 27, 2014 10:46 am

Flatland_Spider wrote:
Lots of nines requires lots of redundancy, and lots of infrastructure. One server isn't going to give you much.
1) How redundant is the datacenter? Should be at least N+1.
2) How geographically dispersed are the systems? One datacenter getting wiped our shouldn't affect the uptime.
3) How redundant is the network?
4) How many backup systems do you have? At least one of them should be a hot backup that will allow everything to failover quickly.
5) Is the application properly segmented and abstracted?

Well given our budget, we're not going to be building a new world-class datacenter, much less two geographically diverse ones. I didn't ask for lots of nines, just as many as we can feasibly get.
1) I don't know what this means but it sounds very much out of our control.
2) The system(s) will probably be close together, or at least in the same region. That's about the best we can do for our dedicated hardware since this will be deployed at a partner's datacenter.
3) I believe it's just a 1Gbps line with a failover to one or two 300Mbps lines or something. Out of our control anyway.
4) We're planning on one backup to start.
5) It is as modular as I can feasibly make it. This is the part we have the most control over.

Flatland_Spider wrote:
As far as speccing out a production server, profile your workload first then spec the server to address the bottlenecks.

Memcache is just a key/value store, and all it does is cache stuff in memory. It doesn't need to be in front of an RBDMS; that's just it's most common use.

I see. That's what I am hoping to do... NodeJS is single-thread heavy by default and we're going to be doing a ton of small-file accesses. Memcache sounds pretty nice, we looked at Redis earlier which seems very similar.

Flatland_Spider wrote:
Look into cgroups, if you're using Linux. (https://en.wikipedia.org/wiki/Cgroups)

Interesting... will do! We are in fact deploying on Linux since it's NodeJS. I think Node has several limits built-in as well.

Duct Tape Dude · Tue Jun 03, 2014 8:42 am

*Several months later*

So the IT guy at our company decided to get us a Dell R620 (1U) server. We ended up with the following specs:

RAM: 16GB
CPU: Dual 8-core/16-thread Xeons: E5-2640 v2 (IVB) @ 2.0GHz (base), 95W, 20MB cache
Disk: Dual Intel 400GB Enterprise SSDs (not sure which gen) in RAID 1

On the software side, I ended up multithreading the most intensive processes, so I'm hoping we can use all of that. The end cost was quite a bit higher than expected because the Intel SSDs are ridiculously pricey.
As a noob to the enterprise world, it's pretty cool seeing how dense SSD and CPU horsepower can get, and also how small the 750W PSUs are! They only output 12V but they are very efficient and tiny. Neat stuff.

Tue Jun 03, 2014 10:25 am

Stop teasing us

Buub · Fri Jun 06, 2014 1:03 am

Duct Tape Dude wrote:
Sort of. I rolled my own file-based key/value store

Why not just download and use something like MongoDB? It's super easy to install and get running, and very likely both higher performance and less buggy than anything you roll your own.

Buub · Fri Jun 06, 2014 1:09 am

morphine wrote:
Regarding the key/value store, not trying to tell you how to do your job, but it sounds like you'd be better of if you used a database of some sort, potentially coupling it with Memcache. The reason being that RDBMs tend to be much better at handling caching by keeping hotly accessed data in memory, you can take better advantage of indexes, etc.

Then when you use Memcache (if it's feasible), it flies.

While I agree with you he should just use an off-the-shelf database, adding memcache right out of the gate seems overly complex. A good database is going to do aggressive caching in memory already. Memcache is sometimes more about moving data closer to where it's being consumed, for example, having it close by on a local network, rather than on something over a remote network. Also it can be about caching computed (or joins or views) data that is expensive to query frequently. But these are optimizations, something you shouldn't bother with until you need them. Believe it or not, there are scenarios where adding the extra Memcache step actually makes things slower, and there are many scenarios where it makes it less accurate.

Buub · Fri Jun 06, 2014 1:15 am

Duct Tape Dude wrote:
morphine wrote:
morphine wrote:
Regarding the key/value store, not trying to tell you how to do your job, but it sounds like you'd be better of if you used a database of some sort, potentially coupling it with Memcache. The reason being that RDBMs tend to be much better at handling caching by keeping hotly accessed data in memory, you can take better advantage of indexes, etc.

I'll defend the decision here for a second: All modern operating systems do the same exact thing automatically actually. Anything recently accessed is cached as long as there is RAM to spare. So we figured why not let the OS take care of caching for us, rather than setting up and maintaining a dedicated DB? All we need is a simple key/value store, so if that's not good enough then we might just add a Redis layer on top of that. Besides, finding a file is a B-tree lookup right? It's as fast as a database for large amounts of data. A filesystem is just a database at heart and we're looking to leverage that to our advantage since we don't need any heavy joining/filtering/etc.

But I do see what you mean. We're opting for simplicity at the cost of a slight performance penalty really. We're hoping it's negligible enough.

The filesystem is optimized for completely different access and usage patterns than accessing data. A database is highly optimized for its task, a filesystem is a general-purpose solution for all things file related. A database will have highly efficient management of indexes, plus will give you things like atomic updates of data and redundancy if you want to improve that uptime.

But ultimately I would suggest this: download something like MongoDB and write a quick and dirty test on top of it, then write the same test on your custom filestore, and see how the performance compares. You can't really discuss which is better without actual data.

Duct Tape Dude · Sat Jun 07, 2014 10:07 pm

You're definitely right--we should try this out in a formal database or document store at some point. I know we're on this again but so far the main reasons we haven't chosen a database or similar is because:

1) We don't have to do any queries beyond a simple key:value lookup (no joins, etc)
2) Our data:RAM ratio is kind of large, we will have on the order of 10^7 "keys" (eventually this will be just shy of 10^8), each with 10^1 subvalues (eventually this will be 10^2), and each of those will have 10^2 values underneath that.
3) So basically, we have 10^10 to 10^12 pieces of data on a system with a modest amount of memory that must be accessed in under 100ms, 1000+ times every second.

Given these constraints, we've steered clear of memory-mapped solutions such as Redis and MongoDB so far because both stutter after they run out of memory (redis is purely in-memory, MongoDB is memory-mapped). A cache, however, might be a great idea.

I know the trend these days is to move away from the archaic filesystem, but honestly there are so many things the filesystem does for us that we would want from our "ideal" database. A modern filesystem implementation has:
-Very fast lookup
-Read caching
-B-tree lookups
-Lightweight on memory (barring ZFS, which is typically more optimized for resiliency instead of performance)
-No files-per-folder limitations

However, maybe a database would bring us better results for the following:
-Read caching
-Query caching
-Related data

Sure, we should try it out sometime. But at the moment we've got something that's efficient (at least space-wise) and works very well for what we need.

TR Forums

NodeJS Server Suggestions

NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Re: NodeJS Server Suggestions

Who is online