Welch wrote:Thanks for the very detailed explanation Maxx, this is exactly the kind of info I was hoping for
.
I can tell you that at this point I'm using Avast IS. I'm surprised to hear that the OS of the host scan machine would play a major role in scan times. I would have thought all relevant information from the AV would be loaded into memory and be ran from there. There is no doubt in my mind about the speed benefits of SSDs in general, I'm just curious what most popular AV software actually do step-by-step in order to process a file, especially when it comes across something it has to decompress. Care to elaborate a bit more and possible explain the steps the software would generally take and why its more advantageous to have faster cores not more? As for Cache and cache speed, that makes sense to me as instructions can be called on very faster not having to go through the rest of the systems slower resources (even system memory).
So I'd assume a Core i3-4130 with 2x4GB of ram and a high IOPS(4k) SSD would drastically decrease the time of scans compared an "8-core" AMD equivalent with the same RAM and SSD?
This seems like something that would be very interesting for TR to do a write up on, considering most of these parts are something already in house. It would provide a lot of real world data for techs like myself who want to build a machine with a very specific purpose.
Sure..
filescanning comes a few forms, Hash, reputation and behavior analytics.
Hashing is the #1 method of av scanning for any and all vendors. this involves the "definitions" "pattern files" "check files" what ever the vendor labels them as which are large databases of md5/sha1 hash lists of known bad things.
so to run this hash scan which is mostly what you are doing.. the scanner needs to determine a path of action.
--is the file a container i recognize ? (zip, rar, jar, class, eml, msg, msi, cab, spreadsheet, word file, power point, flash file etc...
**if it is not, it goes straight off and gets the md5/sha1 of that file and looks it up in the local database of the scanner (held in the pattern files\definitions files).. if it gets a reasonable hit. its flagged as bad
**if it is a file that can be extracted as a container object it "decomposes" or extracts the object out *after* determining how many layers and how many files it will have to scan. This extraction will always be to the install directory of the application (scanner) or a temp folder on the operating system drive specified by the scanner on install. this extraction is VERY IO intensive a it is literally like extracting a zip.. then running the hash scan on those objects.
But the catch is this.. If it is a nested zip, or a nested mime container (zips in zips.. or word docs with word docs embedded in the doc) those objects must be extracted themselves and then hash scanned as well. The extractions impact on the cpu but not as much as you'd think. But a faster cpu will help with the extraction times. But the biggest impact will be the SSD for the temp work. Any file that must be extracted that is not scanned "in memory" will have to be extracted "too" the scanners system drive then scanned.
Reputation: If the scanner does reputation lookups (sonar, download insight etc) these are not as disk intensive, there is a read to get the hash but that has is then transmitted to the reputation servers for a lookup there.
Behavior: Behavior scanning is also very cpu and disk intensive.. *but* only if the file is executed. Behavior is a runtime scan for api calls, file and folder touches that might clue the scanner into a bad file doing suspicious things (like messing with the host file).
Back to what your doing, scanning mounted drives.
#1 issue is the disk being mounted in a usb cradle, while it may have a fast transfer rate... the issue is the transfer rate is for the largest file transferred and not for multiple small file read and writes. if you use HDtune you will find a raptor drive in a usb3 cradle will have a much lower trasnfer rate and much lower IOp/s rate as compared to a Esata mounted drive. Esata is hardware accelerated .. usb is not and is cpu dependent and will not be able to keep up even with a great cpu.
So eliminate the 1st bottle neck in the chain of data handling. get better IOps and lower cpu usage by getting off the usb channel.
#2, Second bottle neck will be the host drive of the scanner. with a platter drive your averaging 70-120 ops a second. compared to 15000k for some of the cheapest 128 meg ssds out there. anytime you need to have the scanner transfer data from the guest drive to the host OS drive you will be waiting on the drives IOp rated writes .. faster is better. sooner it is written then we can scan it which is also disk IOp heavy so again limited by the 70-120 ops of a platter setup.
#3, cpu speed. if you pop open task manager during a scan, you will find (at least on my 4 core systems and VMs that i do AV testing on..) that the scanner will not peg the cpu .. not even close. this is because really most scanners (our included) may be multithreaded.. .but beyond 2 cpus little coding has been done. also.. most vendors keep the scanner "in safe mode" for scanning.. they limit the number of cycles that can be used so as to not impact the host. since your host *IS* a dedicated scanner, see if there is a performance slider to turn up. Ours has a slider for "application performance and the other side is scan performance" .. it is much faster when set to scan performance.. but even then it wont peg my vmware guest systems to protect the user. however.. even though it is limited to x# of cycles... the speed of those cycles is controlled by the speed of the cpu.. so faster ghz... faster execution of what it is doing. (hashing)
my recommendation .. quad core amd, quad core intel at 3.5 ghz. go full fat, no celerons or cheapo Athlons.. you want the bigger l3 and bigger l2 caches to maximize cache hits .. (fyi 8230 or 8350 is what i mean by a amd quad core)
#4 memory .. 2 gigs is sufficient.. but 4 is better and 8 is a waste. for us at least we load not into user space ram but -nonpage-pool space. so it is limited in what it can load there. so gobs of ram wont help. * but * during extractions and the scan itself the scanner may need more resources to run. 4 gigs is exactly what you need.
Cybert said: Capitlization and periods are hard for you, aren't they? I've given over $100 to techforums. I should have you banned for my money.