Personal computing discussed

Moderators: renee, SecretSquirrel, just brew it!

 
Aranarth
Graphmaster Gerbil
Topic Author
Posts: 1435
Joined: Tue Jan 17, 2006 6:56 am
Location: Big Rapids, Mich. (Est Time Zone)
Contact:

Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 12:57 pm

First let me say I SUCK at programming.
I can usually read and modify basic scripts but suck at writing them.
I'm hoping that someone can put together a quick script that I can modify later.

Here's the issue...
I get a daily .csv file from our webserver that includes every file a person has accessed the day before.
Unfortunately this list has items I don't need logged, files are listed in order of usage (greatest to least), and in some cases the same file is listed twice because of letter case, and if a file is not accessed at all it is not listed with a zero.
Each day this log has items in a different order or items may not be listed at all if the usage was zero.

How to grab each file (item) listed in the log that I do need, check for duplicates (and add them together), and put the results in the correct order?

I don't necessarily need to update the running log directly as part of the script if I can just copy and paste from another file into the Excel spreadsheet. (2003)

Currently I spend about 15 minutes a day on this to around an hour on Mondays since there are three days to enter.

Now my Boss does not want just the stats for the little website but the stats for the big one as well which may take 2 hours or more to sort though the whole thing.

Data is listed pretty simply as:

"Filename" "Number of times accessed"
Main machine: Core I7 -2600K @ 4.0Ghz / 16 gig ram / Radeon RX 580 8gb / 500gb toshiba ssd / 5tb hd
Old machine: Core 2 quad Q6600 @ 3ghz / 8 gig ram / Radeon 7870 / 240 gb PNY ssd / 1tb HD
 
steelcity_ballin
Gerbilus Supremus
Posts: 12072
Joined: Mon May 26, 2003 5:55 am
Location: Pittsburgh PA

Re: Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 1:20 pm

Did you just want an .exe that would do this? Can you provide a link to an example file?
 
UberGerbil
Grand Admiral Gerbil
Posts: 10368
Joined: Thu Jun 19, 2003 3:11 pm

Re: Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 1:25 pm

steelcity_ballin wrote:
Did you just want an .exe that would do this? Can you provide a link to an example file?
It sounds like the OP is just asking for an Excel macro. This doesn't sound particularly complicated (the devil is always in the detail of course). It's unclear if this is supposed to update an ongoing permanent record (ie accumulate the numbers over time), or just produce a one-off report each day. The former would actually argue for a database app, though a lot of bosses want to see everything in Excel whether it's appropriate or not.
 
Aranarth
Graphmaster Gerbil
Topic Author
Posts: 1435
Joined: Tue Jan 17, 2006 6:56 am
Location: Big Rapids, Mich. (Est Time Zone)
Contact:

Re: Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 2:08 pm

UberGerbil wrote:
steelcity_ballin wrote:
Did you just want an .exe that would do this? Can you provide a link to an example file?
It sounds like the OP is just asking for an Excel macro. This doesn't sound particularly complicated (the devil is always in the detail of course). It's unclear if this is supposed to update an ongoing permanent record (ie accumulate the numbers over time), or just produce a one-off report each day. The former would actually argue for a database app, though a lot of bosses want to see everything in Excel whether it's appropriate or not.


Correct I'm looking for a vb script to add to an ongoing record.

I'm hoping to get it as a vbscript in the hopes that I can understand the gist and modify it to my current needs and make any changes I need in the future.

I do tech support mostly. I just happen to have the "free time" to do the parsing manually.

Here is a sample if the input file:

############################################################
# Report: name of website....
# Report Name: Requested Pages
# Date Range: 04/18/2011 - 04/18/2011
############################################################
filename1 548
filename2 342
filename3 338
FileName3 330
Filename4 101
filename5 55
filename10 20


(as you can see files 6-9 are missing and one is repeated)

And tomorrow those filename may be in a completely differnet order or even missing with new ones in their place.
There are about 130 files total and I need to get info on about 1/3 or so of them.
Main machine: Core I7 -2600K @ 4.0Ghz / 16 gig ram / Radeon RX 580 8gb / 500gb toshiba ssd / 5tb hd
Old machine: Core 2 quad Q6600 @ 3ghz / 8 gig ram / Radeon 7870 / 240 gb PNY ssd / 1tb HD
 
steelcity_ballin
Gerbilus Supremus
Posts: 12072
Joined: Mon May 26, 2003 5:55 am
Location: Pittsburgh PA

Re: Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 3:01 pm

Should be able to do this with an array (a dictionary in your case). Use the space as the delimiter for array items 0,1 (filename, numTimesAccessed) then populate your array and sort, then loop to combine dupes. Since this would be VBA though, and not VB (vba has no support for sorting) you'd probably have to macro the data to actually populate a worksheet and then sort it via excel's functionality.

http://stackoverflow.com/questions/1309 ... ray-in-vba I wrote something a few years back that basically did this but with more data, if I can dig it up It wouldn't be any trouble to modify it for your uses - I'll bug IT to see if my old machine is still around. In the mean time try using link and learn a little about the dictionary with respect to keys and values. Essentially you read in your data to a dictionary giving it a key(filename) and a value (numTimesAccessed). Then, for each item in your file, if your item already exists in the dictionary you've created, update it's value +numTimesAccessed, otherwise add it as a new entry. When you're all done you should be able to spit it all back out with another loop, and in your case display it too the worksheet in excel. Hope that gets you started!
 
Aranarth
Graphmaster Gerbil
Topic Author
Posts: 1435
Joined: Tue Jan 17, 2006 6:56 am
Location: Big Rapids, Mich. (Est Time Zone)
Contact:

Re: Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 3:46 pm

Yep that's what I thought but I have no idea how the write the code.
I can read it and modify it to add or remove items etc. but the actual writing I can't do.
(I've tried and my creative juices just don't run that way.)

So how do I load the file I'm looking at and then write back the data after its been massaged?

btw I looked at the code in the link and it was so far over my head I did not even hear it go by... :(
Main machine: Core I7 -2600K @ 4.0Ghz / 16 gig ram / Radeon RX 580 8gb / 500gb toshiba ssd / 5tb hd
Old machine: Core 2 quad Q6600 @ 3ghz / 8 gig ram / Radeon 7870 / 240 gb PNY ssd / 1tb HD
 
UberGerbil
Grand Admiral Gerbil
Posts: 10368
Joined: Thu Jun 19, 2003 3:11 pm

Re: Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 4:03 pm

If you record a macro to do the first couple of steps you'll see something like:
    Workbooks.Open Filename:=  "C:\whatever\testdata.csv"
    Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _
        Semicolon:=False, Comma:=False, Space:=True, Other:=False, FieldInfo _
        :=Array(Array(1, 2), Array(2, 1)), TrailingMinusNumbers:=True
That will suck in the data and parse it into columns; you then need to remove the header junk (if you can be absolutely certain it's the same number of lines every time you could just blindly delete them, but if they always begin with a # character it's much safer to just iterate row by row deleting them). You can UCase the fieldname column to make comparisons easier, and then just iterate through the rows looking for (and consolidating) dupes.

At least, that's the Excel-centric way to do it. Personally, as a more traditional coder, I'd probably do it the way steelcity suggests, by using an in-memory data structure (ie a dictionary) and doing all the work in code before spitting it out into Excel.

I'm kind of pressed for time at the moment but I may be able to do a rough pass at something like this later.
 
PrecambrianRabbit
Gerbil
Posts: 46
Joined: Mon Jun 14, 2010 1:04 pm

Re: Need help parsing a log file in .csv to excell

Tue Apr 19, 2011 10:39 pm

If you can use Perl, this is really easy. You can run the following program on the log file, and it'll produce output that you can open as a CSV. Seriously, everyone should know Perl ;-).

while (<>) {
# skip lines starting with '#'
if (/^#/) {
next;
}
# split the line on whitespace into filename and count
my ($filename, $count) = split(/\s+/, $_);
# Convert filename to lowercase and add to a table
$datatable{lc($filename)} += $count;
}

# Output all records in CSV format, sorted by filename
foreach my $filename (sort keys %datatable) {
print "$filename,$datatable{$filename}\n";
}
 
Aranarth
Graphmaster Gerbil
Topic Author
Posts: 1435
Joined: Tue Jan 17, 2006 6:56 am
Location: Big Rapids, Mich. (Est Time Zone)
Contact:

Re: Need help parsing a log file in .csv to excell

Wed Apr 20, 2011 7:16 am

Thanks for your help guys.

These last two pieces of code are pretty obvious to me.

I had no idea perl was such a straightforward language. I'll have to look into it.

The only thing is that I also need to test for those items that do not exist while discarding the items I don't need.

I was thinking something along the lines of:

open log file
Create variable "filename1" and set the number to Zero
look for filename1 and add its value to variable filename1
continue looking for any other entries of the same name ignoring case and add to variable.
Write variable name and value to end of output file.
Repeat for variable filename2, filename3, etc.

This will give me output with the filenames in the right order and still give me zeroes for those items not in the log.
(It looks like that is exactly what the perl script does...)

I know its not elegant but its straight forward easily modifiable code to me.
Main machine: Core I7 -2600K @ 4.0Ghz / 16 gig ram / Radeon RX 580 8gb / 500gb toshiba ssd / 5tb hd
Old machine: Core 2 quad Q6600 @ 3ghz / 8 gig ram / Radeon 7870 / 240 gb PNY ssd / 1tb HD
 
notfred
Maximum Gerbil
Posts: 4610
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Re: Need help parsing a log file in .csv to excell

Wed Apr 20, 2011 8:17 am

Your method will take a lot longer than the Perl as it is going through the log file once for each entry rather than once in total - depending on the logfile size you could be looking at hours rather than seconds. All you actually need to change in the Perl script if you want all the filenames in there in order is to pre-populate the array with the filenames in the order you want, you could even read that in from a separate file. I'm an Awk hacker rather than a Perl hacker so I'll leave that for someone else to give you the code to do it otherwise I'm likely to make some mistakes.
 
Aranarth
Graphmaster Gerbil
Topic Author
Posts: 1435
Joined: Tue Jan 17, 2006 6:56 am
Location: Big Rapids, Mich. (Est Time Zone)
Contact:

Re: Need help parsing a log file in .csv to excell

Wed Apr 20, 2011 8:52 am

PrecambrianRabbit wrote:
If you can use Perl, this is really easy. You can run the following program on the log file, and it'll produce output that you can open as a CSV. Seriously, everyone should know Perl ;-).

while (<>) {
# skip lines starting with '#'
if (/^#/) {
next;
}
# split the line on whitespace into filename and count
my ($filename, $count) = split(/\s+/, $_);
# Convert filename to lowercase and add to a table
$datatable{lc($filename)} += $count;
}

# Output all records in CSV format, sorted by filename
foreach my $filename (sort keys %datatable) {
print "$filename,$datatable{$filename}\n";
}


Ok I've installed the strawberry command interpreter.
How do I tell the script which file to load?

I would assume it is something like (c: prompt)> logscript.pl | logfile.csv > output.csv?
Or is that something that needs to be added to the script?
How do I populate the datatable?

Thanks for your help Rabbit (and the rest of you guys.)
Main machine: Core I7 -2600K @ 4.0Ghz / 16 gig ram / Radeon RX 580 8gb / 500gb toshiba ssd / 5tb hd
Old machine: Core 2 quad Q6600 @ 3ghz / 8 gig ram / Radeon 7870 / 240 gb PNY ssd / 1tb HD
 
PrecambrianRabbit
Gerbil
Posts: 46
Joined: Mon Jun 14, 2010 1:04 pm

Re: Need help parsing a log file in .csv to excell

Wed Apr 20, 2011 10:52 pm

Aranarth wrote:
Ok I've installed the strawberry command interpreter.
How do I tell the script which file to load?

I would assume it is something like (c: prompt)> logscript.pl | logfile.csv > output.csv?
Or is that something that needs to be added to the script?
How do I populate the datatable?

Thanks for your help Rabbit (and the rest of you guys.)


I've never used Strawberry and haven't used the Windows shell in about 10 years, but I'd expect the syntax to be:
logscript.pl logfile.csv > output.csv

The datatable gets populated as the script reads the log - the line:
$datatable{lc($filename)} += $count

creates an entry in the table with the filename as the key and the count as the value.
 
Aranarth
Graphmaster Gerbil
Topic Author
Posts: 1435
Joined: Tue Jan 17, 2006 6:56 am
Location: Big Rapids, Mich. (Est Time Zone)
Contact:

Re: Need help parsing a log file in .csv to excell

Thu Apr 21, 2011 8:05 am

thanks...

I also had to spend some time finding and installing the text/csv.pl library.
Its setup kind of confusing till you realize they have made every sound a lot more complicated that it needs to be.
Just copy the package to the right place in the directory structure and there you go.

I was playing around with csv data structures a little bit yesterday but didn't get very far because this was missing.

Once I got this working I just need to figure how to get the information in the order I want it.

So far got "hello world" to work! :D
Main machine: Core I7 -2600K @ 4.0Ghz / 16 gig ram / Radeon RX 580 8gb / 500gb toshiba ssd / 5tb hd
Old machine: Core 2 quad Q6600 @ 3ghz / 8 gig ram / Radeon 7870 / 240 gb PNY ssd / 1tb HD
 
morphine
TR Staff
Posts: 11600
Joined: Fri Dec 27, 2002 8:51 pm
Location: Portugal (that's next to Spain)

Re: Need help parsing a log file in .csv to excell

Thu Apr 21, 2011 9:09 am

Not to bash VB by itself (it has its place), but learning Perl/PHP/Python/Ruby/whatever is always a good idea, because you can create these small-but-useful scripts that save you hours of work.
There is a fixed amount of intelligence on the planet, and the population keeps growing :(
 
Aphasia
Grand Gerbil Poohbah
Posts: 3710
Joined: Tue Jan 01, 2002 7:00 pm
Location: Solna/Sweden
Contact:

Re: Need help parsing a log file in .csv to excell

Thu Apr 21, 2011 10:53 am

And its not like there is too few resources for others that have done the same stuff earlier. I havent learned perl, but having worked quite abit with logs, especially squid, and also some log analyzer tools, there isnt much you cant do if you want to.

Even Microsofts log parser gets the work done if you doesnt have access to anything else. But the good thing with perl is that you can usually find scripts to do what you want, say exchanging default squid time format for human readable, etc. And if there isnt perl, theres usually awk or other things available. But yeah, I've been thinking of getting to know awk and perhaps perl someday, when I can find the time for it.
 
ShadowTiger
Gerbil First Class
Posts: 125
Joined: Wed Oct 01, 2008 2:39 pm

Re: Need help parsing a log file in .csv to excell

Tue Apr 26, 2011 12:34 am

I think everyone should learn a scripting language. Perl is pretty easy to learn but it is easy to write difficult to read code with it. I prefer Python because it is simple to read and write.

Anyways, I think there is an easier way to achieve what you want using VB.

First of all, excel can import a csv and turn it into a spreadsheet. What you can do is import the CSV each work into Page 2. Then run a script that scans all of the rows (using a for each loop) in Page 1 (this should be all of the important ones) and see if there is a corresponding value in Page 2. If there is, add them together and save it in Page 1.

Since you don't know how to script this could take you several hours to figure out, but i suspect this would only be 10 lines of code.

However, VB is pretty limited, and Perl is much more flexible and useful IMO. It also allows you to automate your tasks using shell scripts.

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On