Database performance help please

From Visual Basic to GNU C, this is the place to talk programming.

Moderators: SecretSquirrel, just brew it!

Database performance help please

Postposted on Tue Oct 06, 2009 7:56 pm

Hi all,
I have written an application which accumulates lightning strike data, and saves it to an Access-type database. The problem is over the past 2 years it has gotten rather large, 1.45G to be more precise. 3 days worth of data logging can net me another 123K worth of strikes, and this is on a non stormy day (strikes all the way in Africa from NY).

So needless to say, when I start the application and load in the initial data which consists of the past 6 days for a little history, it takes a LOOOONNNG time to load, upwards of several minutes. Usually 300K records are returned. All time is spent in the last line of the following:
Code: Select all
                String SQL = getEventGatheringSQL(startTime, stopTime, m_DatabaseThreshold);

                myAdapter = new OleDbDataAdapter(SQL, mainConnection);

                myDataset = new DataSet();
                //Finally pull the data into the dataset...
                myAdapter.Fill(myDataset);


The SQL string generator looks like this:

Code: Select all
            String SQLString =
                "SELECT * FROM LightningEvents" +
                " WHERE (([eventTime] > #" + startTime + "#)" +
                " AND ([eventTime] < #" + stopTime + "#))";


Meanwhile one core of the CPU is pegged while loading, but the hard disk is not. Seems something is limiting performance, and I would like to learn how to improve load times.

I am currently connecting to the database using a basic file connection. Is there a better, higher-performance way I should be doing this, like creating a connection through the control panel, and somehow using that? If that would increase my performance, I would appreciate knowing how to build up a connection string, etc.

But then I have the problem of making an installer for dummies that can get this set up for them.

I am now in the process of adding a threshold that a user can set to prevent huge amounts of data being saved, so only larger, more powerful strikes are saved in the database. This should drop the daily rate from 40K down to <4K or better. Cleaning up the old database may be a struggle though.

-Mark
liquidsquid
Minister of Gerbil Affairs
 
Posts: 2452
Joined: Wed May 29, 2002 10:49 am
Location: New York

Re: Database performance help please

Postposted on Tue Oct 06, 2009 10:08 pm

This is probably because you don't have an index on the [eventTime] column. Adding an index on that column may do the trick. See this link for details:
http://office.microsoft.com/en-us/acces ... 71033.aspx
b_naresh
Gerbil First Class
 
Posts: 177
Joined: Wed Mar 22, 2006 4:25 pm

Re: Database performance help please

Postposted on Tue Oct 06, 2009 10:14 pm

Is the database table indexed at all? You really need to have an index on the eventTime column if you're going to run queries like that against a large table. Edit: Looks like b_naresh beat me to that one.

I'm not entirely sure what you mean by "Access-type" -- is it an Access database or not? In general, Access is not very good at handling large data sets; if indexing doesn't help, I don't think changing the way you connect to the database is going to fix things.

If it is still too slow after adding an index, my recommendation would be to migrate the data to a real database engine. SQL Server is the obvious choice if you're more comfortable sticking with a Microsoft product (there's a free "Express" edition available); there are also some good cross-platform Open Source databases available (MySQL and PostgreSQL).
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38142
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Database performance help please

Postposted on Tue Oct 06, 2009 11:20 pm

I also recommend upgrading. I imagine Microsoft has a decent and painless wizard for upgrading to Sql Server Express. After that happens, several things start working in your favor to improve retrieval times (eg, stored procedures, 64-bit, cached execution, more efficient .NET data adapters). Your also approaching Access's 2GB file size limit.

In the meantime, I assume you've run the 'Compact & Repair' routine in Access. That silly thing usually makes a dramatic improvement -but you have to make sure no one will want to write or read data for ~5 minutes.

Regardless if you use MySQL, Access or Sql Server, check if you are storing unnecessarily large data types. Also, do you really need to retrieve every column (ie, "SELECT * FROM LightningEvent") and every row between those two times? Is there a smaller subset that satisfies the requirements?
wibeasley
Gerbil Elite
Gold subscriber
 
 
Posts: 952
Joined: Sat Mar 29, 2008 3:19 pm
Location: Norman OK

Re: Database performance help please

Postposted on Wed Oct 07, 2009 6:32 am

Thanks all, this gives me some help.

First off thanks for the recommendation of indexing on startTime, I don't know why I didn't think of that other than perhaps I did not have a startTime when I started. Instead I have an Index, which only has the benefit of telling me how many entries are in the database. For the most part unused except for diagnostics. At best I will get a 2x or better gain from that while reducing database size.

The records themselves are small, and I am using Access (*.MDB) files out of convenience and lack of database experience. Also partly due to the fact it will hopefully become a consumer/hobbyist product and I can't have people purchasing expensive servers just to store data locally. I really don't want to spin my own file system. XML I looked at, but a text-based database gets monstrous huge quickly. All that is stored is:
Index
StrikeMagnitude
StrikeAngle
StrikeStartTIme
StrikeDuration
StrikeType
and eventually
StrikeLattitude
StrikeLongitude

All are required fields. I thought of making a simple sequential file-based storage, but then you have to deal with file corruption issues and seek times since you don't know how far into the file you need to go without searching at startup. I may wind up going this way by simply saving a new file for each month, which will be much faster. In that way if a file goes corrupt, at most you loose a month of data. However dealing with such beasts may be really messy development-wise.

I will try moving the index to time and see how it goes. I am afraid earlier records may not have any uniqueness to them as I fought with bugs. Oh well, time to clean house anyhow.

SQL servers are simply out of the question as they would simply be too complex to support a casual user. I think.
liquidsquid
Minister of Gerbil Affairs
 
Posts: 2452
Joined: Wed May 29, 2002 10:49 am
Location: New York

Re: Database performance help please

Postposted on Wed Oct 07, 2009 7:11 am

liquidsquid wrote:...
Also partly due to the fact it will hopefully become a consumer/hobbyist product and I can't have people purchasing expensive servers just to store data locally.

SQL Server Express, MySQL, and PostgreSQL are all available for free. SQL Server probably has the advantage of being closer to what you're used to; the other two have the advantage of being truly Open Source (not just free), with no vendor lock-in hazards (e.g. the vendor can't just unilaterally declare that the free version is no longer supported on the next version of Windows).
SQL servers are simply out of the question as they would simply be too complex to support a casual user. I think.

You have a point here; installation of your app will probably be a bit trickier than if you just use .mdb files. But figure it out once, write some good instructions, and you should be set? Once it is installed it should appear the same to the user.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38142
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Database performance help please

Postposted on Wed Oct 07, 2009 7:51 am

I've done some projects that have used MySQL and with it being Open Source you can just embed it in your app. I really wouldn't push Access - everything I've read about it has convinced me to stay well clear of it as it quickly reaches its limits.
notfred
Grand Gerbil Poohbah
 
Posts: 3775
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Re: Database performance help please

Postposted on Wed Oct 07, 2009 8:43 am

A few more questions: If I decide to go to something like MySQL, is that something easily dealt with in a C# program? I am using C# since I have so far been able to port it to Linux very easily with minimum fuss. Ideally I would just change the connection string in my program, correct?

Thanks again,
Mark
liquidsquid
Minister of Gerbil Affairs
 
Posts: 2452
Joined: Wed May 29, 2002 10:49 am
Location: New York

Re: Database performance help please

Postposted on Wed Oct 07, 2009 9:07 am

There may be some slight syntactic differences in the SQL queries (each vendor implements different extensions to the base SQL language), but yeah basically you just change the connection string.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38142
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Database performance help please

Postposted on Wed Oct 07, 2009 10:44 am

I recommend SQL Server Express definitely. I think from within Access they have some migration wizards already.

As for redistribution, you just need to "register" on Microsoft's site (they say it is for critical patches/fixes notification) and there is no cost/royalty after that. For 2005/2008, they give you an EXE where you just need to ask the users to run (guide here). Once you figure out the command line parameters it is not difficult to incorporate this into your installer (if you have one). Only if you want an advanced, integrated one-stop installer then you need to figure out chaining and all that stuff, but one step at a time.

You also probably need to look at paging your data set with the proper data controls, that should reduce the need to download a giant dataset on load?

So is this data viewing app or users can load new data in as well? Is the database shared?
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24584
Joined: Mon May 24, 2004 2:19 am

Re: Database performance help please

Postposted on Wed Oct 07, 2009 10:54 am

What about going to SQLite? Then you are only running the database when your application is running and it should have a smaller memory footprint. Installation should also be a little simpler. You've also got an open source ADO.NET Wrapper for it.
Intel i7 860, Asus P7P55D Pro, 4x2GB Corsair XMS3 1600 (CMX4GX3M2A1600C9), EVGA GTX 560 Ti Superclocked
Seagate 7200.7 160GB, WD Caviar Black 640GB, WD Caviar Green 1TB, WD Caviar Green 2TB
Dell 2408WFP and Dell 2407WFP-HC for dual-24" goodness
emorgoch
Gerbil Elite
 
Posts: 690
Joined: Tue Mar 27, 2007 11:26 am
Location: Toronto, ON

Re: Database performance help please

Postposted on Wed Oct 07, 2009 12:38 pm

I don't usually post but I had to chime in on this as I spent a lot of my time cutting my teeth on databasing with MS Access.

1. It is possible that given the size of your MDB you may be approaching the limitations of that version of Access. IIRC MDB's are limited to (2GB - System Objects)

2. Is your data normalized? Without looking at your tables/relationships this is not something I can judge. But I have worked with data similar to yours in the past. What I did, was instead of:

TblData
IDField | DataField | Date

I split the table into 2 tables:

TblData
IDField | DataField | DateID

and

TblDate
DateID | Date

Relate the DateID from TblDate to TblData. At first glance this seems like you are taking up more space but with how Access/Jet works indexes you are actually cutting down on significant amounts of work it has to perform while running a query.

3. Instead of "SELECT * ..." You could change this to Select the actual field that you need and thus cut down on the amount of data Access/Jet returns to you, unless of course, you need all those fields. I usually accomplish this on some type of report. I'll place the date range at the top of my report and then only return the fields that are necessary for my needs. Of course, this may not be your case depending on how you have things set up.

4. I am assuming you are connecting to a SQL server or at least it appears you are in your first code block. Access is not very good over a network, at least I have seen some nasty things happen especially with wireless and Access. It is entirely possible that a large amount of this is due to network latency. If you are storing all of this data in a backend database I would make local tables within your database that you import all that data to when are you beginning work. It's my experience that local tables are blazing fast compared to working with a backend over a network.

Hopefully some of this will help you, good luck.
ASherbuck
Gerbil
 
Posts: 12
Joined: Fri Jan 25, 2008 1:38 am

Re: Database performance help please

Postposted on Thu Oct 08, 2009 5:36 am

Ok, I resolved my performance issues except for data load times, but that has been handled using a Background Worker to pull up data from the file into the main collection as the user can interact with the UI. The illusion is the program is loaded and ready almost instantly. I really only needed to find my ID (primary Key) once from where I needed to start, and once I had that, I based everything off of it. Probably a 5x performance gain in reality, and almost a 500x apparent increase.

It is really the reason why I went through a dB so multiple threads could add/read data as originally I intended a lightweight data-logger service which strictly wrote to the database, and a separate UI for reading/displaying the data which would socket-in to the logging app for realtime data.

Now I have to write a utility for scraping out uninteresting data to shrink the database back to something reasonable. The question is how to programatically clean/compress the database rather than go through the Control Panel.
liquidsquid
Minister of Gerbil Affairs
 
Posts: 2452
Joined: Wed May 29, 2002 10:49 am
Location: New York

Re: Database performance help please

Postposted on Thu Oct 08, 2009 1:27 pm

why all this work for an access database? if you plan on making this into a product, you DEFINITELY want to use SQL Server Express. It's free and embeddable. Also, the syntax is almost identical to Access.
Science is forbidden. Laboratories manufacture danger!
sativa
Grand Gerbil Poohbah
 
Posts: 3045
Joined: Sun Apr 14, 2002 7:22 pm
Location: lafayette, la

Re: Database performance help please

Postposted on Thu Oct 08, 2009 3:30 pm

Well, I guess it is partly lack of experience on database programming. I honestly don't know how to make a table and fields dynamically through code anymore (I could once in VB, but it seems that functionality is well-hidden).

For example; say I install my application on a users machine, but don't deliver a "seed" database with it. How would I create a brand-spanking new database in some SQL server with my required tables? Due to my (lack of) experience, I created the initial database in Access, and went from there. Anyone have some basic examples in C# or VB.NET where this is done? Luckily my code is heavily modularized so I can easily switch the underpinnings of the database to almost anything.

Thanks again,

-Mark
liquidsquid
Minister of Gerbil Affairs
 
Posts: 2452
Joined: Wed May 29, 2002 10:49 am
Location: New York

Re: Database performance help please

Postposted on Thu Oct 08, 2009 5:19 pm

liquidsquid wrote:say I install my application on a users machine.
Admittedly that's trickier, but it's done and documented a lot (see the bottom of this post). You might consider the Compact Edition also; that's advertised to be even easier to package and deploy with a .NET app. From your position, I think it would be preferable to support this approach, than to support all the users installing & configuring Access by themselves correctly; there's plenty of ways for them to mess up that.

But I think there's a better architectural solution: Don't distribute a bunch of databases -instead set up one with a 3rd party host and have your users write to that central database. I imagine you can find a decent provider for Sql Server for less than $20/mo; I bet a MySQL host charges half that for the same quality of service. From an analysis perspective, a central location of data has many advantages. From a development perspective, it might be a little harder (than distributing isolated databases). From a deployment & support perspective, this will be much easier.

Deployment examples:
1) See FF's post
2) http://msdn.microsoft.com/en-us/library/bb522540.aspx
3) http://www.trap17.com/index.php/Install ... 60470.html
wibeasley
Gerbil Elite
Gold subscriber
 
 
Posts: 952
Joined: Sat Mar 29, 2008 3:19 pm
Location: Norman OK

Re: Database performance help please

Postposted on Thu Oct 08, 2009 8:10 pm

Ok, having a remote database makes good sense for another aspect of the program, since this local user client will also connect to a remote server which will use the lightning data sent from multiple clients correlate strikes. From this data it will triangulate the strike location, then it will send back a bunch of events with the triangulated data. The main server will process and store this central data and spit it back as needed.

However you are not always guaranteed an internet connection when running this program (stormy weather after all), so it is nice to have a locally stored database as well so that when a connection is restored, data can later be sent for processing and refining the data stored on the server, or loaded for review.

It is a big chunk of effort to chew by myself for sure, but that's what I do when I don't watch TV.

Oh, on a stormy day, I may get 500 events a minute or more. Right now, things are dead. I am only measuring 15.3 a minute. These would be individual records, so an update may be 500 at once.

-Mark
liquidsquid
Minister of Gerbil Affairs
 
Posts: 2452
Joined: Wed May 29, 2002 10:49 am
Location: New York

Re: Database performance help please

Postposted on Thu Oct 08, 2009 10:10 pm

There are basically 2 ways of creating that "seed" database of yours:
1. Do a database export and import on install. You need to supply the data files with your installer but the command line to import is pre-coded and just needs to be run (usually a batch file).
2. Include your database creation script. Internet, SQL Server Management Studio Express can help you write those CREATE TABLE statements and stuff.

There is actually a 3rd way but I'm not sure if it is risky (or totally not work). AFAIK it involves "detaching" the .mdb files (yes SQL Server uses that extension too) from your own database, copying the those files over to the target machine, and then "attach" them to the new SQL Server instance.
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24584
Joined: Mon May 24, 2004 2:19 am

Re: Database performance help please

Postposted on Thu Oct 08, 2009 11:48 pm

Flying Fox wrote: AFAIK it involves "detaching" the .mdb files (yes SQL Server uses that extension too) from your own database, copying the those files over to the target machine, and then "attach" them to the new SQL Server instance.
Could you mean the '.mdf' file? (e.g., http://support.microsoft.com/kb/224071) I've never deployed a database with an automated installer. With respect to the last two ways, it might be tricky getting DB Admin privileges on a user's SQL Server instance computer -but I'm speculating.
wibeasley
Gerbil Elite
Gold subscriber
 
 
Posts: 952
Joined: Sat Mar 29, 2008 3:19 pm
Location: Norman OK

Re: Database performance help please

Postposted on Thu Oct 08, 2009 11:58 pm

liquidsquid wrote:However you are not always guaranteed an internet connection when running this program (stormy weather after all), so it is nice to have a locally stored database as well so that when a connection is restored, data can later be sent for processing and refining the data stored on the server, or loaded for review.
It sounds like you have a cool project. I've never used designed a system that had a DB for a temporary store before writing to a central db, but I read about a few of approaches when considering if it was worth the effort. Is it likely that the internet would go down, but not the power for the computers and sensors? If so, something like this maybe be helpful: http://code.msdn.microsoft.com/Release/ ... aseId=1200. There are more examples of this approach, but I'm not finding them.
wibeasley
Gerbil Elite
Gold subscriber
 
 
Posts: 952
Joined: Sat Mar 29, 2008 3:19 pm
Location: Norman OK


Return to Developer's Den

Who is online

Users browsing this forum: No registered users and 3 guests