TR Forums

wibeasley · Mon Jan 12, 2009 5:17 pm

I'm a statistics grad student who programs a lot of simulations. One of my simulations spans several computers (due to licensing contrainsts). The stat results for the current simulation consumes 4GB on SqlServer 2005. The performance of the current schema was fine when I was developing it on an empty database. But now that the project is almost completed, the deletes are taking a very long time. Deleting one tblBatch row used to finish in less than a second; now it's over 20 seconds.

I'm asking this question because I want to design my next simulation database better. I don't think I'll change the current schema because rows are only deleted if they were created by a unit test; and I can just increase the TimeOut value for the tests. The performance of the reads and writes is still fine.

From a database perspective, would it have been better to do the many-to-many relationship more conventionally? I'm thinking tblModelSingle should have been a junction table by making:
-one tblBatch row to many tblChunk rows,
-one tblChunk row to many tblModelSingle rows,
-one tblDataFile row to many tblModelSingle rows,
-one tblModelChunk row to many tblModelSingle rows, and
-one tblModelSingle row to one tblModelSpec row.

When I first started this project 18 months ago, I avoided doing this conventional way because my C# program would have had to be more chatty with the database. But now I'm guessing that would have been an acceptable tradeoff.

In addition to quicker deletes, are there likely to be other benefits if I had used the conventional approach.

Currently, one tblBatch row has 16 tblChunk rows. One tblChunk row has 100 tblDataFile rows and 243 tblModelChunkRows --and thus 24,300 tblModelSingle and 24,300 tblModelSpec rows. (One reason for the complication is that the expensive stat software estimates one model for one dataset in 10sec, but estimates one model for 100 datasets in 20sec).

Additional Details
-My database server is on P4 with 3GB of ram, but task manager shows that the PF usage never exceeds 2GB.
-SqlServer 2005 Standard and Windows Server 2003 32-bit.
-Two 7,200 Barracudas with plenty of space. (One drive is for the OS and the log file, the other drive is for the MDF)
-All the indices in all the tables have been rebuilt/defragged.
-The physical MDF file doesn't need to be defragged. It's 4GB in only 3 fragments.

If you see other weaknesses, I'm open to any criticisms. If you're still reading at this point in the post, Thanks.

Tue Jan 13, 2009 9:10 am

I am no expert in DB schema design, but I see 2 ways to get from the tblBatch to tblModelSingle. May be that is why the DB is having a harder time?

How would you mean by "conventional"? Denormalize this? That's not "conventional", no?

wibeasley · Tue Jan 13, 2009 11:30 am

This is the schema that I was trying to verbally describe. Is it not the conventional way to do it? (I'm really asking. I don't know.)

lugnplu · Mon Feb 09, 2009 1:28 am

How are you deleting? (giving the tsql would be key here)
What are the indexes for those tables?

Just seems odd that the delete is getting slower... and said indexes dont have frags.

wibeasley · Tue Feb 10, 2009 12:56 pm

Thanks for asking. There are two ways I delete (both are slow).

--Deleting an incomplete batch (GKey is a GUID):
DELETE from tblBatch WHERE GKey = '40E659FE-3C72-4D7A-B035-F11E6E2816EF'
--Deleting results from unit and integration tests:
DELETE from tblBatch WHERE UnitTest=1

Over the weekend I made small change to the schema which improved the delete time. I switched the delete rule between tblModelChunk and tblModelSingle to cascade, while the relationship between tblDataFile and tblModelSingle is now 'no action'. All the other relationships in the schema have always been cascade.

When deleting 35 test rows in tblBatch, the time decreased from 210sec to 30. This may be because there are more tblModelChunkRows per Chunk than tblDataFileRows (243 vs 100).

By the way since my last post, I've upgraded my database server to a E6400 with 8GB 800MHz ram (and a 64-bit OS). The main HDD is a Caviar black, while the Sql Server log and the mdf files are written to separate Velociraptors (no raid). I got a small grant for my dissertation, which is a different project than this. I couldn't bear to let the hardware sit idle for the next few weeks while I'm developing that code.

wibeasley · Tue Feb 10, 2009 1:16 pm

Here are the indices by table. A column named 'GKey' is a GUID; a column named 'ID' is a 32 or 64 bit integer.
tblBatch - primary key on GKey
tblChunk - primary key on ID, unique index on the combintation of BatchKey, DIFCondition, and GroupSubjectCount
tblDataFile - primary key on ID, unique index on the combination of ChunkID and ChunkReplicationIndex
tblModelChunk - primary key on GKey
tblModelSingle - primary key on ID, unique index on the combination of DataFileID and ModelChunkKey
tblModelSpec- primary key on ModelSingleID

I'd appreciate any advice -related or not to my initial question. I may be wrong, but I thnk that this is pretty good hardware for a small project. Therefore I think the weakness belongs to the schema I created.

TR Forums

Many-to-many Database Schema

Many-to-many Database Schema

Re: Many-to-many Database Schema

Re: Many-to-many Database Schema

Re: Many-to-many Database Schema

Re: Many-to-many Database Schema

Re: Many-to-many Database Schema

Who is online