Back to General and Gameplay Programming

SQL Selecting from very large datasets on non-indexed fields

General and Gameplay Programming Programming

Started by MButchers June 06, 2005 01:06 PM

3 comments, last by MButchers 18 years, 10 months ago

MButchers

123

Author

June 06, 2005 01:06 PM

Hi There, I have a slight problem and wondered if anyone out there may have a solution other than the obvious ones of 'Adding an index'. The problem is I have an SQL database and have to select a subset of data out of certain tables, the worst case is 4Million + rows. The field I need to select data on is a non-indexed date time field, the selection obviously times out. The only solution I can think of ( without modifying the database ) is to use the ID field and do selects on the datetime field within a range of ID's e.g. SELECT [Fields] FROM [Table] WHERE [IndexedField] BETWEEN N AND N + 1000 AND [NonIndexedField] >= Value Does anyone out there no of any other methods which will solve this problem without modifying the source database. Thanks in advance Mark

evolutional

1,397

June 06, 2005 01:29 PM

Hmmm, you could select into a temporary table which is ordered by the field you're interested in and index that - but then you're still having to select and sort the data. I think the only real solution you have is creating an index unless you're certain the date/time is stored incrementally in the database (which you aren't).

Datetime fields are difficult to deal with, I just had to do some comparisons on a 150 Million record dataset.

Replicon

306

June 07, 2005 02:23 AM

You mean you're stuck querying a huge table on a field that isn't indexed and you don't want to build an index? Hmm... You could always look at what indexes you have available, and see if you can approximate your query somehow... but you might not get the accuracy you want.

What kind of application is this for? "relatively time-critical" or "overnight data-crunch that isn't actually time-critical, just my stupid thing keeps timing out"?

My other question was, out of the 4Mil, how many rows is this thing expected to return, on average?

One way to do it is close to what you're talking about, with the indexed field: If you have an indexed field that's relatively uniformly distributed accross the rows, then you can break the query down based on that index and do some local processing.

markr

1,692

June 07, 2005 05:31 AM

You could select the contents of the field you want, and another field which is indexed (ideally the PK) into a table in another database (that is writeable) on the same server.

You would then be able to do a join, provided you are using a database server which supports cross-catalogue joins - which I think you are, as I think you're using MS SQL?

Of course, the SELECT INTO would take as long, if not considerably longer than the original select, but you'd only have to do it once, then in future you could use a join which would be much faster.

Mark

MButchers

123

Author

June 07, 2005 08:06 AM

Many thanks for the suggestions guys,

MarkR, how do you mean I would only need to SELECT INTO once? Surely when the data in the field im selecting for searching changes i'd need to update those changes in the other database or the index would be incorrect! If this is the case Id still need to identify those records which have recently changed. An obvious work around to this is place a trigger onto the source datatable which pushes the relevant information into the new table when a record changes, unfortunately I dont have the luxury of being able to do this!

The database is indeed MS SQLServer 7.0 and the application is fairly 'time critical'

Cheers

Mark

SQL Selecting from very large datasets on non-indexed fields

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

SQL Selecting from very large datasets on non-indexed fields

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines