SQL Selecting from very large datasets on non-indexed fields

Started by
3 comments, last by MButchers 18 years, 10 months ago
Hi There, I have a slight problem and wondered if anyone out there may have a solution other than the obvious ones of 'Adding an index'. The problem is I have an SQL database and have to select a subset of data out of certain tables, the worst case is 4Million + rows. The field I need to select data on is a non-indexed date time field, the selection obviously times out. The only solution I can think of ( without modifying the database ) is to use the ID field and do selects on the datetime field within a range of ID's e.g. SELECT [Fields] FROM [Table] WHERE [IndexedField] BETWEEN N AND N + 1000 AND [NonIndexedField] >= Value Does anyone out there no of any other methods which will solve this problem without modifying the source database. Thanks in advance Mark
Advertisement
Hmmm, you could select into a temporary table which is ordered by the field you're interested in and index that - but then you're still having to select and sort the data. I think the only real solution you have is creating an index unless you're certain the date/time is stored incrementally in the database (which you aren't).

Datetime fields are difficult to deal with, I just had to do some comparisons on a 150 Million record dataset.
You mean you're stuck querying a huge table on a field that isn't indexed and you don't want to build an index? Hmm... You could always look at what indexes you have available, and see if you can approximate your query somehow... but you might not get the accuracy you want.

What kind of application is this for? "relatively time-critical" or "overnight data-crunch that isn't actually time-critical, just my stupid thing keeps timing out"?

My other question was, out of the 4Mil, how many rows is this thing expected to return, on average?

One way to do it is close to what you're talking about, with the indexed field: If you have an indexed field that's relatively uniformly distributed accross the rows, then you can break the query down based on that index and do some local processing.
You could select the contents of the field you want, and another field which is indexed (ideally the PK) into a table in another database (that is writeable) on the same server.

You would then be able to do a join, provided you are using a database server which supports cross-catalogue joins - which I think you are, as I think you're using MS SQL?

Of course, the SELECT INTO would take as long, if not considerably longer than the original select, but you'd only have to do it once, then in future you could use a join which would be much faster.

Mark
Many thanks for the suggestions guys,

MarkR, how do you mean I would only need to SELECT INTO once? Surely when the data in the field im selecting for searching changes i'd need to update those changes in the other database or the index would be incorrect! If this is the case Id still need to identify those records which have recently changed. An obvious work around to this is place a trigger onto the source datatable which pushes the relevant information into the new table when a record changes, unfortunately I dont have the luxury of being able to do this!

The database is indeed MS SQLServer 7.0 and the application is fairly 'time critical'

Cheers

Mark

This topic is closed to new replies.

Advertisement