Speeding up mySQL Queries

Started by
16 comments, last by graveyard filla 18 years, 2 months ago
A friend of mine and I are working for the first time together on a web project, a Stock Management prototype... Now, he argues with me that all tables in the database should have as Primary key an ID int field that auto-increments (whats the diference between Primary and Index keys? so lost...) I would never dare say I'm an SQL guru, but I allways thought your primary key should be the field that you search for the most. So, if you have a table that has the product ID (pID) and the product's name (pName), and you always search that table for pID, then pID should be your index field, right?
Advertisement
Right...... your product ID should be the int field that auto-increments is what he means.....

Whoa whoa whoa...

Sometimes it's entirely inappropriate to have a key at all; it doesn't offer any extra information, and the only thing it can do is speed up a query by making the heap of information a bit more organized.

The difference between a Primary key and an Index key is simple; a Primary key is an Index key that is used first (and usually, most often). Using a primary key extremely speeds up queries where you've got very sequential data, but if you've got an associative table (a table that's entire purpose is to join the information in other tables effectively and supply a sort order), a primary key can be detrimental.

So yes, you're right. A primary key is the key you use the most, and should always be. And in this case, pID should definitely be the primary if you're doing more selects on it than on the name (honestly, I'd think you'd be using the pName search more, but I dunno, it's very specific to your setup).

If you want to know for sure, make the table twice, load both up with identical sets of bogus (or real if you've got it) information, and do an EXPLAIN on the query to each. It'll show you which query is faster (though parsing the info can be rough at first, Google it to learn how).
Quote:Original post by Arkantis
Right...... your product ID should be the int field that auto-increments is what he means.....


Right, well, in our case we don't define a Product's ID code, it already has one that we need to enter into the DB, but that's beside the point, what matters is that we search for the ID first, so that should be our primary key, I think.

What he tries to argue with me, and I don't agree, is that all tables should have an id auto-incrementable field as the primary key.

See this example. We got a table, it has two fields, an alfanumeric barcode and the product's name for that barcode, some example entries could be:
AA320 - Jeep Tire
AA320R - Jeep Tire for Rain

Now, I would put the barcode field as the primary key, thats what we'll be searching for, but he argues that the table should have as the primary key an id field, even though we won't be using that for anything at all.. yes, if later I want to link to a specific record in that table, I can link to that ID, but otherwise it doesnt make much sense to me...
In the case of PK's - some tables have fields which "naturally" fit as primary keys... for example, in a "user" table (that stores users login names and passwords), the user_name field would make a good fit for a primary key, since obviously you want to force users to create unique usernames.

Other tables may not have any fields which fit to be primary keys... in this case, you can make an *_ID field which is auto-incremented.

Of course nothing is written in stone here - there are many different situations that call for different things. For example you can force uniqueness w/o having to make it a primary key. Or, maybe you want to set up relations, and to be more efficient you might put a *_ID field on your "user" table. Every situation calls for something different...

If you want to optimize searches, thats when you start setting up indices. Again, this is a case-by-case basis, and varies depending on the table's fields, its relations, what people are searching it based on, how often it is hit, etc...
FTA, my 2D futuristic action MMORPG
Quote:Original post by Prozak

What he tries to argue with me, and I don't agree, is that all tables should have an id


And he's wrong. In this case, you don't need one. Put the primary key on the Product ID and be done.

edit: this kind of error is something people new to SQL often feel the need to do, especially people who haven't had good experience with joins. An Auto-incrementing primary key is only useful if you're creating a lot of unique data (such as call-logs or IP hits or something). If you've got a huge set of already well defined data, adding a primary integer key just adds 4 bytes to each field and makes the SQL server have to keep up with that information. With your data, you've already got a well defined key, and that's all you need.
Quote:Original post by Prozak
Quote:Original post by Arkantis
Right...... your product ID should be the int field that auto-increments is what he means.....


Right, well, in our case we don't define a Product's ID code, it already has one that we need to enter into the DB, but that's beside the point, what matters is that we search for the ID first, so that should be our primary key, I think.

What he tries to argue with me, and I don't agree, is that all tables should have an id auto-incrementable field as the primary key.

See this example. We got a table, it has two fields, an alfanumeric barcode and the product's name for that barcode, some example entries could be:
AA320 - Jeep Tire
AA320R - Jeep Tire for Rain

Now, I would put the barcode field as the primary key, thats what we'll be searching for, but he argues that the table should have as the primary key an id field, even though we won't be using that for anything at all.. yes, if later I want to link to a specific record in that table, I can link to that ID, but otherwise it doesnt make much sense to me...


Your friend is wrong..... you should use the most appropriate field as the primary key...
FTA, my 2D futuristic action MMORPG
Quote:Original post by Prozak
What he tries to argue with me, and I don't agree, is that all tables should have an id auto-incrementable field as the primary key.
Yes, this is a good sign of a person who sort of learned databases off tutorials and junk and really has very little idea what they're doing.

Primary keys go on tables that should have primary keys. They go on columns which are logical choices for primary keys (and sometimes that is a compound key of several columns). They are the data type that it most makes sense for them to be. In your given example, the bar code should be the primary key; I assume it's a varchar of some length, which is fine.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
All tables should have a primary key. This means that at least one column (or set of columns) should be unique.

This makes sense, as tables without a primary key can contain duplicate rows, which are no use to anyone.

Not every table needs to have an integer primary key which is auto incrementing, but it's often a good idea.

Normally I put an integer, auto incrementing PK on every table and call it "id".

However, there are some cases - notably cross-reference tables - where doing so doesn't help at all. Rather, there are two foreign keys which comprise the primary key for that table between them.

I should stress that all tables MUST have a primary key. Some RDBMSs require it, on all others it's merely an incredibly good idea.

Using username for a primary key on a users table is NOT a good idea, as this will break things if you ever need to rename a user.

A rule of thumb is that you should never use any "meaningful" data as a primary key - anything that the user can see or would have to record anywhere (or enter).

Updating a primary key is of course, usually a bad idea, as you'd have to update many foreign keys at the same time which reference it. If foreign key constraints are enabled, you'd have to do some horrible jiggery-pokery to achieve this.

Mark
First off a PK doesn't have to have anything to do with indexing. All you're saying when you define a PK is that this field or fields will be the way to identify a single record. The query plan can use the PK like a unique index (it knows that it defines a unique record), but indexing is a whole other issue than intro table design. For example, what columns you define your clustered index (generally you can have 1 per table as it stores the actual table data in the order of the clustered index) on will matter more than anything you do with a PK.

Some data does have a naturally defined PK. A natural PK occurs when there are columns of data in the data you're storing that can be used to uniquely identify a record.

What your friend is talking about is a surrogate PK. After designing/working with/reading about dbs for years now I generally think that having a surrogate PK is a good idea, even when you have a natural PK. My reasoning follows:

1) No matter how well I think I know the data, it WILL change. If what you thought was a natural PK changes then you have to change all the tables that had foreign key relationships to your table. I'm not sure how much experience you've had with DBs, but fixing something like this is a nightmare :)

2) Someone mentioned the size issue up above, albeit incorrectly. If my natural PK is a a varchar product ID field or a combination of fields it ends up using more space by not having a surrogate auto-increment int field when doing foreign keys to other tables.

3) It makes queries easier by always giving you a single PK/FK pair to join tables on.

To answer your last question indexes aren't really keys. As mentioned a PK is just a unique way to identify a single record. It's then used by the query plan as a unique index (I don't know mysqls query engine that well so I'm guessing here).

Indexes are structures built external to the actual data table that store additional information for the query plan to use in order to retrieve the data. There are all sorts of indexes depending on which db you're using. When to add indexes and the implications of doing so are probably better left to another question :)

This topic is closed to new replies.

Advertisement