Sign in to follow this  

Speeding up mySQL Queries

This topic is 4337 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

A friend of mine and I are working for the first time together on a web project, a Stock Management prototype... Now, he argues with me that all tables in the database should have as Primary key an ID int field that auto-increments (whats the diference between Primary and Index keys? so lost...) I would never dare say I'm an SQL guru, but I allways thought your primary key should be the field that you search for the most. So, if you have a table that has the product ID (pID) and the product's name (pName), and you always search that table for pID, then pID should be your index field, right?

Share this post


Link to post
Share on other sites
Whoa whoa whoa...

Sometimes it's entirely inappropriate to have a key at all; it doesn't offer any extra information, and the only thing it can do is speed up a query by making the heap of information a bit more organized.

The difference between a Primary key and an Index key is simple; a Primary key is an Index key that is used first (and usually, most often). Using a primary key extremely speeds up queries where you've got very sequential data, but if you've got an associative table (a table that's entire purpose is to join the information in other tables effectively and supply a sort order), a primary key can be detrimental.

So yes, you're right. A primary key is the key you use the most, and should always be. And in this case, pID should definitely be the primary if you're doing more selects on it than on the name (honestly, I'd think you'd be using the pName search more, but I dunno, it's very specific to your setup).

If you want to know for sure, make the table twice, load both up with identical sets of bogus (or real if you've got it) information, and do an EXPLAIN on the query to each. It'll show you which query is faster (though parsing the info can be rough at first, Google it to learn how).

Share this post


Link to post
Share on other sites
Quote:
Original post by Arkantis
Right...... your product ID should be the int field that auto-increments is what he means.....


Right, well, in our case we don't define a Product's ID code, it already has one that we need to enter into the DB, but that's beside the point, what matters is that we search for the ID first, so that should be our primary key, I think.

What he tries to argue with me, and I don't agree, is that all tables should have an id auto-incrementable field as the primary key.

See this example. We got a table, it has two fields, an alfanumeric barcode and the product's name for that barcode, some example entries could be:
AA320 - Jeep Tire
AA320R - Jeep Tire for Rain

Now, I would put the barcode field as the primary key, thats what we'll be searching for, but he argues that the table should have as the primary key an id field, even though we won't be using that for anything at all.. yes, if later I want to link to a specific record in that table, I can link to that ID, but otherwise it doesnt make much sense to me...

Share this post


Link to post
Share on other sites
In the case of PK's - some tables have fields which "naturally" fit as primary keys... for example, in a "user" table (that stores users login names and passwords), the user_name field would make a good fit for a primary key, since obviously you want to force users to create unique usernames.

Other tables may not have any fields which fit to be primary keys... in this case, you can make an *_ID field which is auto-incremented.

Of course nothing is written in stone here - there are many different situations that call for different things. For example you can force uniqueness w/o having to make it a primary key. Or, maybe you want to set up relations, and to be more efficient you might put a *_ID field on your "user" table. Every situation calls for something different...

If you want to optimize searches, thats when you start setting up indices. Again, this is a case-by-case basis, and varies depending on the table's fields, its relations, what people are searching it based on, how often it is hit, etc...

Share this post


Link to post
Share on other sites
Quote:
Original post by Prozak

What he tries to argue with me, and I don't agree, is that all tables should have an id


And he's wrong. In this case, you don't need one. Put the primary key on the Product ID and be done.

edit: this kind of error is something people new to SQL often feel the need to do, especially people who haven't had good experience with joins. An Auto-incrementing primary key is only useful if you're creating a lot of unique data (such as call-logs or IP hits or something). If you've got a huge set of already well defined data, adding a primary integer key just adds 4 bytes to each field and makes the SQL server have to keep up with that information. With your data, you've already got a well defined key, and that's all you need.

Share this post


Link to post
Share on other sites
Quote:
Original post by Prozak
Quote:
Original post by Arkantis
Right...... your product ID should be the int field that auto-increments is what he means.....


Right, well, in our case we don't define a Product's ID code, it already has one that we need to enter into the DB, but that's beside the point, what matters is that we search for the ID first, so that should be our primary key, I think.

What he tries to argue with me, and I don't agree, is that all tables should have an id auto-incrementable field as the primary key.

See this example. We got a table, it has two fields, an alfanumeric barcode and the product's name for that barcode, some example entries could be:
AA320 - Jeep Tire
AA320R - Jeep Tire for Rain

Now, I would put the barcode field as the primary key, thats what we'll be searching for, but he argues that the table should have as the primary key an id field, even though we won't be using that for anything at all.. yes, if later I want to link to a specific record in that table, I can link to that ID, but otherwise it doesnt make much sense to me...


Your friend is wrong..... you should use the most appropriate field as the primary key...

Share this post


Link to post
Share on other sites
Quote:
Original post by Prozak
What he tries to argue with me, and I don't agree, is that all tables should have an id auto-incrementable field as the primary key.
Yes, this is a good sign of a person who sort of learned databases off tutorials and junk and really has very little idea what they're doing.

Primary keys go on tables that should have primary keys. They go on columns which are logical choices for primary keys (and sometimes that is a compound key of several columns). They are the data type that it most makes sense for them to be. In your given example, the bar code should be the primary key; I assume it's a varchar of some length, which is fine.

Share this post


Link to post
Share on other sites
All tables should have a primary key. This means that at least one column (or set of columns) should be unique.

This makes sense, as tables without a primary key can contain duplicate rows, which are no use to anyone.

Not every table needs to have an integer primary key which is auto incrementing, but it's often a good idea.

Normally I put an integer, auto incrementing PK on every table and call it "id".

However, there are some cases - notably cross-reference tables - where doing so doesn't help at all. Rather, there are two foreign keys which comprise the primary key for that table between them.

I should stress that all tables MUST have a primary key. Some RDBMSs require it, on all others it's merely an incredibly good idea.

Using username for a primary key on a users table is NOT a good idea, as this will break things if you ever need to rename a user.

A rule of thumb is that you should never use any "meaningful" data as a primary key - anything that the user can see or would have to record anywhere (or enter).

Updating a primary key is of course, usually a bad idea, as you'd have to update many foreign keys at the same time which reference it. If foreign key constraints are enabled, you'd have to do some horrible jiggery-pokery to achieve this.

Mark

Share this post


Link to post
Share on other sites
First off a PK doesn't have to have anything to do with indexing. All you're saying when you define a PK is that this field or fields will be the way to identify a single record. The query plan can use the PK like a unique index (it knows that it defines a unique record), but indexing is a whole other issue than intro table design. For example, what columns you define your clustered index (generally you can have 1 per table as it stores the actual table data in the order of the clustered index) on will matter more than anything you do with a PK.

Some data does have a naturally defined PK. A natural PK occurs when there are columns of data in the data you're storing that can be used to uniquely identify a record.

What your friend is talking about is a surrogate PK. After designing/working with/reading about dbs for years now I generally think that having a surrogate PK is a good idea, even when you have a natural PK. My reasoning follows:

1) No matter how well I think I know the data, it WILL change. If what you thought was a natural PK changes then you have to change all the tables that had foreign key relationships to your table. I'm not sure how much experience you've had with DBs, but fixing something like this is a nightmare :)

2) Someone mentioned the size issue up above, albeit incorrectly. If my natural PK is a a varchar product ID field or a combination of fields it ends up using more space by not having a surrogate auto-increment int field when doing foreign keys to other tables.

3) It makes queries easier by always giving you a single PK/FK pair to join tables on.

To answer your last question indexes aren't really keys. As mentioned a PK is just a unique way to identify a single record. It's then used by the query plan as a unique index (I don't know mysqls query engine that well so I'm guessing here).

Indexes are structures built external to the actual data table that store additional information for the query plan to use in order to retrieve the data. There are all sorts of indexes depending on which db you're using. When to add indexes and the implications of doing so are probably better left to another question :)

Share this post


Link to post
Share on other sites
Quote:
Original post by graveyard filla
In the case of PK's - some tables have fields which "naturally" fit as primary keys... for example, in a "user" table (that stores users login names and passwords), the user_name field would make a good fit for a primary key, since obviously you want to force users to create unique usernames.


Good example, but I'd add one corollary. A primary key should have two features:

1. It uniquely identifies the record. (This is required.)
2. It is static. (This is recommended.)

Your example works best if the user_name field never changes. If the user can change his name, the database will have to perform cascading updates in order to maintain data integrity; so in that case, it would make sense to have a separate auto-incrementing primary key, and a unique index on the user_name field.

Share this post


Link to post
Share on other sites
Yeah, you're right.... Like I said, every situation has different variables.....in a web app I did for work I used the username as the PK - because no one ever changed their username (In fact, this wasn't even possible by the software.) It doesn't even seem like something that would be too common in any software, anyway...... have you ever changed your username for anything?

Share this post


Link to post
Share on other sites
Quote:
Original post by graveyard filla
Yeah, you're right.... Like I said, every situation has different variables.....in a web app I did for work I used the username as the PK - because no one ever changed their username (In fact, this wasn't even possible by the software.) It doesn't even seem like something that would be too common in any software, anyway...... have you ever changed your username for anything?


Most web-based forums I've seen allow user names to be changed. The users might not have permission to do it themselves, but admins can. In your app, since it's not possible, it shouldn't be an issue. I just mentioned it as something to consider during database design.

Share this post


Link to post
Share on other sites
Having looked at the code for a number of web forums, most of them use a numeric ID as the primary key for their users table. They keep the usernames unique by doing a search through the table when creating a new user or changing the username and not allowing it if the search comes up with any existing rows.

Share this post


Link to post
Share on other sites
Quote:
Original post by graveyard filla
Yeah, you're right.... Like I said, every situation has different variables.....in a web app I did for work I used the username as the PK - because no one ever changed their username (In fact, this wasn't even possible by the software.) It doesn't even seem like something that would be too common in any software, anyway...... have you ever changed your username for anything?


It would've been more efficient storage wise to use a surrogate key even if the user name never changed. An int would give you 2.1B unique numbers in only 4 bytes of space. All of your foreign keys would also only be 4 bytes. A varchar would vary, but anyone with a user name >4 chars long would be less efficient spacewise than the int.

As far as performance, joining using the int datatype is more efficient performance wise than either char or varchar (at least on mssql). Also, b/c of the reason above, more data can possibly fit in your data pages that will again make a join more efficient timewise. An argument could be made that since you have the username in the other tables you won't have to join the user table to them at all, but then what about the other tables that you may need to join together such as roles, prefs, posts, whatever.

Very rarely in practice have I seen a natural PK workout better than a surrogate one in storage efficiency, speed, and maintainability.

Share this post


Link to post
Share on other sites
You're right..... however, in my case, it was done simply because it was easier and cleaner to do. The app was for a start up company with light load and a small database. It simplified queries, etc..... Unfortunately my boss was a cheap bastard so these are the kind of things I had to do [grin].

Share this post


Link to post
Share on other sites
Quote:
Original post by graveyard filla
Unfortunately my boss was a cheap bastard so these are the kind of things I had to do [grin].


Understood. I tell my current boss either we can do it right now or do it right later. No matter what we do it will eventually have to get done right. Only takes a couple examples before managers start to realize a little extra time up front can save so much time in the back end. This is even more true on the database side of things b/c once you have a structure built with live data and code built on top of it, it's so hard to change in a graceful manner.

Share this post


Link to post
Share on other sites

This topic is 4337 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this