About the article
Learning the basics of SQL shouldn't take more than a very short while to get started. If you're used to programming and some of the theory behind data storage this tutorial will be easily understood.
However, I'm more than eager to help everybody learn so I will try to make the examples easy. This is an article for beginners; you don't need to know much to get started.
This article does not show you examples in languages like PHP or ASP. It only introduces you to the powerful language of SQL.
Introduction to SQL
SQL (Structured Query Language), pronounced "sequel" (from the former name SEQL, Structured English Query Language) is the language generally used to speak to any kind of database out there. Examples of databases are mySql (my favourite, it's free, it's fast, it's usable), Microsoft Access, Microsoft SQL Server, Oracle or literally any alternative in existence. ODBC is a driver that makes (or made, it's pretty outdated) it easier to connect to the database, but nowadays it's more common to connect directly to the database since that's generally faster.
SQL has been out there for many years and a decent SQL administrator can easily earn some serious money if he/she knows enough about it. Optimizing SQL queries is an easy way to secure your job future, since most sites want to have faster web pages (duh!). However, becoming an SQL expert takes years in the field and I wouldn't even consider myself to be an expert.
Databases are becoming more relevant to all fields of programming. Modern games connect to a global database (perhaps via a web interface) and get data for high scores, member access (ever played/watched EverQuest? A lot of database storage there my friends), etc. Also on the application front databases are becoming more and more popular on the users' edge. For instance budgeting applications, but also any other kind of application where you'd want a storage solution that is expandable, searchable and fast using as little memory as possible.
Creating tables and getting started
Most database today come with a user interface which is like any windows/kde/gnome application out there, making it easy to watch, add, edit and remove tables in the database. Tables are the "groups" in the database where information is stored. Normally you start off with a name for the database (most software allows you to have several databases within the product). Call your database the name of your webpage for instance, to make it easier to relate the database to the software that is using it.
The tables in the database are containers of data. For instance, for a site like GameDev.net, the names of the tables might be "Member", "News" and "Article".
You can add tables through SQL queries but personally I think it is more trouble than its worth, so use your favourite GUI to try it out. If you are on Windows I really recommend you try Microsoft Access just because it's so easy to get started with. If you are on Linux/BSD I would suggest you to download one of the graphical interfaces available for mySql.
Inside the tables there are entities. Just as in a programming language, entities are like variables. There are different types, and the types are named differently between different databases but I'll try to describe what they should be and you can try to find the equivalent in the software you choose to use. They also have a name, and of course the data which is stored in them. So, to sum up, Name, Type and Data.
When you design your tables you only need to enter the Name and the Type, data will be added later. For now, we will do a simple news system for our favourite homepage. I'll try to be as brief as I can; please read carefully and try to understand each part I'm talking about.
Let's take a look at the following graphics for the tables and try to implement it in your database; I'm explaining each fuzziness under the image.
The member table
As we would like to store the member's name, password, email and URL to a web page (if the user has such), we need to store just that in the database.
First of all, the ID field hits you. Is this auto generated? Well, in a sense it is, in another it isn't. You need to create it yourself, but you should make sure you find the right data type for it. In Access you could choose "Autonumber", and while you're at it, make sure it runs with "no duplicates" and check that "required" is also set to yes.
The ID field you just created will need to be a primary key. I showed this with an empty circle in front of the name. There is only one primary key per table, and personally I think it should ALWAYS be a separate field called ID. The big thing about a primary key is that it has to be unique. You might think "yeah, but the email address is unique, can't I use that?". Well, you can. But I wouldn't recommend it. First of all, the email address is a text string containing up to 50 characters in our example. Making the database need to connect tables through textfields is not only slow, but a fairly risky process since the foreign keys (I'll talk about them in a moment) will need to be more complex. Second, it will also require a lot more space due to these foreign keys. Back to them later.
By the way, you can set any entity of the table to required and "allow zero length" to true, etc. This is up to you - how much you want the database to control for you. Personally, I do this check in the WebPages instead, not by checking error values from the database. I guess this is a matter of taste and also a matter of how you are using databases. Try different methods out and use whatever works for you.
The memberName, memberPassword, memberEmail and memberWeb are named using a "member" prefix, because when you do complex queries, it's nice to know which table uses which entities. The ID's are an exception to this, if I want to know which table they belong I will have to write [Table].id to get it. More about that later.
Each of these entities is a text field with a maximum length set to 50. This does not only restrict users from entering names that are too long, but also saves storage (in theory yes, in practical database implementations a string never takes up more than length(data_text)+1 so to speak, but you don't need to bother about this, it's good to set a limit anyhow). I guess the database also figures out a good strategy for saving the data intelligently according to the limits of each entity too so setting everything up is quite a good idea.
That's basically the member table. A good idea to speed up databases is to put the entity field "index" to true on such entities that are likely to be used in comparisons with other tables in the database. Primary and foreign keys are especially targeted.
The news table
Here we also have an ID field just as with the member table. Having the same name and settings on these fields in all tables simplifies things since we never have to check the database or write down what the settings are on each of these tables.
Second you will see a field called memberID, which is our first foreign key (open up the champagne ladies and gentlemen)! This is a reference (like a link, if that makes you more comfortable) to the member table. In our case it's a reference to the member table because each news item also has an author. Now, why didn't we call this authorID, memberAuthorID or something similar? Well, I don't know. Once again, you have to make up your own mind here, I've just stuck with calling references to other tables [Tablename]ID. Of course, there can be several links to the member table from other tables. For instance all of the following could be members of an article table: memberCreatorID, memberAuthorID, memberAuthorizationID etc, for the one who created the article item in the article table, one who is actually responsible for the contents of the article, and one that verifies the article for posting on the web. When I write memberID, I'm referring to the one who created the news item, author or not.
The newsSubject entity is the same type as the member specific entities, only this entity can be 255 characters long.
The newsBody item is a new one on the heap. The memo type is from Access and simply describes the type for a whole lot of text. I believe varchar is used for text in mySql mySql and memo in Microsoft SQL Server, but I think you get the idea. This is storage that in practical use doesn't have any limits. Great for storing bodies in news items for sure!
Last but not least is the newsDate. This datafield contains the date when the table row (data is stored in "rows", entities can be seen as "columns") was created. It's a good idea to automate this feature so you don't have to enter the date yourself in your SQL queries. This is how you do it (one more time in Access - forgive me, but it's easy to start off in Access!) In the "Default" field of the entity newsDate, enter "Now()". Now() is a method that gets the date. So when the row is created, instead of entering the date through the query, it will set the date automatically. Sweet!
Now the tables are done. You could try to enter some data through the graphical interface, but please remove them when you're done so we can continue with the article without confusion. Btw, note how nicely the dates are generated in the news table.
In most databases, there is something called relations between tables. This is so you can set up the foreign keys to point at the primary keys of the targeted table. This is a safety issue. While we are just messing around I don't think we want to set these, but if you're building a serious web site you should consider using this. It simply restricts you from removing the user "George Smithers" if he has written news items. In that case, you will simply be forced to first remove all the news items he has written, and then, after that is completed, remove the user himself. This stops the possibility of removing a user and having unreferenced material in the database (trash so to speak). As I said, when messing around, it doesn't matter what you do, but when you gain experience, you should really look this up because it will save you some database space in the long run, trust me.
Now when we're done setting up the tables (we should be up to 60 minutes by now, damn me, hehe).
INSERT something into the database
As the heading suggests revealed, when inserting (adding) something to the database, we're likely to use the INSERT command through SQL. Remember, a query is only a string command, you have to set up the database connection and objects in the code for yourself, but I'll post two examples.
This is the basic syntax for inserting a member into the database through a single SQL query:
INSERT INTO member(memberName, memberPassword, memberEmail, memberWeb) VALUES('Albert Sandberg', 'secret', 'firstname.lastname@example.org', 'www.thec.org') (All the text in above example should be on one row, I just cut it down for nicer page layout)
It's pretty self explanatory: first we have the command, INSERT, which tells the database we're about to add something, next is the INTO statement followed up by the table name which we are supposed to store the data in. Inside the parentheses we just tell the database which entities we want to store, and inside the VALUES comes the data, in the same order as specified inside the first set of parentheses. No problem? Didn't think so!
It's really that simple. The hardest part is to program a web page that contains forms and links, corrections for incorrectly entered data, etc. The SQL should be the easy part when you've gotten used to it!
Here's an example of entering some data into the news table:
INSERT INTO news(memberID, newsSubject, newsBody) VALUES(1, 'I know something about you.', 'You're really impressed by SQL too!') Right here, we're using some of the magic of SQL. First, we have the memberID field, now referring to the first entry of the member table, which is 1. The member with id=1 in the member table is Mr "Albert Sandberg" since we've just entered him above. Note: If you have entered some test info and then deleted it when you created your tables, the first id will be the the one after the last one you entered in the test, because, as we discussed before, the id field should contain NO DUPLICATES! This is a feature you know!
The date is covered by the database. We never need to manually set it through the query, since the entities we're not entering will be filled in with their default values. If we hadn't entered Now() for the newsDate field, it would probably have been an empty field instead.
UPDATE changed data
Once again, you might have an idea of what is going down. The UPDATE command changes things in the database. To move on forward let's skip the talk and get down to business:
UPDATE member set memberPassword='w?nderbaum' where memberName='Albert Sandberg' This should be fairly self-explanatory. The above will work fine if there is only one Albert Sandberg in the database, but since there is one more guy with that name which I know of, chances are we might some day enter the same website that you have coded, and I wouldn't like to not be able to log in when the other guy changes his password. You see, the where statement in the sql query returns ALL rows which match the execution argument. So if there had been two Albert Sandberg's, both passwords would have been changed. So we need to write something smarter, like:
UPDATE member set memberPassword='w?nderbaum' where id=1 Now there's no doubt. Since you keep my login in a cookie on the web server (or session variable, still a cookie though) you will know my id, and when I choose to change my password, you will enter it at the end of the line. In practice you never use static foreign/primary keys, as they are always dynamic.
SELECT (view) our data
All right, you want to present your news on the first page of your web site. Well, wait no further!
SELECT * FROM news That's the basics. SELECT just picks every row in the database and since we don't have a WHERE statement on this one, it will simply select every row in the table.
That's not as interesting though, since there could be 200 news items when we just want to see 20 of them, and perhaps in reversed order, or at least ordered by date (note, SQL has the power to choose in which order the items will be shown if we don't pass directions, so look ahead):
SELECT TOP 20 * FROM news ORDER BY newsDate DESC Now we're talking. The "TOP 20" tells the database we want the 20 first occurrences of news items, sorted by newsDate! DESC just tells the order to be reversed. With dates, that means newest first, oldest last. In some databases you would use the LIMIT command instead of TOP, but TOP is used in Access. Try to read the database documentation for further hints and usage descriptions.
All right, I have one more thing to show you before I move on to the next command. It's called joins and it is a little bit tricky at first, but I will try to make this as clear as I possibly can. Think about this: when you have SELECTed each news item, how do you fetch the member's name and id?
- The bad way. You take each row of the returned news data and create a second query to the database where you do: SELECT * from member where id=1 (of course you know the id=1 from the news.memberID field, right?). Anyway, this is not the correct way of doing it. When you are forced to make a second SQL query for fetching information you know belongs to the first query to start with, you know you need to update your SQL skills. This will make a lot more sense after looking at the next option.
- You do it the good way, like this: SELECT TOP 20 news.*, member.memberName, member.id FROM news,member WHERE news.memberID=member.id ORDER BY newsDate DESC Felt like punch in the belly? No? Great, open up the champagne (or just fill up your glass). Here we do a join between two tables, plus we use the basics we already used. We want the TOP 20 results, we want to SELECT the info FROM both the news and the member table (which we select by using [table].entity; * is a wildcard, telling the database that we want everything), we tell the query compiler we only want entities WHERE news.memberID matches member.id (this one actually tells the database to bind up two tables into one result, but let's just stay there for a while) and we want to ORDER it by the newsDate, but in reverse order, that is, DESCending.
All right, let's face it. Albert Sandberg is the most boring guy on the planet, so you want to get rid of him and destroy all evidence of his existence. You will have to start with removing all rows in the database where the id matches Albert Sandberg's, so you do the following:
DELETE * FROM news where memberID=1 That's how you remove the row matching the where clause in Microsoft Access. In any decent database you do it without the wildcard (why would you try to delete 50% of the entities anyway?). To remove the user from the member table in, for example, a mySql database, you'd use the following:
DELETE FROM member where id=1 Done. Albert Sandberg ceases to exist. Finally, the hero knows his powers and is ready to save the world! Hurray!
How do I use this knowledge on the web?
You visit www.php.net or www.aspin.com and look up some decent tutorials on the language you want to use. Search terms such as "connect to database" would be a good start. Now that you know a little more about SQL the examples should be easy as 1,2,3.
How do I search for news newer than [a date]?
This one gave me a hard time when I was a beginner. The answer for returning all news post OLDER than 2003-01-01 is:
SELECT * from news where newsDate<'#2003-01-01#' ORDER BY newsDate DESC It's the ## which does the magic. < means "less than" and > means "greater than". Figure it out!
How do I search a field for a given string?
This is good if you want to do a little search function for your forums or news perhaps. Use the LIKE statement as follows:
SELECT * from news WHERE newsBody LIKE "%futurama%" The % signs are wildcards telling the database that there can be other text both in front of and after the search string. This query will return all news where the body contains the word futurama.
How do I count news items?
This is also a very common question, and a nice feature to add for that little extra on your page. Try the following code to count the number of news posts Albert Sandberg has posted (before it was all deleted by the hero):
SELECT COUNT(*) from news where memberID=1 How do I figure out the average age of my members?
Let's say you have a entity of the table member which is called memberAge, where you store the age of the member visiting your site:
SELECT AVG(memberAge) AS age from member Here I showcased the AS command too. This way you can name your results. This comes in handy when you select multiple (otherwise) nameless variables. In the following examples, utilizing some other nice functions in SQL, using no naming simply wouldn't be appropriate (it would still work though):
SELECT AVG(memberAge) AS average, MIN(memberAge) AS youngest, MAX(memberAge) AS oldest FROM member That's it!
This has been one crazy night for me. My hands hurt from all the typing and I would really appreciate if you told me if you liked the article or have anything to add about it. My email at [email="email@example.com"]firstname.lastname@example.org[/email] is always open for your comments and I'd love to hear from you. I hope this article has given you something to build on and that it has been some pleasant reading.
SQL is really a powerful tool to know, and the more you know, the more fun it will be. There are many aspects of SQL which I haven't shown here. Perhaps I'll write another article some time, only time can tell.
Thank you for your time.
Albert "thec" Sandberg
About the author
You're going to be surprised. Although I love electronics and know a lot of programming in many senses and have been studying programming at the university for a while, I'm a tile layer. That's like a bricklayer only that I just do the tiles part. That is, I do bathrooms, kitchens, floors and balconies and such. Please check my homepage www.thec.org for some pictures and extended information about myself.
During a couple of years in the industry and lots of home programming I mostly program/manage hobby sites and such, as well as doing some C++ code when I lose my mind... sorry, I mean am in the right mood for it!
Other than computers, I love cars and snowboarding, so that's basically my real life activities.