[web] SQL problem solving

Started by
7 comments, last by cyn 18 years, 10 months ago
I believe I've bitched about it before, but I don't think I can go much longer having to create a situation specific solution (that comes way too close to having holes and problems if the right is used) everytime I come up against this problem. For reference this is under SQL Server 2000. The problem is simple. You want to list the TOP 1 item from a list based on a specified sorting order. However you don't just want one item, you want to take a table/view and find the first item in the sorting order for each item in a group. Lets take a hypothetical example. We have a table for employees: CREATE TABLE Employees(Name varchar(100), Age integer, Salary integer, DepartmentID integer); As you can see, for each employee we know their age and their department and how much they make. What we want to do is find the oldest employee in each department, how old they are, and how much they make. If there is a tie, then we want to break it by the order of their name. For example to find the oldest in one department the query would look like this: SELECT * FROM Employees WHERE DepartmentID=# ORDER BY Age DESC, Name However we don't want one department at a time. We want to be able to list the oldest employee in each department in one query. Something along the lines of this (note, the FIRST aggregate function only exists in JET, and you can't make user defined aggregate functions unless you go to something like SQL Server 2005 or Oracle 9. Even in JET I'm not sure if this is valid or not as I'm not a pro-Access programmer). SELECT FIRST(Name), FIRST(age), FIRST(Salary), DepartmentID FROM Employees GROUP BY DepartmentID ORDER BY Age DESC, Name There is a big rate up if anyone can find a general purpose, robust solution to this problem. I'm also rating up to anyone with insightful comments or sucking up.
Advertisement
You can do it with a correlated subquery, but that's horribly inefficient.

I'm thinking that there might be a better way of doing it with a clever outer join, joining the employee table with itself (possibly with group by).

SELECT employee.id, employee.name, employee.age FROM employee e where age = (SELECT MAX(age) FROM employee oldest WHERE oldest.dept = e.dept)


Mark
Hmm ok, this does it a bit better:

SELECT e.id, e.name, e.dept, e.age, count(older.id) FROMEMPLOYEE eLEFT OUTER JOIN employee older ON e.dept = older.dept AND e.age < older.ageGROUP BY e.id, e.name, e.dept, e.ageHAVING count(older.id) = 0


Effectively we're joining everyone with the group of rows of people in their own department who are older than then, and counting them with the count aggregate function.

Then using HAVING, we select only those where the count is 0, who must therefore be the oldest people in the department.

Mark
Quote:Original post by markr
You can do it with a correlated subquery, but that's horribly inefficient.

I'm thinking that there might be a better way of doing it with a clever outer join, joining the employee table with itself (possibly with group by).

SELECT employee.id, employee.name, employee.age FROM employee e where age = (SELECT MAX(age) FROM employee oldest WHERE oldest.dept = e.dept)


Mark



That's pretty much the horrible method I've been using now. The problem with it is that you get only one column, which breaks a lot of stuff in one of two ways. First is that you can't "sort" by two or more columns (the primary sort column might have only 2 or 3 possible values, making the second column very important to the ordering). The second problem is again related to that, in that the record chosen is based entirely on that one column. In a lot of cases that column is not unique. And you are right, you can do it with a join too.

Just for reference, the other horrible method involves user defined functions that execute the single item TOP 1 method. While technically robust, you can see how inefficient it is (worst plan ever).
Quote:Original post by markr
Hmm ok, this does it a bit better:

SELECT e.id, e.name, e.dept, e.age, count(older.id) FROMEMPLOYEE eLEFT OUTER JOIN employee older ON e.dept = older.dept AND e.age < older.ageGROUP BY e.id, e.name, e.dept, e.ageHAVING count(older.id) = 0


Effectively we're joining everyone with the group of rows of people in their own department who are older than then, and counting them with the count aggregate function.

Then using HAVING, we select only those where the count is 0, who must therefore be the oldest people in the department.

Mark


I think I like it. I will have to try it out to see if it works with all my situations and what the speed is. You got your rate up.
We're still running into the issue of sorting by more then one column. When 2 employees have the same age, they will both be included in the result set.
Which of course we can solve with a simple condition:

(e.age < older.age OR (e.age = older.age and e.name > older.name))

Great solution markr.
select * From Employees e1 WHERE Name =(Select top 1 Name From Employees e2              WHERE e1.DepartmentID = e2.DepartmentID              ORDER BY e2.Age desc, e2.Name)


EDIT:

Source data:
Name    Age     Salary  DepartmentIDBob	27	15	1Mike	57	35	1Jim	28	17	2Clive	17	10	3John	18	10	3Clem	18	14	3



Results:
Name    Age     Salary  DepartmentIDMike	57	35	1Jim	28	17	2Clem	18	14	3

Best I can come up with is writing a user-defined function that takes the department and returns the record of choice. All the custom function would need to do is execute "SELECT TOP 1 FROM employees WHERE Dept = @deptVar ORDER BY Age, Name". Shouldn't leave too many holes and could be used as a template for similar sorts.

Then your main query would be "SELECT DISTINCT Dept, dept_Function(Dept) FROM employees;"

I'm not currently at work and do not have Enterprise manager up to test this, but like I said, it's the best I could come up with. Hope it helps.

cyn

This topic is closed to new replies.

Advertisement