Sign in to follow this  
chadsxe

SQL question.

Recommended Posts

So I am fairly new to SQL programming and have a questions. I wrote this query

SELECT EMP_ID, SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales 
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate.ESTIMATE_ID IN
(
SELECT ESTIMATE_ID
FROM scf_estimate_line
WHERE INVOICE_ID IS NOT NULL
)
GROUP BY EMP_ID;


Which returns
EMP_ID  totalSales
14 5972.2
15 60103.35
16 52888.5
22 8458.5
23 42228.8


I am now trying to figure out how to go about and get a result that only shows the MIN or MAX of the totalSales column. I imagine I have to subquery the table that results from the query above but I am not sure on how to do that. Any advice would be great.

Thanks

Chad


Share this post


Link to post
Share on other sites
So I tried this


SELECT d.EMP_ID, MAX(d.totalSales)
FROM
(
SELECT EMP_ID, SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate.ESTIMATE_ID IN
(
SELECT ESTIMATE_ID
FROM scf_estimate_line
WHERE INVOICE_ID IS NOT NULL
)
GROUP BY EMP_ID
) AS d;

And it results in this

EMP_ID MAX(d.totalSales)
14 60103.35

As you can see it shows the correct totalSales value but not the correct EMP_ID

Any thoughts

Chad

Share this post


Link to post
Share on other sites
Maybe not optimal (if the query is really executed twice), but that should work for the max:

SELECT totalSales from ( your_first_query ) WHERE totalSales = (SELECT MAX(totalSales) from ( your_first_query ) )

(add another condition if you want the min in the same result).

Share this post


Link to post
Share on other sites
There are several ways to approach this. Some depend on the database you are targeting.

I've included a few examples here (please excuse any odd mistakes - I don't have a database here to play with. I believe that the best way to write an efficient query is to build it up in pieces, using your database's "explain" functionality throughout to tweak it performance-wise).

Oracle:

SELECT
emp_id,
totalSales
from
(
SELECT
EMP_ID,
SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales,
rownum rn
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate.ESTIMATE_ID IN
(
SELECT ESTIMATE_ID
FROM scf_estimate_line
WHERE INVOICE_ID IS NOT NULL
)
GROUP BY EMP_ID
order by totalSales desc
)
WHERE rn=1





MySQL:

SELECT
EMP_ID,
SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate.ESTIMATE_ID IN
(
SELECT ESTIMATE_ID
FROM scf_estimate_line
WHERE INVOICE_ID IS NOT NULL
)
GROUP BY EMP_ID
order by totalSales desc
LIMIT 1;







You should also consider the odd cases which can crop up when writing queries like this - if more than one employee has the max, do you show them all or just an arbitrary one?

Some Oracle examples which will show all "top" rows:


select emp_id, totalSales
from
(
SELECT
emp_id,
totalSales
dense_rank over (order by totalSales desc) rank
FROM
(
SELECT
EMP_ID,
SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales,
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate.ESTIMATE_ID IN
(
SELECT ESTIMATE_ID
FROM scf_estimate_line
WHERE INVOICE_ID IS NOT NULL
)
GROUP BY EMP_ID
order by totalSales desc
)
)
WHERE rank=1;






WITH grouping as
(
your original query
)
select * from grouping where totalSales = (select max(totalSales) from grouping);



Share this post


Link to post
Share on other sites
Quote:
Original post by WavyVirus
There are several ways to approach this. Some depend on the database you are targeting.

I've included a few examples here (please excuse any odd mistakes - I don't have a database here to play with. I believe that the best way to write an efficient query is to build it up in pieces, using your database's "explain" functionality throughout to tweak it performance-wise).

Oracle:
*** Source Snippet Removed ***

MySQL:
*** Source Snippet Removed ***



You should also consider the odd cases which can crop up when writing queries like this - if more than one employee has the max, do you show them all or just an arbitrary one?

Some Oracle examples which will show all "top" rows:

*** Source Snippet Removed ***

*** Source Snippet Removed ***


Thank you for the great explination. I helped a lot. I have another question for the sake of it. If you noticed, I have a nested subquery in my query. This can be removed by adding another "AND" with a join.

SELECT EMP_ID, SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales 
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate_line.INVOICE_ID IS NOT NULL
GROUP BY EMP_ID;



Which version would be the better way to do it?

Share this post


Link to post
Share on other sites
Quote:
Original post by chadsxe
Thank you for the great explination. I helped a lot. I have another question for the sake of it. If you noticed, I have a nested subquery in my query. This can be removed by adding another "AND" with a join.

*** Source Snippet Removed ***

Which version would be the better way to do it?


No problem. Again, there are a few ways to go about doing this.

Note that the query you posted above will not consider any estimate_line rows with a null invoice_id, while the subquery version will include all estimate_line rows where any line in the estimate has an associated invoice.


SELECT EMP_ID, SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate.ESTIMATE_ID in
(
select estimate_id from scf_estimate_line where invoice_id is not null
)
GROUP BY EMP_ID;






This works fine. The query optimizer will probably process this in much the same way as:


SELECT EMP_ID, SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales
FROM scf_estimate, scf_estimate_line, scf_service,
(select distinct estimate_id from scf_estimate_line where invoice_id is not null) e
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND scf_estimate.ESTIMATE_ID = e.estimate_id
GROUP BY EMP_ID;






The subquery is generally evaluated and then sorted, hashed or something similar ready to be joined to the rest.

(Note that without the "distinct", mutiple rows are produced by the subquery for each estimate_id, and so rows are "duplicated" when it is joined to the rest).

Another approach is to use "exists" with a co-related subquery:


SELECT EMP_ID, SUM(ESTIMATE_LINE_QTY * SERVICE_COST) AS totalSales
FROM scf_estimate, scf_estimate_line, scf_service
WHERE scf_estimate.ESTIMATE_ID = scf_estimate_line.ESTIMATE_ID
AND scf_service.SERVICE_ID = scf_estimate_line.SERVICE_ID
AND exists
(
select 1 from scf_estimate_line line where line.invoice_id is not null
and line.estimate_id = scf_estimate.ESTIMATE_ID
)
GROUP BY EMP_ID;






This could be thought of like doing a "short circuit" evaluation of the subquery for each estimate_id, stopping when one row is found. Instead of selecting "1", you could select anything - even null. If there is a nice index on estimate_line.estimate_id then the series of index lookups may be quicker than materializing the subquery as with the "in" example.

There is some great information about this kind of thing at AskTom.com - it's Oracle-centric, but there is a mountain of very good general advice on all things database and SQL related, including "in vs exists".

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this