SELECT
main.*,
previous.page AS previous_page,
next.page AS next_page
FROM
previous_next AS main
LEFT JOIN (
SELECT
p_main.id AS id,
p_previous.page AS page
FROM
previous_next AS p_main,
previous_next AS p_previous
WHERE
p_previous.id<p_main.id AND p_previous.session=p_main.session
GROUP BY p_previous.id
) AS previous ON (previous.id=main.id)
LEFT JOIN (
SELECT
n_next.id AS id,
n_main.page AS page
FROM
previous_next AS n_main,
previous_next AS n_next
WHERE
n_next.id<n_main.id AND n_next.session=n_main.session
GROUP BY n_next.id
) AS `next` ON (next.id=main.id);
[web] [MySQL] Return matching "previous" and "next" values (or null if nonexistant)
Here's one that's puzzling me; let's say you have a table tracking page views. Every time a page is visited its name is logged, along with an autoincrementing ID and a session ID (unique to each user). When pulling data off the table I would like to retrieve the current page ID, but also retrieve the previous and next page for each row. If there was no previous or next page, null should be returned instead.
A couple of LEFT JOINs does the trick, and works beautifully on a 7 row table (named previous_next):
However, this is evidently not the way to go as it takes 70 seconds on a 50,000 row table (though it does produce the correct results). Is there a better way to do this?
Edit: This question is now purely academic, as I now log the previous/next pages directly onto the log table every time a page is viewed. I'd still be interested to see the "correct" solution.
[Edited by - benryves on October 25, 2009 10:48:53 AM]
I would think that subqueries would work better than left joins. But I would guess if performance improved with that technique, it wouldn't reduce your query time enough from 70 seconds to be practical, either.
Have you defined indexes that cover the query? That is always the first thing to check when you have a really slow query. You should have indexes on id and session, at the least.
Have you defined indexes that cover the query? That is always the first thing to check when you have a really slow query. You should have indexes on id and session, at the least.
SELECT main.*, ( SELECT p_previous.page FROM previous_next AS p_previous WHERE p_previous.id < main.id AND p_previous.session=main.session GROUP BY p_previous.id ) AS previous_page, ( SELECT n_main.page AS page FROM previous_next AS n_next WHERE n_next.id < main.id AND n_next.session=main.session GROUP BY n_next.id ) AS next_pageFROM previous_next AS main;
Yes, indices are set up correctly. In fact, the table has been directly imported from an old site, and I've been trying to recreate a report that used to run on the old site without access to its source. The old site would complete generating a report in under five seconds.
The subquery method is significantly slower, sadly:
The subquery method is significantly slower, sadly:
SELECT main.*, ( SELECT p_previous.page FROM previous_next AS p_previous WHERE p_previous.id < main.id AND p_previous.session=main.session GROUP BY p_previous.id ORDER BY id DESC LIMIT 1 ) AS previous_page, ( SELECT n_next.page AS page FROM previous_next AS n_next WHERE n_next.id > main.id AND n_next.session=main.session GROUP BY n_next.id ORDER BY id ASC LIMIT 1 ) AS next_pageFROM previous_next AS main;
Quote:Original post by benryvesIs it possible the old site simply selected all required the rows and generated the report itself (i.e. in a procedural language)? That's probably the way I'd do it.
Edit: This question is now purely academic, as I now log the previous/next pages directly onto the log table every time a page is viewed. I'd still be interested to see the "correct" solution.
(Actually, I'd probably do what you're doing now and log the required data directly in the table)
Quote:Original post by CodekaPossibly. I hacked something together in a minute to populate the new "next page" and "previous page" fields for the historical data and it took a couple of minutes to run, but that could have been my naïve implementation.Quote:Original post by benryvesIs it possible the old site simply selected all required the rows and generated the report itself (i.e. in a procedural language)? That's probably the way I'd do it.
Edit: This question is now purely academic, as I now log the previous/next pages directly onto the log table every time a page is viewed. I'd still be interested to see the "correct" solution.
Quote:Original post by benryves
Yes, indices are set up correctly. In fact, the table has been directly imported from an old site, and I've been trying to recreate a report that used to run on the old site without access to its source. The old site would complete generating a report in under five seconds.
The subquery method is significantly slower, sadly:
*** Source Snippet Removed ***
I'm curious, too. The GROUP BY might be the culprit. What happens if those are removed, and replaced with MIN or MAX, as appropriate?
I'm no expert on query optimization, so I would ask if you've tried running "EXPLAIN" on your query to see if there's something else happening (assuming you're dealing MySQL)? Maybe, for some wild reason, an index isn't getting used. Or maybe it's fetching way more rows than it really needs to.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement