[web] [MySQL] Return matching "previous" and "next" values (or null if nonexistant)

Started by
5 comments, last by aryx 14 years, 6 months ago
Here's one that's puzzling me; let's say you have a table tracking page views. Every time a page is visited its name is logged, along with an autoincrementing ID and a session ID (unique to each user). When pulling data off the table I would like to retrieve the current page ID, but also retrieve the previous and next page for each row. If there was no previous or next page, null should be returned instead. A couple of LEFT JOINs does the trick, and works beautifully on a 7 row table (named previous_next):
SELECT
    main.*,
    previous.page AS previous_page,
    next.page AS next_page
FROM
    previous_next AS main
LEFT JOIN (
    SELECT
        p_main.id AS id,
        p_previous.page AS page
    FROM
        previous_next AS p_main,
        previous_next AS p_previous
    WHERE
        p_previous.id<p_main.id AND p_previous.session=p_main.session
    GROUP BY p_previous.id
) AS previous ON (previous.id=main.id)
LEFT JOIN (
    SELECT
        n_next.id AS id,
        n_main.page AS page
    FROM
        previous_next AS n_main,
        previous_next AS n_next
    WHERE
        n_next.id<n_main.id AND n_next.session=n_main.session
    GROUP BY n_next.id
) AS `next` ON (next.id=main.id);
However, this is evidently not the way to go as it takes 70 seconds on a 50,000 row table (though it does produce the correct results). Is there a better way to do this? Edit: This question is now purely academic, as I now log the previous/next pages directly onto the log table every time a page is viewed. I'd still be interested to see the "correct" solution. [Edited by - benryves on October 25, 2009 10:48:53 AM]

[Website] [+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++]

Advertisement
I would think that subqueries would work better than left joins. But I would guess if performance improved with that technique, it wouldn't reduce your query time enough from 70 seconds to be practical, either.

Have you defined indexes that cover the query? That is always the first thing to check when you have a really slow query. You should have indexes on id and session, at the least.

SELECT    main.*,    (   SELECT            p_previous.page        FROM            previous_next AS p_previous        WHERE            p_previous.id < main.id AND p_previous.session=main.session        GROUP BY p_previous.id    ) AS previous_page,    (   SELECT            n_main.page AS page        FROM            previous_next AS n_next        WHERE            n_next.id < main.id AND n_next.session=main.session        GROUP BY n_next.id    ) AS next_pageFROM    previous_next AS main;

Yes, indices are set up correctly. In fact, the table has been directly imported from an old site, and I've been trying to recreate a report that used to run on the old site without access to its source. The old site would complete generating a report in under five seconds.

The subquery method is significantly slower, sadly:

SELECT    main.*,    (   SELECT            p_previous.page        FROM            previous_next AS p_previous        WHERE            p_previous.id < main.id AND p_previous.session=main.session        GROUP BY p_previous.id        ORDER BY id DESC        LIMIT 1    ) AS previous_page,    (   SELECT            n_next.page AS page        FROM            previous_next AS n_next        WHERE            n_next.id > main.id AND n_next.session=main.session        GROUP BY n_next.id        ORDER BY id ASC        LIMIT 1    ) AS next_pageFROM    previous_next AS main;

[Website] [+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++]

Quote:Original post by benryves
Edit: This question is now purely academic, as I now log the previous/next pages directly onto the log table every time a page is viewed. I'd still be interested to see the "correct" solution.
Is it possible the old site simply selected all required the rows and generated the report itself (i.e. in a procedural language)? That's probably the way I'd do it.

(Actually, I'd probably do what you're doing now and log the required data directly in the table)
Quote:Original post by Codeka
Quote:Original post by benryves
Edit: This question is now purely academic, as I now log the previous/next pages directly onto the log table every time a page is viewed. I'd still be interested to see the "correct" solution.
Is it possible the old site simply selected all required the rows and generated the report itself (i.e. in a procedural language)? That's probably the way I'd do it.
Possibly. I hacked something together in a minute to populate the new "next page" and "previous page" fields for the historical data and it took a couple of minutes to run, but that could have been my naïve implementation.

[Website] [+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++]

Quote:Original post by benryves
Yes, indices are set up correctly. In fact, the table has been directly imported from an old site, and I've been trying to recreate a report that used to run on the old site without access to its source. The old site would complete generating a report in under five seconds.

The subquery method is significantly slower, sadly:

*** Source Snippet Removed ***


I'm curious, too. The GROUP BY might be the culprit. What happens if those are removed, and replaced with MIN or MAX, as appropriate?
I'm no expert on query optimization, so I would ask if you've tried running "EXPLAIN" on your query to see if there's something else happening (assuming you're dealing MySQL)? Maybe, for some wild reason, an index isn't getting used. Or maybe it's fetching way more rows than it really needs to.

This topic is closed to new replies.

Advertisement