If we're talking about the normal DCT/IDTC used in image processing (aka jpeg) then its not a memory issue. In jpeg (or mpeg) the data is uncompressed from either a Huffman or arithmetic encoded entropy stream into its quantanized coefficients. The coefficients take up as much space as the final output. So unless you plan to do the entropy decode on the video card (not entirely impossible in theory, though not easy I'd imagine), you're not saving much by working directly on the DCT coefficients as opposed to the post transform data.
In the paper they are clearly performing the edge detection on the uncompressed coefficients, and not on the compressed/entropy encoded data stream. The IDCT transform is so fast compared to all the other operations that occur its kinda a non-issue IMO.
You mention a few things though that might imply you don't fully understand the question. Not trying to be antagonistic (tone is very hard to convey in writing). The discreet cosine transform (like that used in jpeg or mpeg) in and of itself does not perform compression. I'll describe it briefly but you may want to google about jpeg, pictures can really make it easier to understand.
The original source image data (you called it RAW I think) in pixels is divided into 8x8 blocks (merely for ease of computation, the DCT can use blocks of any size, but 8x8 is the de-factor standard). Each pixel in the 8x8 block is a single 8-bit byte (in the case of color images each channel is compressed separately). Each block is transformed by the forward discreet cosine transform, abbreviated to DCT. The output of the forward DCT isn't any smaller, in fact it can be larger. When I played around with the DCT (years ago) you could see that the result of the DCT would sometimes exceed the 8-bits of the source image, so an in-place transform would cause issues if this wasn't taken into account. Some papers I read used pre-scaling, some did other methods (I was lazy and just used a floats as the intermediate), each method had its pros/cons. Bottom line is though, you're not saving memory by doing the forward DCT. The coefficients after the forward DCT are then quantanized, which reduces their magnitude and is the 'lossy' step of compression. There are all sorts of various standard quantanization matricies, but in general they tend to favor the top left coefficients and throw out the bottom right coefficients. After this the entropy encoding occurs. This is the step that actually compresses the data, up till now you've actually made things larger/more difficult to work with.