I managed to get this to work by a lot of optimizing, basically using the original setup.
- Going from sobel to cross filter for faster edge fragment shader (half the number of texture reads). Not as pretty but OK in this context.
- I was sloppily using a lot of fragment discards in many shaders, now I only do a few edge triangles in a special shader.
- Lots of small tweaks as suggested above and in the imgtec document.
I didn't use the alpha channel for the depth buffer, it seemed to produce artifacts, and I didn't notice any performance gain (although I'm not sure I did it right). I guess I could do more research to see if this is feasible some how, but the frame rate is ok now so I'm leaving it as a future to-do item.