That sounds like a sort of expensive approach, even with a low res mask. On one of the games I worked on we had a feature where the player could "sign" their name with the touch screen. As the player dragged their finger along the screen I would add verts to a tristrip that followed the player's finger. That way I had a relatively small object which described the player's signature (just a tristrip with a couple hundred points) vs a large texture.
If I were in your place, I would attempt something similar. As the player drags their finger, I would create a line strip or something that followed the player finger, then when they release their finger I could compare the line strip to the "correct" line strip for the gesture. Matching the player's line strip vs the correct line strip might be kind of tricky. I would probably normalize the player line strip (scale the line strip such that the min x of the line strip becomes -1, the max x becomes 1, the min y becomes -1 and the max y becomes 1) and then march along both the player line strip and the correct line strip at fixed intervals and sum up the distance from the player point to the correct point. If the sum of the distances at each sample point is below some threshold, it passes, otherwise it fails.
Keep in mind I've never actually implemented a gesture system, so the matching part might need some work, but what I suggested is probably a good place to start and should be faster than comparing two textures (even a modest 100x50 texture would require 5000 comparisons).
Edited by Samith, 09 November 2013 - 10:52 AM.