In fact there is a lot of triangle intersection code algorithms, out of all, most interesting are:
which is just a standard either single-side or double-side barycentric test. It doesn't need any additional storage and in general is very fast on GPUs and CPUs.
Woop http://jcgt.org/published/0002/01/05/paper.pdf
I find it very fast on the CPU. It needs more storage compared to MT test (plus for shading you will need original vertex coordinates), so I find it to behave a bit worse (on some of the GPUs) compared to MT.
Out of others - Plucker test is one of the most robust, Badouel (projection) test behave very fast on SIMD SSE on CPU, and so on. I'm also linking to (a bit older) good summary of algorithms and their comparsions in terms of speed - http://gggj.ujaen.es/docs/grapp14_jimenez.pdf
I hope this can help. I could link you to my implementation of gpu-raytracer on GitHub, but that repo is heavily WIP and I haven't pushed for few weeks (I have modifications locally), plus the setup is a bit harder as of now (but I'm working on it!). https://github.com/Zgragselus/OpenTracer ... there is obsolete CPU code with variation of barycentric test and woop test in OpenCL (which is used in runtime).