I just got done doing this in opengl es 2.0
problem i had was the GPU stores everything as floats (even if it was an int) so the rounding error between the CPU and GPU was killing me. CPU said 1.0 but GPU was generating 0.99999 which got turned to 0.
solved it like this
index.x = int(blendIndices.x + 0.5);
index.y = int(blendIndices.y + 0.5);
index.z = int(blendIndices.z + 0.5);
index.w = int(blendIndices.w + 0.5);
finalPosition = blendWeights.x * ( boneMatrix[ index.x] * vec4( position, 1.0));
finalPosition += blendWeights.y * ( boneMatrix[ index.y] * vec4( position, 1.0));
finalPosition += blendWeights.z * ( boneMatrix[ index.z] * vec4( position, 1.0));
finalPosition += blendWeights.w * ( boneMatrix[ index.w] * vec4( position, 1.0));
never had a problem again.
i forget if i had to load the indices in 0123 order or 3210 order. but i was also moving from a big endian machine to a little endian machine. so it might have been unrelated to your problem.