As far as I understand things, a typical view matrix is actually a "inverted" in the sense that if the camera matrix is in the world space, then the required matrix to transform things from world space to the camera's space is actually "inverse camera matrix" or the view matrix or inverse view matrix in this case... to make things complicated. It is just a case of "confusing naming". I use naming "camera matrix" to define camera's location and direction in world space and view matrix is actually the inverted camera matrix.
You can confirm this from many code samples where the view matrix is constructed. Just rarely the code uses the actual matrix inversion, since in the view matrix case the inverse can be calculated easily.
Yes, in the code the the ray is transformed to the local/object space by the inverse world matrix of the sphere. The beauty of things is that in the local space the sphere is located at origo (0,0,0) so translation doesn't have to be accounted in the ray-sphere intersection test.
The advantage of this technique is that it supports also things like scaling / non-uniform scaling for the world matrix. The ray-sphere test remains always the same, since it's just the ray's position and direction changing.