These are called pixels.
A pixel is single element of (usually) rectangular grid:
+-+-+- | | | +-+-+- | | |
A voxel is 3D equivalent. Instead of small squares, you have boxes.
There is no real mathematics behind it, no more than it takes to draw a cross word puzzle or tic-tac-toe board.
Voxels are problematic due to storage. To represent complex object at sufficient resolution, one needs vastly more storage than is available today. A petabyte (1 million gigabytes) would be quite a good start.
So voxel handling techniques deal mostly with solving this problem. While storage is cheap, we cannot access it fast enough.
Simple voxel manipulation doesn't complicate, it simply splits world into equal rectangular chunks and only draws those that are visible. Stuff presented in video comes closer to cutting edge, so there's a combination of techniques.
Background needed to implement such engines is covered by standard low-end CS curriculum which covers computer graphics, linear algebra and computer architectures.
Some other applicable semi-mathematical fields would be information theory (encoding, entropy) and compression.
Voxels are, at least today, an engineering problem, since the hardware we have isn't capable of working with them directly. As far as their structure goes, they are absurdly simple - just a bunch of boxes.