It might be a single instruction, but that doesn't make it a
fast instruction. Some light
Googling places the idiv instructions at about 20-40 clock cycles, whereas shifts are between 3 or 4. I don't know whether this still holds with modern processors, or how relevant it is given that the processor is usually waiting for memory.
Using assembly can actually slow down your program, unless you 100% know what you are doing. Some compilers, e.g. Microsoft's, treat assembly as a "black box", which they won't optimise or inline. Lack of information about what happens inside this black box reduces the available optimisations in the surrounding code.
If you don't know assembly, I'd recommend you get pretty good at that
first, before you try to use it in a real program. Otherwise you'll just make your code more brittle without any meaningful difference to performance, or a negative difference.