In the sports games I've worked on we've usually got ~30 players and referees on the field, plus ~32 low-detail spectators (which are then instanced/impostered to fill up to 100000 stadium seats). That's at 30hz on DX9/2006-era consoles (with about half the frame time being spent on post-processing), and 60Hz on the new DX11/2014-era ones.and the next step after that would be stepping up to skinned meshes vs rigid body animation. but i don't think even dx11 and the latest card could do it: 125 characters onscreen at once without slowdowns at 15fps. that would be 62 characters at once at 30 fps, or 31 skinned mesh characters onscreen at once at 60fps. games can't really do this yet can they? total war draws a lots of characters, but they're not high resolution, like a character in a typical shooter.
Bigger rival companies were doing it at 60Hz in the DX9 era too...
Play any newish Assassin's creed or Hitman game and you'll see crowds of easily 100 animated NPCs, which the player can interact with (interrupt/push/etc).
Going back a ways, any Quake 3 derived shooter (e.g. every Call of Duty game) supports 32 player multiplayer on DX9.
Quake 3 was on the cusp between the CPU doing the skinning, and the GPU's vertex shader taking over that role. These days almost everyone uses GPU-skinning. GPU's can crunch *millions* of pixels per frame with highly complex pixel shaders, so 30 characters * 10k verts is a breeze.