Someone else mentioned the word "culling" which is a key part of the equation here. This is related to those for loops.
If 15,000 objects aren't on your screen, there's no need to loop over them when drawing. You already have a quad tree for physics so you've got the basics of the solution in place already: use a scene partitioning system to efficiently find the subset of objects that are actually visible. If 6 sprites are visible and 15,000 are not, your rendering code should only even try to draw those 6 sprites and entirely ignore the other 15,000. A good scene graph (or even just a quad-tree) helps here a massive amount. You can hypothetically reduce your O(N) rendering to O(log(N)) or less which is a huge win.
One of your for loops then basically goes away. You never need to iterate over the static objects. Never. Both graphics and physics should be using an efficient graph of some kind to find relevant objects. (Note that physics and graphics might use _different_ graphs, because what makes the most sense for physics isn't necessarily true for graphics.)
You can then also heavily reduce the load on your dynamic objects for loop. After all, collision detection is already using an efficient data structure and you'll be changing your graphics to also use a tree of some kind, so the only work for dynamic objects that remains is to apply velocity. Luckily, that work only needs to be done on objects that _have_ a non-zero velocity. You can thus split your dynamic objects into two sets: active and inactive objects. The active objects are the ones that are currently moving.
After that split, you might notice that there's no longer any practical difference between static and dynamic objects, so you can perhaps remove that entire concept from the code. Objects with velocity (or some other forces) are put into the active set and are removed from the active set when they stop moving.
The physics code need only loop over the active objects to apply velocity and forces. The collision system uses an efficient graph. Rendering uses an efficient graph. Unless you (foolishly) make all 15,000 objects active at the same time, you never need to loop over all your objects even once, much less twice.
The next big step then is the batching improvements others have recommended.
The third big step I'd recommend is to use an accelerator like PyPy or whatever's in vogue today. Remember that even the simplest operation like adding two small integers should just be a single ADD instruction on the CPU, but in plain ol' CPython that same operation can take hundreds of CPU instructions, branches, memory accesses, and so on. Python does not translate to CPU code very well and requires a sophisticated JIT engine to do even marginally well in terms of performance, and Python does not include such a JIT engine out of the box like JavaScript engines do, so you have to use an add-on/replacement like PyPy.