Sure, you can use many GPU queues, and then use fences on the GPU side to synchronize them, but that's a lot of extra overhead.
Are there any other options?
You could also pass ownership of the your single GPU queue from thread to thread, so that the thread processing the first range of your list performs it's own submission, then the thread that owns the 2nd range performs it's submission, etc... IMHO that would be much more complex and require more synch work than just having N write-command-list jobs, followed by a single submit job with a dependency between them. The latter should easily fit into any modern engine's job system.