Is cache management, like flushing a specific variable, is that manual?
Depending on the platform. You shouldn't have to mess with it in Windows, for instance. Platforms where it's useful will generally have API functions for managing it. It's only necessary when you have DMA going on and you have a situation where a DMA device may want to use memory that's cached on the CPU. It's actually quite rare, even in those situations. I think I only ever used it for the texture swizzle function and the texture hue-change function on aforementioned device. I just had to put a writeback call at the end of those functions and everything worked fine.
Interrupts can be signaled from asm. The instruction is INT on x86. http://pdos.csail.mit.edu/6.828/2004/readings/i386/s02_06.htm
(Edit - If you pick a fight with the thread scheduler the skeleton man will eat you.)
With the virtual memory model programs are more or less compiled as though they have exclusive access to the machine. The thread scheduler interacts with them from the outside. It's sort of like an emulator save-state. If you have the register states recorded and the memory recorded then you can restore both and resume execution as if there had been no interruption. VMM allows this because it can easily swap out pages for different processes, so each process basically has its own memory within the span of real-address-mode memory (or the swap file, depending on memory conditions). Since the stack for a thread is stored in that thread's stack page the register states can be pushed to that stack and then the stack page can be swapped to that of the next thread and the registers popped. This restores both the memory and the register states of the upcoming thread, so it continues as if nothing had ever happened.
When a PC boots it has a specific location that it looks to, usually on the primary disk volume, to find a startup module. The BIOS handles getting everything out of bed and then does basic loading to get the ball rolling. After that the startup module loads the OS and everything snowballs into place. On other platforms (like REDACTED that I was mentioning) the platform only runs one program at a time and the OS is just the first program to start. When it wants to run a program it places the path of the program in a specific place in memory then does a sort of soft-boot and the indicated program gets control of more or less the whole device.
Running an exe, the Windows loader does some pretty interesting black magic to map the different parts of the exe file into memory. The 'entry point' is calculated based on some values in structs near the beginning of the exe file. It's a pretty horrific mess in there TBH. You can read some about it here, but it may make you want to claw your eyes out. Basically the exe includes information about how big of a stack it needs and how big of a heap, etc. The loader gets it the pages it needs and then just sets eip to the entry point and lets it do what it wants. Sometimes it has to fiddle with some relative addresses within the file image in memory first, but VMM can usually eliminate that.
Just explain to me how I would, in assembly, instruct the computer to carry out two different threads at once.
How I would say something basic like "do A, B, and C in seperate caches. when you've done A and B, do D."
That's more basic concurrency than threading. You'd mean 'cores' or possibly 'fibers' there rather than 'caches'. This is actually a better concurrency model than threading in many cases.
I'd really like to see what you're talking about implemented in C++, though I hear rumors that some compilers have custom features for it.
Starting an actual thread in ASM is no different than doing so in C/C++. You have to invoke the operating system's functions for it. I think what you may want in this situation is just a task queue that feeds into more than one thread for execution, but I suspect that you may not get the results you're looking for. The thing about the Windows thread scheduler is that Windows usually has a pretty large number of threads running all over the place. It's not just the program in the foreground that's getting scheduled. The kernel needs to do work, programs in the background are doing work, etc, etc. So even if you split up work using threads you're not really getting pure concurrency since those threads will basically execute when they damn well please. Your work won't necessarily be paralleled and you'll probably end up waiting on results when you don't need to.
In other words, you may be a little ahead of the state of the art there, but people are working on it. If anyone knows about these rumored compilers that offer keywords for concurrency I'd like to know about them as well.