On which physical cpu/core does a process run?

Started by
15 comments, last by hydroo 16 years, 2 months ago
Oh, if you go the modify the kernel route, this will help if you intend to add a syscall to return CPU info to a userspace process,

http://tldp.org/HOWTO/html_single/Implement-Sys-Call-Linux-2.6-i386/

BTW, looks like a getcpu() vsyscall has been proposed and tested before on x86-64

http://sourceware.org/ml/libc-alpha/2006-06/msg00024.html


[Edited by - TimothyFarrar on January 17, 2008 4:47:20 PM]
_|imothy Farrar :: www.farrarfocus.com/atom
Advertisement
Thanks for your input.

Regarding that this proposal has been made 1.5 years ago and it hasn't been implemented yet, it will not be implemented, right.

Timing is not the issue. We already have a measurement environment. By contract we have to record the process location "somehow". Making a kernel patch/module would be the worst case.
Yeah, a getcpu() syscall will probably never happen in the main branch.

Still your own personal kernel patch might be the best option. I've done it before (custom modified http server running kernel side), and it's not bad if you are using it for testing only.

If you do go the kernel modification route, I think on more recent kernels there is a common vdso page which is mapped into every process. This page is read/execute only and supplies the entry/exit for the vsyscall. I'm not sure if this has been done yet, but one kernel optimization effort was to get a vgettimeofday area into vdso page such that you could simply read the area to gettimeofday() instead of doing an actual syscall. Since the vdso page is always virtually mapped to the same physical page in the kernel, you might be able to write the pids in there as well (from inside the scheduler) for the current pid running on each processor. Then just poll in userspace land to find which processor your current pid is on.

Also I'm not 100% up to date on glibc2 pthreads anymore, so you might have to check that each thread gets its own pid still. I believe there is a way to create threads (processes on linux) which share pids. So you might need a second number for identification...

In theory it should work...

BTW, if you do find/implement a good way of getting current cpu id from userland, please post what you did :)
_|imothy Farrar :: www.farrarfocus.com/atom
Thanks so far.
To lower your enthusiasm a bit:
I actually never done kernel-hacking. And my priorities at work are totally different (I am working on the visualization part). So I won't be the one to implement it. Chances are that a student will take this task.
I will keep you updated, but don't expect anything soon.

It's a bit sad, since ordinary (non kernel patching) linux users won't be able to use this feature.
Quote:Original post by hydroo
It's a bit sad, since ordinary (non kernel patching) linux users won't be able to use this feature.


Unless you try to land your patch in the mainline kernel. I don't know what application you're working on exactly, but if knowing the CPU something runs on is required for some kind of QA or certification or something, it might make it in :-)

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

Can you explain why you need to know this? I really think you are on the wrong path in trying to find this information out, because it's simply not useful to know in isolation. So, please, explain why you think you need to know.

Quote:Original post by TimothyFarrar
How often does a process (thread is a process on linux) switch CPUs. My guess is that it happens more often that you would expect.


A thread is a task. A process is a task. A thread is not a process.

For switching, it's actually probably less often. A good task scheduler aims for good task/CPU affinity. That is, it tries very hard to retain execution locality, because running a task on the same CPU it ran last time is always going to be more efficient than running it on another CPU.

Quote:Original post by TimothyFarrar
Also gettimeofday() is a syscall, so you might want to use RDTSC in x86 chips to simply read the CPU clock cycle counter instead (but keep in mind it wraps around quickly...)


It's kinda tricky to use RDTSC properly. For example, on many multicore AMD CPUs and almost all multiprocessor systems, the RDTSC is not synchronized between CPUs, so the results can be all over the place. If you're measuring very small things you're probably safe thanks to task affinity, but otherwise you need to force the task to run only on a single CPU for the duration of the time between two RDTSC instructions or use some other timing method.

Quote:Original post by TimothyFarrar
// for 32bit machines
typedef signed long long is8;
static inline is8 VCycle(void) { is8 x;
asm volatile("rdtsc\n\tmov %%edx, %%ecx\n\t" :"=A" (x)); return x; }

// or for when running in 64bit mode
typedef unsigned int iu4;
typedef signed long is8;
static inline is8 VCycle(void) { iu4 aa,dd;
asm volatile("rdtsc" : "=a" (aa), "=d" (dd));
return ((is8)aa)|(((is8)dd)<<32); }


This is super, super wrong. Let me count the ways:

1. RDTSC does not provide accurate measurements without a serializing instruction beforehand. CPUID is usually a good choice for this.
2. If the code was written properly you wouldn't need separate implementations for x86 and x86_64.
3. The "32-bit" version silently clobbers ECX, which will cause very subtle bugs. You need to tell the compiler that you are clobbering registers.
4. The EDX->ECX move would be unnecessary if the appropriate asm contraints had been used. See also #2.
5. The typedefs are unnecessary.
6. The result of RDTSC is totally not a signed value.
7. The return in the "64-bit" version appears to be the result of an explosion at the parens factory.
My rating perfectly reflects the pathetic yes-men in-crowd attitude of this forum.
Quote:Original post by truthsayer
Can you explain why you need to know this? I really think you are on the wrong path in trying to find this information out, because it's simply not useful to know in isolation. So, please, explain why you think you need to know.


just in case you are still reading:
"It might be interesting for the user." - plain

You can analyze the connection between topology of your system and the your software topology (mpi+threading/possibly nested), especially on clustered smp-machines this kind of information is valuable.

This topic is closed to new replies.

Advertisement