Sign in to follow this  
hydroo

On which physical cpu/core does a process run?

Recommended Posts

Is there a way to find that out? I need this for performance evaluation. I found out that it might be task_struct->oncpu, which unfortunately is only accessible inside the kernel :(

Share this post


Link to post
Share on other sites
Quote:
Original post by hydroo
Is there a way to find that out?
I need this for performance evaluation.

I found out that it might be task_struct->oncpu, which unfortunately is only accessible inside the kernel :(


A process may run on any or all cores, at the same time. There is no way to find out outside of the kernel because it requires a context switch to the kernel, so (a) the process may not even be running when the query is made and (2) the answer is just as likely not correct by the time the kernel call returns.

Share this post


Link to post
Share on other sites
It's not totally important to be 100% in timing correct by the way.
I'd give it a try. And watch how accurate it'd be.

Or can I get an info when a process is rescheduled to another queue?

Share this post


Link to post
Share on other sites
On windows you can request a thread runs on a particular processor/core (something like SetThreadAffinity() I believe). I assume you can do something similar on the major operating systems. However, in general it is a bad idea as the OS knows a lot better than you how to schedule threads/processes.

Share this post


Link to post
Share on other sites
I am pretty sure that you can find out where it is. Because even when a process is not running it's still in the queue of it's cpu.
And as I said its not the goal to be 1ms-correct.

thanks for your replies

Share this post


Link to post
Share on other sites
Quote:
Original post by hydroo
I am pretty sure that you can find out where it is. Because even when a process is not running it's still in the queue of it's cpu.


Is it? or is it in the ready queue, not assigned to a CPU until it's running? If that's the case, no matter how often you look, your process will never appear to be getting any CPU at all on a single-processor system, and chances are it's not running on a multiprocessor system.

Your best bet is to read the Linux kernel code to understand how the scheduler works.

Share this post


Link to post
Share on other sites
Quote:
Original post by Bregma
Quote:
Original post by hydroo
I am pretty sure that you can find out where it is. Because even when a process is not running it's still in the queue of it's cpu.


Is it? or is it in the ready queue, not assigned to a CPU until it's running? If that's the case, no matter how often you look, your process will never appear to be getting any CPU at all on a single-processor system, and chances are it's not running on a multiprocessor system.

Your best bet is to read the Linux kernel code to understand how the scheduler works.


I am pretty sure that there is no central queue (I read about the O(1)-Scheduler. Don't know how CFS handles things). Processes will most likely be scheduled to the cpu they were scheduled to before, to for example exploit locality. Only under certain circumstances it will be moved to another cpu.
This wouldn't be possible without indication where a process has been, would it.


So I guess there is no easy way to do it.

Share this post


Link to post
Share on other sites
Probably one the best questions on this forum in a while.

I don't know of any easy way to do this. You could try and modify the kernel to log the information (which is going to be tremendous BTW and slow everything down).

Doing this from user space in your app would get messy.

How often does a process (thread is a process on linux) switch CPUs. My guess is that it happens more often that you would expect.

You could keep track of the gettimeofday() time of common thread entry/blocking points (after IO blocking or mutex, etc). Won't give you an idea of what CPU you are on, but will give you an idea of what really matters, ie how the threads are interacting and blocking.

Also gettimeofday() is a syscall, so you might want to use RDTSC in x86 chips to simply read the CPU clock cycle counter instead (but keep in mind it wraps around quickly...),


// for 32bit machines
typedef signed long long is8;
static inline is8 VCycle(void) { is8 x;
asm volatile("rdtsc\n\tmov %%edx, %%ecx\n\t" :"=A" (x)); return x; }

// or for when running in 64bit mode
typedef unsigned int iu4;
typedef signed long is8;
static inline is8 VCycle(void) { iu4 aa,dd;
asm volatile("rdtsc" : "=a" (aa), "=d" (dd));
return ((is8)aa)|(((is8)dd)<<32); }



BTW, I know the 64bit one works, because I use it all the time, forget if I tested the 32bit version!

Share this post


Link to post
Share on other sites
Oh, if you go the modify the kernel route, this will help if you intend to add a syscall to return CPU info to a userspace process,

http://tldp.org/HOWTO/html_single/Implement-Sys-Call-Linux-2.6-i386/

BTW, looks like a getcpu() vsyscall has been proposed and tested before on x86-64

http://sourceware.org/ml/libc-alpha/2006-06/msg00024.html


[Edited by - TimothyFarrar on January 17, 2008 4:47:20 PM]

Share this post


Link to post
Share on other sites
Thanks for your input.

Regarding that this proposal has been made 1.5 years ago and it hasn't been implemented yet, it will not be implemented, right.

Timing is not the issue. We already have a measurement environment. By contract we have to record the process location "somehow". Making a kernel patch/module would be the worst case.

Share this post


Link to post
Share on other sites
Yeah, a getcpu() syscall will probably never happen in the main branch.

Still your own personal kernel patch might be the best option. I've done it before (custom modified http server running kernel side), and it's not bad if you are using it for testing only.

If you do go the kernel modification route, I think on more recent kernels there is a common vdso page which is mapped into every process. This page is read/execute only and supplies the entry/exit for the vsyscall. I'm not sure if this has been done yet, but one kernel optimization effort was to get a vgettimeofday area into vdso page such that you could simply read the area to gettimeofday() instead of doing an actual syscall. Since the vdso page is always virtually mapped to the same physical page in the kernel, you might be able to write the pids in there as well (from inside the scheduler) for the current pid running on each processor. Then just poll in userspace land to find which processor your current pid is on.

Also I'm not 100% up to date on glibc2 pthreads anymore, so you might have to check that each thread gets its own pid still. I believe there is a way to create threads (processes on linux) which share pids. So you might need a second number for identification...

In theory it should work...

BTW, if you do find/implement a good way of getting current cpu id from userland, please post what you did :)

Share this post


Link to post
Share on other sites
Thanks so far.
To lower your enthusiasm a bit:
I actually never done kernel-hacking. And my priorities at work are totally different (I am working on the visualization part). So I won't be the one to implement it. Chances are that a student will take this task.
I will keep you updated, but don't expect anything soon.

It's a bit sad, since ordinary (non kernel patching) linux users won't be able to use this feature.

Share this post


Link to post
Share on other sites
Quote:
Original post by hydroo
It's a bit sad, since ordinary (non kernel patching) linux users won't be able to use this feature.


Unless you try to land your patch in the mainline kernel. I don't know what application you're working on exactly, but if knowing the CPU something runs on is required for some kind of QA or certification or something, it might make it in :-)

Share this post


Link to post
Share on other sites
Can you explain why you need to know this? I really think you are on the wrong path in trying to find this information out, because it's simply not useful to know in isolation. So, please, explain why you think you need to know.

Quote:
Original post by TimothyFarrar
How often does a process (thread is a process on linux) switch CPUs. My guess is that it happens more often that you would expect.


A thread is a task. A process is a task. A thread is not a process.

For switching, it's actually probably less often. A good task scheduler aims for good task/CPU affinity. That is, it tries very hard to retain execution locality, because running a task on the same CPU it ran last time is always going to be more efficient than running it on another CPU.

Quote:
Original post by TimothyFarrar
Also gettimeofday() is a syscall, so you might want to use RDTSC in x86 chips to simply read the CPU clock cycle counter instead (but keep in mind it wraps around quickly...)


It's kinda tricky to use RDTSC properly. For example, on many multicore AMD CPUs and almost all multiprocessor systems, the RDTSC is not synchronized between CPUs, so the results can be all over the place. If you're measuring very small things you're probably safe thanks to task affinity, but otherwise you need to force the task to run only on a single CPU for the duration of the time between two RDTSC instructions or use some other timing method.

Quote:
Original post by TimothyFarrar
// for 32bit machines
typedef signed long long is8;
static inline is8 VCycle(void) { is8 x;
asm volatile("rdtsc\n\tmov %%edx, %%ecx\n\t" :"=A" (x)); return x; }

// or for when running in 64bit mode
typedef unsigned int iu4;
typedef signed long is8;
static inline is8 VCycle(void) { iu4 aa,dd;
asm volatile("rdtsc" : "=a" (aa), "=d" (dd));
return ((is8)aa)|(((is8)dd)<<32); }


This is super, super wrong. Let me count the ways:

1. RDTSC does not provide accurate measurements without a serializing instruction beforehand. CPUID is usually a good choice for this.
2. If the code was written properly you wouldn't need separate implementations for x86 and x86_64.
3. The "32-bit" version silently clobbers ECX, which will cause very subtle bugs. You need to tell the compiler that you are clobbering registers.
4. The EDX->ECX move would be unnecessary if the appropriate asm contraints had been used. See also #2.
5. The typedefs are unnecessary.
6. The result of RDTSC is totally not a signed value.
7. The return in the "64-bit" version appears to be the result of an explosion at the parens factory.

Share this post


Link to post
Share on other sites
Quote:
Original post by truthsayer
Can you explain why you need to know this? I really think you are on the wrong path in trying to find this information out, because it's simply not useful to know in isolation. So, please, explain why you think you need to know.


just in case you are still reading:
"It might be interesting for the user." - plain

You can analyze the connection between topology of your system and the your software topology (mpi+threading/possibly nested), especially on clustered smp-machines this kind of information is valuable.

Share this post


Link to post
Share on other sites
Sign in to follow this