• ### Popular Now

• 10
• 12
• 12
• 14
• 15

This topic is 634 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi guys,

If I have large array that goes through a heavy computational process, is it safe to have threads work on different sections of this array without locking it.

For example have one thread work on the top half and another work on the bottom half. As long as I can indicate to the main program that both threads have completed processing, there shouldn't be any need to lock the array. Right?

##### Share on other sites

Interesting.

So you'd be better off copying the array to separate arrays to ensure this doesn't happen?

Something like this?

|            MainArray[1024]             |


And then merge the two after calculation has completed again?

Edited by DarkRonin

##### Share on other sites

Copying the array means either one of two things:
A) Your array is so small that copying is cheap; if it's that small, you shouldn't be multithreading it anyway.
or B) The array is large, but you're taking the easy way out instead of understanding the problem (which sometimes is the economic thing to do, if you don't have time to waste doing the research).

If you are going to take the easy way out, the easiest way is leaving it singlethreaded. But if you are wanting to multithread it, you can do so without splitting up the array.

Another easy and safe way to split up workloads is to use a single thread and simply do half the array on odd frames, and the other half on even frames. Depending on the nature of your problem.

Generally, yes it will be ok... but technically it's unsafe, and you need to be aware of the actual compiler/hardware behavior to justify it as being safe to do.

To get into specifics... if you're compiling for a PC (x86 / x86-64), then the CPU can atomically write 4-byte sized variables.
If the elements within your array have sizeof/alignof > 4, then you'll be ok.

If the elements within your array have sizeof < 4, then you're in trouble
e.g. say you've got an array of six unsigned short's, shared by two threads:

| thread 1  | thread 2  |
|u16|u16|u16|u16|u16|u16|
|block 1|block 2|block 3|
As far as the CPU is concerned, it can only read or write to these 4-byte blocks, labelled block 1/2/3 above.
So if thread#1 tries to write to index [2] and thread 2 tries to write to index [3] at the same time, then both threads will be performing a read-modify-write operation on "block 2", which is a race condition.

You're talking about making each variable word-aligned. But isn't another issue ensuring that two threads aren't operating on the same cache-line at the same time as well?

I'm an noob when it comes to multi-threading (i.e. I read about it but have never played with it), so correct me if I'm wrong here, but shouldn't each work-load be cache-aligned if there is a chance that at least one of the two threads will write something to the same cache that the other thread wants to read from, or the memory will have to be repeatedly shuffled from CoreA's L1 up to L2, L3, or even main RAM, and then back then to CoreB's L1, if the hardware architecture means they aren't sharing the same cache (e.g. if the two threads aren't hyperthreads on the same core).

Wouldn't that shoot your performance in the foot? i.e. if your elements aren't word-aligned, you can have undefined behavior. But even if word-aligned elements, if your thread workloads aren't cache-aligned, you might have unexpected and hard to detect slowdowns. I think.  :ph34r:

##### Share on other sites

Copying the array means either one of two things:
A) Your array is so small that copying is cheap; if it's that small, you shouldn't be multithreading it anyway.

Seems a bit general. Wouldn't this depend on what the calculations are that I am doing on the data?

##### Share on other sites

You're talking about making each variable word-aligned. But isn't another issue ensuring that two threads aren't operating on the same cache-line at the same time as well?

Wouldn't that shoot your performance in the foot?

Yes

- making the data word-aligned affects program correctness (avoiding race conditions).

- making the data cache-line-aligned affects program performance (false-sharing could be a performance pitfall).

I didn't mention it because it's a performance optimization issue, not an "is it safe" issue :)

Generally, if your array is large enough and both threads are advancing linearly, by the time that thread#1 gets to the end of its range, thread#2 will also be near the end of its range, so it's very unlikely that both threads will write to this shared cache line at the same time anyway.

##### Share on other sites

Copying the array means either one of two things:
A) Your array is so small that copying is cheap; if it's that small, you shouldn't be multithreading it anyway.

Seems a bit general. Wouldn't this depend on what the calculations are that I am doing on the data?

True; I was assuming small element sizes since we were already discussing word-alignment.  :)

##### Share on other sites
Doing a quick read up on cache lines, it appears that Intel architecture cache lines are 64 byte. So if the data is divisible by 64 like a 1024 byte array, this should avoid performance hits shouldn't it?

##### Share on other sites

Copying the array means either one of two things: A) Your array is so small that copying is cheap; if it's that small, you shouldn't be multithreading it anyway. or B) The array is large, but you're taking the easy way out instead of understanding the problem (which sometimes is the economic thing to do, if you don't have time to waste doing the research).

Or the array is small and cheap to copy, but the computation you're performing is expensive and makes sense to spread over multiple threads...

##### Share on other sites

Doing a quick read up on cache lines, it appears that Intel architecture cache lines are 64 byte. So if the data is divisible by 64 like a 1024 byte array, this should avoid performance hits shouldn't it?

Under the assumption that your data actually starts at a cache-line boundary, I think so.

struct { char x;  char manyx[1024]; }