This is a rather general question about a situation that arises frequently. An array is to be populated by computing the entries in a parallelised loop. What is the best way to prevent slowdown caused by cache line contention? What I have done in the past is to ensure that each process operates on a range of indices. For example, if I am using 4 processors and the array is A(4N), then process k populates A(i), i=(k-1)*N+1,..,kN. Is this the best method?
Thanks
Gib