[Novalug] balancing the load on a 4-processor system

Nino R. Pereira ninorpereira@gmail.com
Tue Jul 24 21:00:49 EDT 2012


List,

here's something that baffles me. Can you give insight?

I run a fortran program written for parallel execution, with mpi.
How it does that in detail I don't know, but it's supposed to
distribute the computations, in my system over 4 processors and in
other systems over 64 or 4096 or however many are available.

It used to work this way, maybe a year ago or so, but no longer:

Now I see, with ps -aux

pereira  18066 93.2  1.1 139324 42736 pts/1    R    18:04 146:50 accept
pereira  18067 62.0  1.1 139584 43036 pts/1    R    18:04  97:38 accept
pereira  18068 79.8  1.1 139584 42956 pts/1    R    18:04 125:44 accept
pereira  18069 62.7  0.8 139928 32316 pts/1    R    18:04  98:49 accept

apparently, two processors are working hard while the others are slackers.

'top' shows that this is indeed so, but not the 'why':

18066 pereira   20   0  136m  41m 2588 R  100  1.1 157:04.69 accept
18069 pereira   20   0  136m  31m 3552 R  100  0.9 109:08.93 accept
18067 pereira   20   0  136m  42m 2896 R   50  1.1 102:51.41 accept
18068 pereira   20   0  136m  41m 2812 R   50  1.1 130:58.56 accept

Are the ones carrying PID 10867 and ..8 old and tired? Or too hot?
Should I blow away the dust on the fan so that they remain cooler?
Any idea what's going on? And, how would you find out?

  Thank you,

Nino





More information about the Novalug mailing list