[Novalug] balancing the load on a 4-processor system
Nino R. Pereira
ninorpereira@gmail.com
Tue Jul 24 21:00:49 EDT 2012
List,
here's something that baffles me. Can you give insight?
I run a fortran program written for parallel execution, with mpi.
How it does that in detail I don't know, but it's supposed to
distribute the computations, in my system over 4 processors and in
other systems over 64 or 4096 or however many are available.
It used to work this way, maybe a year ago or so, but no longer:
Now I see, with ps -aux
pereira 18066 93.2 1.1 139324 42736 pts/1 R 18:04 146:50 accept
pereira 18067 62.0 1.1 139584 43036 pts/1 R 18:04 97:38 accept
pereira 18068 79.8 1.1 139584 42956 pts/1 R 18:04 125:44 accept
pereira 18069 62.7 0.8 139928 32316 pts/1 R 18:04 98:49 accept
apparently, two processors are working hard while the others are slackers.
'top' shows that this is indeed so, but not the 'why':
18066 pereira 20 0 136m 41m 2588 R 100 1.1 157:04.69 accept
18069 pereira 20 0 136m 31m 3552 R 100 0.9 109:08.93 accept
18067 pereira 20 0 136m 42m 2896 R 50 1.1 102:51.41 accept
18068 pereira 20 0 136m 41m 2812 R 50 1.1 130:58.56 accept
Are the ones carrying PID 10867 and ..8 old and tired? Or too hot?
Should I blow away the dust on the fan so that they remain cooler?
Any idea what's going on? And, how would you find out?
Thank you,
Nino
More information about the Novalug
mailing list