[Novalug] for Peter Larsen, with thanks

Peter Larsen plarsen@famlarsen.homelinux.com
Mon Jul 30 12:12:26 EDT 2012


On Sat, 2012-07-28 at 13:03 -0400, Nino R. Pereira wrote: 
> Delivery to the following recipient failed permanently:
> 
>       plarsen@famlarsen.homelinux.com
> To: Peter Larsen<plarsen@famlarsen.homelinux.com>
> 
> Peter,
> 
> thank you for your reaction.
> >  First off - I'm please to learn that Fortran is multi-threaded. That's
> >  really news to me :)
> It's been like that for quite some time. I've slowly moved away from
> computations in the mid-1980s, when multi-processing wasn't yet
> in the picture (that I knew), but I've always understood that the
> various clusters (beowulf?) were originally intended for heavy-duty
> computing such as is normally done with fortran codes.
> 
> I've completely skipped this transition, to the point that I don't
> really know how this all works. So, I'm not sure if my answers
> to your questions make sense, and worse, if I can understand any
> of your reactions. But, given that, here it goes.
> 
> What kind of hardware do you have? Number of sockets, physical cores and
> hyperthreaded cores?
>    While HT gives you the illusion of more cores - they're not true cores.
> In some cases they will perform worse than having true different cores -
> and heavy
> computations would do that. I wonder if your thread system knows that and
> simply doesn't use HT cores.
> 
> cat /proc/cpuinfo gives a lot of stuff that I don't want to send to the list
> (that's why I reply only to you, but, if you think others can profit
> from your
> reactions, please copy the list):

You should have more than one processor listed back. Look for the
following attributes:

> physical id    : 0
> siblings    : 3
> core id        : 0
> cpu cores    : 3
> apicid        : 0
> initial apicid    : 0
> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
> pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc
> extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic
> cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt

Your flags indicate HT (HyperThreading) which most likely means you'll
find that two of your "processors" are actually not there - they're
"virtual". What confuses me with your numbers is the "cpu cores = 3" -
that makes no sense. At least I've never seen it with odd numbers. Since
it matches siblings I guess it's not a paste error - it may help if you
pasted the full list here. There's nothing compromising in the list at
all.

"cpu cores" indicates how many cores are on the system. "physical id" is
an indication of the physical CPU (socket). You'll see the core-id count
up for each listed "cpu", and it should end on 3 (0-3 = 4). You'll find
on some systems, that the count of cores says 2 but you see 4 CPUs
listed. Short and sweet - the last two are "HT" or logical/virtual CPUs.
In normal usage, you shouldn't notice the difference at all. But since
the CPUs are simulated you'll find that certain tasks behave slower and
even slower than they would on a single processor. Heavy FPU
computations would be a good candidate for that. 

On a more advanced level you can pin a process to certain cpus using
cgroups - which would allow you to say to only run your code on the
physical cores. It would be an interesting test, but I think the answer
may be a bit more straight forward.

When you do concurrent processing, locks are used to avoid having two
processors trying to read/write the same address at the exact same time.
Also, traditional delays like IO queues will still hold up your
execution - multi-threaded or not. In addition, you have the job
scheduler which tries to create the illusion of multi-processes even on
a single process system. You'll have a scheduler per core but still with
semaphores locking, if your threads are all trying to read/write the
same global variables they'll wait for eachother and you don't get much
"umpfff" out of CPU at all. 

This is the trick about multi-threaded code. It's why we try to make
every thread totally self-contained only operating on local variables.
That way there will be no waits at all once the thread is running. In
Java we used the keyword "synchronized" to indicate something has to be
locked and only accessible by a single thread at a time. The trick is to
have as few of those as possible. We even have the concept of "Thread
safe" meaning if it depends on global data structures or not. If it
does, it's not considered thread-safe and should NOT be used in
multi-threaded environments.  I would be interesting in knowing how your
Fortran deals with that. Last time I did F77 it was nothing but global
constructs and I cannot imagine how you would create independent threads
out of that.

-- 
Best Regards
  Peter Larsen

Wise words of the day:
And now for something completely different.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.firemountain.net/pipermail/novalug/attachments/20120730/0ac78270/attachment.asc>


More information about the Novalug mailing list