[Novalug] Algorithms As Art

Bryan J Smith b.j.smith@ieee.org
Wed May 27 16:24:46 EDT 2015


On Wed, May 27, 2015 at 3:34 PM, William Sutton <william@trilug.org> wrote:
> Interesting to see how the curriculum evolves over the years.
> When I graduated with my BS in CS from Auburn in 2000, the CS and Computer
> Engineering majors were both part of the CS department, attached to the
> College of Engineering.  Course track for both was largely identical, with
> some additional hardware and VLSI classes for the CPE folks, and minors
> required for the CS folks.

I completed my EE+CpE by early '96.  My official graduation date is
'97 because I had left some general and core classes that were not
prerequisites to offset others (e.g., I didn't finish Comp 1&2 until I
was in my 4th and 5th year, which was allowed at my Alma Mater).

> We all got the same programming/data
> structures/algorithms/compilers/operating systems/digital logic curriculum.
> The digital logic (from the EE department) covered basic 8086/8088 Intel
> chip design, a subset of the assembler, and a discussion of logic, op codes,
> etc., not as a "here's how we optimize this" track, but more as a "this is
> how a particular implementation of this works from gates to logic to op
> codes to assembler" discussion; with side discussions on how it worked in,
> say RISC.  The goal being fundamental understanding of how your 3G language
> translated all the way down to the circuit logic level.

The 3G language, namely C, all EEs to radically modify the instruction
set to fit layout and timing by the early '80s.

By the late '90s, some x86 implementations had become RISC underneath,
which meant optimizing legacy x86 CISC could be self-defeating.
NexGen (acquired by AMD) had proven that run-time optimization could
be done by pre-decoding x86 into direct MEAG ("one-hot") traces, and
figuring out availability, along with register renaming, etc... --
things we associate with RISC, but were entirely different.

I.e., don't get me started on the design failure of the Pentium ALU.
"Pentium optimizations" were often work-arounds using the FPU for the
buggy ALU, because the FPU is the one area where a different team
actually did a few ops correct (although precision was another story
--well beyond the published FPU bug).  NexGen's ALU had 3x the load
speed of the Pentium's ALU, let alone even using the 64-bit FPU load
to load 2 x 32-bit integers was still slower than the NexGen at
loading integers using existing x86 code.

> That said, I enjoyed the digital logic class immensely.  it was also an
> opportunity to learn why not to do things certain ways.  In my case, we had
> to implement (in stages) a software emulator for a subset of x86 assmbler.
> the emulator would take certain op codes and run them.

We wrote an assembler in my CS minor as well.

Unfortunately, an exercise writing an assembler only teaches one about
lexical processing (CS), and totally jack about computer architecture.
Lexical processing _is_ a very, very useful, "real world" method to
know when anyone who codes regularly.

The problem is that too many CS majors proliferate the claim it's also
good for learning computer architecture, when that is not true at all.
I cannot stress this enough.  A key point is to keep one thing in mind
...

Instruction sets were designed by mathematicians for mathematicians so
they could programmatically code computers without manual
intervention.  The fact that they used boolean logic, and replaced the
mathematician pulling switches with a clock, is where most of our
current issues come from in semiconductor.  Why?

The world is still, very much analog ... unless your just programming
a FPGA, of course.  ;)

I admit this because most solutions today can be optimized by just
coding a FPGA, where you don't have to worry about layout, and develop
a high-speed ASIC, using basic, digital logic.  You bypass the whole
concept of instruction sets, and you're literally coding MEAGs ("one
hots") to switch gates.

There are a lot of algorithms in the crypto, financial trading and
other industries where FPGA development is heavy and constantly being
updated.  To design a static microcontroller with such would be
self-defeating.  In fact, even in the microprocessor world, one of the
smartest things AMD ever did was design a superior, 3-issue FPU so
they could quick adapt and decode any SIMD instruction Intel added for
their next design.

I know I'm likely getting a lot of "dumb stares" now ... but as one,
young, 18 year-old IT enthusiast told me at age 24, after he completed
his EE program ... "You just had to study it to know."  So unless you
don't below the FPGA level, none of what one learns is applicable to
computer architecture.

I mean, Intel tried to optimize at the instruction set with IA-64's
EPIC and predication, and it spectacularly failed to achieve any
performance benefits, just as Digital Semi said it wouldn't.  Itanium
didn't fail to catch on because it didn't have full x86 compatibility,
it failed to perform even with its own, native instruction set,
because it's psuedo-VLIW was a poor design in the first place.

And just like with the Pentium ALU, Digital fixed the solution, so
even the Itanlum II had run-time optimization with branch predicting.
But in the end, Intel still had a solution that didn't perform with
even its native code.  Optimizing at the instruction set never works.

> My lab partner and I
> eschewed the designated memory space in favor of push'ing our op codes onto
> the processor stack, and pop'ing them when we were ready to use them.  for
> (N-2) labs, that worked beautifully.  When we hit the (N-1) lab, we started
> getting random junk back from the stack.  after asking the lab TA, we found
> our problem--we'd hit the stack limit, so the processor helpfully moved the
> stack somewhere else in memory and assigned a new stack.  when it came time
> to pop the old stack's contents, it didn't know where they were, and we got
> junk.  we had to scramble to re-architect it :-)  Interesting lesson
> learned--know what you're doing, why you're doing it, and what the potential
> resource implications are.

Big O is a great study, including tricks like getting away from
traditional procedure calls to optimize recursion.  But as I quickly
learned, just using C macros instead of calls would speed up things
tremendously, which is where I took issue with some traditional CS
education.  I even locked horns with one instructor over it,
especially since I had recursion and Big O way too many times.

Which is another pet peeve of mine, stack-based architectures.  They
work great for a calculator, and as with many engineers, I learned to
use RPN overnight with a single, assignment with statics, never to go
back to arithmetic entry (good riddens).  But why Sun stuck with a
stack-based approach with Java and its picoJava processors is beyond
me.  The Perl guys had an interesting idea with Parrot, to build a
CISC VM implementation for its p-code execution, but that was yet
another CS ideal that is still a mixed result.

Now I'm really on my "opinionated, high horse," and side-tracking well
into a tangent, away from the OP.  Sorry about that, but I'm just in a
very, very small minority out there.

I mean, when people hear me say don't use clocked boolean for
circuits, they are dumbfounded there could be anything else.  That
right there is the 100% proof that there's an almost absolute, base,
mathematical bias to how everyone learns computer architecture and
system design ... even today.  ;)



-- 
Bryan J Smith - http://www.linkedin.com/in/bjsmith



More information about the Novalug mailing list