[Novalug] hadoop

cliff@palmercs.com cliff@palmercs.com
Fri Jul 13 18:42:16 EDT 2012


We use MapReduce daily and I will be glad to give you a brief overview tomorrow
after the meeting.
If it's a topic of interest we can talk about doing a presentation at a future
meeting.

See you tomorrow
Cliff Palmer


On July 13, 2012 at 3:25 PM greg pryzby <greg@pryzby.org> wrote:

> On Fri, Jul 13, 2012 at 2:52 PM, Peter Larsen
> <plarsen@famlarsen.homelinux.com> wrote:
> > On Fri, 2012-07-13 at 11:44 -0400, greg pryzby wrote:
> >> MapReduce is a Java class (maybe classes, I haven't gotten that far)
> >> that allow distributed search. So it can run the Java app on any (or
> >> multiple) nodes and search in parallel. In the end there is a dataset
> >> that contains the information that matches the search criteria. This
> >> can be put into an RDBMS or other store for future reference if the
> >> results need to last or further parsed with other advanced tools.
> >
> > MapReduce is the methodology used by Google to make the internet
> > searchable. Granted, it's not the Hadoop way of MapReduce but it's the
> > same principle.
> >
> > http://en.wikipedia.org/wiki/MapReduce
> >
> > There's no need for a RDBMS in traditional way here. The point is that
> > maps are a natural component of most languages, and the structures
> > returned by MapReduce are simple "map" collections. It's native to the
> > code, and hence extremely well adapted for processing in the native
> > language.
>
> my english needs work.
>
> What I was trying to say was the results COULD be store in an RDBMS
> (saw that in a pic which led me to believe it is common).
>
>
> > Further more, it's not like a clustered RDBMS either. Even with
> > clustering, you would always have a whole record at the very least
>
> don't think I said that. It wasn't what I thought at all. I do find it
> interesting that it is 64M blocks (vs 4 or 8k with more fs). I
> understand the nameNode and replication to the DataNodes.
>
> the only RDBMS was to store the results of mapreduce IF desired.
>
>
> > located at a given node. MapReduce operates on attribute levels and can
> > spread out a record over multiple nodes and hence read it concurrently
> > on all nodes, to join (reduce) them together in the result as a single
> > record. With RDBMSes, you would only get one node to return the whole
> > record. MapReduce allows you to locate data optimal depending on how you
> > access them. It's a very different approach from traditional relational
> > distributed systems where the DBA estimated locations and once data was
> > "tagged" it never moved from it's logical position. MapReduce does this
> > dynamically and hence for _some_ datasets it's extremely efficient. As
> > google has proven, it works great for it's purposes.
>
>
> yep...
>
>
> > I must admit I giggled when I saw your oversimplied grep example. It's
> > not even close - doesn't even address the "map" side of the equation and
> > certainly, only a very small piece of the reduce (query) piece - in
> > particular I wonder how you solve the "key" vs. "data" with grep.
>
>
> it doesn't map, but it does reduce, i think. If all weblogs were in
> directories this would definitely reduce to the common thread. it
> depends on wait you are looking for. correct?
>
>
> > That said - your initial thought "this is not new" is quite right.
> > Clustered data sources has been around for a long time, and the idea of
> > distributing by attribute values isn't new either. But the
> > implementation/concept of MapReduce is rather new.
>
>
> I think MapReduce and Hadoop are two pieces, correct? We can discuss
> that they go together, but hadoop can stand alone.
>
> hadoop is 'better' nfs (for a number of reasons)
> MapReduce is better grid/hpc/MPI for specific data sets
>
> Would you buy that?
>
>
> > It's not just for Java though - lots of languages have a "map" data
> > structure and this works well for them too.
>
> To date, (and I haven't gotten to MapReduce yet) all mr stuff
> referenced says java.
>
> --
> greg pryzby                              greg at pryzby dot org
> http://www.linkedin.com/in/gpryzby
>
> WEB:  http://www.MakeRoomForArt.com/
> TWTR: gpryzby
> _______________________________________________
> Novalug mailing list
> Novalug@calypso.tux.org
> http://calypso.tux.org/mailman/listinfo/novalug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.firemountain.net/pipermail/novalug/attachments/20120713/9d2fd5f1/attachment.htm>


More information about the Novalug mailing list