Tuesday, September 13, 2011

Go on Hadoop

It's easy to use Go (or any other language) with Hadoop streaming. Here's a little "word count" example.

System Setup:
  • Hadoop running locally (Cloudera cdh3u0)
  • A copy of hadoop-streaming-0.20.2-cdh3u0.jar in local directory
  • Copy of "Alice In Wonderland" under /user/miki/alice.txt on HDFS
mapper.go

reducer.go

run-job.sh

After the job has ran, you can view the output and check the most common words:

hadoop fs -cat /user/miki/words-out/part-00000 | sort -k 2 -n -r | head
the	1686
and	869
to	799
a	672
of	606
I	545
it	540
she	509
said	456
in	414

3 comments:

  1. I've working on a small library to make Hadoop Streaming code easier to write in Go. It handles the un/marshaling and the line aggregation -- you just need to write the Mapper and the Reducer. It's on github at https://github.com/dgryski/dmrgo .

    ReplyDelete
  2. Fabulous, what a website it is on hadoop! This website presents helpful data to us, keep it up.
    Hadoop Training in hyderabad

    ReplyDelete
  3. Nice piece of article you have shared here, my dream of becoming a hadoop professional become true with the help of Big Data Training Chennai, keep up your good work of sharing quality articles.

    hadoop training in velachery|hadoop training velachery|Big Data Training Chennai

    ReplyDelete

/* MIKI: Analytics */