Tuesday, September 13, 2011

Go on Hadoop

It's easy to use Go (or any other language) with Hadoop streaming. Here's a little "word count" example.

System Setup:
  • Hadoop running locally (Cloudera cdh3u0)
  • A copy of hadoop-streaming-0.20.2-cdh3u0.jar in local directory
  • Copy of "Alice In Wonderland" under /user/miki/alice.txt on HDFS
mapper.go

reducer.go

run-job.sh

After the job has ran, you can view the output and check the most common words:

hadoop fs -cat /user/miki/words-out/part-00000 | sort -k 2 -n -r | head
the	1686
and	869
to	799
a	672
of	606
I	545
it	540
she	509
said	456
in	414

2 comments:

  1. I've working on a small library to make Hadoop Streaming code easier to write in Go. It handles the un/marshaling and the line aggregation -- you just need to write the Mapper and the Reducer. It's on github at https://github.com/dgryski/dmrgo .

    ReplyDelete
  2. Fabulous, what a website it is on hadoop! This website presents helpful data to us, keep it up.
    Hadoop Training in hyderabad

    ReplyDelete

/* MIKI: Analytics */