Showing posts with label hadoop. Show all posts
Showing posts with label hadoop. Show all posts

Tuesday, September 13, 2011

Go on Hadoop

It's easy to use Go (or any other language) with Hadoop streaming. Here's a little "word count" example.

System Setup:
  • Hadoop running locally (Cloudera cdh3u0)
  • A copy of hadoop-streaming-0.20.2-cdh3u0.jar in local directory
  • Copy of "Alice In Wonderland" under /user/miki/alice.txt on HDFS
mapper.go

reducer.go

run-job.sh

After the job has ran, you can view the output and check the most common words:

hadoop fs -cat /user/miki/words-out/part-00000 | sort -k 2 -n -r | head
the	1686
and	869
to	799
a	672
of	606
I	545
it	540
she	509
said	456
in	414
/* MIKI: Analytics */