Tuesday, September 13, 2011

Go on Hadoop

It's easy to use Go (or any other language) with Hadoop streaming. Here's a little "word count" example.

System Setup:
  • Hadoop running locally (Cloudera cdh3u0)
  • A copy of hadoop-streaming-0.20.2-cdh3u0.jar in local directory
  • Copy of "Alice In Wonderland" under /user/miki/alice.txt on HDFS
mapper.go

reducer.go

run-job.sh

After the job has ran, you can view the output and check the most common words:

hadoop fs -cat /user/miki/words-out/part-00000 | sort -k 2 -n -r | head
the	1686
and	869
to	799
a	672
of	606
I	545
it	540
she	509
said	456
in	414

17 comments:

  1. I've working on a small library to make Hadoop Streaming code easier to write in Go. It handles the un/marshaling and the line aggregation -- you just need to write the Mapper and the Reducer. It's on github at https://github.com/dgryski/dmrgo .

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
  13. This comment has been removed by a blog administrator.

    ReplyDelete
  14. This blog was referred to me by one of my batch-mates who used to participate along with me at hadoop online training center who is also a genius in the subject. Thanks you for the information which is cent percent reliable on this blog.

    ReplyDelete
  15. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..
    Selenium Training in Chennai | QTP Training in Chennai

    ReplyDelete
  16. Thanks for Information Oracle Apps Technical is a collection of a bunch of collected applications like accounts payables, purchasing, inventory, accounts receivables, human resources, order management, general ledger and fixed assets, etc which have its own functionality for serving the business
    Oracle Apps Training In Chennai

    ReplyDelete
  17. Oracle Training in chennai | Oracle D2K Training In chennai
    This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..

    ReplyDelete

/* MIKI: Analytics */