Hadoop on Windows Azure: Visualizing Data

We are currently experimenting with a 32-node Apache Hadoop cluster on Windows Azure.  The setup includes a web-based interactive JavaScript console, which lets you put data into HDFS, launch MapReduce jobs, and also visualize results with HTML5 charts – and it’s very easy to use.

Using our in-house tools, we generated a report showing what keywords on Twitter were recently used in conjunction with the word “xbox”, and what user platforms were used to Tweet about “xbox”:

Screen_shot_2012-01-15_at_2

In this example, we’ve exported data that’s shown on either side of the screen as tab-delimited text files:

  • xbox_tweets_keywords.txt
  • xbox_tweets_platforms.txt

The files are then uploaded to HDFS using fs.put() command in the JavaScript console – which lets you upload a file from your desktop:

js> fs.put()

File uploaded.

Screen_shot_2012-01-15_at_2

We verify that both data files have been uploaded:

js> #ls

Found 3 items

drwxr-xr-x   – itrend supergroup          0 2012-01-15 06:48 /user/itrend/.oink

-rw-r–r–   3 itrend supergroup        895 2012-01-15 08:39 /user/itrend/xbox_tweets_keywords.txt

-rw-r–r–   3 itrend supergroup        244 2012-01-15 08:28 /user/itrend/xbox_tweets_platforms.txt

Once the file is in the HDFS, we can read its contents:

js> file = fs.read(“xbox_tweets_platforms.txt”)

411 web

229 twitter for iphone

142 twitterfeed

130 twitter for android

117 mobile web

98 raptr

58 tweetdeck

58 echofon

50 twitter for blackberry

47 txt

33 google

29 kit link

26 dlvr.it

25 slickdeals

25 xbox

21 bersocial for blackberry

301 Other

… and parse the tab-delimited data (just showing the top few data nodes here for simplicity):

js> data = parse(file.data, “Tweets:integer, Platform”)

[

    0: {

        Tweets: "411"

        Platform: "web"

    }

    1: {

        Tweets: "229"

        Platform: "twitter for iphone"

    }

    2: {

        Tweets: "142"

        Platform: "twitterfeed"

    }

...
... 

]

Prepare for charting:

js> options = { title: “Twitter User Platforms”, orientation: 25, x: “Platform”, y: “Tweets” }

{

    title: “Twitter User Platforms”

    orientation: 25

    x: “Platform”

    y: “Tweets”

}

Then, generate a bar graph:

js> graph.bar(data, options)

Screen_shot_2012-01-15_at_3

… followed by a pie chart:

js> graph.pie(data, options)

Screen_shot_2012-01-15_at_3

Similar process is used for the keyword data (this sampling clearly needs a different visualization mechanism):

Screen_shot_2012-01-15_at_4

Next week, we will be posting a more interesting example.

by Michael Alatortsev

 

 

Technologist, parallel entrepreneur. Interests: travel, photography, big data, analytics, predictive modeling.

Tagged with: , , , , , , , , , , , , , ,
Posted in Uncategorized
One comment on “Hadoop on Windows Azure: Visualizing Data
  1. Michael Alatortsev says:

    Yury, I wasn’t sure 32 nodes was going to cut it. The spreadsheet was 17 x 2, that’s 34 cells. But it all worked out in the end!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: