Leap Second Caused Hadoop Cluster’s Slowness

We all know that the leap seconds caused major outrage to lots of big sites like Reddit, Gawker, LinkedIn, Foursquare and Yelp reported early July. I didn’t understand why it happened and never thought of it would affect the servers that host our Hadoop cluster. When we got in the office on Monday morning, we had noticed that most of processing tasks had been running very slowly and we had got lots of Hadoop namenode overloaded warning email from OpsView.

Initially we thought that it was caused by our monthly processing, which is quite resource intensive as it needs to process TB of data. However, the same problem kept happening even after the monthly processing had finished and warnings is continuing on, and the Hive server was just up and down.

On Tuesday night, I had to work with our system administrator up until 12AM in the mid-night trying to restart all Hadoop datanode clusters, but still no hope.

Eventually we concluded that it was the Linux Kernel bug caused by the leap second. The solution was to apply the bug fix to the kernel and reset the server’s clock.

It wasn’t an obvious fix, but we all learned from it. Next time would be easier to identify, but when?


Another Hadoop Deployment

We have just done another hadoop deployment to our processing system to process our demographic data on daily, weekly and monthly basis. This is the third Hadoop based processing release in about a month. Everything goes really well and no problems found so far.

We will continue with our fourth new feature based on Hadoop’s power in the coming weeks.

Great work team!

Enable Snappy Compression For Flume

Snappy is a compression/decompression library developed by Google. It aims for very high speeds and reasonable compression ( might be bigger than other standard compression algorithms but faster speed ). Snappy is shipped with Hadoop, unlike LZO compression which is excluded due to licensing issues. To enable Snappy in your Flume installation, following the steps below:

Install on Red Hat systems:

$ sudo yum install hadoop-0.20-native

Install on Ubuntu systems:

$ sudo apt-get install hadoop-0.20-native

This should create a directory under /usr/lib/hadoop/lib/native/ which contains some native hadoop libraries.

Then create environment config for Flume:

$ cp /usr/lib/flume/bin/flume-env.sh.template /usr/lib/flume/bin/flume-env.sh

And update the last line in the file to be:

For 32-bit platform

$ export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32

For 64-bit platform

$ export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64

Next update the flume’s configuration file under “/etc/flume/conf/flume-site.xml” on the collector node to:

    <description>Writes formatted data compressed in specified codec to
    dfs. Value is None, GzipCodec, DefaultCodec (deflate), BZip2Codec, SnappyCodec
    or any other Codec Hadoop is aware of </description>

And then finally restart the flume-node:

$ /etc/init.d/flume-node restart

You next file update in HDFS will look something like the following:

-rw-r--r--   3 flume supergroup          0 2011-10-21 14:01 /data/traffic/Y2011_M9_W37_D254/R0_P0/C1_20111021-140124175+1100.955183363700204.00000244.snappy.tmp
-rw-r--r--   3 flume supergroup   35156526 2011-10-20 16:51 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-164928958+1100.780424004236302.00000018.snappy
-rw-r--r--   3 flume supergroup     830565 2011-10-20 17:15 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-171423368+1100.781918413572302.00000018.snappy
-rw-r--r--   3 flume supergroup          0 2011-10-20 17:19 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-171853599+1100.782188644505302.00000042.snappy.tmp
-rw-r--r--   3 flume supergroup    1261171 2011-10-20 17:37 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-173728225+1100.783303271088302.00000018.snappy
-rw-r--r--   3 flume supergroup    2128701 2011-10-20 17:40 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-174024045+1100.783479090669302.00000046.snappy

Happy Fluming..