After Hadoop is install manually using binary package on CentOS, Snappy compression is not supported by default and there are extra steps required in order for Snappy to work in Hadoop. It is straightforward but might not be obvious if you don’t know what to do.
Firstly, if you are using 64 bit version of CentOS, you will need to replace the default native hadoop library which is shipped with Hadoop (it is only compiled for 32 bit), you can try to download it from here, and then put it under “$HADOOP_HOME/lib/native” directory. If there is a symlink, you can just remove the symlink with the actual file. If it still doesn’t work, then you might need to compile yourself on your machine which is out of scope of this post, you can follow instructions on this site.
Secondly you will need to install native snappy library for your operating system (CentOS 6.3 in my case):
$ sudo yum install snappy snappy-devel
This will create a file called libsnappy.so under /usr/lib64 directory, we need to create a link to this file under “$HADOOP_HOME/lib/native”
sudo ln -s /usr/lib64/libsnappy.so $HADOOP_HOME/lib/native/libsnappy.so
Then update three configuration files:
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapred.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
And finally add the following line into $HADOOP_HOME/etc/hadoop/hadoop-env.sh to tell Hadoop to load the native library from the exact location:
That’s it, just restart HDFS and Yarn by running:
Now you should be able to create hive tables with Snappy compressed.