Enabling Snappy Compression Support in Hadoop 2.4 under CentOS 6.3

Buy A Essay After Hadoop is install manually using binary package on CentOS, Snappy compression is not supported by default and there are extra steps required in order for Snappy to work in Hadoop. It is straightforward but might not be obvious if you don’t know what to do.

http://perfectperceptionmedia.com/essay-writers-service/ Firstly, if you are using 64 bit version of CentOS, you will need to replace the default native hadoop library which is shipped with Hadoop (it is only compiled for 32 bit), you can try to download it from here, and then put it under “$HADOOP_HOME/lib/native” directory. If there is a symlink, you can just remove the symlink with the actual file. If it still doesn’t work, then you might need to compile yourself on your machine which is out of scope of this post, you can follow instructions on this site.

http://www.acquevini.it/master-thesis-rural-development/ Secondly you will need to install native snappy library for your operating system (CentOS 6.3 in my case):

go site $ sudo yum install snappy snappy-devel

http://www.socio.msu.ru/?do-prewriting-essay do prewriting essay This will create a file called libsnappy.so under /usr/lib64 directory, we need to create a link to this file under “$HADOOP_HOME/lib/native”

go to link sudo ln -s /usr/lib64/libsnappy.so $HADOOP_HOME/lib/native/libsnappy.so

link Then update three configuration files:

go to site $HADOOP_HOME/etc/hadoop/core-site.xml

here <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>

link $HADOOP_HOME/etc/hadoop/yarn-site.xml

writing assistance <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapred.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>

see And finally add the following line into $HADOOP_HOME/etc/hadoop/hadoop-env.sh to tell Hadoop to load the native library from the exact location:

Five Paragraph Essay export JAVA_LIBRARY_PATH="/usr/local/hadoop/lib/native"

Paper Based Dissertation That’s it, just restart HDFS and Yarn by running:

follow $HADOOP_HOME/sbin/stop-all.sh $HADOOP_HOME/sbin/start-all.sh

http://comsaltoalto.com.br/?p=what-order-is-best-for-narrative-essays Now you should be able to create hive tables with Snappy compressed.

Load Data From File Into Compressed Hive Table

college essay find x Disk might be cheap, but when it comes to deal with TB of data, you might want to consider compression for your data storage.

http://www.aftlv.com/doctoral-thesis-chris-huff/ When you want to create a table with compression enabled, you will need to use “STORED AS SEQUENCEFILE” when create a table in Hive:

http://kofc2204.net/?p=professional-writing-terms CREATE TABLE compressed_table (data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE;

However, you will not be able to use “LOAD DATA” command to load data from text file into this compressed table, Hive will complain about the file format.

There is a trick to by pass this, however. What you need to do is to create a temp table to hold the data from file as “LOAD DATA” command will work for normal text file storage, and “INSERT OVERWRITE INTO TABLE” will also work for compressed table. Follow the steps below:

CREATE TABLE tmp_table (data STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
 
CREATE TABLE compressed_table (data STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
   STORED AS SEQUENCEFILE;
 
LOAD DATA LOCAL INPATH '/tmp/file.txt' INTO TABLE tmp_table;

INSERT OVERWRITE TABLE compressed_table SELECT * FROM tmp_table; 

-- then drop the tmp table
DROP TABLE tmp_table;

Another way ( faster ) is to use the external table which will save the time to load data into the tmp_table, but you will need to put the file into HDFS first:

CREATE EXTERNAL TABLE IF NOT EXISTS tmp_table (data STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
   LOCATION 'hdfs://hadoop-namenode:8020/directory_name';
 
CREATE TABLE compressed_table (data STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
   STORED AS SEQUENCEFILE;

INSERT OVERWRITE TABLE compressed_table SELECT * FROM tmp_table; 

-- then drop the tmp table
DROP TABLE tmp_table;

Hope the new Hive release will make our life easier by fixing these limitations.

Enable Snappy Compression For Flume

Snappy is a compression/decompression library developed by Google. It aims for very high speeds and reasonable compression ( might be bigger than other standard compression algorithms but faster speed ). Snappy is shipped with Hadoop, unlike LZO compression which is excluded due to licensing issues. To enable Snappy in your Flume installation, following the steps below:

Install on Red Hat systems:

$ sudo yum install hadoop-0.20-native

Install on Ubuntu systems:

$ sudo apt-get install hadoop-0.20-native

This should create a directory under /usr/lib/hadoop/lib/native/ which contains some native hadoop libraries.

Then create environment config for Flume:

$ cp /usr/lib/flume/bin/flume-env.sh.template /usr/lib/flume/bin/flume-env.sh

And update the last line in the file to be:

For 32-bit platform

$ export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32

For 64-bit platform

$ export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64

Next update the flume’s configuration file under “/etc/flume/conf/flume-site.xml” on the collector node to:

  <property>
    <name>flume.collector.dfs.compress.codec</name>
    <value>SnappyCodec</value>
    <description>Writes formatted data compressed in specified codec to
    dfs. Value is None, GzipCodec, DefaultCodec (deflate), BZip2Codec, SnappyCodec
    or any other Codec Hadoop is aware of </description>
  </property>

And then finally restart the flume-node:

$ /etc/init.d/flume-node restart

You next file update in HDFS will look something like the following:

-rw-r--r--   3 flume supergroup          0 2011-10-21 14:01 /data/traffic/Y2011_M9_W37_D254/R0_P0/C1_20111021-140124175+1100.955183363700204.00000244.snappy.tmp
-rw-r--r--   3 flume supergroup   35156526 2011-10-20 16:51 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-164928958+1100.780424004236302.00000018.snappy
-rw-r--r--   3 flume supergroup     830565 2011-10-20 17:15 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-171423368+1100.781918413572302.00000018.snappy
-rw-r--r--   3 flume supergroup          0 2011-10-20 17:19 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-171853599+1100.782188644505302.00000042.snappy.tmp
-rw-r--r--   3 flume supergroup    1261171 2011-10-20 17:37 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-173728225+1100.783303271088302.00000018.snappy
-rw-r--r--   3 flume supergroup    2128701 2011-10-20 17:40 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-174024045+1100.783479090669302.00000046.snappy

Happy Fluming..