How to Index LZO files in Hadoop

Today I was trying to index LZO file using hadoop command: hadoop jar /opt/cloudera/parcels/GPLEXTRAS-5.7.0-1.cdh5.7.0.p0.40/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer /tmp/lzo_test However, it failed with the following error: 16/09/10 03:05:51 INFO mapreduce.Job: Task Id : attempt_1473404927068_0005_m_000000_0, Status : FAILED Error: java.lang.NullPointerException at com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:50) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at …

Enable Snappy Compression For Flume

Snappy is a compression/decompression library developed by Google. It aims for very high speeds and reasonable compression ( might be bigger than other standard compression algorithms but faster speed ). Snappy is shipped with Hadoop, unlike LZO compression which is excluded due to licensing issues. To enable Snappy in your …

Compile Hadoop LZO Compression Library on CentOS

To compile and install Hadoop’s LZO compression library on CentOS, following the steps below: Download hadoop LZO source from Kevin’s Hadoop LZO Project. If you are using Ant version of < 1.7, please download latest ant binary pacakge from Apache Ant, otherwise you will get the following error when compiling: …