Copy Hadoop Data From One HDFS to Another

If you have two HDFS cluster operating on two different places (production vs alpha for example), sometimes you might want to copy some data from one cluster to another. To do it is easy using Hadoop’s internal “distcp” command: hadoop distcp hdfs://hadoop-namenode/data/2013/01 hdfs:///data/2013/ We have the following directory structure in …

Another Hadoop Deployment

We have just done another hadoop deployment to our processing system to process our demographic data on daily, weekly and monthly basis. This is the third Hadoop based processing release in about a month. Everything goes really well and no problems found so far. We will continue with our fourth …

Enable Snappy Compression For Flume

Snappy is a compression/decompression library developed by Google. It aims for very high speeds and reasonable compression ( might be bigger than other standard compression algorithms but faster speed ). Snappy is shipped with Hadoop, unlike LZO compression which is excluded due to licensing issues. To enable Snappy in your …