Timestamp stored in Parquet file format in Impala Showing GMT Value

This article explains why Impala and Hive return different timestamp values on the same table that was created and value inserted from Hive. It also outlines the steps to force Impala to apply local time zone conversion when reading timestamp field stored in Parquet file format. When Hive stores a timestamp …

How to control the number of mappers required for a Hive query

This article explains how to increase or decrease the number of mappers required for a particular Hive query. Setting both “mapreduce.input.fileinputformat.split.maxsize” and “mapreduce.input.fileinputformat.split.minsize” to the same value in most cases will be able to control the number of mappers (either increase or decrease) used when Hive is running a particular …

Sqoop Fails with FileNotFoundException in CDH

The following Exceptions occur when executing Sqoop on a cluster managed by Cloudera Manager: 15/05/11 20:42:55 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/mnt/var/opt/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/sqoop/lib/hsqldb-1.8.0.10.jar 15/05/11 20:42:55 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/mnt/var/opt/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/sqoop/lib/hsqldb-1.8.0.10.jar at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) …

Hive Shows NULL Value to New Column Added to a Partitioned Table With Existing Data

Today I discovered a bug that Hive can not recognise the existing data for a newly added column to a partitioned external table. In this post, I explained the steps to re-produced as well as the workaround to the issue. Firstly I prepared the data in text format call test.txt, …