Month: <span>May 2015</span>

Month: May 2015

Timestamp stored in Parquet file format in Impala Showing GMT Value

This article explains why Impala and Hive return different timestamp values on the same table that was created and value inserted from Hive. It also outlines the steps to force Impala to apply local time zone conversion when reading timestamp field stored in Parquet file format. When Hive stores a timestamp …

Loading

How to ask Sqoop to empty NULL valued fields when importing into Hive

Data imported from Postgres into Hive has lots of fields with “null” as the value, including fields with BIGINT data type. When Impala tries to read the table with such data, it produces lots of warning message: WARNINGS: Backend 2:Error converting column: 6 TO BIGINT (Data is: null) To force …

Loading

How to control the number of mappers required for a Hive query

This article explains how to increase or decrease the number of mappers required for a particular Hive query. Setting both “mapreduce.input.fileinputformat.split.maxsize” and “mapreduce.input.fileinputformat.split.minsize” to the same value in most cases will be able to control the number of mappers (either increase or decrease) used when Hive is running a particular …

Loading

Hive Shows NULL Value to New Column Added to a Partitioned Table With Existing Data

Today I discovered a bug that Hive can not recognise the existing data for a newly added column to a partitioned external table. In this post, I explained the steps to re-produced as well as the workaround to the issue. Firstly I prepared the data in text format call test.txt, …

Loading

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!