Impala Failed to Read Parquet Decimal Data

If you are using Hive JDBC/ODBC driver to create Parquet data via Hive, and you face below error when reading from Impala:

File 'hdfs://namespace1/path/to/parquet/file/15110390563421' column 'column1' has a scale that does not match the table metadata scale. 
File metadata scale: 0 Table metadata scale: 2 when we try to execute in Impala.

It is most likely that you hit a known issue HIVE-15519.

Due to this bug, the DECIMAL scale value returned from the driver was incorrect (returns null), hence caused the data being written to parquet file with wrong scale value for those columns. And Impala will complain that the column’s definition at metadata side is not matching with the column type stored in Parquet file, due to different scale values.

Currently there is no workarounds available for this issue, and a patch is required. Or you will have to wait it to be fixed. At the time of writing, the latest Cloudera CDH is at 5.13.1.

Leave a Reply

Your email address will not be published. Required fields are marked *