Impala query failed with error: “Incompatible Parquet Schema”

Yesterday, I was dealing with an issue that when running a very simple Impala SELECT query, it failed with “Incompatible Parquet schema” error. I have confirmed the following workflow that triggered the error: Parquet file is created from external library Load the parquet file into Hive/Impala table Query the table …

Impala Reported Corrupt Parquet File After Failed With OutOfMemory Error

Recently I was dealing with an issue that impala reported Corrupt Parquet File after it failed with OutOfMemory error, however, if it does not fail, no corruption will be reported. See below error message reportd in Impala Daemon logs: Memory limit exceeded HdfsParquetScanner::ReadDataPage() failed to allocate 65535 bytes for decompressed …

How to redirect parquet’s log message into STDERR rather than STDOUT

This article explains the steps needed to redirect parquet’s log message from STDOUT to STDERR, so that the output of Hive result will not be polluted should the user wants to capture the query result on command line. In Parquet’s code based, it writes its logging information directly into STDOUT, …