Memory limit exceeded HdfsParquetScanner::ReadDataPage() failed to allocate 65535 bytes for decompressed data. Corrupt Parquet file 'hdfs://nameservice1/path/to/file/914164e7120e6076-cdae1be60000001f_169433548_data.0.parq': column 'client_ord_id' had 1024 remaining values but expected 0 _ [Executed: 4/29/2017 5:28:58 AM] [Execution: 588ms] When an impala query failed with OOM error, it also reported corrupted parquet file: HdfsParquetScanner::ReadDataPage() failed to allocate 65535 bytes for decompressed data. Corrupt Parquet file 'hdfs://nameservice1/path/to/file/914164e7120e6076-cdae1be60000001f_169433548_data.0.parq': column 'client_ord_id' had 1024 remaining values but expected 0 _ [Executed: 4/29/2017 5:28:58 AM] [Execution: 588ms]This is reported in the upstream JIRA: IMPALA-5197, this can happen in the following scenarios:
- Query failed with OOM error
- There is a LIMIT clause in the query
- Query is manually cancelled by the user
SET MEM_LIMIT=10g;For the other two causes, we will need to wait for IMPALA-5197 to be fixed Update:IMPALA-5197 has been fixed since CDH5.12.0 as well as CDH5.10.2, CDH5.9.3 and CDH5.11.2.
It has been fixed in Impala 2.9.0 Version
Thanks Surendranatha for you update. Yes it has been fixed since CDH5.12.0, as well as CDH5.10.2, CDH5.9.3 and CDH5.11.2