Impala query failed with error: “Incompatible Parquet Schema”

ghostwriter uni wien Yesterday, I was dealing with an issue that when running a very simple Impala SELECT query, it failed with “Incompatible Parquet schema” error. I have confirmed the following workflow that triggered the error:

  1. Parquet file is created from external library
  2. Load the parquet file into Hive/Impala table
  3. Query the table through Impala will fail with below error message cheap essays for sale incompatible Parquet schema for column 'db_name.tbl_name.col_name'. Column type: DECIMAL(19, 0), Parquet schema:\noptional byte_array col_name [i:2 d:1 r:0]

  4. The same query works well in Hive

go This is due to impala currently does not support all decimal specs that are supported by Parquet. Currently Parquet supports the following specs:

  • int32: for 1 <= precision <= 9
  • int64: for 1 <= precision <= 18; precision < 10 will produce a warning
  • fixed_len_byte_array: precision is limited by the array size. Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits
  • binaryprecision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.

Please refer to Parquet Logical Type Definitions page for details. old dominion admissions essay However, Impala only supports fixed_len_byte_array, but no others. This has been reported in the upstream JIRA: IMPALA-2494

legitimate custom writing services The only workaround for now is to create a parquet file that will use supported specs for Decimal column, or simply create parquet file through either Hive or Impala.

Impala Query fails with NoSuchObjectException error

follow site In the last few months, I have seem CDH users hitting Impala query returning NoSuchObjectException error very often. This happens when running query against a particular table with INT partition types and it failed with below message:

Custom Writings Discount Code WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive Metastore: CAUSED BY: InvalidOperationException: Alter partition operation failed: NoSuchObjectException(message:partition values=[2017, 6, 1, 8]) We have confirmed that the table has four partitions with Integer data type, and select individual partition works.

The following scenario will trigger such error:

  • Partitions with INT data type
  • Partition data was inserted from Hive with zero prefixes, something like below query:

    see INSERT OVERWRITE TABLE test_tbl PARTITION (year = '2017', month = '06'....) .....

  • Partition data will be created under HDFS location like below:

    next hdfs://nameservice1/user/hive/warehouse/test_tbl/year=2017/month=06/day=01/hour=08

  • When query through Impala, since the data type is INT, Impala will convert values from “06” to 6, “01” to 1 etc, and will be looking for location : hdfs://nameservice1/user/hive/warehouse/test_tbl/year=2017/month=6/day=1/hour=8

    instead​ of: hdfs://nameservice1/user/hive/warehouse/test_tbl/year=2017/month=06/day=01/hour=08 hence triggered NoSuchObjectException error.

district manager loss prevention resume To fix the issue, there are two options:

  1. Convert the data type of partition columns to String, instead of Integer:

    1000 words essay ALTER TABLE test_tbl PARTITION COLUMN (year string); ALTER TABLE test_tbl PARTITION COLUMN (month string); ALTER TABLE test_tbl PARTITION COLUMN (day string); ALTER TABLE test_tbl PARTITION COLUMN (hour string);

  2. if integer type need to be kept, then we will need to re-build the table into a new one and store them into locations without leading zeros. This can be done by running the following queries from Impala: CREATE TABLE new_test_tbl LIKE test_tbl; INSERT OVERWRITE TABLE new_test_tbl PARTITION (year, month, day, hour) as SELECT * FROM test_tbl;

    writing an accounting dissertation The new table will have leading zeros in partitions removed and then we can switch over to use the new table. When writing more data into the new table through Hive, please be sure to remove all leading zeros to prevent the issue from happening again.

watch Above steps should help resolve the issue. Hope they will help.

Unable to Import Data as Parquet into Encrypted HDFS Zone

here Recently I have discovered an issue in Sqoop that when it is importing data into Hive table, whose location is in an encrypted HDFS zone, the Sqoop command will fail with the following errors:

source site Command:

home page sqoop import --connect <postgres_url> --username <username> --password <password> \ --table sourceTable --split-by id --hive-import --hive-database staging \ --hive-table hiveTable --as-parquetfile Errors:

distracted from homework 2017-05-24 13:38:51,539 INFO [Thread-84] Setting job diagnostics to Job commit failed: Could not move contents of hdfs://nameservice1/tmp/staging/. temp/job_1495453174050_1035/mr/job_1495453174050_1035 to hdfs://nameservice1/user/hive/warehouse/staging.db/hiveTable at at at at$MergeOutputCommitter.commitJob( at$EventProcessor.handleJobCommit( at$ at java.util.concurrent.ThreadPoolExecutor.runWorker( at java.util.concurrent.ThreadPoolExecutor$ at Caused by: org.apache.hadoop.ipc.RemoteException( /tmp/staging/.temp/job_1495453174050_1035/mr/job_1495453174050_1035/964f7b5e-2f55-421d-bfb6-7613cc4bf26e.parquet can't be moved into an encryption zone. at org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager.checkMoveValidity( at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedRenameTo( at org.apache.hadoop.hdfs.server.namenode.FSDirectory.renameTo( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(

custom writing essay reviews After some research, I have found out that it is caused by a known Sqoop bug: SQOOP-2943. This happens because Sqoop currently uses the Kite SDK to generate Parquet file, and the Kite SDK uses the /tmp directory to generate the parquet file on the fly. Because the /tmp directory is not encrypted and the hive warehouse directory is encrypted, the final move command to move the parquet file from the /tmp directory to hive warehouse will fail due to the encryption.

transfer college essay The import only fails with parquet format, the text file format currently works as expected.

enter As SQOOP-2943 is not fixed at this stage, and there is no direct workarounds, I would suggest the following two methods for importing the data into a Hive parquet table, inside the encrypted warehouse:

  • Import the data as text file format into Hive temporary table inside the Hive warehouse (encrypted), and then use Hive query to copy data into destination parquet table
  • Import the data as parquet file into non-encrypted temporary directory outside of Hive warehouse, and then again use Hive to copy data into destination parquet table inside the Hive warehouse (encrypted) who can help me write a paper for money Hope above can help with anyone who noticed the similar issues.

Hive Query Failed with Token Renewer Error | Hive on Spark

click If you run Hive on Spark on some CDH versions, you might run into issues when Hive is trying to renew HDFS delegation tokens. See below error message (that you can find from HiveServer2 server logs):

dont do your childs homework 2017-06-27 17:04:08,836 INFO org.apache.hive.spark.client.SparkClientImpl: [stderr-redir-1]: 17/06/27 17:04:08 WARN security.UserGroupInformation: PriviledgedActionException as:testuser (auth:PROXY) via hive/ (auth:KERBEROS) testuser tries to renew a token with renewer hive

source site If you do some googling, you should be able to locate the corresponding upstream Hive JIRA for this issue: HIVE-15485. And from this JIRA, you should also be able to identify that the issue was introduced by HIVE-14383. This is due to the fact that Spark needs the principal/keytab passed in via –principal and –keytab options, and does the renewal by copying the keytab to the cluster and handling login to kerberos inside the application. But the option –principal and –keytab could not work with –proxy-user in, so at this moment we could support either the token renewal or the impersonation, but not both. The only way to avoid such issue is to upgrade CDH to the version that has the fix for HIVE-15485, which has been fixed in the following releases:

CDH5.8.5 CDH5.9.2 CDH5.10.1, CDH5.10.2 CDH5.11.0, CDH5.11.1

Since HIVE-14383 was introduced in the following CDH:

CDH5.8.3, CDH5.8.4, CDH5.8.5
CDH5.9.1, CDH5.9.2
CDH5.10.0, CDH5.10.1, CDH5.10.2 
CDH5.11.0, CDH5.11.1

This makes the following CDH currently will have such issues:

CDH5.8.3, CDH5.8.4, CDH5.9.1, CDH5.10.0

Please deploy the latest maintenance release for your major version to avoid such issue in Hive on Spark.

Hope above helps.

Read Files under Sub-Directories for Hive and Impala

Sometimes you might want to store data under sub-directories in HDFS and then you want Hive or Impala to read from those sub-directories. For example, you have the following directory structure:

root hdfs     231206 2017-06-30 02:45 /test/table1/000000_0
root hdfs          0 2017-06-30 02:45 /test/table1/child_directory
root hdfs     231206 2017-06-30 02:45 /test/table1/child_directory/000000_0

By default, Hive will only look for files in the root of directory specified, in my test case is /test/table1. However, Hive supports to read all data under the root table’s sub-directories as well. This can be achieved by updating the following settings:

SET mapred.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories=true;

Impala however, on the other side, currently does not support reading files from table’s sub-directories. This has been reported in the upstream JIRA of IMPALA-1944. Currently there is no immediate plan to support such feature, but it might be in the future release of Impala.

Hope above information is useful.