sqoop import --connectErrors:--username --password \ --table sourceTable --split-by id --hive-import --hive-database staging \ --hive-table hiveTable --as-parquetfile
2017-05-24 13:38:51,539 INFO [Thread-84] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to Job commit failed: org.kitesdk.data.DatasetIOException: Could not move contents of hdfs://nameservice1/tmp/staging/. temp/job_1495453174050_1035/mr/job_1495453174050_1035 to hdfs://nameservice1/user/hive/warehouse/staging.db/hiveTable at org.kitesdk.data.spi.filesystem.FileSystemUtil.stageMove(FileSystemUtil.java:117) at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:406) at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:62) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:387) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): /tmp/staging/.temp/job_1495453174050_1035/mr/job_1495453174050_1035/964f7b5e-2f55-421d-bfb6-7613cc4bf26e.parquet can't be moved into an encryption zone. at org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager.checkMoveValidity(EncryptionZoneManager.java:284) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedRenameTo(FSDirectory.java:564) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.renameTo(FSDirectory.java:478) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:3929) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:3891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3856)This is caused by a known Sqoop bug: SQOOP-2943. This happens because Sqoop uses Kite SDK to generate Parquet file, and Kite SDK uses /tmp directory to generate the parquet file on the fly. And because /tmp directory is not encrypted and hive warehouse directory is encrypted, the final move command to move the parquet file from /tmp to hive warehouse will fail due to the encryption. The import only fails with parquet format, text file format works as expected. Currently SQOOP-2943 is not fixed and there is no direct workaround. For the time being, the workaround is to:
- Import data as text file format into Hive temporary table, and then use Hive query to copy data into destination parquet table. OR
- Import data as parquet file into temporary directory outside of Hive warehouse, and then again use Hive to copy data into destination parquet table
Hi Eric,
Is this issue resolved. I have run into exactly same problem.
Thanks.
Hi Kanhaiya,
Thanks for visiting my blog.
SQOOP-2943 is duplicated by SQOOP-3313, which has been fixed. However, it is not in CDH, but it might be in CDP in the near future. For now, you still have to use the workarounds.
Cheers
Eric