The following Exceptions occur when executing Sqoop on a cluster managed by Cloudera Manager:

15/05/11 20:42:55 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/mnt/var/opt/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/sqoop/lib/hsqldb-1.8.0.10.jar
15/05/11 20:42:55 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/mnt/var/opt/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/sqoop/lib/hsqldb-1.8.0.10.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:481)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:198)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:171)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:268)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

This is caused by Sqoop needs configuration deployment throught a YARN Gateway.

To fix this problem, in Cloudera Manager, go to:

YARN > Instances > Add Roles Instances > Gateway (Select Hosts) > (Click on target hosts) > OK

then “Deploy Client Configuration” is required, go to:

YARN > Intances > Actions (top right corner) > Deploy Client Configuration

Now run the sqoop command again and it should be working.

The reason that the YARN Gateway is needed is that when Sqoop runs on a particular host, this host has to know where the Resource Manager is, whether it is in the local mode or cluster mode etc, so that it knows how to submit those MapReduce jobs. By deploying client configuration to the host, it will have those information and it knows what to do when Sqoop is run, as Sqoop jobs are actually MapReduce jobs.

The solution is not obvious, and hope it can help.

    Leave a Reply

    Your email address will not be published. Required fields are marked *