My Patch for SQOOP-3042 Committed

I have got a lot complains from Cloudera customers that after Sqoop job finishes, the table class Jar files were not cleaned up. By default, they are saved under /tmp/sqoop-{username}/compile, to be used by current running jobs. They are not needed anymore after job finishes, so they should be cleaned up.

The content of the directory looks like below:

[root@localhost ~]# ll /tmp/sqoop-hadoop/compile/
total 16
drwxrwxr-x. 2 hadoop hadoop 4096 Jun  6 08:56 1496d8f8400052af2a7d3ede2cfe496d
drwxrwxr-x. 2 hadoop hadoop 4096 Jun  6 08:45 6360b964ea0c1fdf6bf6aaed7a35b986
drwxrwxr-x. 2 hadoop hadoop 4096 Jun  6 08:45 d4ccb83934494ba2874b5c6d1b51d2ac
drwxrwxr-x. 2 hadoop hadoop 4096 Jun  6 08:50 df37a566defbfac477f6f309cf227dec
[root@localhost ~]# ll /tmp/sqoop-hadoop/compile/1496d8f8400052af2a7d3ede2cfe496d
total 56
-rw-rw-r--. 1 hadoop hadoop   620 Jun  6 08:56 SQOOP_3042$1.class
-rw-rw-r--. 1 hadoop hadoop   617 Jun  6 08:56 SQOOP_3042$2.class
-rw-rw-r--. 1 hadoop hadoop   620 Jun  6 08:56 SQOOP_3042$3.class
-rw-rw-r--. 1 hadoop hadoop   516 Jun  6 08:56 SQOOP_3042.avsc
-rw-rw-r--. 1 hadoop hadoop 10389 Jun  6 08:56 SQOOP_3042.class
-rw-rw-r--. 1 hadoop hadoop   237 Jun  6 08:56 SQOOP_3042$FieldSetterCommand.class
-rw-rw-r--. 1 hadoop hadoop  6063 Jun  6 08:56 SQOOP_3042.jar
-rw-rw-r--. 1 hadoop hadoop 12847 Jun  6 08:56 SQOOP_3042.java

I created an upstream JIRA to track and fix it SQOOP-3042 in Nov 2016. I have provided the patch since then, but never got looked at due to lack of reviewers.

After getting help from Cloudera Sqoop Engineers in our Budapest team, I finally get the JIRA progressed in the last few weeks and it was committed to Sqoop trunk yesterday. Details can be seen here: https://github.com/apache/sqoop/commit/0cfbf56713f7574568ea3754f6854e82f5540954

The fix involves adding a new command line options “–delete-compile-dir” so that user can instruct Sqoop to remove those temp directories after job finishes. The reason to add such option is to avoid changing Sqoop’s behaviour, but at the same time, allow Sqoop to perform exact actions.

An example command would look like below:

sqoop import --connect jdbc:mysql://localhost/test --username root --password pass --table SQOOP_3042 --target-dir /tmp/erictest --delete-target-dir --verbose --delete-compile-dir

And you can see below message showing in the –verbose mode to verify that directory and files are removed:

....
18/06/06 17:39:27 INFO mapreduce.ImportJobBase: Transferred 52 bytes in 29.6139 seconds (1.7559 bytes/sec)
18/06/06 17:39:27 INFO mapreduce.ImportJobBase: Retrieved 4 records.
18/06/06 17:39:27 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@6f1fba17
18/06/06 17:39:28 DEBUG util.DirCleanupHook: Removing directory: /tmp/sqoop-hadoop/compile/a9d8a87bc02a5f823a82014c49516736 in the clean up hook.