Securely Managing Passwords In Sqoop

Apache Sqoop became the Top-Level Project in Apache in March 2012. Since then, Sqoop has developed a lot and become very popular amongst Hadoop ecosystem. In this post, I will cover the ways to specify database passwords to Sqoop in a secure way.

The following ways are common to pass database passwords to Sqoop:

sqoop import --connect jdbc:mysql://myexample.com/test \
             --username myuser -P \
             --table mytable
sqoop import --connect jdbc:mysql://myexample.com/test 
             --username myuser \
             --password mypassword \
             --table mytable

The first one is secure as other people can’t see the password, however, it is only practical to use in the command line.

And we all agree that the second one is insecure as everyone can see what the password is to access the database.

The more secure way of passing the password is through the use of so called password file. The command as follows:

echo -n "password" > /home/ericlin/.mysql.password
chmod 400 /home/ericlin/.mysql.password
sqoop import --connect jdbc:mysql://myexample.com/test \
             --username myuser \
             --password-file /home/ericlin/.mysql.password \
             --table mytable

Please note that we need “-n” option for the “echo” command so that no newline will be added to the end of the password. And, please do not use “vim” to create the file as “vim” will add newline automatically to the end of the file, which will cause Sqoop to fail as the password contains a newline character.

However, storing password in a text file is still considered not secure even though we have set the permissions. As of Sqoop 1.4.5, Sqoop supports the use of JAVA Key Store to store passwords, so that you do not need to store passwords in clear text in a file.

To generate the key:

[ericlin@localhost ~] $ hadoop credential create mydb.password.alias -provider jceks://hdfs/user/ericlin/mysql.password.jceks
Enter password: 
Enter password again: 
mysql.password has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

On prompt, enter the password that will be used to access the database.

The “mydb.password.alias” is the alias that we can use to pass to Sqoop when running the command, so that no password is needed.

Then you can run the following Sqoop command:

sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/ericlin/mysql.password.jceks \
             -–connect ‘jdbc:mysql://myexample.com/test’ \
             -–table mytable \
             -–username myuser \
             -–password-alias mydb.password.alias

This way password is hidden inside jceks://hdfs/user/ericlin/mysql.password.jceks and no one is able to see it.

Hope this helps.

    18 Comments

    1. Thank you for this, Eric! Very helpful. With the hdfs method of storing password, it would only work if you have a persisted hdfs, correct? Do you have any suggestions for EMR as well? Thanks.

      Reply

      1. Hi,

        I have not played with EMR before, so I don’t have much to say about it. But yes, you need to have persisted HDFS to store the password file.

        Reply

    2. I followed same steps, but i got this error ” INFO [main] sqoop.Sqoop: Running Sqoop version: 1.4.6
      CredentialProvider facility not available in the hadoop environment used

      I run his command
      sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/hadoop/mysql.password.jceks –connect ‘jdbc:mysql://localhost/practice’ –table authors –username root –password-alias mydb.password.alias
      Whats wrong? Please reply.

      Reply

      1. I run this command to generate password
        hadoop credential create mydb.password.alias -provider jceks://hdfs/user/hadoop/mysql.password.jceks

        Reply

      2. Hi Venu,

        Sorry for the late reply. Can you please provide the verbose output of Sqoop command? So add “–verbose” to sqoop and paste the output here.

        Thanks

        Reply

    3. Hi,
      I run the same command , but it is giving unrecognised argument at –password-alias like below

      14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
      16/07/13 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
      16/07/13 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: –password-alias
      16/07/13 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: db2-dev-password

      plz help me….

      Reply

    4. Thank you very much Eric! Fantastic article, helped me to understand how the password credential provider works.

      Regards
      Sivakumar

      Reply

    5. very nice and clear post – I was very confused reading official hadoop documentation but your post and examples cleared this up for me. thanks Eric!

      Reply

    6. Hi Eric,

      Is there any protection to the credential provider? My point is if everyone can access the provider and retrieve the password, that’ll be essentially the same as everyone being able to see the password in plain text. Right?

      Reply

      1. Hi Rui,

        Sorry about the late reply. I am not sure any good ways, but I would think setting the file to be 600 will prevent others from accessing it. The password is already encrypted and it is very hard to steal password already.

        Reply

        1. Thanks Eric for the reply. So we need to rely on file permission to protect the key store, which is the same as the (plain text) password file method. But the difference is, unlike password file, the password is encrypted in key store file. Is this correct?

          Reply

          1. Hi Rui,

            Yeah, that’s correct, at least based on my understanding. If you try to cat the jceks file, you will noticed garbage outputs.

            Reply

    Leave a Reply

    Your email address will not be published. Required fields are marked *