Access S3 in Hive through hadoop.security.credential.provider.path

A couple of years ago, I wrote a blog about how to securely managing passwords in Sqoop, so that the RDBMS’ password won’t be exposed to end users when running Sqoop jobs. The details can be found here.

Recently, we have one customer tested such feature with Hive accessing S3 credentials, and tried to set hadoop.security.credential.provider.path in Hive with the value of the jceks file’s location on HDFS, but not able to get it working. I have spent quite a few days researching on this topic and concluded that currently Hive does not support such feature to read S3A passwords from JCEK credential file.

I have the following findings:

1. When you create credential store files using below command:

hadoop credential create fs.s3a.access.key -value XXXXXXXXXXX \
    -provider jceks://hdfs/keystore/s3-key.jceks

Hadoop will produce below warning:

WARNING: You have accepted the use of the default provider password
by not configuring a password in one of the two following locations:
    * In the environment variable HADOOP_CREDSTORE_PASSWORD
    * In a file referred to by the configuration entry
      hadoop.security.credstore.java-keystore-provider.password-file.
Please review the documentation regarding provider passwords in
the keystore passwords section of the Credential Provider API
Continuing with the default provider password.

I believe that the default password for HADOOP_CREDSTORE_PASSWORD is “none” if you do not set it before running the command.

2. Both HiveServer2 and HiveMetaStore will generate their own HADOOP_CREDSTORE_PASSWORD by Cloudera Manager after each restart. To get the values for them, you can try to run below commands on their hosts:

strings /proc/$(pgrep -f HiveServer2)/environ | grep HADOOP_CREDSTORE_PASSWORD

strings /proc/$(pgrep -f HiveMetaStore)/environ | grep HADOOP_CREDSTORE_PASSWORD

In my case, it returned below output:

strings /proc/$(pgrep -f HiveServer2)/environ | grep HADOOP_CREDSTORE_PASSWORD
HADOOP_CREDSTORE_PASSWORD=dq440oxl9rgopo9c593k86vch

strings /proc/$(pgrep -f HiveMetaStore)/environ | grep HADOOP_CREDSTORE_PASSWORD
HADOOP_CREDSTORE_PASSWORD=8sttjj54e9ffdo6ovqcufxmp4

You can see that the HADOOP_CREDSTORE_PASSWORD is different between them, so both HiveServer2 and HiveMetaStore will not be able to read the manually generated jceks files as password does not match.

3. I also tried to override the environment variable HADOOP_CREDSTORE_PASSWORD for both HiveServer2 and HiveMetaStore, via “Hive Metastore Server Environment Advanced Configuration Snippet (Safety Valve)” and “HiveServer2 Environment Advanced Configuration Snippet (Safety Valve)” under Cloudera Manager > Hive > Configuration page, however, this will break HiveMetaStore with below errors:

2018-11-03 02:44:39,569 ERROR org.apache.hadoop.hive.metastore.HiveMetaStore: [main]: Metastore Thrift Server threw an exception...
java.lang.RuntimeException: Error getting metastore password: null
	at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:403)
	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:279)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:60)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:682)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:660)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:709)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:508)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6475)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6470)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6720)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:6647)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException
	at org.apache.hadoop.hive.shims.Hadoop23Shims.getPassword(Hadoop23Shims.java:1144)
	at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:397)
	... 21 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.hive.shims.Hadoop23Shims.getPassword(Hadoop23Shims.java:1138)
	... 22 more
Caused by: java.io.IOException: Configuration problem with provider path.
	at org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2118)
	at org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2037)
	... 27 more
Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect
	at com.sun.crypto.provider.JceKeyStore.engineLoad(JceKeyStore.java:865)
	at java.security.KeyStore.load(KeyStore.java:1445)
	at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.locateKeystore(AbstractJavaKeyStoreProvider.java:335)
	at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.<init>(AbstractJavaKeyStoreProvider.java:88)
	at org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider.<init>(LocalJavaKeyStoreProvider.java:58)
	at org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider.<init>(LocalJavaKeyStoreProvider.java:50)
	at org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider$Factory.createProvider(LocalJavaKeyStoreProvider.java:177)
	at org.apache.hadoop.security.alias.CredentialProviderFactory.getProviders(CredentialProviderFactory.java:63)
	at org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2098)
	... 28 more

I have not got time to investigate into why HiveMetaStore would fail, but HiveServer2 is OK. However, this confirms that such feature is currently not supported in Hive.

I think this is a good feature to have because currently you have to have the S3 access and secret keys saved in Hive’s configuration file. This means that everyone will be able to have access to S3. Having the keys saved in crednetial files, only the user who has access to those files will be able to access S3, so it should be more secure. So I filed Cloudera internal JIRA to request this feature, but at the time of writing, it is not fixed yet.

For now, if you want Hive table to access S3 location, the access and secret key need to be configured inside the configuration XML file at global level.

Leave a Reply

Your email address will not be published. Required fields are marked *