WebHCat Request Failed With Error “id: HTTP: no such user”

WebHCat, previously known as Templeton, is the REST API for HCatalog, a table and storage management layer for Hadoop. Users can use WebHCat to access metadata information from HCatalog, as well as submitting jobs for MapReduce, Hive & Pig.

Below is an example of how to retrieve a list of databases via WebHCat API:

curl --negotiate -u: http://{webhcat-hostname}:50111/templeton/v1/ddl/database/

Please note that port 50111 is the default port number for WebHCat. And sample output looks like below:

{"databases":["default","mytest","s3_test","udf_db"]}

However, recently I was facing an issue that WebHCat request failed with below error:

2018-08-08 17:18:44,413 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hive (auth:PROXY) via HTTP/{webhcat-hostname}@CDH511.COM (auth:KERBEROS) cause:org.apache.thrift.transport.TTrans
portException: java.net.SocketException: Connection reset
2018-08-08 17:18:44,414 ERROR org.apache.hive.hcatalog.templeton.CatchallExceptionMapper: java.lang.reflect.UndeclaredThrowableException
java.io.IOException: java.lang.reflect.UndeclaredThrowableException
...
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
....
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1882)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1872)
....

From the stacktrace, we can see that WebHCat failed when trying to collect delegation token from HiveMetaStore. So checking the HMS server log, I found below error:

2018-08-08 17:18:19,675 WARN  org.apache.hadoop.security.ShellBasedUnixGroupsMapping: [pool-7-thread-2]: unable to return groups for user HTTP
PartialGroupNameException The user name 'HTTP' is not found. id: HTTP: no such user
id: HTTP: no such user

It is pretty clear that HMS failed due to user “HTTP” is missing. Adding “HTTP” user on the HMS server host resolved the issue.

Research further, I realized that this was due to in Hive configuration, hadoop.proxyuser.hive.groups was set to a list of groups, rather than “*”, and “HTTP” was one in the group list. You will not get such error if the hadoop.proxyuser.hive.groups is set at “*”, and only failed if “HTTP” was added manually (it is required to be on this list if the value is not “*”, because “hive” user need to be able to impersonate as “HTTP” user for the request to work).

The reason for such failure is because when hadoop.proxyuser.hive.groups is set as “*”, Hive will not bother to check for user’s existence, since every user is allowed. However, when a list of users are defined here, when Hive impersonates as those users, it will try to make sure that those users exist on the host that Hive runs. In our case, “HTTP” user did not exist on HMS host, HMS failed with the error we saw earlier. So we just need to add this user to resolve the issue.

Hope above helps for anyone who also have the same issue.