Impala Query Profile Explained – Part 1

If you work with Impala, but have no idea how to interpret the Impala query PROFILEs, it would be very hard to understand what’s going on and how to make your query run at its full potential. I think this is the case for lots of Impala users, so I would like to write a simple blog post to share my experience and hope that it can help with anyone who like to learn more.

This is the Part 1 of the series, so I will go with the basics and just cover the main things to look out for when examining the PROFILE.

So first thing first, how do you collect Impala query PROFILE? Well, there are a couple of ways. The simplest way is to just run “PROFILE” after your query in impala-shell, like below:

[impala-daemon-host.com:21000] > SELECT COUNT(*) FROM sample_07;
Query: SELECT COUNT(*) FROM sample_07
Query submitted at: 2018-09-14 15:57:35 (Coordinator: https://impala-daemon-host.com:25000)
dQuery progress can be monitored at: https://impala-daemon-host.com:25000/query_plan?query_id=36433472787e1cab:29c30e7800000000
+----------+
| count(*) |
+----------+
| 823      |
+----------+
Fetched 1 row(s) in 6.68s

[impala-daemon-host.com:21000] > PROFILE; <-- Simply run "PROFILE" as a query
Query Runtime Profile:
Query (id=36433472787e1cab:29c30e7800000000):
Summary:
Session ID: 443110cc7292c92:6e3ff4d76f0c5aaf
Session Type: BEESWAX
.....

You can also collect from Cloudera Manager Web UI, by navigating to CM > Impala > Queries, locate the query you just ran and click on “Query Details”

Then scroll down a bit to locate “Download Profile” button:

Last, but not least, you can navigate to Impala Daemon’s web UI and download from there. Go to the Impala Daemon that is used as the coordinator to run the query:

https://{impala-daemon-url}:25000/queries

The list of queries will be displayed:

Click through the “Details” link and then to “Profile” tab:

All right, so we have the PROFILE now, let’s dive into the details.

Below is the snippet of Query PROFILE we will go through today, which is the Summary section at the top of the PROFILE:

Query (id=36433472787e1cab:29c30e7800000000):
Summary:
Session ID: 443110cc7292c92:6e3ff4d76f0c5aaf
Session Type: BEESWAX
Start Time: 2018-09-14 15:57:35.883111000
End Time: 2018-09-14 15:57:42.565042000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.11.0-cdh5.14.x RELEASE (build 50eddf4550faa6200f51e98413de785bf1bf0de1)
User: hive@VPC.CLOUDERA.COM
Connected User: hive@VPC.CLOUDERA.COM
Delegated User:
Network Address: ::ffff:172.26.26.117:58834
Default Db: default
Sql Statement: SELECT COUNT(*) FROM sample_07
Coordinator: impala-daemon-url.com:22000
Query Options (set by configuration):
Query Options (set by configuration and planner): MT_DOP=0
Plan:
----------------

Let’s break it into sections and walk through one by one. There are a few important information here that used more often:

a. Query ID:

Query (id=36433472787e1cab:29c30e7800000000):

This is useful to identify relevant Query related information from Impala Daemon logs. Simply search this query ID and you can find out what it was doing behind the scene, especially useful for finding out related error messages.

b. Session Type:

Session Type: BEESWAX

This can tell us where the connection is from. BEESWAX means that the query ran from impala-shell client. If you run from Hue, the type will be “HIVESERVER2” since Hue connects via HiveServer2 thrift.

c. Start and End time:

Start Time: 2018-09-14 15:57:35.883111000
End Time: 2018-09-14 15:57:42.565042000

This is useful to tell how long the query ran for. Please keep it in mind that this time includes session idle time. So if you run a simple query that returns in a few seconds in Hue, since Hue keeps session open until session is closed or user runs another query, so the time here might show longer time than normal. The start and end time should match exactly the run time if run through impala-shell however, since impala-shell closes query handler straightaway after query finishes.

d. Query status:

Query Status: OK

This tells if the query finished successfully or not. OK means good. If there are errors, normally will show here, for example, cancelled by user, session timeout, Exceptions etc.

e. Impala version:

Impala Version: impalad version 2.11.0-cdh5.14.x RELEASE (build 50eddf4550faa6200f51e98413de785bf1bf0de1)

This confirms the version that is used to run the query, if you see this is not matching with your installation, then something is not setup properly.

f. User information:

User: hive@XXX.XXXXXX.COM
Connected User: hive@XXX.XXXXXX.COM
Delegated User:

You can find out who ran the query from this session, so you know who to blame :).

g. DB selected on connection:

Default Db: default

Not used a lot, but good to know.

h. The query that used to return this PROFILE:

Sql Statement: SELECT COUNT(*) FROM sample_07

You will need this info if you are helping others to troubleshoot, as you need to know how query was constructed and what tables are involved. In lots of cases that a simple rewrite of the query will help to resolve issues or boost query performance.

i. The impala daemon that is used to run the query, what we called the Coordinator:

Coordinator: impala-daemon-host.com:22000

This is important piece of information, as you will determine which host to get the impala daemon log should you wish to check for INFO, WARNING and ERROR level logs.

j. Query Options used for this query:

Query Options (set by configuration):
Query Options (set by configuration and planner): MT_DOP=0

This section tells you what kind of QUERY OPTIONS being applied to the current query, if there are any. This is useful to see if there is any user level, or pool level overrides that will affect this query. One example would be if Impala Daemon’s memory is set at, say 120GB, but a small query still fails with OutOfMemory error. This is the place you will check if user accidentally set MEM_LIMIT in their session to a lower value that could results in OutOfMemory error.

This concludes the part 1 of the series to explain the Summary section of the query to understand the basic information. In the next part of the series, I will explain in detail on Query Plan as well as the Execution Summary of the PROFILE.

Any comments or suggestions, please let me know from the comments section below. Thanks

Simple Tool to Enable SSL/TLS for CM/CDH Cluster

Since earlier this year, Cloudera has started a new program that allows each Support Engineer to do a full week offline self-learning. Topics can be chosen by each individual engineer so long as the outcome has a value to the business, It can be either the engineer skilled up with a certification that helps with day to day work, or a presentation to share with the rest of the team what he/she had learnt from the week doing self-learning. Last week, from 27th of August to 31st of August was my turn.

After a careful consideration, I thought that my knowledge on SSL/TLS area needed to be skilled up, so I had decided to find some SSL/TLS related courses on either SafariOnline or Lynda, and then see if I could try to enable Cloudera Manager as well as most of the CDH services with SSL/TLS, ideally to put everything into a script so that this process can be automated. I discussed this with my manager and we agreed on my plan.

On the first two days, I found a couple of very useful video courses from Lynda.com, see below link:

SSL Certificates For Web Developers
Learning Secure Sockets Layer

They were very useful in helping me getting a better understanding of the fundamental of SSL/TLS and how to generate keys and sign the cerficate all by yourself.

After that I reviewed Cloudera’s official online documentation on how to enable SSL/TLS for Cloudera Manager as well as the rest of CDH services and built a little tool that is written in shell script to allow anyone to generate certificates on the fly and enable SSL/TLS for his/her cluster with a simple couple of commands.

The documentation links can be found below:

Configuring TLS Encryption for Cloudera Manager
Configuring TLS/SSL Encryption for CDH Services

I have published this little tool on github and is available here. Currently it supports enabling SSL/TLS for the following services:

Cloudera Manager (from Level 1 to Level 3 security)
HDFS
YARN
Hive
Impala
Oozie
HBase
Hue

With this tool, user can enable SSL/TLS for any of the above services with ease in a few minutes.

If you have any suggestions or comments, please leave them in the comment section below, thanks.

How to Clean Up Deleted Projects in Cloudera Data Science Workbench

We have got a few customer complains about the fact that currently Cloudera Data Science Workbench (CDSW) does not release the underlining project files on disk after the project is deleted from within the CDSW web console.

This can be re-produced easily by creating a dummy project in CDSW, check the project directory created under /var/lib/cdsw/current/projects/projects/0, and then delete the project again, you will see that the newly created project directory is not removed on the file system.

This is a known bug and reported internally in Cloudera, however, it has not been fixed as yet.

To workaround the issue, you can setup a simple shell script to detect the orphaned project and remove it automatically.

Steps as below:

1. Get the list of Project IDs from the directory /var/lib/cdsw/current/projects/projects/0/ on the master host, in my example, it returned below:

ls /var/lib/cdsw/current/projects/projects/0/
1  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48

2. Run ‘cdsw status’ command on the master host to capture the DB pod id:

Sending detailed logs to [/tmp/cdsw_status_T8jRig.log] ...
CDSW Version: [1.3.0:9bb84f6]
OK: Application running as root check
OK: Sysctl params check
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                 NAME                |   STATUS   |           CREATED-AT          |   VERSION   |   EXTERNAL-IP   |          OS-IMAGE         |         KERNEL-VERSION         |   GPU   |   STATEFUL   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   ericlin-xxx.xxx-1.com   |    True    |   2018-07-11 03:30:45+00:00   |   v1.6.11   |       None      |   CentOS Linux 7 (Core)   |   3.10.0-514.26.2.el7.x86_64   |    0    |    False     |
|   ericlin-xxx.xxx-2.com   |    True    |   2018-07-11 03:30:32+00:00   |   v1.6.11   |       None      |   CentOS Linux 7 (Core)   |   3.10.0-514.26.2.el7.x86_64   |    0    |     True     |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2/2 nodes are ready.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                             NAME                            |   READY   |    STATUS   |   RESTARTS   |           CREATED-AT          |       POD-IP      |      HOST-IP      |   ROLE   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|             etcd-ericlin-xxx.xxx-1.com                      |    1/1    |   Running   |      5       |   2018-07-11 03:31:44+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|        kube-apiserver-ericlin-xxx.xxx-1.com                 |    1/1    |   Running   |      5       |   2018-07-11 03:30:31+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|   kube-controller-manager-ericlin-xxx.xxx-1.com             |    1/1    |   Running   |      5       |   2018-07-11 03:31:54+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|                  kube-dns-3911048160-30l05                  |    3/3    |   Running   |      15      |   2018-07-11 03:30:45+00:00   |    100.66.128.1   |   172.26.12.157   |   None   |
|                       kube-proxy-c4xk7                      |    1/1    |   Running   |      4       |   2018-07-11 03:30:45+00:00   |    172.26.14.58   |    172.26.14.58   |   None   |
|                       kube-proxy-k95s2                      |    1/1    |   Running   |      5       |   2018-07-11 03:30:45+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|        kube-scheduler-ericlin-xxx.xxx-1          .com       |    1/1    |   Running   |      5       |   2018-07-11 03:31:57+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|               node-problem-detector-v0.1-0624z              |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|               node-problem-detector-v0.1-b80tt              |    1/1    |   Running   |      4       |   2018-07-11 03:32:15+00:00   |    172.26.14.58   |    172.26.14.58   |   None   |
|                       weave-net-469fb                       |    2/2    |   Running   |      12      |   2018-07-11 03:30:45+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|                       weave-net-8dzx6                       |    2/2    |   Running   |      10      |   2018-07-11 03:30:45+00:00   |    172.26.14.58   |    172.26.14.58   |   None   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
All required pods are ready in cluster kube-system.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                  NAME                  |   READY   |    STATUS   |   RESTARTS   |           CREATED-AT          |       POD-IP      |      HOST-IP      |           ROLE           |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|         cron-1906902965-wzkp5          |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.9   |   172.26.12.157   |           cron           |
|          db-1165222207-dg98q           |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.6   |   172.26.12.157   |            db            |
|           engine-deps-1rvcl            |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.4   |   172.26.12.157   |       engine-deps        |
|           engine-deps-njwlc            |    1/1    |   Running   |      4       |   2018-07-11 03:32:15+00:00   |     100.66.0.5    |    172.26.14.58   |       engine-deps        |
|   ingress-controller-684706958-6fzh3   |    1/1    |   Running   |      5       |   2018-07-11 03:32:14+00:00   |   172.26.12.157   |   172.26.12.157   |    ingress-controller    |
|        livelog-2502658797-kmq4l        |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.3   |   172.26.12.157   |         livelog          |
|      reconciler-2738760185-1nnsp       |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.2   |   172.26.12.157   |        reconciler        |
|       spark-port-forwarder-krtw6       |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |   172.26.12.157   |   172.26.12.157   |   spark-port-forwarder   |
|       spark-port-forwarder-rbhc6       |    1/1    |   Running   |      4       |   2018-07-11 03:32:15+00:00   |    172.26.14.58   |    172.26.14.58   |   spark-port-forwarder   |
|          web-3320989329-7php0          |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.7   |   172.26.12.157   |           web            |
|          web-3320989329-ms63k          |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.5   |   172.26.12.157   |           web            |
|          web-3320989329-zdpcj          |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.8   |   172.26.12.157   |           web            |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
All required pods are ready in cluster default.
All required Application services are configured.
All required config maps are ready.
All required secrets are available.
Persistent volumes are ready.
Persistent volume claims are ready.
Ingresses are ready.
Checking web at url: http://ericlin-xxx.xxx-1.com
OK: HTTP port check
Cloudera Data Science Workbench is ready!

You can see that from above example, the DB pod ID is: db-1165222207-dg98q

3. Run below command to connect to CDSW database:

kubectl exec db-1165222207-dg98q -ti -- psql -U sense

4. On prompt, run PostgreSQL to check which project has been delete (the one not in the DB)

sense=# SELECT id, user_id, name, slug FROM projects;
 id | user_id |      name      |      slug
----+---------+----------------+----------------
 41 |       1 | Impala Project | impala-project
 34 |       3 | Test           | test
  1 |       1 | Test           | test
 47 |      10 | tensortest     | tensortest
 44 |       9 | TestEnvVar     | testenvvar
 40 |       4 | hbase          | hbase
 36 |       2 | Scala Test     | scala-test
 46 |       1 | R Project      | r-project
 45 |       4 | spackshell     | spackshell
 37 |       4 | tim            | tim
 48 |      10 | rtest          | rtest
 35 |       1 | Scala Project  | scala-project
 39 |       5 | salim          | salim
 38 |       4 | timtest        | timtest
(14 rows)

To put all above together, I have below quick shell script that can do the job:

for project_id in `ls /var/lib/cdsw/current/projects/projects/0/`
do
  echo "Processing $project_id"
  rows=`kubectl exec $(cdsw status | grep 'db-' | cut -d '|' -f 2 | sed 's/ //g') -ti -- psql -U sense -c "SELECT * FROM projects WHERE ID = $project_id" | grep '0 row' | wc -l`
  if [ $rows -gt 0 ]; then
    echo "Project $project_id has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/$proejct_id"
  fi
done

The output looks like something below:

Processing project 1
Processing project 34
Processing project 35
Processing project 36
Processing project 37
Processing project 38
Processing project 39
Processing project 40
Processing project 41
Processing project 42
Project 42 has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/
Processing project 43
Project 43 has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/
Processing project 44
Processing project 45
Processing project 46
Processing project 47
Processing project 48

So, until there is a fix for this issue, I hope above simple shell script can help.

Any suggestions or ideas, please let me know in the comments section below, thanks a lot in advance.

Oozie SSH Action Does Not Support Chained Commands – OOZIE-1974

I have seen quite a few CDH users who try to run chained Linux command via Oozie’s SSH Action. Example is like below:

<action name="sshTest">
  <ssh xmlns="uri:oozie:ssh-action:0.1">
    <host>${sshUserHost}</host>
    <command>kinit test.keytab test@TEST.COM ; python ....</command>
    <capture-output/>
  </ssh>
  <ok to="nextActino"/>
  <error to="kill"/>
</action>

We can see that the command to run on remote host is as below:

kinit test.keytab test@TEST.COM ; python ....

This is OK if both commands can finish successfully very quickly. However it will cause the SSH action to fail if the python command needs to run for a certain time, say more than 5-10 minutes. Below is the example log messages produced in Oozie’s server log while the SSH action is running:

2018-08-13 10:01:48,215 WARN org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[{oozie-host}] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000234-172707121423674-oozie-oozi-W] ACTION[0000234-172707121423674-oozie-oozi-W@sshTest] Received early callback for action still in PREP state; will wait [10,000]ms and requeue up to [5] more times
2018-08-13 10:01:48,216 WARN org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[{oozie-host}] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000234-172707121423674-oozie-oozi-W] ACTION[0000234-172707121423674-oozie-oozi-W@sshTest] Received early callback for action still in PREP state; will wait [10,000]ms and requeue up to [5] more times

....

2018-08-13 10:02:38,243 ERROR org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[cdlpf1hdpm1004.es.ad.adp.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000234-172707121423674-oozie-oozi-W] ACTION[0000234-172707121423674-oozie-oozi-W@sshTest] XException, 
org.apache.oozie.command.CommandException: E0822: Received early callback for action [0000234-172707121423674-oozie-oozi-W@sshTest] while still in PREP state and exhausted all requeues
 at org.apache.oozie.command.wf.CompletedActionXCommand.execute(CompletedActionXCommand.java:114)
 at org.apache.oozie.command.wf.CompletedActionXCommand.execute(CompletedActionXCommand.java:39)
 at org.apache.oozie.command.XCommand.call(XCommand.java:286)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

The reason for the failure is because Oozie currently does not support chained Linux commands on SSH Action, which is tracked via upstream JIRA OOZIE-1974.

Below is what happened behind the scene:

1. ssh-base.sh and ssh-wrapper.sh files will be copied to target host
https://github.com/apache/oozie/blob/master/core/src/main/resources

2. Oozie will run below command from Oozie server via ssh directly to the target host:

sh ssh-base.sh FLATTEN_ARGS curl "http://{oozie-host}:11000/oozie/callback?id=0000234-172707121423674-oozie-oozi-W@sshTest&status=#status" \
"--data-binary%%%@#stdout%%%--request%%%POST%%%--header%%%\"content-type:text/plain\"" \
0000234-172707121423674-oozie-oozi-W@sshTest@3 kinit test.keytab test@TEST.COM ; python ....

3. based on the command from above, we can see that the command was rebuilt, now the full command will be broken into two commands:

sh ssh-base.sh FLATTEN_ARGS curl "http://{oozie-host}:11000/oozie/callback?id=0000234-172707121423674-oozie-oozi-W@sshTest&status=#status" \
"--data-binary%%%@#stdout%%%--request%%%POST%%%--header%%%\"content-type:text/plain\"" \
0000234-172707121423674-oozie-oozi-W@sshTest@3 kinit test.keytab test@TEST.COM

and

python ....

Not the original “kinit test.keytab test@TEST.COM” and “python ….”

4. ssh-base.sh script will in term run below command:

sh ssh-wrapper.sh FLATTEN_ARGS curl "http://{oozie-host}:11000/oozie/callback?id=0000234-172707121423674-oozie-oozi-W@sshTest&status=#status" \
"--data-binary%%%@#stdout%%%--request%%%POST%%%--header%%%\"content-type:text/plain\"" \
0000234-172707121423674-oozie-oozi-W@sshTest@3 kinit test.keytab test@TEST.COM

This command will finish very quickly and triggered callback curl call immediately, however, the “python” command will cause the SSH job to not finish until it finishes. This is causing the Oozie job in the pending state and causing the callback to fail after timeout because the Oozie job and SSH job states are not consistent.

So until OOZIE-1974 is fixed, the solution is to put both the commands inside a single script file and make it available to run on the remote host.

Hope above helps.

WebHCat Request Failed With Error “id: HTTP: no such user”

WebHCat, previously known as Templeton, is the REST API for HCatalog, a table and storage management layer for Hadoop. Users can use WebHCat to access metadata information from HCatalog, as well as submitting jobs for MapReduce, Hive & Pig.

Below is an example of how to retrieve a list of databases via WebHCat API:

curl --negotiate -u: http://{webhcat-hostname}:50111/templeton/v1/ddl/database/

Please note that port 50111 is the default port number for WebHCat. And sample output looks like below:

{"databases":["default","mytest","s3_test","udf_db"]}

However, recently I was facing an issue that WebHCat request failed with below error:

2018-08-08 17:18:44,413 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hive (auth:PROXY) via HTTP/{webhcat-hostname}@CDH511.COM (auth:KERBEROS) cause:org.apache.thrift.transport.TTrans
portException: java.net.SocketException: Connection reset
2018-08-08 17:18:44,414 ERROR org.apache.hive.hcatalog.templeton.CatchallExceptionMapper: java.lang.reflect.UndeclaredThrowableException
java.io.IOException: java.lang.reflect.UndeclaredThrowableException
...
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
....
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1882)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1872)
....

From the stacktrace, we can see that WebHCat failed when trying to collect delegation token from HiveMetaStore. So checking the HMS server log, I found below error:

2018-08-08 17:18:19,675 WARN  org.apache.hadoop.security.ShellBasedUnixGroupsMapping: [pool-7-thread-2]: unable to return groups for user HTTP
PartialGroupNameException The user name 'HTTP' is not found. id: HTTP: no such user
id: HTTP: no such user

It is pretty clear that HMS failed due to user “HTTP” is missing. Adding “HTTP” user on the HMS server host resolved the issue.

Research further, I realized that this was due to in Hive configuration, hadoop.proxyuser.hive.groups was set to a list of groups, rather than “*”, and “HTTP” was one in the group list. You will not get such error if the hadoop.proxyuser.hive.groups is set at “*”, and only failed if “HTTP” was added manually (it is required to be on this list if the value is not “*”, because “hive” user need to be able to impersonate as “HTTP” user for the request to work).

The reason for such failure is because when hadoop.proxyuser.hive.groups is set as “*”, Hive will not bother to check for user’s existence, since every user is allowed. However, when a list of users are defined here, when Hive impersonates as those users, it will try to make sure that those users exist on the host that Hive runs. In our case, “HTTP” user did not exist on HMS host, HMS failed with the error we saw earlier. So we just need to add this user to resolve the issue.

Hope above helps for anyone who also have the same issue.