How to enable HiveServer2 audit log through Cloudera Manager

This article explains the steps required to enable audit log for HiveServer2, so that all queries run through HiveServer2 will be audited into a central log file.

Please follow the steps below:

  1. Go to Cloudera Manager home page > Hive > Configuration
  2. Tick “Enable Audit Collection”
  3. Ensure “Audit Log Directory” location point to a path that has enough disk space
  4. Go to Cloudera Manager home page > click on “Cloudera Management Service” > Instances
  5. Click on “Add Role Instances” button on the top right corner of the page
  6. Choose a host for Navigator Audit Server & Navigator Metadata Server
  7. Then follow on screen instructions to finish adding the new roles
  8. Once the roles are added successfully, Cloudera Manager will ask you to restart a few services, including Hive
  9. Go ahead and restart Hive

After restarting, Hive’s audit log will be enabled and logged into /var/log/hive/audit directory by default.

Please note that you are not required start Navigator services, so if you don’t need them running, you can just leave them at STOP state, the Hive’s audit logs should still function as normal. However, it is a requirement to have Navigator installed for the audit log to function properly, as there are some libraries from Navigator are required for audit to work.

How to query a multiple delimited table in Hive

This article explains how to query a multi delimited Hive table in CDH5.3.x.

Use case as follow:

Having the following table definitions:

CREATE TABLE test_multi 
(a string, b string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES (
    "field.delim"="#|",
    "collection.delim"=":",
    "mapkey.delim"="@"
);

Query all columns is OK:

select * from test_multi;
+---------------+---------------+
| test_multi.a  | test_multi.b  |
+---------------+---------------+
| eric          | test more     |
+---------------+---------------+
1 row selected (1.58 seconds)

However, query a single column will get the following error in HS2:

select a from test_multi;
Error: Error while processing statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
        at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:334)
        at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:352)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
        ... 22 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
        at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:304)
        ... 24 more

This happens in CDh5.3.3, which ships Hive 0.13, and I am not sure whether it also applies to CDh5.4.x and CDh5.5.x.

This is caused by class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe was not loaded to Hive’s UDF list when HiveServer2 starts up. We need to copy the JAR file that contains this class to Hive’s AUX directory.

Steps as follows:

1) Locate the AUX directory for HiveServer2, if you don’t have one create one and update Hive’s configuration through Cloudera Manager. If you don’t use Cloudera Manager, simply create a directory on HiveServer2 host, in my case is /hive/jars.

2) Create a symlink to file /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar (or if not using Cloudera Manager, /usr/lib/hive/lib/hive-contrib.jar) from within /hive/jars

ln -s /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar /hive/jars/hive-contrib.jar

3) If you don’t use Cloudera Manager, add the following:

<property>
<name>hive.aux.jars.path</name>
<value>/hive/jars/hive-contrib.jar</value>
</property>

to hive-site.xml for HiveServer2.

If you use Cloudera Manager, simply go to step 4.

4) Restart HiveServer2.

This should be able to remove the error we saw earlier in the post and get Hive query working.

Hope this helps.

Kerberos connections to HIveServer2 not working cross domain

The following is the scenario of the cross domain problem with Kerberized cluster:

1. Cluster is within realm “DEV.EXAMPLE.COM”
2. Client is outside cluster with realm “EXAMPLE.COM”
3. Connect to Impala from client machine works
4. Connect to HS2 from client machine does not work and get the following error:

java.lang.IllegalArgumentException: Illegal principal name <user>@EXAMPLE.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: 
No rules applied to <user>@EXAMPLE.COM
	at org.apache.hadoop.security.User.<init>(User.java:50)
	at org.apache.hadoop.security.User.<init>(User.java:43)
	at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1221)
	at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1205)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:689)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to <user>@EXAMPLE.COM
	at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389)
	at org.apache.hadoop.security.User.<init>(User.java:48)
	... 8 more

This is caused by HDFS not resolving the principal from cross domain to the local user in the cluster. To fix the issue, follow the steps below:

1. In Cloudera Manager go to HDFS > Configuration > search for “Trusted Kerberos Realms” > add “EXAMPLE.COM” to list
2. Firstly restart HS2
3. Confirm that we can connect to HS2 from client now
4. Restart the rest of the services

This should allow user to connect to HS2 from outside the cluster’s realm.