Setting up Cloudera ODBC driver on Windows 10 I have seen lots of CDH users now have trouble setting up Hive/Impala ODBC drivers on Windows 10 machine to connect to remote Kerberized cluster recently. Connection keeps getting Kerberos related error messages. Like below:

dissertation writing english [Cloudera][Hardy] (34) Error from server: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Credential cache is empty).

follow OR

view [Cloudera][ImpalaODBC] (100) Error from the Impala Thrift API: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No credentials cache found) To help CDH users to get it working without much hassle, I would like to compile a list of steps below for reference. I have tested this in my VM Windows 10.

helpwithessaywriting net 1. For Kerberos authentication to work, you need to get a valid Kerberos ticket on your client machine, which is Windows 10. Hence, you will need to download and install MIT Kerberos client tool so that you can authenticate yourself against the remote cluster, much like running “kinit” on Linux. To get the tool, please visit and follow the links

page 2. In order for client machine to talk to remote KDC server that contains principal database, we need a valid krb5 configuration file on client side. This file normally sits under /etc/krb5.conf on Linux. On Windows 10, it should be under order essay uk C:\ProgramData\MIT\Kerberos5\krb5.ini. Please copy the krb5.conf file in your cluster and then copy to this location on your Windows machine. Please be aware that the file name in Windows should be krb5.ini, not krb5.conf. Also note that C:\ProgramData is a hidden directory, so you will need to unhide it first from File Explorer before you can access the files underneath it. 3. Make sure that you connect to correct port number, for Hive, it is normally 10000 by default. For Impala, it should be 21050, NOT 21000, which is used by impala-shell. If you have Load Balancer setup for either Hive or Impala, then the port number could also be different, please consult with your system admin to get the correct port number if this is the case. 4. Add Windows system variable KRB5CCNAME with value of “C:\krb5\krb5cc”, where “krb5cc” is a file name for the kerberos ticket cache, it can be anything, but we commonly use krb5cc or krb5cache. To do so, please follow steps below: writing an essay for college application 2014 a. open “File Explorer”
b. right click on “This PC”
c. select “Properties”
d. next to “Computer name”, click on “Change settings”
e. click on “Advanced” tab and then “Environment Variables”
f. under “System Variables”, click on “New”
g. enter “KRB5CCNAME” in “Variable name” and “C:\krb5\krb5cc” in “Variable value” (without double quotes)
h. click on “OK” and then “OK” again
i. restart Windows

follow url 5. If you have SSL enabled for either Hive or Impala, you will also need to “Enable SSL” for ODBC driver. This can be found under “SSL Options” popup window, see below screenshot for details:

source site

where to buy a research paper urgently

Please note that “SSL Options” is only available in newer version of ODBC driver, if you do not see this option, please upgrade ODBC driver to latest version. At the time of writing, Hive ODBC Driver is at 2.5.24.

That should be it. The above are the common missing steps by Windows users when trying to connect to Hive or Impala via ODBC. If you have encountered other problems that need extra steps, please leave a comment below and I will update my post.

Hope above helps.

“No data or no sasl data in the stream” Error in HiveServer2 Log

I have seen lots of users complain about seeing lots of “No data or no sasl data in the stream” errors in the HiveServer2 server log, yet they have not noticed any performance impact nor query failure for Hive. So I think it would be good to write a blog about the possible reason behind this to clarify and remove the concerns that users have.

The following shows the full error message and stacktrace taken from HiveServer2 log:

ERROR org.apache.thrift.server.TThreadPoolServer: [HiveServer2-Handler-Pool: Thread-533556]: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$
at Method)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(
at org.apache.thrift.server.TThreadPoolServer$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream

The likely cause is below:

  1. You have kerberos enabled
  2. You have multiple HiveServer2 hosts
  3. You have Load Balancer enabled in front of all HS2 servers that have such errors

If you have above setup, the error message you saw in HiveServer2 is harmless and can be safely ignored. This just indicated that SASL negotiation failed for one particular Hive client, which in this case would be the Load Balancer who pings regularly to those HiveServer2’s to check for connectivity. Those pings from LB were trying with PLAIN TCP connection, hence those messages.

There are a couple of ways to avoid those messages:

1. Reduce the frequency of pings from LB, this will reduce the errors in the log, however, won’t avoid it. I do not know a way to configure the LB to avoid PLAIN TCP connection, this is outside of scope of this blog, you might need to consult to F5 or HAProxy manual for further info.

2. Add filter to HiveServer2’s logging to filter out those exceptions:

a. Using Cloudera Manager, navigate to Hive > Configuration > “HiveServer2 Logging Advanced Configuration Snippet (Safety Valve)”
b. Copy and paste the the following configuration into the safety valve:

log4j.appender.RFA.filter.1.Expression=EXCEPTION ~= org.apache.thrift.transport.TSaslTransportException 

c. Then save and restart HiveServer2 service through Cloudera Manager.

Hope above helps.

Sqoop job failed with ClassNotFoundException

In the last few weeks, I was dealing with an issue that when importing data from DB2 into HDFS, it kept failing with NoClassDefFoundError. Below was the command details:


And the error message was:

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.util.ReflectionUtils.newInstance(
        at org.apache.sqoop.mapreduce.db.DBRecordReader.createValue(
        at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(
        at org.apache.hadoop.mapred.MapTask.runNewMapper(
        at org.apache.hadoop.mapred.YarnChild$
        at Method)
        at org.apache.hadoop.mapred.YarnChild.main(
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
        at java.lang.reflect.Constructor.newInstance(
        at org.apache.hadoop.util.ReflectionUtils.newInstance(
        ... 14 more
Caused by: java.lang.NoClassDefFoundError: ZZZ001$_part_table$1
        at ZZZ001$_part_table.init0(ZZZ001$
        at ZZZ001$_part_table.<init>(ZZZ001$
        ... 19 more
Caused by: java.lang.ClassNotFoundException: ZZZ001$_part_table$1
        at Method)
        at java.lang.ClassLoader.loadClass(
        at sun.misc.Launcher$AppClassLoader.loadClass(
        at java.lang.ClassLoader.loadClass(
        ... 21 more

By looking at the error message, it was highly suspicious that the class name ZZZ001$_part_table$1 looked wrong. This was caused by the table name itself in DB2 contained “$”: Essay Of Courage ZZZ001$.part_table. So when sqoop generated the class, the name became dissertation committee meeting agenda ZZZ001$_part_table$1, which is invalid Java class name.

To bypass this issue, the workaround is to force Sqoop to generate a customer class name by passing “–class-name” parameter. So the new command becomes:


Hope above helps.

Hive Lateral View to Flatten Array Data

In this blog post, I will show how to transform the following data in Hive


and turn it into a table with the following output:


1. To prepare the table, firstly need to create a dummy table with one record:

CREATE TABLE dummy (a int);

This is to allow us to insert array data into our testing table by SELECTing from our 1 row dummy table.

2. Create a table with array data type:

CREATE TABLE array_table (a array<int>, b array<int>, c array<int>);
INSERT INTO array_test SELECT array(1,2,3), array(3,4,5), array(6,7,8) FROM dummy;

This will get us ready with a table having the following data:

| a           | b           | c           |
| [1,2,3]     | [3,4,5]     | [6,7,8]     |

3. Now run the following query using LATERAL VIEW will give the output we want:

SELECT key1, key2, key3 FROM array_test 
LATERAL VIEW posexplode(a) t1 AS q1, key1 
LATERAL VIEW posexplode(b) t2 AS q2, key2 
LATERAL VIEW posexplode(c) t3 AS q3, key3 
WHERE q1 = q2 AND q1 = q3;

| key1  | key2  | key3  |
| 1     | 3     | 6     |
| 2     | 4     | 7     |
| 3     | 5     | 8     |

This might not be the best solution, but at least it works. However, how well the performance goes on real Big Data set, you have to test out and confirm yourself. Hope above helps.

Enabling Kerberos Debug for Hive

From time to time, we need to do troubleshooting steps for locating the root cause of Kerberos failure in Hive. I will outline below steps in order to turn on debugging message from both Client and HiveServer2 server side.

  1. To enable on Hive Client side (beeline), simply add the following export commands before you run beeline command:
    export HADOOP_JAAS_DEBUG=true;
    export HADOOP_OPTS=''

    Then the debug message will be printed on the shell when you run beeline.

  2. To enable kerberos debug on HiveServer2 side (assuming you are using Cloudera Manager)
    1. To to CM > Hive > Configuration
    2. locate “HiveServer2 Environment Advanced Configuration Snippet (Safety Valve)”
    3. add following to the textarea:
    4. Save and restart Hive service

    Once restarted, you will be able to locate the kerberos debug message from HiveServer2’s process directory on the server host, which is located under /var/run/cloudera-scm-agent/process/XXX-hive-HIVESERVER2/logs/stdout.log, where XXX is the largest number under the directory for HiveServer2

The sample debug message for kerberos looks like below:

Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
[UnixLoginModule]: succeeded importing info:
uid = 0
gid = 0
supp gid = 0
Debug is true storeKey false useTicketCache true useKeyTab false doNotPrompt true ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
>>>KinitOptions cache name is /tmp/krb5cc_0
>>>DEBUG client principal is impala/{host-name}@REAL.COM
>>>DEBUG server principal is krbtgt/REAL.COM@REAL.COM
>>>DEBUG key type: 23
>>>DEBUG auth time: Sun Aug 13 21:07:46 PDT 2017
>>>DEBUG start time: Sun Aug 13 21:07:46 PDT 2017
>>>DEBUG end time: Mon Aug 14 07:07:46 PDT 2017
>>>DEBUG renew_till time: Sun Aug 20 21:07:46 PDT 2017
>>>DEBUG client principal is impala/{host-name}@REAL.COM
>>>DEBUG server principal is X-CACHECONF:/krb5_ccache_conf_data/pa_type/krbtgt/REAL.COM@REAL.COM
>>>DEBUG key type: 0
>>>DEBUG auth time: Wed Dec 31 16:00:00 PST 1969
>>>DEBUG start time: null
>>>DEBUG end time: Wed Dec 31 16:00:00 PST 1969
>>>DEBUG renew_till time: null
>>> CCacheInputStream: readFlags()
Principal is impala/{host-name}@REAL.COM
[UnixLoginModule]: added UnixPrincipal,
to Subject
Commit Succeeded

Search Subject for Kerberos V5 INIT cred (<>,
Found ticket for impala/{host-name}@REAL.COM to go to krbtgt/REAL.COM@REAL.COM expiring on Mon Aug 14 07:07:46 PDT 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for impala/{host-name}@REAL.COM to go to krbtgt/REAL.COM@REAL.COM expiring on Mon Aug 14 07:07:46 PDT 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
default etypes for default_tgs_enctypes: 23.
>>> CksumType:
>>> EType:
>>> KdcAccessibility: reset
>>> KrbKdcReq send: TCP:88, timeout=3000, number of retries =3, #bytes=1607
>>> KDCCommunication: TCP:88, timeout=3000,Attempt =1, #bytes=1607
>>>DEBUG: TCPClient reading 1581 bytes
>>> KrbKdcReq send: #bytes read=1581
>>> KdcAccessibility: remove
>>> EType:
>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
>>> EType:
Krb5Context setting mySeqNumber to: 789412608
Created InitSecContextToken:

From above message, you can see at least below info:

  • Client config file for kerberos /etc/krb5.conf
  • Ticket case file: /tmp/krb5cc_0
  • Client principal name: impala/{host-name}@REAL.COM
  • KDC server host: and using TCP connection via port 88 (TCP:88)
  • and a lot more others that might be useful for your troubleshooting

Hope above helps.