Impala query failed with error “Failed to parse view-definition statement” after upgrade

If you have just upgraded CDH or Impala, from CDH5.11 and before to CDH5.12 and after, or Impala 2.8 and before to Impala 2.9 and after, and you noticed that your Impala query against certain VIEWs failed with below error:

AnalysisException: Failed to load metadata for table: 'db.view_name' CAUSED BY: 
TableLoadingException: Failed to parse view-definition statement of view: db.view_name

Then please read on.

Firstly, please confirm if your VIEW contains a keyword like “SORT”, by running “SHOW CREATE TABLE view_name”. If the output looks something like below:

CREATE VIEW default.test_view AS SELECT a, sort FROM default.test

Notice the word “sort” in the query, then you are hitting the issue I am talking about in this post.

This is because in CDH5.12.x or Impala 2.9 version, the JIRA IMPALA-4166 was introduced. This JIRA adds support for “SORT BY” operation in Impala, which makes “SORT” a keyword in Impala from this version onwards. And because SORT is now a keyword, any reference to it in other part of the query will cause the query to be invalid, and cause the query to fail.

You can see below stacktrace from Impala Coordinator log:

CAUSED BY: TableLoadingException: Failed to parse view-definition statement of view: db.view_name
@ 0x83efb9 impala::Status::Status()
@ 0xb747c2 impala::JniUtil::GetJniExceptionMsg()
@ 0xa7dfcb impala::Frontend::GetExecRequest()
@ 0xa98fc8 impala::ImpalaServer::ExecuteInternal()
@ 0xaa0278 impala::ImpalaServer::Execute()
@ 0xadf28e impala::ImpalaServer::query()
@ 0xd71cd5 beeswax::BeeswaxServiceProcessor::process_query()
@ 0xd74fd4 beeswax::BeeswaxServiceProcessor::dispatchCall()
@ 0x80ecdc apache::thrift::TDispatchProcessor::process()
@ 0x1b596ab apache::thrift::server::TThreadPoolServer::Task::run()
@ 0x1b40fc9 apache::thrift::concurrency::ThreadManager::Worker::run()
@ 0x9d5c69 impala::ThriftThread::RunRunnable()
@ 0x9d6a42 boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0xbd6fe2 impala::Thread::SuperviseThread()
@ 0xbd7744 boost::detail::thread_data<>::run()
@ 0xe6417a (unknown)

To fix the issue, there are two ways:

1. Add “`” to the SORT keyword, by DROP and re-CREATE the view, like below:

DROP VIEW default.test_view;
Query: DROP VIEW default.test_view
Fetched 0 row(s) in 4.35s

CREATE VIEW default.test_view AS SELECT a, `sort` FROM default.test;
Query: CREATE VIEW default.test_view AS SELECT a, `sort` FROM default.test
Fetched 0 row(s) in 3.87s

SELECT * FROM default.test_view;
Query: SELECT * FROM default.test_view
Fetched 0 row(s) in 3.63s

2. DROP and re-CREATE the VIEW in beeline, without modifying the VIEW creation query, if your view is complex.

This will work because Hive will add “`” for you automatically, which Impala does not.

This should help to resolve your cluster upgrade issue.

If this post helps, please share your experience below in the comments, thanks.

Enable Debug Logging for YARN Application in Hive

When troubleshooting Hadoop related issues, we constantly need to enable DEBUG level logging so that we can see more what Hadoop is doing when running jobs. This blog post in particular shows how you can enable DEBUG logging for YARN application jobs when you run them through Hive.

To do so, it is simple. Just add below three lines before you run your query in Hive session:

SET mapreduce.map.log.level=DEBUG;
SET mapreduce.reduce.log.level=DEBUG;
SET yarn.app.mapreduce.am.log.level=DEBUG;

This will turn on Mapper, Reducer and Application Master log to DEBUG level. If you just need one of them, then simply remove the other ones.

Then you can just run your query as normal:

SELECT COUNT(*) FROM tablename;

Before enabling the DEBUG in Mapper log:

2018-05-22 21:06:30,267 WARN [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Metrics system not started: org.apache.commons.configuration.ConfigurationException: Unable to load the configuration from the URL file:/run/cloudera-scm-agent/process/1370-yarn-NODEMANAGER/hadoop-metrics2.properties
2018-05-22 21:06:30,338 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2018-05-22 21:06:30,339 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1526873868324_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@54b83d10)
2018-05-22 21:06:30,686 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: HDFS_DELEGATION_TOKEN, Service: 10.17.100.142:8020, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host-10-17-100-142.coe.cloudera.com@CDH511.COM, renewer=yarn, realUser=, issueDate=1527048353000, maxDate=1527653153000, sequenceNumber=787, masterKeyId=173)
2018-05-22 21:06:31,556 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2018-05-22 21:06:32,194 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm2/usercache/hive/appcache/application_1526873868324_0001
2018-05-22 21:06:33,738 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2018-05-22 21:06:34,271 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]

After DEBUG is enabled:

2018-05-22 21:08:00,607 DEBUG [main] org.apache.hadoop.mapred.YarnChild: Child starting
2018-05-22 21:08:01,936 DEBUG [main] org.apache.hadoop.metrics2.lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
2018-05-22 21:08:01,968 DEBUG [main] org.apache.hadoop.metrics2.lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
2018-05-22 21:08:01,968 DEBUG [main] org.apache.hadoop.metrics2.lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[GetGroups], always=false, type=DEFAULT, sampleName=Ops)
2018-05-22 21:08:01,969 DEBUG [main] org.apache.hadoop.metrics2.lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Renewal failures since startup], always=false, type=DEFAULT, sampleName=Ops)
2018-05-22 21:08:01,969 DEBUG [main] org.apache.hadoop.metrics2.lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Renewal failures since last successful login], always=false, type=DEFAULT, sampleName=Ops)
2018-05-22 21:08:01,970 DEBUG [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
2018-05-22 21:08:02,125 DEBUG [main] org.apache.hadoop.security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
2018-05-22 21:08:02,169 DEBUG [main] org.apache.hadoop.security.Groups:  Creating new Groups object
2018-05-22 21:08:02,211 DEBUG [main] org.apache.hadoop.util.Shell: Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
	at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:351)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:376)
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
	at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:168)
	at org.apache.hadoop.security.Groups.<init>(Groups.java:132)
	at org.apache.hadoop.security.Groups.<init>(Groups.java:100)
	at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:435)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:337)
	at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:374)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:78)
2018-05-22 21:08:02,425 DEBUG [main] org.apache.hadoop.util.Shell: setsid exited with exit code 0
2018-05-22 21:08:02,521 DEBUG [main] org.apache.hadoop.security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000; warningDeltaMs=5000
2018-05-22 21:08:02,548 DEBUG [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: from system property: null
2018-05-22 21:08:02,549 DEBUG [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: from environment variable: null
2018-05-22 21:08:02,614 DEBUG [main] org.apache.commons.configuration.ConfigurationUtils: ConfigurationUtils.locate(): base is null, name is hadoop-metrics2-maptask.properties
2018-05-22 21:08:02,616 DEBUG [main] org.apache.commons.configuration.ConfigurationUtils: ConfigurationUtils.locate(): base is null, name is hadoop-metrics2.properties
2018-05-22 21:08:02,617 DEBUG [main] org.apache.commons.configuration.ConfigurationUtils: Loading configuration from the context classpath (hadoop-metrics2.properties)
2018-05-22 21:08:02,620 WARN [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Metrics system not started: org.apache.commons.configuration.ConfigurationException: Unable to load the configuration from the URL file:/run/cloudera-scm-agent/process/1370-yarn-NODEMANAGER/hadoop-metrics2.properties

Hope above helps!

Patch for SENTRY-2240 – DROP UDF Permission Issue in Sentry

Last week, I have discovered an issue in Sentry that it does not check permissions properly when a user is trying to DROP a function. To re-produce this is easy, simply create a function under a database using admin account, and make sure that one particular user does not have ANY permissions on the database that the UDF was created under. Then, try to use that user to DROP the function.

I immediately checked if there is any upstream JIRA reported, but I was not able to find any, hence I filed a new JIRA, please see SENTRY-2240.

I have done some patches before for Sqoop and Hive, however, I have not done any for Sentry yet, so I think this JIRA is a good one to start. I went ahead to check out Sentry code from github, examined through the code to see what was wrong and could see that for CREATE and DROP FUNCTION calls, Sentry does not care what database the user was under:

HiveAuthzBindingHook.java#L226
HiveAuthzBindingHook.java#L230

      case HiveParser.TOK_CREATEFUNCTION:
        ........

        // create/drop function is allowed with any database
        currDB = Database.ALL;
        break;
      case HiveParser.TOK_DROPFUNCTION:
        // create/drop function is allowed with any database
        currDB = Database.ALL;
        break;

I have spent last weekend thinking about and applying fixes and also updating test cases that are affected. I have forked into my repository and created a branch to track my changes until final version. Please refer to https://github.com/ericlin05/sentry/tree/SENTRY-2240.

If you have any comments on my patch or want to discuss it, please add your comments below.

Configure Postfix to use Gmail as a Mail Relay

We all know that Gmail is popular these days, from personal usage to business heavy users. I myself use Gmail on daily basis, both for my personal email as well as at work.

My WordPress site recently got malware, and it kept creating script files as well as tried to send emails through my Postfix server, running on the same host as my blog.

Before this, I have never spent time to sit down and setup some scripts properly to monitor my host and website. Given the latest incidents I have with my host that the mail log filled up my disk space, due to endless attempts to send emails through my host and failed every time. Now I think I need to spend sometime get it setup properly so that I can monitor easily.

The first step is to setup my postfix server properly so that it can send emails to me for monitoring. This is important as I need to have a daily visual on server logs as well as cron job outputs. And I think the easiest way is to use Gmail as a Mail Relay, so that emails will go through Gmail from Postfix server on my host.

To do this, I have found a useful blog that detailed on the steps from HowtoForge website, I will put the steps here based on Ubuntu for self reference:

1. Install necessary packages:

sudo apt-get update && sudo apt-get install postfix mailutils

2. Configure Gmail Authentication

Create or modify a password file which will be used by Postfix to establish authentication with Gmail. In the authentication information below, replace username with your Gmail username and password with your Gmail password. If you are using a custom Gmail Apps domain name, you may replace gmail.com with your Google Apps domain.

The password file will reside in the Postfix configuration directory. The file can be named whatever you like, but the recommended filename is sasl_passwd.

a. Open file for edit:

vi /etc/postfix/sasl_passwd

b. Add the following line to above file:

[smtp.gmail.com]:587    username@gmail.com:password

c. And then make sure that the file is only readable by yourself:

chmod 600 /etc/postfix/sasl_passwd

3. Configure Postfix

Now, we need to configure Postfix by adding the following lines to its main configuration file. Open /etc/postfix/main.cf using your favourite editor and then add below content:

relayhost = [smtp.gmail.com]:587
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_security_options =
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt

4. Process Password File

Use postmap to compile and hash the contents of sasl_passwd. The results will be stored in your Postfix configuration directory in the file sasl_passwd.db.

postmap /etc/postfix/sasl_passwd

5. Restart Postfix server, so that the configurations will take effect:

sudo systemctl restart postfix.service

6. Lastly, we need to Enable “Less Secure Apps” In Gmail

By default, only the most secure sign-ins, such as logging in to Gmail on the web, are allowed for your Gmail account. To permit relay requests, log in to your Gmail account and turn on Allow less secure apps (you will need to login first before you can turn on the setting).

For more information, please refer to “Allowing less secure apps to access your account.”

7. To confirm everything is working, run below command to test:

mail -s "Test subject" recipient@domain.com

If no emails, check log file under /var/log/mail.log to see what message reported from Postfix and fix accordingly.

Hope this can also help with anyone who lands on my blog.

Oozie SSH Action Failed With “externalId cannot be empty” Error

Last week I was working with an issue that when running a very simple SSH action through Oozie, the job kept failing with “externalId cannot be empty” error. The workflow only had one single SSH action, and nothing else. See the workflow example below:

<workflow-app name="SSH Action Test" xmlns="uri:oozie:workflow:0.5">
    <start to="ssh-5c4d"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="ssh-5c4d">
        <ssh xmlns="uri:oozie:ssh-action:0.1">
            <host>user1@another-server-url</host>
            <command>ls / &gt;&gt; /tmp/test.log</command>
            <capture-output/>
        </ssh>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

And the error message from the Oozie server looked like below:

2018-01-03 06:12:45,347 ERROR org.apache.oozie.command.wf.ActionStartXCommand: 
SERVER[{oozie-server-url}] USER[admin] GROUP[-] TOKEN[] APP[SSH Action Test] JOB[0000000-180103010440574-ooz
ie-oozi-W] ACTION[0000000-180103010440574-oozie-oozi-W@ssh-5c4d] Exception,
java.lang.IllegalArgumentException: externalId cannot be empty
        at org.apache.oozie.util.ParamChecker.notEmpty(ParamChecker.java:90)
        at org.apache.oozie.util.ParamChecker.notEmpty(ParamChecker.java:74)
        at org.apache.oozie.WorkflowActionBean.setStartData(WorkflowActionBean.java:503)
        at org.apache.oozie.command.wf.ActionXCommand$ActionExecutorContext.setStartData(ActionXCommand.java:387)
        at org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:269)
        at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
        at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
        at org.apache.oozie.command.XCommand.call(XCommand.java:286)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

We confirmed that the passwordless connection from the Ooize server to the remote server worked correctly without issues.

After digging through the Oozie source code, I found out that it was due to the fact that Oozie uses Java’s Runtime.exec library to execute the commands remotely. And Runtime.exec does not work in the same way as shell, especially when re-directing output to a file, which Runtime.exec does not support at all. What happened under the hood was that Oozie will split the full command “ls / >> /tmp/test.log” into tokens “ls”, “/”, “>>”, “/tmp/test.log”, and pass all of them into Runtime.exec. And when Runtime.exec executed the command, it treated all tokens, apart from “ls” as the parameters to “ls” command. As you would expect, “>>” is not a file, and “ls” command will fail complain that file does not exist, hence will return exit status of 1, rather than 0.

Oozie tried to capture the PID of the remote process, but failed, and hence returned “externalId cannot be empty” error.

The workaround is simple, just store the full command you want to run into a new script file and ask Oozie to execute that script instead:

1. Create a file “ssh-action.sh” on the target host, for example, under /home/{user}/scripts/ssh-action.sh
2. Add command “ls / >> /tmp/ssh.log” to the file
3. Make the file executable by running:

chmod 744 /home/{user}/scripts/ssh-action.sh

4. Update Oozie workflow to run the new shell script instead:

<ssh xmlns="uri:oozie:ssh-action:0.1">
    <host>user@remote-server-url</host>
    <command>/home/{user}/scripts/ssh-action.sh</command>
    <capture-output/>
</ssh>

And then the SSH action should work perfectly.