How to Disable Actions in Oozie

Oozie is an orchestration system for managing Hadoop jobs. Currently it supports various actions, including, but not limited to, Spark1, Spark2, Shell, SSH, Hive, Sqoop and Java etc. However, due to certain business requirements, sometimes we want to disable some of the actions so that we can control how users use Oozie to run jobs.

For example, currently when you setup a SSH action, it is required that you need to setup passwordless login for a certain user from Oozie server host to the target host. This will also allow any user to be able to setup a job to run on the remote machine, so long as you know the username and remote host domain name. This is a security concern. There are thoughts on how to improve it, but not yet have a solution at this stage.

So if business has such concerns, we can disable SSH action easily. Please follow below steps (assuming that you are using Cloudera Manager to manage CDH Hadoop):

1. Go to Cloudera Manager home page > Oozie > Configuration
2. Locate configuration called “Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml”
3. Click on the “Add” button and enter “oozie.service.ActionService.executor.classes” for name and value as below:

org.apache.oozie.action.decision.DecisionActionExecutor,org.apache.oozie.action.hadoop.JavaActionExecutor,org.apache.oozie.action.hadoop.FsActionExecutor,org.apache.oozie.action.hadoop.MapReduceActionExecutor,org.apache.oozie.action.hadoop.PigActionExecutor,org.apache.oozie.action.hadoop.HiveActionExecutor,org.apache.oozie.action.hadoop.ShellActionExecutor,org.apache.oozie.action.hadoop.SqoopActionExecutor,org.apache.oozie.action.hadoop.DistcpActionExecutor,org.apache.oozie.action.hadoop.Hive2ActionExecutor,org.apache.oozie.action.oozie.SubWorkflowActionExecutor,org.apache.oozie.action.email.EmailActionExecutor,org.apache.oozie.action.hadoop.SparkActionExecutor

The full list is:

org.apache.oozie.action.decision.DecisionActionExecutor,org.apache.oozie.action.hadoop.JavaActionExecutor,org.apache.oozie.action.hadoop.FsActionExecutor,org.apache.oozie.action.hadoop.MapReduceActionExecutor,org.apache.oozie.action.hadoop.PigActionExecutor,org.apache.oozie.action.hadoop.HiveActionExecutor,org.apache.oozie.action.hadoop.ShellActionExecutor,org.apache.oozie.action.hadoop.SqoopActionExecutor,org.apache.oozie.action.hadoop.DistcpActionExecutor,org.apache.oozie.action.hadoop.Hive2ActionExecutor,org.apache.oozie.action.ssh.SshActionExecutor,org.apache.oozie.action.oozie.SubWorkflowActionExecutor,org.apache.oozie.action.email.EmailActionExecutor,org.apache.oozie.action.hadoop.SparkActionExecutor

so we just need to remove org.apache.oozie.action.ssh.SshActionExecutor action class. Basically, just remove the corresponding action classes that you do not want Oozie to support.

4. Save and restart Oozie

After that, if you try to run a SSH action through Oozie, it will fail. And sample error from Hue looks like below:

Please keep in mind that if you do make this change, remember to check the version of Oozie that you will upgrade to in the future, to make sure that the new supported classes are added to this list, otherwise other jobs will fail. For example, currently Oozie can only support running Spark1 action, Spark2 is not supported. However, in latest CDH version 6.x, Spark2 is now supported and the list will need to be updated.

Impala Query Profile Explained – Part 1

If you work with Impala, but have no idea how to interpret the Impala query PROFILEs, it would be very hard to understand what’s going on and how to make your query run at its full potential. I think this is the case for lots of Impala users, so I would like to write a simple blog post to share my experience and hope that it can help with anyone who like to learn more.

This is the Part 1 of the series, so I will go with the basics and just cover the main things to look out for when examining the PROFILE.

So first thing first, how do you collect Impala query PROFILE? Well, there are a couple of ways. The simplest way is to just run “PROFILE” after your query in impala-shell, like below:

[impala-daemon-host.com:21000] > SELECT COUNT(*) FROM sample_07;
Query: SELECT COUNT(*) FROM sample_07
Query submitted at: 2018-09-14 15:57:35 (Coordinator: https://impala-daemon-host.com:25000)
dQuery progress can be monitored at: https://impala-daemon-host.com:25000/query_plan?query_id=36433472787e1cab:29c30e7800000000
+----------+
| count(*) |
+----------+
| 823      |
+----------+
Fetched 1 row(s) in 6.68s

[impala-daemon-host.com:21000] > PROFILE; <-- Simply run "PROFILE" as a query
Query Runtime Profile:
Query (id=36433472787e1cab:29c30e7800000000):
Summary:
Session ID: 443110cc7292c92:6e3ff4d76f0c5aaf
Session Type: BEESWAX
.....

You can also collect from Cloudera Manager Web UI, by navigating to CM > Impala > Queries, locate the query you just ran and click on “Query Details”

Then scroll down a bit to locate “Download Profile” button:

Last, but not least, you can navigate to Impala Daemon’s web UI and download from there. Go to the Impala Daemon that is used as the coordinator to run the query:

https://{impala-daemon-url}:25000/queries

The list of queries will be displayed:

Click through the “Details” link and then to “Profile” tab:

All right, so we have the PROFILE now, let’s dive into the details.

Below is the snippet of Query PROFILE we will go through today, which is the Summary section at the top of the PROFILE:

Query (id=36433472787e1cab:29c30e7800000000):
Summary:
Session ID: 443110cc7292c92:6e3ff4d76f0c5aaf
Session Type: BEESWAX
Start Time: 2018-09-14 15:57:35.883111000
End Time: 2018-09-14 15:57:42.565042000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.11.0-cdh5.14.x RELEASE (build 50eddf4550faa6200f51e98413de785bf1bf0de1)
User: hive@VPC.CLOUDERA.COM
Connected User: hive@VPC.CLOUDERA.COM
Delegated User:
Network Address: ::ffff:172.26.26.117:58834
Default Db: default
Sql Statement: SELECT COUNT(*) FROM sample_07
Coordinator: impala-daemon-url.com:22000
Query Options (set by configuration):
Query Options (set by configuration and planner): MT_DOP=0
Plan:
----------------

Let’s break it into sections and walk through one by one. There are a few important information here that used more often:

a. Query ID:

Query (id=36433472787e1cab:29c30e7800000000):

This is useful to identify relevant Query related information from Impala Daemon logs. Simply search this query ID and you can find out what it was doing behind the scene, especially useful for finding out related error messages.

b. Session Type:

Session Type: BEESWAX

This can tell us where the connection is from. BEESWAX means that the query ran from impala-shell client. If you run from Hue, the type will be “HIVESERVER2” since Hue connects via HiveServer2 thrift.

c. Start and End time:

Start Time: 2018-09-14 15:57:35.883111000
End Time: 2018-09-14 15:57:42.565042000

This is useful to tell how long the query ran for. Please keep it in mind that this time includes session idle time. So if you run a simple query that returns in a few seconds in Hue, since Hue keeps session open until session is closed or user runs another query, so the time here might show longer time than normal. The start and end time should match exactly the run time if run through impala-shell however, since impala-shell closes query handler straightaway after query finishes.

d. Query status:

Query Status: OK

This tells if the query finished successfully or not. OK means good. If there are errors, normally will show here, for example, cancelled by user, session timeout, Exceptions etc.

e. Impala version:

Impala Version: impalad version 2.11.0-cdh5.14.x RELEASE (build 50eddf4550faa6200f51e98413de785bf1bf0de1)

This confirms the version that is used to run the query, if you see this is not matching with your installation, then something is not setup properly.

f. User information:

User: hive@XXX.XXXXXX.COM
Connected User: hive@XXX.XXXXXX.COM
Delegated User:

You can find out who ran the query from this session, so you know who to blame :).

g. DB selected on connection:

Default Db: default

Not used a lot, but good to know.

h. The query that used to return this PROFILE:

Sql Statement: SELECT COUNT(*) FROM sample_07

You will need this info if you are helping others to troubleshoot, as you need to know how query was constructed and what tables are involved. In lots of cases that a simple rewrite of the query will help to resolve issues or boost query performance.

i. The impala daemon that is used to run the query, what we called the Coordinator:

Coordinator: impala-daemon-host.com:22000

This is important piece of information, as you will determine which host to get the impala daemon log should you wish to check for INFO, WARNING and ERROR level logs.

j. Query Options used for this query:

Query Options (set by configuration):
Query Options (set by configuration and planner): MT_DOP=0

This section tells you what kind of QUERY OPTIONS being applied to the current query, if there are any. This is useful to see if there is any user level, or pool level overrides that will affect this query. One example would be if Impala Daemon’s memory is set at, say 120GB, but a small query still fails with OutOfMemory error. This is the place you will check if user accidentally set MEM_LIMIT in their session to a lower value that could results in OutOfMemory error.

This concludes the part 1 of the series to explain the Summary section of the query to understand the basic information. In the next part of the series, I will explain in detail on Query Plan as well as the Execution Summary of the PROFILE.

Any comments or suggestions, please let me know from the comments section below. Thanks

Simple Tool to Enable SSL/TLS for CM/CDH Cluster

Since earlier this year, Cloudera has started a new program that allows each Support Engineer to do a full week offline self-learning. Topics can be chosen by each individual engineer so long as the outcome has a value to the business, It can be either the engineer skilled up with a certification that helps with day to day work, or a presentation to share with the rest of the team what he/she had learnt from the week doing self-learning. Last week, from 27th of August to 31st of August was my turn.

After a careful consideration, I thought that my knowledge on SSL/TLS area needed to be skilled up, so I had decided to find some SSL/TLS related courses on either SafariOnline or Lynda, and then see if I could try to enable Cloudera Manager as well as most of the CDH services with SSL/TLS, ideally to put everything into a script so that this process can be automated. I discussed this with my manager and we agreed on my plan.

On the first two days, I found a couple of very useful video courses from Lynda.com, see below link:

SSL Certificates For Web Developers
Learning Secure Sockets Layer

They were very useful in helping me getting a better understanding of the fundamental of SSL/TLS and how to generate keys and sign the cerficate all by yourself.

After that I reviewed Cloudera’s official online documentation on how to enable SSL/TLS for Cloudera Manager as well as the rest of CDH services and built a little tool that is written in shell script to allow anyone to generate certificates on the fly and enable SSL/TLS for his/her cluster with a simple couple of commands.

The documentation links can be found below:

Configuring TLS Encryption for Cloudera Manager
Configuring TLS/SSL Encryption for CDH Services

I have published this little tool on github and is available here. Currently it supports enabling SSL/TLS for the following services:

Cloudera Manager (from Level 1 to Level 3 security)
HDFS
YARN
Hive
Impala
Oozie
HBase
Hue

With this tool, user can enable SSL/TLS for any of the above services with ease in a few minutes.

If you have any suggestions or comments, please leave them in the comment section below, thanks.

How to Clean Up Deleted Projects in Cloudera Data Science Workbench

We have got a few customer complains about the fact that currently Cloudera Data Science Workbench (CDSW) does not release the underlining project files on disk after the project is deleted from within the CDSW web console.

This can be re-produced easily by creating a dummy project in CDSW, check the project directory created under /var/lib/cdsw/current/projects/projects/0, and then delete the project again, you will see that the newly created project directory is not removed on the file system.

This is a known bug and reported internally in Cloudera, however, it has not been fixed as yet.

To workaround the issue, you can setup a simple shell script to detect the orphaned project and remove it automatically.

Steps as below:

1. Get the list of Project IDs from the directory /var/lib/cdsw/current/projects/projects/0/ on the master host, in my example, it returned below:

ls /var/lib/cdsw/current/projects/projects/0/
1  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48

2. Run ‘cdsw status’ command on the master host to capture the DB pod id:

Sending detailed logs to [/tmp/cdsw_status_T8jRig.log] ...
CDSW Version: [1.3.0:9bb84f6]
OK: Application running as root check
OK: Sysctl params check
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                 NAME                |   STATUS   |           CREATED-AT          |   VERSION   |   EXTERNAL-IP   |          OS-IMAGE         |         KERNEL-VERSION         |   GPU   |   STATEFUL   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   ericlin-xxx.xxx-1.com   |    True    |   2018-07-11 03:30:45+00:00   |   v1.6.11   |       None      |   CentOS Linux 7 (Core)   |   3.10.0-514.26.2.el7.x86_64   |    0    |    False     |
|   ericlin-xxx.xxx-2.com   |    True    |   2018-07-11 03:30:32+00:00   |   v1.6.11   |       None      |   CentOS Linux 7 (Core)   |   3.10.0-514.26.2.el7.x86_64   |    0    |     True     |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2/2 nodes are ready.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                             NAME                            |   READY   |    STATUS   |   RESTARTS   |           CREATED-AT          |       POD-IP      |      HOST-IP      |   ROLE   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|             etcd-ericlin-xxx.xxx-1.com                      |    1/1    |   Running   |      5       |   2018-07-11 03:31:44+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|        kube-apiserver-ericlin-xxx.xxx-1.com                 |    1/1    |   Running   |      5       |   2018-07-11 03:30:31+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|   kube-controller-manager-ericlin-xxx.xxx-1.com             |    1/1    |   Running   |      5       |   2018-07-11 03:31:54+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|                  kube-dns-3911048160-30l05                  |    3/3    |   Running   |      15      |   2018-07-11 03:30:45+00:00   |    100.66.128.1   |   172.26.12.157   |   None   |
|                       kube-proxy-c4xk7                      |    1/1    |   Running   |      4       |   2018-07-11 03:30:45+00:00   |    172.26.14.58   |    172.26.14.58   |   None   |
|                       kube-proxy-k95s2                      |    1/1    |   Running   |      5       |   2018-07-11 03:30:45+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|        kube-scheduler-ericlin-xxx.xxx-1          .com       |    1/1    |   Running   |      5       |   2018-07-11 03:31:57+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|               node-problem-detector-v0.1-0624z              |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|               node-problem-detector-v0.1-b80tt              |    1/1    |   Running   |      4       |   2018-07-11 03:32:15+00:00   |    172.26.14.58   |    172.26.14.58   |   None   |
|                       weave-net-469fb                       |    2/2    |   Running   |      12      |   2018-07-11 03:30:45+00:00   |   172.26.12.157   |   172.26.12.157   |   None   |
|                       weave-net-8dzx6                       |    2/2    |   Running   |      10      |   2018-07-11 03:30:45+00:00   |    172.26.14.58   |    172.26.14.58   |   None   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
All required pods are ready in cluster kube-system.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                  NAME                  |   READY   |    STATUS   |   RESTARTS   |           CREATED-AT          |       POD-IP      |      HOST-IP      |           ROLE           |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|         cron-1906902965-wzkp5          |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.9   |   172.26.12.157   |           cron           |
|          db-1165222207-dg98q           |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.6   |   172.26.12.157   |            db            |
|           engine-deps-1rvcl            |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.4   |   172.26.12.157   |       engine-deps        |
|           engine-deps-njwlc            |    1/1    |   Running   |      4       |   2018-07-11 03:32:15+00:00   |     100.66.0.5    |    172.26.14.58   |       engine-deps        |
|   ingress-controller-684706958-6fzh3   |    1/1    |   Running   |      5       |   2018-07-11 03:32:14+00:00   |   172.26.12.157   |   172.26.12.157   |    ingress-controller    |
|        livelog-2502658797-kmq4l        |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.3   |   172.26.12.157   |         livelog          |
|      reconciler-2738760185-1nnsp       |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.2   |   172.26.12.157   |        reconciler        |
|       spark-port-forwarder-krtw6       |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |   172.26.12.157   |   172.26.12.157   |   spark-port-forwarder   |
|       spark-port-forwarder-rbhc6       |    1/1    |   Running   |      4       |   2018-07-11 03:32:15+00:00   |    172.26.14.58   |    172.26.14.58   |   spark-port-forwarder   |
|          web-3320989329-7php0          |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.7   |   172.26.12.157   |           web            |
|          web-3320989329-ms63k          |    1/1    |   Running   |      5       |   2018-07-11 03:32:15+00:00   |    100.66.128.5   |   172.26.12.157   |           web            |
|          web-3320989329-zdpcj          |    1/1    |   Running   |      2       |   2018-08-10 08:13:55+00:00   |    100.66.128.8   |   172.26.12.157   |           web            |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
All required pods are ready in cluster default.
All required Application services are configured.
All required config maps are ready.
All required secrets are available.
Persistent volumes are ready.
Persistent volume claims are ready.
Ingresses are ready.
Checking web at url: http://ericlin-xxx.xxx-1.com
OK: HTTP port check
Cloudera Data Science Workbench is ready!

You can see that from above example, the DB pod ID is: db-1165222207-dg98q

3. Run below command to connect to CDSW database:

kubectl exec db-1165222207-dg98q -ti -- psql -U sense

4. On prompt, run PostgreSQL to check which project has been delete (the one not in the DB)

sense=# SELECT id, user_id, name, slug FROM projects;
 id | user_id |      name      |      slug
----+---------+----------------+----------------
 41 |       1 | Impala Project | impala-project
 34 |       3 | Test           | test
  1 |       1 | Test           | test
 47 |      10 | tensortest     | tensortest
 44 |       9 | TestEnvVar     | testenvvar
 40 |       4 | hbase          | hbase
 36 |       2 | Scala Test     | scala-test
 46 |       1 | R Project      | r-project
 45 |       4 | spackshell     | spackshell
 37 |       4 | tim            | tim
 48 |      10 | rtest          | rtest
 35 |       1 | Scala Project  | scala-project
 39 |       5 | salim          | salim
 38 |       4 | timtest        | timtest
(14 rows)

To put all above together, I have below quick shell script that can do the job:

for project_id in `ls /var/lib/cdsw/current/projects/projects/0/`
do
  echo "Processing $project_id"
  rows=`kubectl exec $(cdsw status | grep 'db-' | cut -d '|' -f 2 | sed 's/ //g') -ti -- psql -U sense -c "SELECT * FROM projects WHERE ID = $project_id" | grep '0 row' | wc -l`
  if [ $rows -gt 0 ]; then
    echo "Project $project_id has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/$proejct_id"
  fi
done

The output looks like something below:

Processing project 1
Processing project 34
Processing project 35
Processing project 36
Processing project 37
Processing project 38
Processing project 39
Processing project 40
Processing project 41
Processing project 42
Project 42 has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/
Processing project 43
Project 43 has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/
Processing project 44
Processing project 45
Processing project 46
Processing project 47
Processing project 48

So, until there is a fix for this issue, I hope above simple shell script can help.

Any suggestions or ideas, please let me know in the comments section below, thanks a lot in advance.

Oozie SSH Action Does Not Support Chained Commands – OOZIE-1974

I have seen quite a few CDH users who try to run chained Linux command via Oozie’s SSH Action. Example is like below:

<action name="sshTest">
  <ssh xmlns="uri:oozie:ssh-action:0.1">
    <host>${sshUserHost}</host>
    <command>kinit test.keytab test@TEST.COM ; python ....</command>
    <capture-output/>
  </ssh>
  <ok to="nextActino"/>
  <error to="kill"/>
</action>

We can see that the command to run on remote host is as below:

kinit test.keytab test@TEST.COM ; python ....

This is OK if both commands can finish successfully very quickly. However it will cause the SSH action to fail if the python command needs to run for a certain time, say more than 5-10 minutes. Below is the example log messages produced in Oozie’s server log while the SSH action is running:

2018-08-13 10:01:48,215 WARN org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[{oozie-host}] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000234-172707121423674-oozie-oozi-W] ACTION[0000234-172707121423674-oozie-oozi-W@sshTest] Received early callback for action still in PREP state; will wait [10,000]ms and requeue up to [5] more times
2018-08-13 10:01:48,216 WARN org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[{oozie-host}] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000234-172707121423674-oozie-oozi-W] ACTION[0000234-172707121423674-oozie-oozi-W@sshTest] Received early callback for action still in PREP state; will wait [10,000]ms and requeue up to [5] more times

....

2018-08-13 10:02:38,243 ERROR org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[cdlpf1hdpm1004.es.ad.adp.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000234-172707121423674-oozie-oozi-W] ACTION[0000234-172707121423674-oozie-oozi-W@sshTest] XException, 
org.apache.oozie.command.CommandException: E0822: Received early callback for action [0000234-172707121423674-oozie-oozi-W@sshTest] while still in PREP state and exhausted all requeues
 at org.apache.oozie.command.wf.CompletedActionXCommand.execute(CompletedActionXCommand.java:114)
 at org.apache.oozie.command.wf.CompletedActionXCommand.execute(CompletedActionXCommand.java:39)
 at org.apache.oozie.command.XCommand.call(XCommand.java:286)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

The reason for the failure is because Oozie currently does not support chained Linux commands on SSH Action, which is tracked via upstream JIRA OOZIE-1974.

Below is what happened behind the scene:

1. ssh-base.sh and ssh-wrapper.sh files will be copied to target host
https://github.com/apache/oozie/blob/master/core/src/main/resources

2. Oozie will run below command from Oozie server via ssh directly to the target host:

sh ssh-base.sh FLATTEN_ARGS curl "http://{oozie-host}:11000/oozie/callback?id=0000234-172707121423674-oozie-oozi-W@sshTest&status=#status" \
"--data-binary%%%@#stdout%%%--request%%%POST%%%--header%%%\"content-type:text/plain\"" \
0000234-172707121423674-oozie-oozi-W@sshTest@3 kinit test.keytab test@TEST.COM ; python ....

3. based on the command from above, we can see that the command was rebuilt, now the full command will be broken into two commands:

sh ssh-base.sh FLATTEN_ARGS curl "http://{oozie-host}:11000/oozie/callback?id=0000234-172707121423674-oozie-oozi-W@sshTest&status=#status" \
"--data-binary%%%@#stdout%%%--request%%%POST%%%--header%%%\"content-type:text/plain\"" \
0000234-172707121423674-oozie-oozi-W@sshTest@3 kinit test.keytab test@TEST.COM

and

python ....

Not the original “kinit test.keytab test@TEST.COM” and “python ….”

4. ssh-base.sh script will in term run below command:

sh ssh-wrapper.sh FLATTEN_ARGS curl "http://{oozie-host}:11000/oozie/callback?id=0000234-172707121423674-oozie-oozi-W@sshTest&status=#status" \
"--data-binary%%%@#stdout%%%--request%%%POST%%%--header%%%\"content-type:text/plain\"" \
0000234-172707121423674-oozie-oozi-W@sshTest@3 kinit test.keytab test@TEST.COM

This command will finish very quickly and triggered callback curl call immediately, however, the “python” command will cause the SSH job to not finish until it finishes. This is causing the Oozie job in the pending state and causing the callback to fail after timeout because the Oozie job and SSH job states are not consistent.

So until OOZIE-1974 is fixed, the solution is to put both the commands inside a single script file and make it available to run on the remote host.

Hope above helps.