Oozie Spark Action Not Loading Spark Configurations

Recently I was working on an issue that Oozie was not able to pick up Spark’s configuration and caused job to fail. The reason that I know it was not loading Spark’s configuration was because spark had “spark.authenticate=true” set in its configuration file under file /etc/spark/conf/spark-defaults.conf.

$ head /etc/spark/conf/spark-defaults.conf
spark.authenticate=true
spark.authenticate.enableSaslEncryption=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
....

And I confirmed Oozie job failure can be resolved by adding “–conf spark.authenticate=true” into in the workflow.xml file. In theory, if Spark already has the setting, then Oozie should just pick it up.

By checking Oozie’s configuration file oozie-site.xml, I noticed that the setting that is required to load Spark configuration is missing: oozie.service.SparkConfigurationService.spark.configurations. Without this setting, Oozie will not be able to load those settings and apply to job for Spark Action.

To remedy this, it will be easy if you are using Cloudera Manager, simply go to:

Cloudera Manager > Oozie > Configuration > search for “Spark on Yarn Service”

Then select “Spark” instead of “none” and restart Oozie.

You can then go to oozie-site.xml file for Oozie’s process after restarting and confirm that below configs present:

<property>
    <name>oozie.service.SparkConfigurationService.spark.configurations</name>
    <value>*=/etc/spark/conf</value>
</property>

After above change, Oozie should pick up Spark’s default configurations by default without the need to manually specify for every Spark Action.

Oozie Hive2 Action Failed with Error: “HiveSQLException: Failed to execute session hooks”

If you have an Oozie Hive2 job that fails with below error message randomly, which can be found in Oozie’s server log, located by default under /var/log/oozie:

2018-06-02 09:00:01,103 WARN org.apache.oozie.action.hadoop.Hive2Credentials: SERVER[hlp3058p.oocl.com] 
USER[dmsa_appln] GROUP[-] TOKEN[] APP[DMSA_CMTX_PCON_ETL_ONLY] JOB[0010548-180302135253124-oozie-oozi-W] 
ACTION[0010548-180302135253124-oozie-oozi-W@spark-6799] Exception in addtoJobConf
org.apache.hive.service.cli.HiveSQLException: Failed to execute session hooks
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:232)
        at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:491)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
        at java.sql.DriverManager.getConnection(DriverManager.java:571)
        at java.sql.DriverManager.getConnection(DriverManager.java:233)
        at org.apache.oozie.action.hadoop.Hive2Credentials.addtoJobConf(Hive2Credentials.java:66)
        at org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1213)
        at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1063)
        at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1295)
        at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
        at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
        at org.apache.oozie.command.XCommand.call(XCommand.java:286)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to execute session hooks
        at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:308)
        at org.apache.hive.service.cli.CLIService.openSession(CLIService.java:178)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:422)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:316)
        at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253)
        at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:746)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        ... 3 more
Caused by: java.lang.IllegalStateException: zip file closed
        at java.util.zip.ZipFile.ensureOpen(ZipFile.java:634)
        at java.util.zip.ZipFile.getEntry(ZipFile.java:305)
        at java.util.jar.JarFile.getEntry(JarFile.java:227)
        at sun.net.www.protocol.jar.URLJarFile.getEntry(URLJarFile.java:128)
        at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:132)
        at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:150)
        at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:233)
        at javax.xml.parsers.SecuritySupport$4.run(SecuritySupport.java:94)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.xml.parsers.SecuritySupport.getResourceAsStream(SecuritySupport.java:87)
        at javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:283)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2526)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2513)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2409)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:982)
        at org.apache.sentry.binding.hive.conf.HiveAuthzConf.<init>(HiveAuthzConf.java:162)
        at org.apache.sentry.binding.hive.HiveAuthzBindingHook.loadAuthzConf(HiveAuthzBindingHook.java:131)
        at org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook.run(HiveAuthzBindingSessionHook.java:108)
        at org.apache.hive.service.cli.session.SessionManager.executeSessionHooks(SessionManager.java:420)
        at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:300)
        ... 12 more

It is likely that you are hitting a possible issue with JDK. Please refer to HADOOP-13809 for details. There is no prove at this stage that it is JDK bug, but workaround is at JDK level. As mentioned in the JIRA, you can add below parameters to HiveServer2’s Java options:

-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

If you are using Cloudera Manager, you can go to:

CM > Hive > Configuration > Search “Java configuration options for HiveServer2”

and add above parameters to the end of the string, and don’t forget an extra space before it.

Then restart HiveServer2 through CM. This should help to avoid the issue.

Oozie Spark Actions Fail with Error “Spark config without ‘=’: –conf”

Currently Oozie provides easy interface for Spark1 jobs via Spark1 action, so that user does not have to embed spark-submit into shell action. However, recently I have discovered an issue in Oozie that it has a bug to parse Spark configurations and incorrectly generated a spark-submit command to submit Spark jobs. By checking Oozie’s launcher stderr.log, I discovered below error:

Error: Spark config without '=': --conf
Run with --help for usage help or --verbose for debug output
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [1]

Also, by checking the stdout.log, I can see below incorrect command for Spark:

  --conf
  spark.yarn.security.tokens.hive.enabled=false
  --conf
  --conf
  spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*:$PWD/*
  --conf
  spark.driver.extraClassPath=$PWD/*

You can see that there were double “–conf” generated by Oozie for Spark command. This explains the error we saw earlier about “Spark config without ‘=’: –conf”.

This is caused by a known issue reported upstream: OOZIE-2923.

This is a bug on Oozie side that it wrongly parses below configs:

--conf spark.executor.extraClassPath=...
--conf spark.driver.extraClassPath=...

The workaround is to remove the “–conf” in front of the first instance of spark.executor.extraClassPath, so that it will be added by Oozie. For example, if you have below :

<spark-opts>
--files /etc/hive/conf/hive-site.xml 
--driver-memory 4G 
--executor-memory 2G 
... 
--conf spark.yarn.security.tokens.hive.enabled=false 
--conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*
</spark-opts>

Simply remove the first –conf before spark.executor.extraClassPath, so it becomes:

<spark-opts>
--files /etc/hive/conf/hive-site.xml 
--driver-memory 4G 
--executor-memory 2G 
... 
--conf spark.yarn.security.tokens.hive.enabled=false  
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*
</spark-opts>

This will allow you to avoid the issue.

However, the downside is that if you decide to upgrade to a version of CDH that contains the fix to this issue, you will need to re-add “–conf” back.

OOZIE-2923 is affecting CDH5.10.x, CDH5.11.0 and CDH5.11.1.

And CDH5.11.2 and CDH5.12.x and above contains the fix.

Oozie SSH Action Failed With “externalId cannot be empty” Error

Last week I was working with an issue that when running a very simple SSH action through Oozie, the job kept failing with “externalId cannot be empty” error. The workflow only had one single SSH action, and nothing else. See the workflow example below:

<workflow-app name="SSH Action Test" xmlns="uri:oozie:workflow:0.5">
    <start to="ssh-5c4d"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="ssh-5c4d">
        <ssh xmlns="uri:oozie:ssh-action:0.1">
            <host>user1@another-server-url</host>
            <command>ls / &gt;&gt; /tmp/test.log</command>
            <capture-output/>
        </ssh>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

And the error message from the Oozie server looked like below:

2018-01-03 06:12:45,347 ERROR org.apache.oozie.command.wf.ActionStartXCommand: 
SERVER[{oozie-server-url}] USER[admin] GROUP[-] TOKEN[] APP[SSH Action Test] JOB[0000000-180103010440574-ooz
ie-oozi-W] ACTION[0000000-180103010440574-oozie-oozi-W@ssh-5c4d] Exception,
java.lang.IllegalArgumentException: externalId cannot be empty
        at org.apache.oozie.util.ParamChecker.notEmpty(ParamChecker.java:90)
        at org.apache.oozie.util.ParamChecker.notEmpty(ParamChecker.java:74)
        at org.apache.oozie.WorkflowActionBean.setStartData(WorkflowActionBean.java:503)
        at org.apache.oozie.command.wf.ActionXCommand$ActionExecutorContext.setStartData(ActionXCommand.java:387)
        at org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:269)
        at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
        at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
        at org.apache.oozie.command.XCommand.call(XCommand.java:286)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

We confirmed that the passwordless connection from the Ooize server to the remote server worked correctly without issues.

After digging through the Oozie source code, I found out that it was due to the fact that Oozie uses Java’s Runtime.exec library to execute the commands remotely. And Runtime.exec does not work in the same way as shell, especially when re-directing output to a file, which Runtime.exec does not support at all. What happened under the hood was that Oozie will split the full command “ls / >> /tmp/test.log” into tokens “ls”, “/”, “>>”, “/tmp/test.log”, and pass all of them into Runtime.exec. And when Runtime.exec executed the command, it treated all tokens, apart from “ls” as the parameters to “ls” command. As you would expect, “>>” is not a file, and “ls” command will fail complain that file does not exist, hence will return exit status of 1, rather than 0.

Oozie tried to capture the PID of the remote process, but failed, and hence returned “externalId cannot be empty” error.

The workaround is simple, just store the full command you want to run into a new script file and ask Oozie to execute that script instead:

1. Create a file “ssh-action.sh” on the target host, for example, under /home/{user}/scripts/ssh-action.sh
2. Add command “ls / >> /tmp/ssh.log” to the file
3. Make the file executable by running:

chmod 744 /home/{user}/scripts/ssh-action.sh

4. Update Oozie workflow to run the new shell script instead:

<ssh xmlns="uri:oozie:ssh-action:0.1">
    <host>user@remote-server-url</host>
    <command>/home/{user}/scripts/ssh-action.sh</command>
    <capture-output/>
</ssh>

And then the SSH action should work perfectly.

Oozie Server failed to Start with error java.lang.NoSuchFieldError: EXTERNAL_PROPERTY

This issue happens in CDH distribution of Hadoop that is managed by Cloudera Manager (possibly in other distributions as well, due to known upstream JIRA, but I have not tested). Oozie will fail to start after enabling Oozie HA through Cloudera Manager user interface.

The full error message from Oozie’s process stdout.log (can be found under /var/run/cloudera-scm-agent/process/XXX-oozie-OOZIE_SERVER/logs directory) file looks like below:

Wed Jan 25 11:07:41 GST 2017 
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera 
using 5 as CDH_VERSION 
using /var/lib/oozie/tomcat-deployment as CATALINA_BASE 
Copying JDBC jar from /usr/share/java/oracle-connector-java.jar to /var/lib/oozie 

ERROR: Oozie could not be started 

REASON: java.lang.NoSuchFieldError: EXTERNAL_PROPERTY 

Stacktrace: 
----------------------------------------------------------------- 
java.lang.NoSuchFieldError: EXTERNAL_PROPERTY 
at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector._findTypeResolver(JacksonAnnotationIntrospector.java:777) 
at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findPropertyTypeResolver(JacksonAnnotationIntrospector.java:214) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findPropertyTypeSerializer(BeanSerializerFactory.java:370) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory._constructWriter(BeanSerializerFactory.java:772) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findBeanProperties(BeanSerializerFactory.java:586) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.constructBeanSerializer(BeanSerializerFactory.java:430) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findBeanSerializer(BeanSerializerFactory.java:343) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:287) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._createUntypedSerializer(StdSerializerProvider.java:782) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._createAndCacheUntypedSerializer(StdSerializerProvider.java:735) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.findValueSerializer(StdSerializerProvider.java:344) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.findTypedValueSerializer(StdSerializerProvider.java:420) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._serializeValue(StdSerializerProvider.java:601) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.serializeValue(StdSerializerProvider.java:256) 
at org.codehaus.jackson.map.ObjectMapper._configAndWriteValue(ObjectMapper.java:2566) 
at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:2056) 
at org.apache.oozie.util.FixedJsonInstanceSerializer.serialize(FixedJsonInstanceSerializer.java:65) 
at org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.internalRegisterService(ServiceDiscoveryImpl.java:201) 
at org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.registerService(ServiceDiscoveryImpl.java:186) 
at org.apache.oozie.util.ZKUtils.advertiseService(ZKUtils.java:217) 
at org.apache.oozie.util.ZKUtils.<init>(ZKUtils.java:141) 
at org.apache.oozie.util.ZKUtils.register(ZKUtils.java:154) 
at org.apache.oozie.service.ZKLocksService.init(ZKLocksService.java:70) 
at org.apache.oozie.service.Services.setServiceInternal(Services.java:386) 
at org.apache.oozie.service.Services.setService(Services.java:372) 
at org.apache.oozie.service.Services.loadServices(Services.java:305) 
at org.apache.oozie.service.Services.init(Services.java:213) 
at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:46) 
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210) 
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4709) 
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802) 
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) 
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583) 
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:944) 
at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:779) 
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:505) 
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322) 
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:325) 
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) 
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1068) 
at org.apache.catalina.core.StandardHost.start(StandardHost.java:822) 
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1060) 
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) 
at org.apache.catalina.core.StandardService.start(StandardService.java:525) 
at org.apache.catalina.core.StandardServer.start(StandardServer.java:759) 
at org.apache.catalina.startup.Catalina.start(Catalina.java:595) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) 
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 

To fix the issue, please follow the steps below:

  1. Delete or move the following files under CDH’s parcel directory (most likely they are symlinks):
    /opt/cloudera/parcels/CDH/lib/oozie/libserver/hive-exec.jar
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/hive-exec.jar
    
  2. Download hive-exec-{cdh version}-core.jar file from the Cloudera repo, for example, for CDH5.8.2, please go to:
    https://repository.cloudera.com/cloudera/cloudera-repos/org/apache/hive/hive-exec/1.1.0-cdh5.8.2/

    and put the file under the following directories on the Oozie server:

    /opt/cloudera/parcels/CDH/lib/oozie/libserver/
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/
    
  3. Download kryo-2.22.jar from the maven repository:
    http://repo1.maven.org/maven2/com/esotericsoftware/kryo/kryo/2.22/kryo-2.22.jar

    and put it under directories on the Oozie server:

    /opt/cloudera/parcels/CDH/lib/oozie/libserver/
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/
    
  4. Finally restart Oozie service

This is a known Oozie issue and reported in the upstream JIRA: OOZIE-2621, which has been resolved and targeted for 4.3.0 release.

Hope this helps.