Oozie Server failed to Start with error java.lang.NoSuchFieldError: EXTERNAL_PROPERTY

This issue happens in CDH distribution of Hadoop that is managed by Cloudera Manager (possibly in other distributions as well, due to known upstream JIRA, but I have not tested). Oozie will fail to start after enabling Oozie HA through Cloudera Manager user interface.

The full error message from Oozie’s process stdout.log (can be found under /var/run/cloudera-scm-agent/process/XXX-oozie-OOZIE_SERVER/logs directory) file looks like below:

Wed Jan 25 11:07:41 GST 2017 
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera 
using 5 as CDH_VERSION 
using /var/lib/oozie/tomcat-deployment as CATALINA_BASE 
Copying JDBC jar from /usr/share/java/oracle-connector-java.jar to /var/lib/oozie 

ERROR: Oozie could not be started 

REASON: java.lang.NoSuchFieldError: EXTERNAL_PROPERTY 

Stacktrace: 
----------------------------------------------------------------- 
java.lang.NoSuchFieldError: EXTERNAL_PROPERTY 
at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector._findTypeResolver(JacksonAnnotationIntrospector.java:777) 
at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findPropertyTypeResolver(JacksonAnnotationIntrospector.java:214) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findPropertyTypeSerializer(BeanSerializerFactory.java:370) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory._constructWriter(BeanSerializerFactory.java:772) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findBeanProperties(BeanSerializerFactory.java:586) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.constructBeanSerializer(BeanSerializerFactory.java:430) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findBeanSerializer(BeanSerializerFactory.java:343) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:287) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._createUntypedSerializer(StdSerializerProvider.java:782) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._createAndCacheUntypedSerializer(StdSerializerProvider.java:735) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.findValueSerializer(StdSerializerProvider.java:344) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.findTypedValueSerializer(StdSerializerProvider.java:420) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._serializeValue(StdSerializerProvider.java:601) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.serializeValue(StdSerializerProvider.java:256) 
at org.codehaus.jackson.map.ObjectMapper._configAndWriteValue(ObjectMapper.java:2566) 
at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:2056) 
at org.apache.oozie.util.FixedJsonInstanceSerializer.serialize(FixedJsonInstanceSerializer.java:65) 
at org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.internalRegisterService(ServiceDiscoveryImpl.java:201) 
at org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.registerService(ServiceDiscoveryImpl.java:186) 
at org.apache.oozie.util.ZKUtils.advertiseService(ZKUtils.java:217) 
at org.apache.oozie.util.ZKUtils.<init>(ZKUtils.java:141) 
at org.apache.oozie.util.ZKUtils.register(ZKUtils.java:154) 
at org.apache.oozie.service.ZKLocksService.init(ZKLocksService.java:70) 
at org.apache.oozie.service.Services.setServiceInternal(Services.java:386) 
at org.apache.oozie.service.Services.setService(Services.java:372) 
at org.apache.oozie.service.Services.loadServices(Services.java:305) 
at org.apache.oozie.service.Services.init(Services.java:213) 
at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:46) 
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210) 
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4709) 
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802) 
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) 
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583) 
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:944) 
at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:779) 
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:505) 
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322) 
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:325) 
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) 
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1068) 
at org.apache.catalina.core.StandardHost.start(StandardHost.java:822) 
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1060) 
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) 
at org.apache.catalina.core.StandardService.start(StandardService.java:525) 
at org.apache.catalina.core.StandardServer.start(StandardServer.java:759) 
at org.apache.catalina.startup.Catalina.start(Catalina.java:595) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) 
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 

To fix the issue, please follow the steps below:

  1. Delete or move the following files under CDH’s parcel directory (most likely they are symlinks):
    /opt/cloudera/parcels/CDH/lib/oozie/libserver/hive-exec.jar
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/hive-exec.jar
    
  2. Download hive-exec-{cdh version}-core.jar file from the Cloudera repo, for example, for CDH5.8.2, please go to:
    https://repository.cloudera.com/cloudera/cloudera-repos/org/apache/hive/hive-exec/1.1.0-cdh5.8.2/

    and put the file under the following directories on the Oozie server:

    /opt/cloudera/parcels/CDH/lib/oozie/libserver/
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/
    
  3. Download kryo-2.22.jar from the maven repository:
    http://repo1.maven.org/maven2/com/esotericsoftware/kryo/kryo/2.22/kryo-2.22.jar

    and put it under directories on the Oozie server:

    /opt/cloudera/parcels/CDH/lib/oozie/libserver/
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/
    
  4. Finally restart Oozie service

This is a known Oozie issue and reported in the upstream JIRA: OOZIE-2621, which has been resolved and targeted for 4.3.0 release.

Hope this helps.

How to load different version of Spark into Oozie

This article explains the steps needed to load Spark2 into Oozie under CDH5.9.x which comes with Spark1.6. Although this was tested under CDH5.9.0, it should be similar for earlier releases.

Please follow the steps below:

  1. Locate the current shared-lib directory by running:
    oozie admin -oozie http://<oozie-server-host>:11000/oozie -sharelibupdate
    

    you will get something like below:

    [ShareLib update status]
    host = http://<oozie-server-host>:11000/oozie
    status = Successful
    sharelibDirOld = hdfs://<oozie-server-host>:8020/user/oozie/share/lib/lib_20161202183044
    sharelibDirNew = hdfs://<oozie-server-host>:8020/user/oozie/share/lib/lib_20161202183044
    

    This tells me that the current sharelib directory is /user/oozie/share/lib/lib_20161202183044

  2. Create a new directory for spark2.0 under this directory:
    hadoop fs -mkdir /user/oozie/share/lib/lib_20161202183044/spark2
    
  3. Put all your spark 2 jars under this directory, please also make sure that oozie-sharelib-spark-4.1.0-cdh5.9.0.jar is there too
  4. Update the sharelib by running:
    oozie admin -oozie http://<oozie-server-host>:11000/oozie -sharelibupdate
    
  5. Confirm that the spark2 has been added to the shared lib path:
    oozie admin -oozie http://<oozie-server-host>:11000/oozie -shareliblist
    

    you should get something like below:

    [Available ShareLib]
    spark2
    oozie
    hive
    distcp
    hcatalog
    sqoop
    mapreduce-streaming
    spark
    pig
    
  6. Go back to spark workflow and add the following configuration under Spark action:
    <property>
        <name>oozie.action.sharelib.for.spark</name>
        <value>spark2</value>
    </property>
    
  7. Save workflow and run to test if it will pick up the correct JARs now.

Please be advised that although this can work, it will put Spark action in Oozie not supported by Cloudera, because it is not tested and it should not be recommended. But if you are still willing to go ahead, the steps above should help.

Sqoop Action with –query fails on oozie using tag

Yesterday I have discovered an Oozie bug that it does not handle the –query parameter for sqoop action. See my example sqoop action XML below:

<action name="sqoop-ed7d" cred="hive2">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
              <delete path="${nameNode}/user/eric/sqoop-import">
        </delete></prepare>
        <command></command>import  --connect jdbc:mysql://node6.lab.cloudera.com/test --username root --password cloudera 
--target-dir hdfs://node5.lab.cloudera.com:8020/user/eric/sqoop-import --query "SELECT * FROM test WHERE \$CONDITIONS" -m 1
        <file>/user/eric/mysql-connector-java-5.1.37-bin.jar#mysql-connector-java-5.1.37-bin.jar</file>
    </sqoop>
    <ok to="End">
    <error to="Kill">
</error></ok></action>

By looking at the error log from Oozie Launcher, I found the following:

Sqoop command arguments :
             import
             --connect
             jdbc:mysql://node6.lab.cloudera.com/test
             --username
             root
             --password
             cloudera
             --target-dir
             hdfs://node5.lab.cloudera.com:8020/user/eric/sqoop-import
             --query
             "SELECT
             *
             FROM
             test
             WHERE
             \$CONDITIONS"
             -m
             1

WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.5-cdh5.4.7
WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Error parsing arguments for import:
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: *
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: FROM
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: test
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: WHERE
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: \$CONDITIONS"
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: -m
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: 1
Intercepting System.exit(1)

From the error message, it looks like that sqoop was trying to split the whole “SELECT” statement into tokens, just like the rest of parameters, even though we have double quotes around it.

Looking at the Oozie code, I found the following:

String[] args;
if (actionXml.getChild("command", ns) != null) {
    String command = actionXml.getChild("command", ns).getTextTrim();
    StringTokenizer st = new StringTokenizer(command, " ");
    List<string> l = new ArrayList<string>();
    while (st.hasMoreTokens()) {
        l.add(st.nextToken());
    }
    args = l.toArray(new String[l.size()]);
}
else {
    List<element> eArgs = (List<element>) actionXml.getChildren("arg", ns);
    args = new String[eArgs.size()];
    for (int i = 0; i &lt; eArgs.size(); i++) {
        args[i] = eArgs.get(i).getTextTrim();
    }
}

Apparently this is a bug.

Before the bug is fixed, I suggest the following workaround:

<action name="sqoop-ed7d" cred="hive2">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
              <delete path="${nameNode}/user/eric/sqoop-import">
        </delete></prepare>
          <arg>import</arg>
          <arg>--connect</arg>
          <arg>jdbc:mysql://node6.lab.cloudera.com/test</arg>
          <arg>--username</arg>
          <arg>root</arg>
          <arg>--password</arg>
          <arg>cloudera</arg>
          <arg>--target-dir</arg>
          <arg>hdfs://node5.lab.cloudera.com:8020/user/eric/sqoop-import</arg>
          <arg>--query</arg>
          <arg>SELECT * FROM test WHERE $CONDITIONS</arg>
          <arg>-m</arg>
          <arg>1</arg>
    </sqoop>
    <ok to="End">
    <error to="Kill">
</error></ok></action>

Basically replace the <command> with <arg> tag so that the whole query string is considered as one argument, rather than multiple arguments.

Hope this helps.

Oozie “Multiple “ok to” Transitions To The Same Node Are Not Allowed

I have been working with Oozie for quite a few weeks, and the experience so far has been quite positive. It is quite easy to learn, given that you understands XML and Hadoop ecosystem.

However, there is one limitation that is quite annoying, although I understand the purpose behind the design. When you have a decision making node that branches to another node in the later stage, you can not have other nodes go to that particular node anymore in the same XML workflow.

To illustrate the problem, see the workflow XML below:

<decision name="decision">
    <switch>
        <case to="node1">${ wf:conf('my_variable') == 'true'}</case>
        <default to="node2" />
    </switch>
</decision>
<action name="node1">
    <ok to="node2" />
    <error to="fail" />
</action>
<action name="node2">
    <ok to="end" />
    <error to="fail" />
</action>

You can see that “node2” can be reached either from “decision” node, or from “node1” node. This is not allowed in Oozie’s workflow by default, and you will get the following error:

Error: E0743 : E0743: Multiple "ok to" transitions to the same node are not allowed

In order to force it to run, you will need to set the following config in the job properties file:

oozie.validate.ForkJoin=false

This should get Oozie goind without complaining the potential problems.