Oozie Spark Actions Fail with Error “Spark config without ‘=’: –conf”

Currently Oozie provides easy interface for Spark1 jobs via Spark1 action, so that user does not have to embed spark-submit into shell action. However, recently I have discovered an issue in Oozie that it has a bug to parse Spark configurations and incorrectly generated a spark-submit command to submit Spark jobs. By checking Oozie’s launcher stderr.log, I discovered below error:

Error: Spark config without '=': --conf
Run with --help for usage help or --verbose for debug output
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [1]

Also, by checking the stdout.log, I can see below incorrect command for Spark:

  --conf
  spark.yarn.security.tokens.hive.enabled=false
  --conf
  --conf
  spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*:$PWD/*
  --conf
  spark.driver.extraClassPath=$PWD/*

You can see that there were double “–conf” generated by Oozie for Spark command. This explains the error we saw earlier about “Spark config without ‘=’: –conf”.

This is caused by a known issue reported upstream: OOZIE-2923.

This is a bug on Oozie side that it wrongly parses below configs:

--conf spark.executor.extraClassPath=...
--conf spark.driver.extraClassPath=...

The workaround is to remove the “–conf” in front of the first instance of spark.executor.extraClassPath, so that it will be added by Oozie. For example, if you have below :

<spark-opts>
--files /etc/hive/conf/hive-site.xml 
--driver-memory 4G 
--executor-memory 2G 
... 
--conf spark.yarn.security.tokens.hive.enabled=false 
--conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*
</spark-opts>

Simply remove the first –conf before spark.executor.extraClassPath, so it becomes:

<spark-opts>
--files /etc/hive/conf/hive-site.xml 
--driver-memory 4G 
--executor-memory 2G 
... 
--conf spark.yarn.security.tokens.hive.enabled=false  
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*
</spark-opts>

This will allow you to avoid the issue.

However, the downside is that if you decide to upgrade to a version of CDH that contains the fix to this issue, you will need to re-add “–conf” back.

OOZIE-2923 is affecting CDH5.10.x, CDH5.11.0 and CDH5.11.1.

And CDH5.11.2 and CDH5.12.x and above contains the fix.

Leave a Reply

Your email address will not be published. Required fields are marked *