How to load different version of Spark into Oozie

How to load different version of Spark into Oozie

This article explains the steps needed to load Spark2 into Oozie under CDH5.9.x which comes with Spark1.6. Although this was tested under CDH5.9.0, it should be similar for earlier releases. Please follow the steps below:
  1. Locate the current shared-lib directory by running:
    oozie admin -oozie http://<oozie-server-host>:11000/oozie -sharelibupdate
    
    you will get something like below:
    [ShareLib update status]
    host = http://<oozie-server-host>:11000/oozie
    status = Successful
    sharelibDirOld = hdfs://<oozie-server-host>:8020/user/oozie/share/lib/lib_20161202183044
    sharelibDirNew = hdfs://<oozie-server-host>:8020/user/oozie/share/lib/lib_20161202183044
    
    This tells me that the current sharelib directory is /user/oozie/share/lib/lib_20161202183044
  2. Create a new directory for spark2.0 under this directory:
    hadoop fs -mkdir /user/oozie/share/lib/lib_20161202183044/spark2
    
  3. Put all your spark 2 jars under this directory, please also make sure that oozie-sharelib-spark-4.1.0-cdh5.9.0.jar is there too
  4. Update the sharelib by running:
    oozie admin -oozie http://<oozie-server-host>:11000/oozie -sharelibupdate
    
  5. Confirm that the spark2 has been added to the shared lib path:
    oozie admin -oozie http://<oozie-server-host>:11000/oozie -shareliblist
    
    you should get something like below:
    [Available ShareLib]
    spark2
    oozie
    hive
    distcp
    hcatalog
    sqoop
    mapreduce-streaming
    spark
    pig
    
  6. Go back to spark workflow and add the following configuration under Spark action:
    <property>
        <name>oozie.action.sharelib.for.spark</name>
        <value>spark2</value>
    </property>
    
  7. Save workflow and run to test if it will pick up the correct JARs now.

Please be advised that although this can work, it will put Spark action in Oozie not supported by Cloudera, because it is not tested and it should not be recommended. But if you are still willing to go ahead, the steps above should help.

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!