Spark jobs failed with delegation token renewal error

An Oozie Spark job failed with the following error:

Job aborted due to stage failure: Task 103 in stage 194576.0 failed 4 times, most recent failure: Lost task 103.3 in stage 194576.0 
(TID 119674041, ): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): 
token (token for sparkpse: HDFS_DELEGATION_TOKEN owner=@HADOOP.CHARTER.COM, renewer=yarn, realUser=, issueDate=1482494610879, 
maxDate=1483099410879, sequenceNumber=274718, masterKeyId=166) can't be found in cache 
at org.apache.hadoop.ipc.Client.call(Client.java:1471) 
at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) 
at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)

This is caused by long running Spark job in a kerberized environment the checkpointing fails as Token is not renewed properly.

The workaround is to add “–conf spark.hadoop.fs.hdfs.impl.disable.cache=true” to Spark job command line parameters to disable the token cache from spark side.

Leave a Reply

Your email address will not be published. Required fields are marked *