Recently I need to experiment with Shark so that I can compare its performance with Hive at work. In order to do this, I need to setup a single cluster on one of my virtualbox machines.

Installing Hadoop and Hive was quite smooth, you can refer back to my post regarding the instructions. However, I encountered problems when trying to install Shark and Spark.

Spark runs OK, no problems, but Shark kept failing when I tried to run some queries:


shark> select * from test;
17.096: [Full GC 71198K->24382K(506816K), 0.3150970 secs]
Exception in thread "main" java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$SetOwnerRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
	at java.lang.Class.privateGetPublicMethods(Class.java:2651)
	at java.lang.Class.privateGetPublicMethods(Class.java:2661)
	at java.lang.Class.getMethods(Class.java:1467)
	at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:426)
	at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:323)
	at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:636)
	at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:722)
	at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92)
	at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537)
	at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:334)
	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:241)
	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:576)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:521)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:180)
	at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:231)
	at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:288)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1274)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
	at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
	at shark.SharkDriver.compile(SharkDriver.scala:215)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
	at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
	at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
	at shark.SharkCliDriver.main(SharkCliDriver.scala)

It is not obvious what’s happening, but it was caused by the package called “protobuf”. It can be found in jar file “hive-exec-0.11.0-shark-0.9.1.jar” which is located at $SHARK_HOME/lib_managed/jars/edu.berkeley.cs.shark/hive-exec, all you need to do is to remove the package completely:

cd $SHARK_HOME/lib_managed/jars/edu.berkeley.cs.shark/hive-exec
unzip hive-exec-0.11.0-shark-0.9.1.jar
rm -f com/google/protobuf/*
zip -r hive-exec-0.11.0-shark-0.9.1.jar *
rm -fr com hive-exec-log4j.properties javaewah/ javax/ javolution/ META-INF/ org/

That’s it, restart your “shark” and it should be working now.

Leave a Reply

Your email address will not be published. Required fields are marked *