Impala Auto Update Metadata Support

There are lots of CDH users requested that Impala to support automatic metadata update, so that they do NOT need to run “INVALIDATE METADATA” every time when create table or data are updated through other components, like Hive or Pig.

I would like to share that this has been reported upstream and tracked via JIRA: IMPALA-3124. Currently it is still not fixed and it requires detailed design to make sure that it will work in the proper way.

There is no ETA at this stage on when this feature will be added. Advise anyone interested on this feature to add comments to the JIRA. The more votes, the better chance that it will be prioritized.

Hope above information is helpful.

Oozie Server failed to Start with error java.lang.NoSuchFieldError: EXTERNAL_PROPERTY

This issue happens in CDH distribution of Hadoop that is managed by Cloudera Manager (possibly in other distributions as well, due to known upstream JIRA, but I have not tested). Oozie will fail to start after enabling Oozie HA through Cloudera Manager user interface.

The full error message from Oozie’s process stdout.log (can be found under /var/run/cloudera-scm-agent/process/XXX-oozie-OOZIE_SERVER/logs directory) file looks like below:

Wed Jan 25 11:07:41 GST 2017 
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera 
using 5 as CDH_VERSION 
using /var/lib/oozie/tomcat-deployment as CATALINA_BASE 
Copying JDBC jar from /usr/share/java/oracle-connector-java.jar to /var/lib/oozie 

ERROR: Oozie could not be started 

REASON: java.lang.NoSuchFieldError: EXTERNAL_PROPERTY 

Stacktrace: 
----------------------------------------------------------------- 
java.lang.NoSuchFieldError: EXTERNAL_PROPERTY 
at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector._findTypeResolver(JacksonAnnotationIntrospector.java:777) 
at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findPropertyTypeResolver(JacksonAnnotationIntrospector.java:214) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findPropertyTypeSerializer(BeanSerializerFactory.java:370) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory._constructWriter(BeanSerializerFactory.java:772) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findBeanProperties(BeanSerializerFactory.java:586) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.constructBeanSerializer(BeanSerializerFactory.java:430) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.findBeanSerializer(BeanSerializerFactory.java:343) 
at org.codehaus.jackson.map.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:287) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._createUntypedSerializer(StdSerializerProvider.java:782) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._createAndCacheUntypedSerializer(StdSerializerProvider.java:735) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.findValueSerializer(StdSerializerProvider.java:344) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.findTypedValueSerializer(StdSerializerProvider.java:420) 
at org.codehaus.jackson.map.ser.StdSerializerProvider._serializeValue(StdSerializerProvider.java:601) 
at org.codehaus.jackson.map.ser.StdSerializerProvider.serializeValue(StdSerializerProvider.java:256) 
at org.codehaus.jackson.map.ObjectMapper._configAndWriteValue(ObjectMapper.java:2566) 
at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:2056) 
at org.apache.oozie.util.FixedJsonInstanceSerializer.serialize(FixedJsonInstanceSerializer.java:65) 
at org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.internalRegisterService(ServiceDiscoveryImpl.java:201) 
at org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.registerService(ServiceDiscoveryImpl.java:186) 
at org.apache.oozie.util.ZKUtils.advertiseService(ZKUtils.java:217) 
at org.apache.oozie.util.ZKUtils.<init>(ZKUtils.java:141) 
at org.apache.oozie.util.ZKUtils.register(ZKUtils.java:154) 
at org.apache.oozie.service.ZKLocksService.init(ZKLocksService.java:70) 
at org.apache.oozie.service.Services.setServiceInternal(Services.java:386) 
at org.apache.oozie.service.Services.setService(Services.java:372) 
at org.apache.oozie.service.Services.loadServices(Services.java:305) 
at org.apache.oozie.service.Services.init(Services.java:213) 
at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:46) 
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210) 
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4709) 
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802) 
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) 
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583) 
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:944) 
at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:779) 
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:505) 
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322) 
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:325) 
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) 
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1068) 
at org.apache.catalina.core.StandardHost.start(StandardHost.java:822) 
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1060) 
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) 
at org.apache.catalina.core.StandardService.start(StandardService.java:525) 
at org.apache.catalina.core.StandardServer.start(StandardServer.java:759) 
at org.apache.catalina.startup.Catalina.start(Catalina.java:595) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) 
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 

To fix the issue, please follow the steps below:

  1. Delete or move the following files under CDH’s parcel directory (most likely they are symlinks):
    /opt/cloudera/parcels/CDH/lib/oozie/libserver/hive-exec.jar
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/hive-exec.jar
    
  2. Download hive-exec-{cdh version}-core.jar file from the Cloudera repo, for example, for CDH5.8.2, please go to:
    https://repository.cloudera.com/cloudera/cloudera-repos/org/apache/hive/hive-exec/1.1.0-cdh5.8.2/

    and put the file under the following directories on the Oozie server:

    /opt/cloudera/parcels/CDH/lib/oozie/libserver/
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/
    
  3. Download kryo-2.22.jar from the maven repository:
    http://repo1.maven.org/maven2/com/esotericsoftware/kryo/kryo/2.22/kryo-2.22.jar

    and put it under directories on the Oozie server:

    /opt/cloudera/parcels/CDH/lib/oozie/libserver/
    /opt/cloudera/parcels/CDH/lib/oozie/libtools/
    
  4. Finally restart Oozie service

This is a known Oozie issue and reported in the upstream JIRA: OOZIE-2621, which has been resolved and targeted for 4.3.0 release.

Hope this helps.

Unable to Import Data as Parquet into Encrypted HDFS Zone | Sqoop Parquet Import

Recently I discovered an issue in Sqoop that when importing data into Hive table, whose location is in an encrypted HDFS zone, it will fail with “can’t be moved into an encryption zone” error:

Command:

sqoop import --connect <postgres_url> --username <username> --password <password> \
--table sourceTable --split-by id --hive-import --hive-database staging \
--hive-table hiveTable --as-parquetfile

Errors:

2017-05-24 13:38:51,539 INFO [Thread-84] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
Setting job diagnostics to Job commit failed: org.kitesdk.data.DatasetIOException: Could not move contents of hdfs://nameservice1/tmp/staging/.
temp/job_1495453174050_1035/mr/job_1495453174050_1035 to 
hdfs://nameservice1/user/hive/warehouse/staging.db/hiveTable
        at org.kitesdk.data.spi.filesystem.FileSystemUtil.stageMove(FileSystemUtil.java:117)
        at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:406)
        at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:62)
        at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:387)
        at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
        at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
/tmp/staging/.temp/job_1495453174050_1035/mr/job_1495453174050_1035/964f7b5e-2f55-421d-bfb6-7613cc4bf26e.parquet 
can't be moved into an encryption zone.
        at org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager.checkMoveValidity(EncryptionZoneManager.java:284)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedRenameTo(FSDirectory.java:564)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.renameTo(FSDirectory.java:478)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:3929)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:3891)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3856)

This is caused by a known Sqoop bug: SQOOP-2943. This happens because Sqoop uses Kite SDK to generate Parquet file, and Kite SDK uses /tmp directory to generate the parquet file on the fly. And because /tmp directory is not encrypted and hive warehouse directory is encrypted, the final move command to move the parquet file from /tmp to hive warehouse will fail due to the encryption. The import only fails with parquet format, text file format works as expected.

Currently SQOOP-2943 is not fixed and there is no direct workaround.

For the time being, the workaround is to:

  1. Import data as text file format into Hive temporary table, and then use Hive query to copy data into destination parquet table. OR
  2. Import data as parquet file into temporary directory outside of Hive warehouse, and then again use Hive to copy data into destination parquet table

Hive on Spark query failed with ConnectTimeoutException

Recently I have been dealing with an issue that Hive on Spark job intermittently failed with ConnectionTimeouException. The connection timed out when the ApplicationMaster is trying to communicate back to HiveServer2 on a random port and failed immediately after 2 seconds of trying to connect. See below stack trace for details:

17/05/03 03:20:06 INFO yarn.ApplicationMaster: Waiting for spark context initialization
17/05/03 03:20:06 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
17/05/03 03:20:06 INFO client.RemoteDriver: Connecting to: <hs2-host>:35915
17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: 
io.netty.channel.ConnectTimeoutException: connection timed out: <hs2-host>/172.19.22.11:35915 
java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: <hs2-host>/172.19.22.11:35915 
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) 
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156) 
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) 
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: <hs2-host>/172.19.22.11:35915 
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220) 
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) 
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) 
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
at java.lang.Thread.run(Thread.java:745) 
17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.util.concurrent.ExecutionException: 
io.netty.channel.ConnectTimeoutException: connection timed out: <hs2-host>/172.19.22.11:35915) 
17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application. 
17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.util.concurrent.ExecutionException: 
io.netty.channel.ConnectTimeoutException: connection timed out: <hs2-host>/172.19.22.11:35915) 
17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1492040605432_11445 
17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called

We can see from above log that the timeout happened 2 seconds after the attempted connection, which does not make too much sense that the timeout value is such short period.

After digging further into code, it turned out that this timeout is controlled by hive’s setting called hive.spark.client.connect.timeout. The default value for this setting is 1000ms, which is only 1 second, which explained the cause.

This issue only happens when cluster is on high load and HiveServer2 is not able to respond back to ApplicationMaster within 1 second and then connection will timeout.

To by pass this issue, we can simply increase this timeout value to, say, 5 seconds:

SET hive.spark.client.connect.timeout=5000;

# Your query here

I have reported such issue upstream in JIRA: HIVE-16794, and I will submit a patch to increase this timeout setting soon.

Impala Reported Corrupt Parquet File After Failed With OutOfMemory Error

Recently I was dealing with an issue that impala reported Corrupt Parquet File after it failed with OutOfMemory error, however, if it does not fail, no corruption will be reported.

See below error message reportd in Impala Daemon logs:

Memory limit exceeded
HdfsParquetScanner::ReadDataPage() failed to allocate 65535 bytes for decompressed data.
Corrupt Parquet file 'hdfs://nameservice1/path/to/file/914164e7120e6076-cdae1be60000001f_169433548_data.0.parq': column 'client_ord_id' had 1024 remaining values but expected 0 _
[Executed: 4/29/2017 5:28:58 AM] [Execution: 588ms]
When an impala query failed with OOM error, it also reported corrupted parquet file:

HdfsParquetScanner::ReadDataPage() failed to allocate 65535 bytes for decompressed data.
Corrupt Parquet file 'hdfs://nameservice1/path/to/file/914164e7120e6076-cdae1be60000001f_169433548_data.0.parq': column 'client_ord_id' had 1024 remaining values but expected 0 _
[Executed: 4/29/2017 5:28:58 AM] [Execution: 588ms]

This is reported in the upstream JIRA: IMPALA-5197, this can happen in the following scenarios:

  • Query failed with OOM error
  • There is a LIMIT clause in the query
  • Query is manually cancelled by the user

Those corrupt messages do not mean the file is really corrupted, it is caused by an Impala bug that mentioned earlier IMPALA-5197.

If it is caused by OutOfMemory error, simply increase the memory limit for the query and try again:

SET MEM_LIMIT=10g;

For the other two causes, we will need to wait for IMPALA-5197 to be fixed