This article explains the possible causes when HMS server takes a long time to start up (more than 10 minutes)

The symptom:

every time when Hive is restarted through CM, it will take more than 10 minutes for Hive services to become green and users to be able to use beeline CLI.

This is from HMS log:


2015-07-21 20:35:17,359 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: Starting hive metastore on port 9083
.........
2015-07-21 20:44:44,495 INFO org.apache.sentry.hdfs.MetastorePlugin: #### Metastore Plugin initialization complete !!
2015-07-21 20:44:44,495 INFO org.apache.sentry.hdfs.MetastorePlugin: #### Finished flushing queued updates to Sentry !!

You can see that HMS started at 8:35PM, and finished the sync at almost 8:45PM.

The cause:

One possible cause of the issue is:

  • Sentry HDFS sync is enabled
  • There are lots of tables or tables with lots of partitions (hundreds of thousands of partitions)

What you need to do:

When the above two conditions are met, when HMS starts up, it will need to scan through all the tables and partitions in HMS database, and then sync with HDFS directories one by one. If there are too many tables or partitions, there will be a lot of HDFS directories that need to be synced, which will take some time.

If this is the cause, the fix is to simply keep the number of tables and partitions per table down:

  • If possible, drop the tables that you do not need
  • If you need to keep the tables that have lots of partitions, try to merge those partitions if possible, by copying data into a new table with merged partitions
  • If there are hundreds of thousands of partitions that you can not merge, then it is time to redesign your tables so that less partitions could be used

Leave a Reply

Your email address will not be published. Required fields are marked *