This article explains how table locking works in Hive by running a series hive commands and their outputs. To do this, I have created two simple tables in my small cluster called “test” and “test_partitioned”.

Initially, when no query is running against the “test” table, the table should have no locks:

hive> SHOW LOCKS test;
OK
Time taken: 0.305 seconds

If you do a simple query like:

SELECT count(1) FROM test;

you will see that the table will be “SHARED” locked:

hive> SHOW LOCKS test;
OK
default@test	SHARED
Time taken: 0.159 seconds, Fetched: 1 row(s)

“SHARED” lock is also called a “READ” lock, meaning, other people can still read from the table, but any writes will have to wait for it to finish.

Now if you want to write data to the table using query:

INSERT OVERWRITE TABLE test SELECT COUNT(1) FROM sample_07;

the table will be locked “EXCLUSIVELY”

hive> SHOW LOCKS test;
OK
default@test	EXCLUSIVE
Time taken: 0.158 seconds, Fetched: 1 row(s)

“EXCLUSIVE” lock is also called a “WRITE” lock, meaning no one else is able to read or write to the table while the lock is present, all other queries will have to wait for the current query to finish before they can start.

You can also manually lock tables whenever you want:

hive> LOCK TABLE test SHARED;
OK
Time taken: 0.244 seconds
hive> SHOW LOCKS test;
OK
default@test	SHARED
Time taken: 0.107 seconds, Fetched: 1 row(s)
hive> UNLOCK TABLE test;
OK
Time taken: 0.255 seconds
hive> SHOW LOCKS test;
OK
Time taken: 0.114 seconds

hive> LOCK TABLE test EXCLUSIVE;
OK
Time taken: 0.154 seconds
hive> SHOW LOCKS test;
OK
default@test	EXCLUSIVE
Time taken: 0.083 seconds, Fetched: 1 row(s)
hive> UNLOCK TABLE test;
OK
Time taken: 0.127 seconds
hive> SHOW LOCKS test;
OK
Time taken: 0.232 seconds

The locking can also be applied to table partitions:

hive> LOCK TABLE test_partitioned PARTITION (p='p1') EXCLUSIVE;
OK
Time taken: 0.31 seconds
hive> SHOW LOCKS test_partitioned PARTITION (p='p1');
OK
default@test_partitioned@p=p1	EXCLUSIVE
Time taken: 0.189 seconds, Fetched: 1 row(s)
hive> SHOW LOCKS test_partitioned;
OK
Time taken: 0.105 seconds
hive> UNLOCK TABLE test_partitioned PARTITION (p='p1');
OK
Time taken: 0.136 seconds
hive> SHOW LOCKS test_partitioned PARTITION (p='p1');
OK
Time taken: 0.123 seconds
hive> SHOW LOCKS test_partitioned;
OK
Time taken: 0.081 seconds 

When you write to a partition of a table using static partitioning, an EXCLUSIVE lock will be applied to the partition that will be written to, and SHARED lock will be applied to the table itself:

INSERT OVERWRITE TABLE test_partitioned PARTITION (p='p1') SELECT salary FROM sample_07;

hive> SHOW LOCKS test_partitioned;
OK
default@test_partitioned	SHARED
Time taken: 1.345 seconds, Fetched: 1 row(s)
hive> SHOW LOCKS test_partitioned PARTITION (p='p1');
OK
default@test_partitioned@p=p1	EXCLUSIVE
Time taken: 0.243 seconds, Fetched: 1 row(s)

However, there is a bug in Hive that when you try to run an “INSERT OVERWRITE” using dynamic partitioning, because Hive is unable to figure out which partitions need to be locked, it currently only applies “SHARED” lock to the table being updated.

To be absolutely safe, Hive should have applied an EXCLUSIVE lock to the table to prevent any further update to the table and all partitions, but it does not.

So for the same query I ran above, I am able to run the following same query twice in two different Hive sessions and they will race with each other:

hive> INSERT OVERWRITE TABLE test_partitioned PARTITION (p) SELECT salary, 'p1' AS p FROM sample_07;

hive> INSERT OVERWRITE TABLE test_partitioned PARTITION (p) SELECT salary, 'p1' AS p FROM sample_07;

Of course, you will have to enable dynamic partitioning for the above query to run.

They can be run at the same time and whoever finishes the last will overwrite the results from the previous one.

I tested this under CDH5.3 and CDH5.4 and both of them have the bug present. There is a upstream Hive JIRA issue,
you can have a look at Exclusive locks are not acquired when using dynamic partitions for more information.

That’s all I have to say about locks in Hive, if I missed anything, please let me know in the comments.

    12 Comments

      1. Eric Lin

        Hi Boris,

        Thanks for visiting.

        I believe in that case the attempt to obtain the exclusive lock will have to wait until the shared lock is released (that’s the purpose of shared lock to prevent others from writing it).

        Hope this helps.

    1. radhika

      i am getting the below error while executing “show locks tablename”
      FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. show Locks LockManager not specified

      1. Eric Lin

        Hi Radhika,

        Thanks for visiting and your question on my blog. Based on the error message “LockManager not specified”, I suspect that you are either missing one or both of below:

        1. hive.support.concurrency is not set to “true”
        2. you do not have ZooKeeper enabled, which is required for locking to work in Hive.

        Cheers

      1. Eric Lin

        Hi Amulyam,

        Thanks for your comments.

        Hive currently has ACID Transactions feature available. Hortonworks supports it, however, Cloudera still considers this feature is experimental and currently does not support ACID in CDH.

        Normally currently users do not use manual locking on Hive tables, because Hive queries themselves will take care of that automatically. However, if user decides for whatever reason, he/she does not want others to view or update the table, then locking can be used.

        Hope above helps.

        1. Amulyam Agrawal

          I have a usecase where I want the whole table to be locked as I want to overwrite it but there are many read queries which are coming all this time. What do you suggest in above scenario?

          1. Eric Lin

            Hi Amulyam,

            In that case the READ operation really has to wait. If you force READING while WRITING, then the READ operation will return unexpected results. Currently Hive will enforce EXCLUSIVE lock while writing, which will cause any subsequent read on the same table will have to wait. You don’t really have a choice, and you should not anyway for consistent.

            If you have the table partitioned, and READING and WRITING are operating on different partitions, then that will work, as they will not interfere with each other. The EXCLUSIVE lock will only apply to certain partitions that the query will run over.

            Cheers.

      1. Eric Lin

        Hi Raushan,

        Thanks for visiting my blog. To answer your question, no, you can’t drop a table if there is a shared lock. Hive will wait for the lock to be released before dropping the table. If the lock is there for a long time, then the drop query might eventually fail.

        Cheers

    Leave a Reply

    Your email address will not be published. Required fields are marked *