Read Files under Sub-Directories for Hive and Impala

Sometimes you might want to store data under sub-directories in HDFS and then you want Hive or Impala to read from those sub-directories. For example, you have the following directory structure:

root hdfs     231206 2017-06-30 02:45 /test/table1/000000_0
root hdfs          0 2017-06-30 02:45 /test/table1/child_directory
root hdfs     231206 2017-06-30 02:45 /test/table1/child_directory/000000_0

By default, Hive will only look for files in the root of directory specified, in my test case is /test/table1. However, Hive supports to read all data under the root table’s sub-directories as well. This can be achieved by updating the following settings:

SET mapred.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories=true;

Impala however, on the other side, currently does not support reading files from table’s sub-directories. This has been reported in the upstream JIRA of IMPALA-1944. Currently there is no immediate plan to support such feature, but it might be in the future release of Impala.

Hope above information is useful.

Leave a Reply

Your email address will not be published. Required fields are marked *