Sometimes you might want to store data under sub-directories in HDFS and then you want Hive or Impala to read from those sub-directories. For example, you have the following directory structure:
root hdfs 231206 2017-06-30 02:45 /test/table1/000000_0 root hdfs 0 2017-06-30 02:45 /test/table1/child_directory root hdfs 231206 2017-06-30 02:45 /test/table1/child_directory/000000_0
By default, Hive will only look for files in the root of directory specified, in my test case is /test/table1. However, Hive supports to read all data under the root table’s sub-directories as well. This can be achieved by updating the following settings:
SET mapred.input.dir.recursive=true; SET hive.mapred.supports.subdirectories=true;
Impala however, on the other side, currently does not support reading files from table’s sub-directories. This has been reported in the upstream JIRA of IMPALA-1944. Currently there is no immediate plan to support such feature, but it might be in the future release of Impala.
Hope above information is useful.