set mapreduce.input.fileinputformat.split.maxsize=100000; set mapreduce.input.fileinputformat.split.minsize=100000;will trigger two mappers for the map reduce job setting the value of
set mapreduce.input.fileinputformat.split.maxsize=50000; set mapreduce.input.fileinputformat.split.minsize=50000;will trigger 4 mappers for the the same job. If hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat” which is the default in newer version of Hive, Hive will also combine small files whose file size are smaller than mapreduce.input.fileinputformat.split.minsize, so the number of mappers will be reduced to reduce overhead of starting too many mappers. However, this is also affected by the data locality of each HDFS blocks. For example, if there are lots of small files that are stored on different data nodes, Hive will not be able to reduce the number of mappers by combining those files because they are not stored on the same machine.