Beeline Failed To Start With OOM Error When Calling getConsoleReader Method

If you get the following error when trying to start up beeline from command line:

Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit 
at java.util.Arrays.copyOf(Arrays.java:2271) 
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) 
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) 
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:122) 
at org.apache.hive.beeline.BeeLine.getConsoleReader(BeeLine.java:854) 
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:766) 
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) 
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 

Based on the stacktrace, we can see that Beeline was at startup phase and was trying to initialize through getConsoleReader method, which will read data from beeline’s history file:

    try {
      // now load in the previous history
      if (hist != null) {
        History h = consoleReader.getHistory();
        if (h instanceof FileHistory) {
          ((FileHistory) consoleReader.getHistory()).load(new ByteArrayInputStream(hist
              .toByteArray()));
        } else {
          consoleReader.getHistory().add(hist.toString());
        }
      }
    } catch (Exception e) {
        handleException(e);
    }

By default, the history file is located under ~/.beeline/history and beeline will load the latest 500 rows into memory. If those queries are super big, containing lots of characters, it is possible that the history file size will reach as big as a few GBs. When beeline is trying to load such big history file into memory, it will eventually fail with OutOfMemory error.

I have reported such issue in the Hive upstream JIRA: HIVE-15166, and I am in the middle of submitting a patch for it.

For the time being, the best way is to remove the ~/.beeline/history file before you fire up beeline.

Beeline options need to be placed before “-e” option

Recently I needed to deal with an issue that users tried to specify “–incremental=true” as beeline command line option, due to the issue that beeline failed with OutOfMemory error when fetching results from HiveServer2. This option should help with the OOM problem, however it did not in this particular case. The command was run as below:

beeline --hiveconf mapred.job.queue.name=queue_name --silent=true 
-u 'jdbc:hive2://<hs2-host>:10000/default;principal=hive/<hs2-host>@<REALM>' 
--outputformat=csv2 --silent=true -e 'select * from table_name' 
--incremental=true > output.csv

It failed with the following error:

org.apache.thrift.TException: Error in calling method FetchResults
    at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1271)
    at com.sun.proxy.$Proxy8.FetchResults(Unknown Source)
    at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:363)
    at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:42)
    at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
    at org.apache.hive.beeline.Commands.execute(Commands.java:826)
    at org.apache.hive.beeline.Commands.sql(Commands.java:670)
    at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
    at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:716)
    at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:753)
    at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
    at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.Double.valueOf(Double.java:521)
    at org.apache.hive.service.cli.thrift.TDoubleColumn$TDoubleColumnStandardScheme.read(TDoubleColumn.java:454)
    at org.apache.hive.service.cli.thrift.TDoubleColumn$TDoubleColumnStandardScheme.read(TDoubleColumn.java:433)
    at org.apache.hive.service.cli.thrift.TDoubleColumn.read(TDoubleColumn.java:367)
    at org.apache.hive.service.cli.thrift.TColumn.standardSchemeReadValue(TColumn.java:318)
    at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
    at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
    at org.apache.thrift.TUnion.read(TUnion.java:138)
    at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.read(TRowSet.java:573)
    at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.read(TRowSet.java:525)
    at org.apache.hive.service.cli.thrift.TRowSet.read(TRowSet.java:451)
    at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.read(TFetchResultsResp.java:518)
    at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.read(TFetchResultsResp.java:486)
    at org.apache.hive.service.cli.thrift.TFetchResultsResp.read(TFetchResultsResp.java:408)
    at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.read(TCLIService.java:13171)
    at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.read(TCLIService.java:13156)
    at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.read(TCLIService.java:13103)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:501)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:488)
    at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1263)
    at com.sun.proxy.$Proxy8.FetchResults(Unknown Source)
    at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:363)
    at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:42)
    at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
    at org.apache.hive.beeline.Commands.execute(Commands.java:826)
    at org.apache.hive.beeline.Commands.sql(Commands.java:670)
    at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
    at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:716)
Error: org.apache.thrift.TApplicationException: CloseOperation failed: out of sequence response (state=08S01,code=0)
Error: Error while cleaning up the server resources (state=,code=0)

From this stacktrace, we could see that class “BufferedRows” was used, however, if “–incremental=true” was working, it should have used class “IncrementalRows” instead. This confirmed that “–incremental=true” option was not applied.

After further experiment, I figured out that the “–incremental=true” needed to go before “-e” option for it to take effect. So run the command as below:

beeline --hiveconf mapred.job.queue.name=queue_name --silent=true 
-u 'jdbc:hive2://<hs2-host>:10000/default;principal=hive/<hs2-host>@<REALM>' 
--outputformat=csv2 --silent=true ‚Äč--incremental=true
-e 'select * from table_name' > output.csv

helped to resolve the issue. I did not look into details on why, but it should help with anyone who might have similar issues.

Beeline Exit Codes Explained

Beeline will return non-zero exit code on failure in the following CDH versions 5.2.2 onwards.

The returned exit code simply means the number of errors occurred during beeline’s execution. So for example, the following command will result the exit code of 2 because the first two “show” command will fail and last one will succeed:

beeline -u jdbc:hive2://localhost:10000 -e "show table" -e "show t" -e "show tables"

In the bash script environment, the exit code is the only way to catch beeline’s failure reliably.