Alternative Timestamp Support in Hive (ISO-8601)

Hive does not support for ISO-8601 timestamp format, like this “2017-02-16T11:24:29.000Z” by default.

Check the following test case:

1. Create a file with the following content:

2017-02-16T11:24:29.000Z
2017-02-16 11:24:29

2. Put the file in HDFS:

hadoop fs -put test.txt /tmp/test/data

3. Create an external table links to it:

CREATE EXTERNAL TABLE ts_test (a timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED by ',' LOCATION '/tmp/test/data';

4. When you select the table, first record will be NULL:

+------------------------+--+
|       ts_test.a        |
+------------------------+--+
| NULL                   |
| 2017-02-16 11:24:29.0  |
+------------------------+--+

This is due to Hive not able to recognise timestamp format of “2017-02-16T11:24:29.000Z”.

As of CDH5.7.x or Hive 1.2, Hive supports reading alternative timestamp formats, see HIVE-9298

To make it work, run the following Hive query:

ALTER TABLE ts_test SET SERDEPROPERTIES ("timestamp.formats"="yyyy-MM-dd'T'HH:mm:ss.SSSZ");

Then data can be read correctly by Hive:

+------------------------+--+
|       ts_test.a        |
+------------------------+--+
| 2017-02-16 03:24:29.0  |
| 2017-02-16 11:24:29.0  |
+------------------------+--+

The different values is due to timezone conversion (Z is for UTC). Hive treats “2017-02-16T11:24:29.000Z” as UTC and then converts it to server’s local time, in the case of second value of “2017-02-16 11:24:29”, no conversion is done so original value is returned.

Hope this helps.

Leave a Reply

Your email address will not be published. Required fields are marked *