In the last few weeks, I have been dealing the Teradata Support for a Sqoop issue that the value with Timestamp(6) data type in Teradata will lost last 3 digits of nano seconds after importing into HDFS using Sqoop command.

The following test case validates the issue:

  1. Create a table in Teradata:

    ​CREATE TABLE vmtest.test (a integer, b timestamp(6) 
    FORMAT 'yyyy-mm-ddbhh:mi:ss.s(6)') PRIMARY INDEX (a);
    
    INSERT INTO vmtest.test VALUES (1, '2016-04-05 11:27:24.699022');
    
  2. And sqoop import command:

    sqoop import --connect jdbc:teradata://<teradata-host>/database=vmtest \
        --username dbc --password dbc --target-dir /tmp/test --delete-target-dir \
        --as-textfile --fields-terminated-by "," --table test
    
  3. data stored in HDFS as below:
    [cloudera@quickstart ~]$ hadoop fs -cat /tmp/test/part*
    1,2016-04-05 11:27:24.699
    

Notice the nano seconds part truncated from 699022 to 699

This is caused by a bug in TDCH (TeraData Connector for Hadoop) from Teradata, which is used by Cloudera Connector Powered by Teradata.

The workaround is to make sure that the timestamp value is in String format before passing it to Sqoop, so that no conversion will happen. Below Sqoop command is an example:

sqoop import --connect jdbc:teradata://<teradata-host>/database=vmtest \
    --username dbc --password dbc --target-dir /tmp/test \
    --delete-target-dir --as-textfile --fields-terminated-by "," \
    --query "SELECT a, cast(cast(b as format 'YYYY-MM-DD HH:MI:SS.s(6)') as char(40)) from test WHERE \$CONDITIONS" \
    --split-by a

After import, data is stored in HDFS correctly:

[cloudera@quickstart ~]$ hadoop fs -cat /tmp/test/part*
1,2016-04-0511:27:24.699022

Leave a Reply

Your email address will not be published. Required fields are marked *