Hive Lateral View to Flatten Array Data

In this blog post, I will show how to transform the following data in Hive

[1,2,3],[4,5,6],[7,8,9]

and turn it into a table with the following output:

1,4,7
2,5,8
3,6,9

1. To prepare the table, firstly need to create a dummy table with one record:

CREATE TABLE dummy (a int);
INSERT INTO TABLE dummy VALUES (1);

This is to allow us to insert array data into our testing table by SELECTing from our 1 row dummy table.

2. Create a table with array data type:

CREATE TABLE array_table (a array<int>, b array<int>, c array<int>);
INSERT INTO array_test SELECT array(1,2,3), array(3,4,5), array(6,7,8) FROM dummy;

This will get us ready with a table having the following data:

+-------------+-------------+-------------+--+
| a           | b           | c           |
+-------------+-------------+-------------+--+
| [1,2,3]     | [3,4,5]     | [6,7,8]     |
+-------------+-------------+-------------+--+

3. Now run the following query using LATERAL VIEW will give the output we want:

SELECT key1, key2, key3 FROM array_test 
LATERAL VIEW posexplode(a) t1 AS q1, key1 
LATERAL VIEW posexplode(b) t2 AS q2, key2 
LATERAL VIEW posexplode(c) t3 AS q3, key3 
WHERE q1 = q2 AND q1 = q3;

+-------+-------+-------+--+
| key1  | key2  | key3  |
+-------+-------+-------+--+
| 1     | 3     | 6     |
| 2     | 4     | 7     |
| 3     | 5     | 8     |
+-------+-------+-------+--+

This might not be the best solution, but at least it works. However, how well the performance goes on real Big Data set, you have to test out and confirm yourself. Hope above helps.

Leave a Reply

Your email address will not be published. Required fields are marked *