Reading JSON data in Hive is super easy, you can simply use the custom JSON SerDe library that has been created by someone else and load it into Hive and you are ready to go. No UDFs or UDAFs are required. Just follow the steps below: 1. Assuming you have the following json data in HDFS under /tmp/json-test/test.json, one per line [generic] { “details”: { “first_name” : “First Name”, “last_name” : “Last Name” }, orders: [ {“id” : 1}, {“id” : 2} ] } { “details”: { “first_name” : “Foo”, “last_name” : “Bar” }, orders: [ {“id” : 4}, {“id” : 43}, {“id”: 111} ] } 2. Download binary jar file based on your CDH version at: http://www.congiu.net/hive-json-serde/1.3/ to /tmp. You can also read the source code of the project from GitHub. 3. Enter hive command line either using Beeline or Hive 4. Enter the following commands:
ADD JAR /tmp/json-serde-1.3-jar-with-dependencies.jar;

DROP TABLE IF EXISTS customer;

CREATE EXTERNAL TABLE IF NOT EXISTS customer ( 
    details struct,
    orders array>
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '/tmp/json-test';
5. Now you can query the data using the following HQL:
SELECT CONCAT(details.first_name, ' ', details.last_name), orders[0].id FROM customer;
SELECT CONCAT(details.first_name, ' ', details.last_name), size(orders) FROM customer;
Note how easy it is to query the JSON data now in Hive. And of course, you will have to know the JSON data structure very well. Hope this helps.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *