Simple Tool to Enable SSL/TLS for CM/CDH Cluster

Since earlier this year, Cloudera has started a new program that allows each Support Engineer to do a full week offline self-learning. Topics can be chosen by each individual engineer so long as the outcome has a value to the business, It can be either the engineer skilled up with a certification that helps with day to day work, or a presentation to share with the rest of the team what he/she had learnt from the week doing self-learning. Last week, from 27th of August to 31st of August was my turn.

After a careful consideration, I thought that my knowledge on SSL/TLS area needed to be skilled up, so I had decided to find some SSL/TLS related courses on either SafariOnline or Lynda, and then see if I could try to enable Cloudera Manager as well as most of the CDH services with SSL/TLS, ideally to put everything into a script so that this process can be automated. I discussed this with my manager and we agreed on my plan.

On the first two days, I found a couple of very useful video courses from Lynda.com, see below link:

SSL Certificates For Web Developers
Learning Secure Sockets Layer

They were very useful in helping me getting a better understanding of the fundamental of SSL/TLS and how to generate keys and sign the cerficate all by yourself.

After that I reviewed Cloudera’s official online documentation on how to enable SSL/TLS for Cloudera Manager as well as the rest of CDH services and built a little tool that is written in shell script to allow anyone to generate certificates on the fly and enable SSL/TLS for his/her cluster with a simple couple of commands.

The documentation links can be found below:

Configuring TLS Encryption for Cloudera Manager
Configuring TLS/SSL Encryption for CDH Services

I have published this little tool on github and is available here. Currently it supports enabling SSL/TLS for the following services:

Cloudera Manager (from Level 1 to Level 3 security)
HDFS
YARN
Hive
Impala
Oozie
HBase
Hue

With this tool, user can enable SSL/TLS for any of the above services with ease in a few minutes.

If you have any suggestions or comments, please leave them in the comment section below, thanks.

How to use Cloudera Manager API to check a service role exited unexpectedly

This blog explains the steps to use Cloudera Manager API to check for a service role in CDH that was exited unexpectedly, so that proper action can be taken.

To check a service role’s status via Cloudera Manager API, please follow the steps below (I am taking Impala as an example):

  1. Determine the version of API you are using:

    curl -u username:password http://<cm-host>:7180/api/version
    

    in CDH5.7.x, it should return “v12”.

  2. get the cluster name from the output of:
    curl -u username:password http://<cm-host>:7180/api/v12/clusters
    

    sample output:

    {
      "items" : [ {
        "name" : "cluster",
        "displayName" : "Cluster 1",
        "version" : "CDH5",
        "fullVersion" : "5.7.0",
        "maintenanceMode" : false,
        "maintenanceOwners" : [ ],
        "clusterUrl" : "http://<cm-host>:7180/cmf/clusterRedirect/cluster",
        "hostsUrl" : "http://<cm-host>:7180/cmf/clusterRedirect/cluster/hosts",
        "entityStatus" : "GOOD_HEALTH"
      } ]
    }
    

    take note of the value for “name” attribute, in my case it is “cluster”.
  3. get the services in the cluster:

    curl -u username:password http://<cm-host>:7180/api/v12/clusters/<cluster-name>/services
    

    substitute with our cluster’s name:

    curl -u username:password http://<cm-host>:7180/api/v12/clusters/cluster/services
    

    please locate the impala service and get its “name”, in my case it is “impala”:

    {
        "name" : "impala",
        "type" : "IMPALA",
        "clusterRef" : {
          "clusterName" : "cluster"
        },
        ....
    }
    
  4. get the all the roles under impala service:

    curl -u username:password http://<cm-host>:7180/api/v12/clusters/<cluster-name>/services/<impala-name>/roles/
    

    in my case should be:

    curl -u username:password http://<cm-host>:7180/api/v12/clusters/cluster/services/impala/roles
    

    locate the role that you want to monitor, I picked Statestore:

    {
        "name" : "impala-STATESTORE-52cc0fbf54f5cc038b2b0a67634034fe",
        "type" : "STATESTORE",
        "serviceRef" : {
          "clusterName" : "cluster",
          "serviceName" : "impala"
        },
        ....
    }
    

    in my case it is “impala-STATESTORE-52cc0fbf54f5cc038b2b0a67634034fe”

  5. get the status for the role you want to monitor:

    curl -u username:password http://<cm-host>:7180/api/v12/clusters/<cluster-name>/services/<impala-name>/roles/<role-name>
    

    after substitution, it should be:

    curl -u username:password http://<cm-host>:7180/api/v12/clusters/cluster/services/impala/roles/impala-STATESTORE-52cc0fbf54f5cc038b2b0a67634034fe
    

    this will give you full status output for this particular role:

    {
      "name" : "impala-STATESTORE-52cc0fbf54f5cc038b2b0a67634034fe",
      "type" : "STATESTORE",
      "serviceRef" : {
        "clusterName" : "cluster",
        "serviceName" : "impala"
      },
      "hostRef" : {
        "hostId" : "eff96b49-739e-48d4-a19b-e5865a83b164"
      },
      .....
      "configStalenessStatus" : "FRESH",
      "maintenanceMode" : false,
      "maintenanceOwners" : [ ],
      "commissionState" : "COMMISSIONED",
      "roleConfigGroupRef" : {
        "roleConfigGroupName" : "impala-STATESTORE-BASE"
      },
      "entityStatus" : "GOOD_HEALTH"
    }
    

    look for the last attribute called “entityStatus”, it has the following possible values:

    UNKNOWN	
    NONE	
    STOPPED	
    DOWN	
    UNKNOWN_HEALTH	
    DISABLED_HEALTH	
    CONCERNING_HEALTH	
    BAD_HEALTH	
    GOOD_HEALTH	
    STARTING	
    STOPPING	
    HISTORY_NOT_AVAILABLE
    

    in the case that it is exited unexpectedly, the value would be “DOWN”, so that we can programmatically decide whether we can just restart it or not.

Please note that if you have enabled SSL for Cloudera Manager, the URL should be changed to: https://:7183 instead of http://:7180

More information about the apiRole entity can be found here: apiRole

Timestamp stored in Parquet file format in Impala Showing GMT Value

This article explains why Impala and Hive return different timestamp values on the same table that was created and value inserted from Hive. It also outlines the steps to force Impala to apply local time zone conversion when reading timestamp field stored in Parquet file format.

When Hive stores a timestamp value into Parquet format, it converts local time into UTC time, and when it reads data out, it converts back to local time.

Impala, however on the other hand, does no conversion when reads the timestamp field out, hence, UTC time is returned instead of local time.

Both behaviors are by design and work in the right way. More information can be found at: TIMESTAMP Data Type

However, Impala can be set to apply the conversion as well to the timestamp field stored in Parquet file format (only available in Cloudera Manager 5.4), which is also mentioned in the link above. To do this, follow the steps below:

  1. Go to Impala Services home page
  2. Click on “Configuration
  3. ​On the left side under “Filters“, click “Impala Daemon” under “Scope” and “Advanced” under “Category
  4. Locate “Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)“, and then enter the following:
--convert_legacy_hive_parquet_utc_timestamps=true
  1. Save the changes
  2. Restart all Impala Daemons

impala-config

To confirm that the change takes effect, follow the steps below:

  1. Go to Impala Home page
  2. Click on “Instances” tab
  3. Click on any “Impala Daemon” link (make sure you have restarted all of them)
  4. Under “Summary” > “Quick Links“, click on “Impala Daemon Web UI
  5. A new page will open, click on the last tab on the top of the page named “/varz
  6. Search “convert_legacy_hive_parquet_utc_timestamps” and confirm that it is set to “true”: –convert_legacy_hive_parquet_utc_timestamps=true

impala-flags


This enables Impala to do the time zone conversion when reading timestamp field from Parquet file.

Update:

Please be warned that this will have some performance hit if you go with this path, please refer to upstream Impala JIRA: IMPALA-3316 for more details.