Hive MetaStore migration from Embedded Postgres to MySQL

Recently I was dealing with a case where customer wanted to migrate from Cloudera Manager’s Embedded PostgreSQL to MySQL due to the fact that the Embedded PostgreSQL provided by Cloudera Manager is not recommended for production use, instead, using external databases like MySQL, PostgreSQL or Oracle etc is highly recommended. The Embedded PostgreSQL should only be considered to be used by development or testing clusters.

For more information, please refer to Cloudera’s documentation regarding Cloudera Manager and Managed Service Datastores.

So, this article is focusing on how to migrate from Embedded PostgreSQL to MySQL, due to some incompatibilities between database schema for Hive under MySQL and PostgreSQL (see HIVE-7018 for details, at the time of writing HIVE-7018 is not available in any CDH releases). Assuming that you are using Cloudera Manager, please follow the steps below:

  1. Stop cluster so that no changes will happen to the cluster during the migration
  2. Backup the old PostgreSQL database, including schema and data
  3. Create a new database and user in MySQL, with proper permissions:
    CREATE DATABASE metastore DEFAULT CHARACTER SET utf8;
    GRANT ALL ON metastore.* TO 'hive'@'%' IDENTIFIED BY 'hive_password';
    
  4. Update the database connection details in Cloudera Manager > Hive > Configuration page to use the new database
  5. Go to Hive > “Actions” menu on the top right corner and click on “Create Hive MetaStore Database Tables”
  6. Export the PostgreSQL DB without schema definition, i.e., data only, using the following command:
    pg_dump -U <username> --column-inserts --format p -h <postgre-host> -p <port-number> -f hive-in.sql -a <database>
    

    the “-a” for “data only” option

  7. Download a php program from http://www.lightbox.ca/pg2mysql/pg2mysql-1.9.tar.bz2. This program can help us to do the SQL conversion from PostgreSQL to MySQL
  8. Install php-cli on your machine. If you use CentOS, simply run:
    sudo yum install php-cli
    
  9. Run the following command to generate the hive-out.sql file (tested on PHP5.3)
    php pg2mysql-1.9/pg2mysql_cli.php hive-in.sql hive-out.sql InnoDB
    
  10. Open the hive-out.sql file in any editor and add:
    SET FOREIGN_KEY_CHECKS=0;
    

    to the beginning of the file and:

    SET FOREIGN_KEY_CHECKS=1;
    

    to the end of the file, so that no foreign keys will be checked during data import.

  11. Update the following three tables to UPPER CASE, so change from:
    INSERT INTO next_compaction_queue_id (ncq_next) VALUES (1);
    INSERT INTO next_lock_id (nl_next) VALUES (1);
    INSERT INTO next_txn_id (ntxn_next) VALUES (1);
    

    to:

    INSERT INTO NEXT_COMPACTION_QUEUE_ID (ncq_next) VALUES (1);
    INSERT INTO NEXT_LOCK_ID (nl_next) VALUES (1);
    INSERT INTO NEXT_TXN_ID (ntxn_next) VALUES (1);
    

    because those tables are in the UPPER case in MySQL version.

  12. Remove the following line in the hive-out.sql file:
    INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '1.1.0', 'Hive release version 1.1.0');
    

    As we have used Cloudera Manager to generate the schema for us in MySQL at step 5 and this table has been populated automatically.

  13. Put hive-out.sql file on the MySQL server host
  14. Log into MySQL and then run command:
    USE metastore;
    SOURCE /path/to/hive-out.sql;
    
  15. Now ready to “Start” Hive service and test out if everything is OK.

Sqoop1 Import Job Failed With Error “java.io.IOException: No columns to generate for ClassWriter”

Recently when I was testing Sqoop1 command in my CDH cluster, I kept getting “java.io.IOException: No columns to generate for ClassWriter” error.

The full command was like below:

sqoop import --connect jdbc:mysql://<mysql-host>/test 
    --table test 
    --username <username> 
    --password <password> 
    --target-dir sqoop_test 
    -m 1

And full stacktrace:

16/08/20 03:03:13 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@7cd1be26 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@7cd1be26 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
	at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2735)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1899)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569)
	at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524)
	at com.mysql.jdbc.ConnectionImpl.getMaxBytesPerChar(ConnectionImpl.java:3003)
	at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:602)
	at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:445)
	at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:305)
	at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
	at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1846)
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1646)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/08/20 03:03:13 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1652)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Although I do not have the RCA yet for this issue, I do have a workaround, which is adding “––driver com.mysql.jdbc.Driver” to the Sqoop parameters. So the full command becomes:

sqoop import --connect jdbc:mysql://<mysql-host>/test 
    --table test 
    --username <username> 
    --password <password> 
    --target-dir sqoop_test 
    -m 1 
    --driver com.mysql.jdbc.Driver

Hopefully this can help with anyone who might have the same issue.

Hive Metastore Upgrade Failed with Error: Specified key was too long; max key length is 1000 bytes

Sympton:

After upgrading Cloudera Manager (CM) from CDH5.3 to CDH5.4, the requirement was to also upgrade the Hive Metastore Database Schema:

upgrade-hive-metastore-database-schema

However, it failed with the following message:

mysql-1000-bytes-error

It is not immediately apparent what needs to be done. Initially you might think that some of the indexes contained too many characters. However, it was not the case in this scenario.

Cause:

After examination, we have found out that this was caused by corrupted InnoDB log files in MySQL.

So the story was that the MySQL Metastore was configured to use InnoDB Engine, however, when InnoDB log files got corrupted, all tables were reverted back to MyISAM, even though they were created using InnoDB.

mysql> SHOW ENGINES;
+------------+---------+------------------------------------------------------------+--------------+------+------------+
| Engine     | Support | Comment                                                    | Transactions | XA   | Savepoints |
+------------+---------+------------------------------------------------------------+--------------+------+------------+
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                      | NO           | NO   | NO         |
| CSV        | YES     | CSV storage engine                                         | NO           | NO   | NO         |
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance     | NO           | NO   | NO         |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables  | NO           | NO   | NO         |
+------------+---------+------------------------------------------------------------+--------------+------+------------+
4 rows in set (0.00 sec)

This showed that InnoDB was not supported, however, in my.cnf file, there was no such config to disable InnoDB specifically.

Resolution:

The solution was simple, all we need to do is to stop MySQL service, remove those problematic log files of InnoDB located at /var/lib/mysql directory (of course, depends on your installation), and finally bring MySQL back online:

[root@localhost ~]# service mysqld stop
[root@localhost ~]# ll /var/lib/mysql/
total 36876
drwx------ 2 mysql mysql     4096 Jun 26 06:46 hive_metastore
-rw-rw---- 1 mysql mysql 27262976 Jun 26 06:47 ibdata1
-rw-rw---- 1 mysql mysql  5242880 Jun 26 06:47 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 Jun 26 06:46 ib_logfile1
drwx------ 2 mysql mysql     4096 Jun 11 01:20 mysql
srwxrwxrwx 1 mysql mysql        0 Jun 11 01:20 mysql.sock
drwx------ 2 mysql mysql     4096 Jun 26 05:44 test
[root@localhost ~]#

So simply run:

[root@localhost ~]# cd /var/lib/mysql
[root@localhost ~]# rm -f ib_logfile0 ib_logfile1
[root@localhost ~]# service mysqld start

If you don’t feel confident, you can also just rename those files

[root@localhost ~]# cd /var/lib/mysql
[root@localhost ~]# mv ib_logfile0 ib_logfile0.bak
[root@localhost ~]# mv ib_logfile1 ib_logfile1.bak
[root@localhost ~]# service mysqld start

And then re-run the “Upgrade Hive Metastore Database Schema”, and finally restart HMS.

This of course is not the solution for all cases, but worth a try.

    MySQL group_concat Character Limit

    GROUP_CONCAT  in MySQL is quite a handy function and I have used it lots of times to solve problems in my professional life. And today I just used it to solve a memory issue I have in my PHP script which requires grouping of data.

    It all worked OK if you work with very small data set, but problems will arise if you need to process hundreds of millions of rows of data РMySQL does not return all data set required in the GROUP BY and will simply truncate the data based on the group_concat_max_len configuration in MySQL, and the default value is only 1024.

    If you are working with a big data set, it is advised to set this config at run time so that you won’t get the unexpected behaviour:

    
    SET GROUP_CONCAT_MAX_LEN = 100000;
    
    

    Be mindful that although you can change the limit for GROUP_CONCAT, the upper limit is still controlled by the max_allowed_packet, please refer to MySQL documentation for more details.

    MAMP MySQL Access Denied Error

    After installing MAMP 2.0.5 and changed MySQL’s default root password, I kept getting the following error messages:

    The phpMyAdmin still works, because I have already updated the config.inc.php, but this popup every time I start MAMP server is just annoying. And I have found a way to fix this.

    To fix the first error Open file “/Applications/MAMP/bin/quickCheckMysqlUpgrade.sh” and update the password in it

    /Applications/MAMP/Library/bin/mysqlcheck --quick --check-upgrade -u root -p{changeme} --socket=/Applications/MAMP/tmp/mysql/mysql.sock mysql
    

    Do the same for the following files too:

    “/Applications/MAMP/bin/upgradeMysql.sh”
    “/Applications/MAMP/bin/checkMysql.sh”

    To fix the second error update the password as well in file: “/Applications/MAMP/bin/mamp/index.php”

    
    <?php
    include_once 'php/functions.php';
    
    $port = '3306';
    $link = @mysql_connect(':/Applications/MAMP/tmp/mysql/mysql.sock', 'root', '{changeme}');
    
    if (!$link) {
    exit('Error: Could not connect to MySQL server!');
    }
    mysql_close($link);
    ?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>MAMP</title>
    </head>
    <frameset rows="39,*" frameborder="no" border="0" framespacing="0">
    <frame src="<?php echo $language; ?>/navigation.php?language=<?php print $language; ?>" name="navigationFrame" id="navigationFrame" scrolling="No" noresize="noresize" />
    <frame src="<?php echo $language; ?>/index.php?language=<?php print $language; ?>" name="contentFrame" id="contentFrame" />
    </frameset>
    </html>
    
    

    After the changes, both errors should just disappear.