How to Disable the Facebook Friend Finder Suggestion

I have been REALLY annoyed by Facebook when it keeps suggesting me the strangers that I do not know on daily basis, including Facebook site and especially on my mobile phone. I am really getting pissed off and about to uninstall Facebook if I can’t find a solution to this. Lots of people saying there is no way of doing it as it is built in by Facebook and there is no settings to control it.

2016-08-22 12.40.49

Today, I found this article How to Disable the Facebook Friend Finder Suggestion, and I thought to give it a try. This article suggests that those friends suggestions were made based on your imported contact list, so I followed the steps and took the following screenshots.

Step 1:

Navigate to your Facebook page and find the “PEOPLE YOU MAY KNOW” section, click on “See All” link:

Screen Shot 2016-08-22 at 10

Step 2:

Click “Manage imported contacts.”

Screen Shot 2016-08-22 at 10.23.46 PM

Step 3:

Click “Remove all contacts,” and then click “Remove.” A status message appears, advising that a confirmation notice will be sent to you.

Screen Shot 2016-08-22 at 10.24.05 PM

Screen Shot 2016-08-22 at 10.24.13 PM

Screen Shot 2016-08-22 at 10.24.34 PM

Step 4:

Currently I am still waiting for confirmation regarding this action, 30 minutes passed and my contacts still in my list.

Let’s see what happens tomorrow.

Sqoop1 Import Job Failed With Error “java.io.IOException: No columns to generate for ClassWriter”

Recently when I was testing Sqoop1 command in my CDH cluster, I kept getting “java.io.IOException: No columns to generate for ClassWriter” error.

The full command was like below:

sqoop import --connect jdbc:mysql://<mysql-host>/test 
    --table test 
    --username <username> 
    --password <password> 
    --target-dir sqoop_test 
    -m 1

And full stacktrace:

16/08/20 03:03:13 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@7cd1be26 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@7cd1be26 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
	at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2735)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1899)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569)
	at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524)
	at com.mysql.jdbc.ConnectionImpl.getMaxBytesPerChar(ConnectionImpl.java:3003)
	at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:602)
	at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:445)
	at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:305)
	at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
	at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1846)
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1646)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/08/20 03:03:13 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1652)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Although I do not have the RCA yet for this issue, I do have a workaround, which is adding “––driver com.mysql.jdbc.Driver” to the Sqoop parameters. So the full command becomes:

sqoop import --connect jdbc:mysql://<mysql-host>/test 
    --table test 
    --username <username> 
    --password <password> 
    --target-dir sqoop_test 
    -m 1 
    --driver com.mysql.jdbc.Driver

Hopefully this can help with anyone who might have the same issue.

Yarn Job Failed with Error: “Split metadata size exceeded 10000000”

When you run a really big job in Hive that failed with the following error:

2016-06-28 18:55:36,830 INFO [Thread-58] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_1465344841306_1317
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1057)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1500)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_1465344841306_1317
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
... 17 more

This indicated that the value for mapreduce.job.split.metainfo.maxsize is too small for your job (default value of 10000000).

There are two options to fix this:

1. Set the value of mapreduce.job.split.metainfo.maxsize to be “-1” (unlimited) specifically for this job just before running it:

SET mapreduce.job.split.metainfo.maxsize=-1;

This should remove the limit, however, be warned that it will effectively let YARN to create unlimited metadata splits, if there is not enough resources on your cluster, it could have the potential to bring down the host.

2. The safer way is to increase the value to maybe double the value of default, which is 10000000:

SET mapreduce.job.split.metainfo.maxsize=20000000;

You could gradually increase the value and monitor your cluster to make sure that it will not bring down your machines.

I have seen other posts on Google that people were suggesting to set the value of mapreduce.job.split.metainfo.maxsize in mapred-site.xml configuration file. In my opinion, this only affect small number of queries when running against very BIG data set, so it is better to set this value at job level, so that no cluster restart will be required.

Please note that if you are using MapReduce V1, the setting should be mapreduce.jobtracker.split.metainfo.maxsize instead, which does the same thing.

Hope this helps.

Sentry HDFS sync will NOT sync URI privilege

I have seen lots of cases that Hive users were trying to give user permission to a certain directory using Sentry command to GRANT URI in the Sentry HDFS sync enabled cluster. This logic seems to be correct, however it will not work.

Sentry HDFS sync will only sync Sentry privileges to HDFS ACLs at database or table level, and it will ignore all the privileges for URIs.

So if your username is “test”, GROUP is “test_group” and ROLE is “test_role”, which as the following privilege:

0: jdbc:hive2://ausplcdhedge03.us.dell.com:10> show grant role test_role;
+---------------------------------+-------+-----------+--------+----------------+----------------+------------+--------------+------------------+---------+
| database                        | table | partition | column | principal_name | principal_type | privilege  | grant_option | grant_time       | grantor |
+---------------------------------+-------+-----------+--------+----------------+----------------+------------+--------------+------------------+---------+
| hdfs://nameservice1/path/to/dir |       |           |        | test_role      | ROLE           | *          | false        | 1468340836037000 | --      |
+---------------------------------+-------+-----------+--------+----------------+----------------+------------+--------------+------------------+---------+

If you “getfacl” on path “hdfs://nameservice1/path/to/dir”, it will not show that GROUP “test_group” has READ and WRITE permissions. In order to get the Sentry privilege synced, a table will need to be linked to the URI.

Try the following:

CREATE DATABASE dummy; -- have any dummy tables under this database
USE dummy;
CREATE EXTERNAL TABLE dummy (a int) LOCATION "/path/to/dir";
GRANT ALL ON TABLE dummy TO ROLE test_role;

Now if you try to run “hdfs dfs -getfacl /path/to/dir”, the test_group should show up and have “rwx” permissions.

Hive’s Staging Directory Not Cleaned Up Properly

This article explains the situation that will cause Hive to leave its staging directories that were created during processing and not doing clean up after job finished successfully.

The issue happens when user runs Hive query through Hue’s Hive Editor, it does not apply to queries running through Beeline, Hive CLI or through JDBC driver.

To re-produce the issue, simply run a SELECT COUNT(*) query against any table through Hue’s Hive Editor, and then check the staging directory created afterwards (defined by hive.exec.stagingdir property). You will notice that directory looks something like “.hive-staging_hive_2015-12-15_10-46-52_381_5695733254813362445-1329” remains under the staging directory.

This is due to Hue keeps each query open for a long time, or never close them, depending on configuration. And the staging directory’s clean up event will only happen when query connection is closed. As normally in Beeline, Hive CLI or JDBC Driver, once query finished and data returned to client, the query connection will be closed immediately, hence the cleanup event is trigger straightaway. Hue needs to keep the query connection so that user can still come back and retrieve the query result at later stage, or when user navigates away from the query page, the query will still be running in the background, rather than being killed forcibly. As a result, the staging directories never gets cleaned up.

There are two possible ways:

  1. To force Hue to close the query when user navigates away from the page, you can do the following:
      1. go to CM > Hue > Configuration
      2. add the following to “Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini”
        [beeswax]
        close_queries=true
        

        if [beeswax] section already exists, simply add “close_queries=true” to the section

      3. Save and restart Hue

    However, there are downsides to the workaround though:

    1. Users will not be able to retrieve historical query results through Hue anymore for the finished queries
    2. If user navigates away from the Hive Editor page when query was running, then the query will be closed and killed, rather than running in the background
  2. Set the following HiveServer2 parameters to control the session and operation/query time out values:
    hive.server2.session.check.interval = 1 hour
    hive.server2.idle.operation.timeout = 1 day
    hive.server2.idle.session.timeout = 3 days
    

    This will ensure that any HS2 sessions will be closed 3 days after inactivity, with session checking every 1 hour. Any operation/query that is kept open for more than 1 day will also be closed.This will ensure that the operation/query will be timed out after 1 day, which will force queries to be closed, hence trigger the clean up of staging directories.Those settings can be found by going to CM > Hive Configuration and look for the following configuration names:

    Session Check Interval
    Idle Operation Timeout
    Idle Session Timeout
    

    Above are recommended values, however, they should be changed based on cluster usage and query running time.