In my previous Ranger tutorials, I have introduced Tag based policies in Ranger, where Ranger will need to sync Tag/Classification related information from Atlas to Ranger, so that Ranger would know which entities/attributes have certain tags attached.
Recently I was working on an issue that below error can be found in the TagSync service log and certain entities’ tag information not being synced properly to Ranger:
2020-04-24 10:12:14,635 [http-bio-6167-exec-4361] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:63) - Request failed. loginId=rangertagsync, logMessage=Error Populating XXServiceResource. No Service found with name: cluster12_hive javax.ws.rs.WebApplicationException
In this post, I will briefly explain why this happens and how to fix it.
Firstly, we need to understand the information being stored in Atlas. Please see below screenshot:
The one that we need to pay attention to is the qualifiedName field. For Hive entities, it has the following format:
For HDFS entities, it has the following format:
The database_name, table_name and HDFS paths are pretty self-explanatory, for service_name, it is the service/repository name that you have setup in Ranger, under the home page. In my case is c2393, see screenshot below:
After tag information is updated on Atlas side, a Kafka event will be sent to a topic called ATLAS_ENTITIES, and the TagSync service will act as the consumer for the Kafka topic. While TagSync processes the data from Kafka, it will recognize that this entity is for a Hive table, so it will add “_hive” postfix to the end of the service_name extracted. In our case, it will be c2393_hive. TagSync will then use the information gathered and push through to the service under c2393_hive in Ranger.
If the service_name in Atlas is invalid, meaning there is no corresponding service/repository name defined in Ranger, then TagSync will log such error message mentioned above. But it will fail only for the entities that have bad data, other entities will work as normal.
I am not an expert in Atlas, and I don’t see there is delete option in the Atlas UI that will allow you to delete entities. In order to fix the entities’ tag/classification information not being synced to Ranger issue, the simple solution is to create a new entity with the correct service name. Of course, it would be better to clean up the invalid entities, but it falls outside the scope of this post. If you know how to do so, please add comments and let me know.