Introduction to Apache Ranger – Part II – Architecture Overview

Introduction to Apache Ranger – Part II – Architecture Overview

In the last Episode, I have quickly introduced the main features that Ranger provides, the main differences between Ranger and Sentry that have to offer to the end users and the main reason that Cloudera has chosen Ranger as the replacement for Sentry in the latest product that is offered by Cloudera, CDP. If you have missed, please review the Introduction to Apache Ranger – Part I – Ranger vs Sentry.

In this second episode, I would like to introduce some basic architecture of Ranger, the components that combine together to form the full Ranger product.

To start with, let’s list out all the components inside Ranger:

  • Ranger Admin Server/Portal
  • Ranger Policy Server
  • Ranger Plugins
  • Ranger User/Group Sync
  • Ranger Tag Sync
  • Ranger Audit Server

And below is a nice Architecture graph that shows you the relationship between each components:

Image source: https://kymr.github.io/files/hadoop-summit/security/ranger_architecture.png

Now, let’s have a look in more detail on what each component does.

Ranger Admin Server/Portal

  • Central interface for security administration
  • Admin users can
    • Define repositories
    • Create and update policies
    • Manage Ranger users/groups
    • Define audit policies
    • View audit activities
  • It runs embedded Tomcat server
  • Provides Ranger API

Ranger Policy Server

  • Allows admin users to define/update policy details
  • Allows admin users to specify which users are the delegate admins, who can have access to modify policies
  • Policies can be divided into different security zones
    • One resource can only be assigned to one security zone
    • If resource is matched, only the policies in the defined zone will be checked
    • If no resource is matched, policies under the default zone (without a name) will be used
  • Supports both allow and deny policies
    • Denials will be checked before allowances
  • Policies can apply at User or Group levels

Ranger User/Group Sync

  • Synchronisation utility to pull users and groups, it supports user/group sources from:
    • Unix
    • LDAP
    • AD
  • User/Group information is stored within Ranger admin policy DB and used for policy definition

Ranger Plugins

  • Lightweight Java programs to be installed in Hadoop components, like HDFS or Hive
  • Pulls in policies regularly from Admin Server and cache locally
  • Acts as authorisation module and evaluate user requests against security policies
    • If no policy found, will fallback to HDFS ACLs for HDFS request, access will be denied for all other components
  • Trigger audit data store request (to both HDFS and Solr)

Ranger Audit Server

  • Audits are configured via policies (user specifies if audit need to be enabled or not if this policy applies)
  • Audits are stored in both HDFS and Solr by default
    • Data in Solr will be used to display audit data in Ranger admin UI
    • Data in HDFS as a backup and won’t be used (as far as my understanding goes)
    • Audits in DB is no longer supported since 0.5
  • Supports Audit Log Summarisation
    • Since Apache Ranger 0.5
    • Similar logs within defined period that only differs by timestamp will be aggregated to single audit entry, to avoid large number of audit logs
    • Default to 5 seconds

Ranger Tag Sync

  • Since Apache Ranger 0.6
  • It separates resource-classification from access-authorisation
  • Can have one Tag policy applies to multiple components, so long as resources have the same tag attached
    • Helps to reduce the amount of policies that are needed in Ranger
  • Requires Apache Atlas to manage metadata (Hive DBs/Tables, HDFS Path, Kafka Topics and Tags/Classification etc)
  • Event based
    • Any changes in Hive etc will send event to Kafka topic (ATLAS_HOOK) and then Atlas will pick up the changes
    • Any changes in Atlas will send event to Kafka topic (ATLAS_ENTITIES) and then Ranger Tag Sync will pick up the changes
  • Tag policies will be evaluated before Resource based policies

As you can see, there are a lot happening inside Ranger, and I think above overview should give you fair idea of how Ranger functions as a whole. If you have any comments, please post them below.

Stay tuned for the Part III of the series in the coming days.

Loading

2 Comments

  1. I’m looking for a concise explanation of what Ranger can do in terms of attribute-based access control so I can share it with…. people who need to know. This series is helpful, certianly. But what would be really, really useful is an explanation of how tag-based policies work, in the sense of “what are tags, what can be tagged, how do tags combine to define policies, etc..”

Leave a Reply to antonio Cancel reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!