In this article we will discuss various approaches for achieving data isolation or data layer security in a multi-tenancy data warehouse.
I will refer multi-tenancy data warehouse as an architecture where an instance of OLAP database is shared by many customers. In a multi-tenancy platform data isolation (row level data access) is usually performed in application layer.
Challenges
- What are considerations for moving data isolation functionality at database level?
- What are the requirements for data security and permission?
- Does relational database like MySQL or PostgreSQL offer row level data access and security?
- What would it take to decouple the BI backend (database interface) from front end to allow access and consumption of BI data by reporting systems like Power BI, Tableau, Sisense
- Would it be possible to make customer data accessible by non technical users for instance by providing a managed BI provider which requires minimal configuration and yet data is secured?
- How does an external BI (bring your own BI) generally connect to the data warehouse to access data?
- What are considerations for API and DB access requirements?
- How to mitigate database throttling i.e. users ruing problematice SQL queries?
Data Virtualization
Data Virtualization is an approach to integrate data set without moving or duplicating data. For example Denedo views are automatically and securely exposed through REST API.
There are number of data virtualization technologies/techniques can be used for logical data management to find and use data. They offer secure data access and delivery layer. Here are list of virtualization products. It is worth finding out more about their capabilities in term data layer and access:
Fine-graind access and data management with Row Level Security (RLS)
Row Level Security and role based access policy lets you filter data and enables access to specific rows in a table based on qualifying user conditions.
Both Redshift and PostgreSQL provide feature to support for RLS.
It allows you to define policies to control access to individual rows. To enable RLS for a table, you need to execute the following SQL statement:
ALTER TABLE table_name ENABLE ROW LEVEL SECURITY;
After enabling RLS, you have to define policy and rule to control access to data in a table. Example of a SQL statement for creating a policy:
-- create policy
CREATE POLICY clients_data_isolation_policy ON table_name FOR SELECT USING ("company_id" = current_setting('rls.company_id')::integer);
-- create role and grant select privilege to the role
CREATE ROLE application_user;
GRANT select ON table_name TO application_user;
MySQL does not have a feature to support row level security on tables. However, you can implement the functionality in application layer using views and custom procedures/functions.
Conclusion
By enabling Row Level Security feature and leveraging policy and role provides a cost-effective, scalable, and secure solution for managing tenant data in multi-tenancy data warehouse.
Ref:
- Multi-tenant data isolation with PostgreSQL Row Level Security | AWS Database Blog (amazon.com)
- Row-level security recommendations – AWS Prescriptive Guidance (amazon.com)
- aws-samples/aws-saas-factory-postgresql-rls: Multi-Tenant Data Isolation Using PostgreSQL Row Level Security Policies
Example Implementation of Row Level Security (RLS) in PostgreSQL database
DDL and DML
CREATE TABLE salespeople (id serial primary key, name text); CREATE TABLE clients (id serial primary key, name text, salesperson_id integer);
INSERT INTO salespeople (name) values ('Picard');
INSERT INTO salespeople (name) values ('Crusher');
INSERT INTO clients (name, salesperson_id) values ('client1', 1);
INSERT INTO clients (name, salesperson_id) values ('client2', 2);
INSERT INTO clients (name, salesperson_id) values ('client3', 2);
ALTER TABLE clients ENABLE ROW LEVEL SECURITY;
CREATE POLICY clients_isolation_policy ON clients FOR SELECT USING ("salesperson_id" = current_setting('rls.salesperson_id')::integer);
CREATE ROLE salespeople;
GRANT select, insert ON clients TO salespeople;
-- create admin user
CREATE USER appuser WITH ENCRYPTED PASSWORD 'password';
GRANT ALL PRIVILEGES ON DATABASE database_name TO appuser;
GRANT salespeople TO appuser;
test using python script
# Create the database engine
engine = create_engine(url)
# Establish a connection to the database
conn = engine.connect()
# Define your SQL query
query = text(f"""
set role salespeople;
select set_config('rls.salesperson_id', '{args.id}', false);
SELECT * FROM clients
""")
# Read the data into a pandas DataFrame
df = pd.read_sql_query(query, conn)
how to use session variable interactively – psql
SET ROLE salespeople;
SELECT set_config('rls.salesperson_id', '{args.id}', false); SELECT * FROM clients
delete role
REASSIGN OWNED BY salespeople TO postgres;
DROP OWNED BY salespeople;