Multi-Source Financial Data Platform with Row-Level Security

For a fintech company managing financial data across multiple business entities, I architected and implemented a unified data platform combining event-driven data replication, batch API ingestion, and multi-tenant security via PostgreSQL Row-Level Security. The system consolidates data from heterogeneous sources into a centralized analytical warehouse with DBT-automated RLS policy deployment, enabling secure self-service analytics while ensuring strict data isolation between business entities.

The Challenge

A fintech company operated with fragmented data across multiple systems:

Event-driven architecture: Domain events published to Kafka topics from microservices
External data providers: Multiple third-party APIs requiring scheduled ingestion
No unified analytics: Each business entity managed separately, no consolidated view
Security requirements: Strict data isolation between business entities for compliance
Manual reporting: Engineers manually extracting and aggregating data across sources
Scalability concerns: Ad-hoc queries against production event stores impacting performance

They needed:

Unified data warehouse consolidating all financial data sources
Automated data ingestion from both internal databases and external APIs
Multi-tenant security enforcing business-level data isolation
Self-service analytics for business stakeholders
Real-time and batch data pipeline architecture
Scalable transformation layer for analytical models

System Architecture

I designed a multi-layered data platform separating ingestion, transformation, and consumption with security integrated throughout:

graph TB subgraph Sources MS[Microservices
Domain Events] API[External APIs
Financial Data] end subgraph Ingestion KAFKA[Kafka Topics
Event Streams] DAGSTER[Dagster
Orchestration] end subgraph Processing KC[Kafka Connect
JDBC Sink
+ SMT Pipeline] BATCH[Batch API
Integration] end subgraph Storage PG[(PostgreSQL RDS
Analytical Warehouse
+ Row-Level Security)] end subgraph Transformation DBT[DBT
Models + RLS Policies] end subgraph Consumption SUPERSET[Apache Superset
Dashboards] AI[AI Data Analyst
Natural Language] end MS --> KAFKA API --> DAGSTER KAFKA --> KC DAGSTER --> BATCH KC --> PG BATCH --> PG PG --> DBT DBT --> PG PG --> SUPERSET PG --> AI DAGSTER -.->|Triggers| DBT

Data Ingestion Layer:

Event-Driven Data Replication:

I architected Kafka Connect infrastructure for event-driven data ingestion:

Kafka Connect JDBC Sink Connectors:

PostgreSQL RDS sink connectors: Consume domain events from Kafka topics, write to analytical warehouse
Connector configurations: Account, business, order, transaction, and reporting entity topics
Schema evolution: Automatic table creation and schema updates as event structures change
Upsert semantics: Primary key-based upserts ensuring idempotent event processing
Delete handling: Tombstone records properly handled for data consistency

Single Message Transforms (SMT) Pipeline:

I designed sophisticated transformation chains for event processing:

Event filtering: JMESPath predicates selecting specific event types from polymorphic topics
Field extraction: Conditional extraction of event-specific payloads from nested structures
Schema flattening: Nested JSON flattened to relational columns for PostgreSQL
Field exclusions: Sensitive data dropped during ingestion (keys, credentials, internal identifiers)
Key strategy: Custom document ID strategies for upsert operations

Batch API Ingestion:

I built Dagster orchestration for scheduled external data ingestion:

API integration assets: Dagster assets fetching data from external financial data providers
Partitioned assets: Time-based partitioning enabling incremental processing
Retry logic: Exponential backoff with configurable retry limits for API failures
Data validation: Pydantic schemas validating API responses before storage
Incremental loading: Tracking watermarks to fetch only new/updated records

Data Transformation Layer:

DBT Analytical Models:

I designed DBT project with reusable macros and dimensional models:

Macro Library:

I created reusable DBT macros for common analytical operations:

Balance snapshots: Computes end-of-day balance snapshots per business entity
Transaction metrics: Aggregates transaction metrics (volume, count, fees) by day and business
Cross-business aggregations: Multi-tenant balance aggregation for specific dates
RLS policy automation: Automated Row-Level Security policy deployment (detailed below)

Model Organization:

Staging models: Raw data cleaning and type casting
Intermediate models: Business logic and metric calculations
Mart models: Final analytical tables optimized for consumption
Tests: Data quality checks (uniqueness, referential integrity, null checks)

Multi-Tenant Security Architecture:

PostgreSQL Row-Level Security (RLS):

I designed and implemented automated RLS policy deployment via DBT macros:

Permission Model:

I created a permission table mapping users to their authorized business entities, establishing the foundation for RLS enforcement. This table stores user email addresses alongside business identifiers, enabling granular access control at the database level.

RLS Policy Automation:

I designed a DBT macro automating RLS policy deployment across all analytical models. The macro enables Row-Level Security on target tables, creates policies checking user permissions against the permission table, and grants appropriate access to database roles. This automation eliminates manual policy creation, ensures consistency, and simplifies adding new models.

RLS Enforcement:

The system enforces multi-tenant isolation through session-scoped filtering:

Each database connection sets a user session variable identifying the current user
Permission table queries filter by this session variable
RLS policies on analytical tables use EXISTS clauses joining to the filtered permission table
Database-kernel enforcement prevents application-layer bypass

Multi-Role Architecture:

I created a role hierarchy supporting different access patterns:

Read-only AI role: RLS-enforced access for AI applications requiring user-scoped data
Cross-business analytics role: Bypasses RLS for legitimate multi-tenant dashboards
Transformation role: DDL privileges for schema management, bypasses RLS during builds

Analytics Consumption Layer:

Apache Superset Dashboards:

I deployed a self-service BI platform enabling business stakeholders to explore financial data:

Containerized deployment: Superset with PostgreSQL backend
Database connection: Direct connection to analytical warehouse with cross-business analytics privileges
Pre-built dashboards: Executive summary, business performance, transaction analytics
Role-based access: Superset RBAC mapping to warehouse roles
SQL Lab: Ad-hoc query interface for power users

Technical Implementation

Dagster as Orchestration Backbone:

Dagster serves as the central orchestration engine coordinating all data workflows:

Asset-based architecture with automatic dependency resolution between batch ingestion and DBT transformations
Sensors triggering downstream processing on new data arrival
Time-based partitioning for incremental processing and backfills
Integrated observability with run logs, asset lineage visualization, and failure alerting

DBT-Dagster Integration:

DBT models materialized as Dagster assets enabling unified workflow management. Incremental models process only changed data for efficiency, while post-hooks automatically deploy RLS policies after table creation. Automated data quality tests run on every transformation ensuring data integrity.

Production Operations:

The platform runs in distributed mode with Kafka Connect workers processing events continuously, offset management ensuring exactly-once delivery, and comprehensive monitoring via JMX metrics exported to Prometheus for observability.

Key Design Decisions

Dual ingestion strategy (Event-driven + Batch):

Real-time event ingestion from Kafka topics for operational data requiring low latency
Batch API ingestion for external providers with rate limits and historical data
Unified landing in PostgreSQL RDS for consistent transformations

DBT macro for RLS automation:

Eliminates manual policy creation reducing human error
Ensures consistent security policies across all models
Enables testing RLS policies in development environments
Simplifies adding new models (single post-hook configuration)

Three-role security model:

Read-only AI role with RLS enforcement for user-scoped data access
Cross-business analytics role bypassing RLS for legitimate multi-tenant dashboards
Transformation role with DDL privileges for schema management during builds

PostgreSQL RDS as single source of truth:

Centralizes data from Kafka events and external APIs
Enables SQL-based transformations (DBT)
Native RLS support for multi-tenant security
ACID transactions for data consistency

Results & Impact

Consolidated data from Kafka event streams and external APIs into unified PostgreSQL RDS warehouse
Implemented automated Row-Level Security via DBT macros ensuring business-level data isolation
Built Dagster orchestration managing batch ingestion from external providers
Deployed Kafka Connect event-driven replication with sophisticated SMT pipelines
Created reusable DBT macro library for analytical models (balances, metrics, aggregations)
Enabled self-service analytics via Apache Superset while maintaining security compliance
Established foundation for AI-powered data analyst leveraging RLS architecture
Reduced manual reporting effort through automated daily metric calculations

Technologies

Python, Dagster, Kafka Connect, DBT, PostgreSQL RDS, Row-Level Security (RLS), Apache Superset, Docker, Pydantic, SQL