Multi-Source Financial Data Platform with Row-Level Security

Feb 15, 2025

For a fintech company managing financial data across multiple business entities, I architected and implemented a unified data platform combining event-driven data replication, batch API ingestion, and multi-tenant security via PostgreSQL Row-Level Security. The system consolidates data from heterogeneous sources into a centralized analytical warehouse with DBT-automated RLS policy deployment, enabling secure self-service analytics while ensuring strict data isolation between business entities.

The Challenge

A fintech company operated with fragmented data across multiple systems:

  • Event-driven architecture: Domain events published to Kafka topics from microservices
  • External data providers: Multiple third-party APIs requiring scheduled ingestion
  • No unified analytics: Each business entity managed separately, no consolidated view
  • Security requirements: Strict data isolation between business entities for compliance
  • Manual reporting: Engineers manually extracting and aggregating data across sources
  • Scalability concerns: Ad-hoc queries against production event stores impacting performance

They needed:

  • Unified data warehouse consolidating all financial data sources
  • Automated data ingestion from both internal databases and external APIs
  • Multi-tenant security enforcing business-level data isolation
  • Self-service analytics for business stakeholders
  • Real-time and batch data pipeline architecture
  • Scalable transformation layer for analytical models

System Architecture

I designed a multi-layered data platform separating ingestion, transformation, and consumption with security integrated throughout:

graph TB subgraph Sources MS[Microservices
Domain Events] API[External APIs
Financial Data] end subgraph Ingestion KAFKA[Kafka Topics
Event Streams] DAGSTER[Dagster
Orchestration] end subgraph Processing KC[Kafka Connect
JDBC Sink
+ SMT Pipeline] BATCH[Batch API
Integration] end subgraph Storage PG[(PostgreSQL RDS
Analytical Warehouse
+ Row-Level Security)] end subgraph Transformation DBT[DBT
Models + RLS Policies] end subgraph Consumption SUPERSET[Apache Superset
Dashboards] AI[AI Data Analyst
Natural Language] end MS --> KAFKA API --> DAGSTER KAFKA --> KC DAGSTER --> BATCH KC --> PG BATCH --> PG PG --> DBT DBT --> PG PG --> SUPERSET PG --> AI DAGSTER -.->|Triggers| DBT

Data Ingestion Layer:

Event-Driven Data Replication:

I architected Kafka Connect infrastructure for event-driven data ingestion:

Kafka Connect JDBC Sink Connectors:

  • PostgreSQL RDS sink connectors: Consume domain events from Kafka topics, write to analytical warehouse
  • Connector configurations: Account, business, order, transaction, and reporting entity topics
  • Schema evolution: Automatic table creation and schema updates as event structures change
  • Upsert semantics: Primary key-based upserts ensuring idempotent event processing
  • Delete handling: Tombstone records properly handled for data consistency

Single Message Transforms (SMT) Pipeline:

I designed sophisticated transformation chains for event processing:

  • Event filtering: JMESPath predicates selecting specific event types from polymorphic topics
  • Field extraction: Conditional extraction of event-specific payloads from nested structures
  • Schema flattening: Nested JSON flattened to relational columns for PostgreSQL
  • Field exclusions: Sensitive data dropped during ingestion (keys, credentials, internal identifiers)
  • Key strategy: Custom document ID strategies for upsert operations

Batch API Ingestion:

I built Dagster orchestration for scheduled external data ingestion:

  • API integration assets: Dagster assets fetching data from external financial data providers
  • Partitioned assets: Time-based partitioning enabling incremental processing
  • Retry logic: Exponential backoff with configurable retry limits for API failures
  • Data validation: Pydantic schemas validating API responses before storage
  • Incremental loading: Tracking watermarks to fetch only new/updated records

Data Transformation Layer:

DBT Analytical Models:

I designed DBT project with reusable macros and dimensional models:

Macro Library:

I created reusable DBT macros for common analytical operations:

  • Balance snapshots: Computes end-of-day balance snapshots per business entity
  • Transaction metrics: Aggregates transaction metrics (volume, count, fees) by day and business
  • Cross-business aggregations: Multi-tenant balance aggregation for specific dates
  • RLS policy automation: Automated Row-Level Security policy deployment (detailed below)

Model Organization:

  • Staging models: Raw data cleaning and type casting
  • Intermediate models: Business logic and metric calculations
  • Mart models: Final analytical tables optimized for consumption
  • Tests: Data quality checks (uniqueness, referential integrity, null checks)

Multi-Tenant Security Architecture:

PostgreSQL Row-Level Security (RLS):

I designed and implemented automated RLS policy deployment via DBT macros:

Permission Model:

I created a permission table mapping users to their authorized business entities, establishing the foundation for RLS enforcement. This table stores user email addresses alongside business identifiers, enabling granular access control at the database level.

RLS Policy Automation:

I designed a DBT macro automating RLS policy deployment across all analytical models. The macro enables Row-Level Security on target tables, creates policies checking user permissions against the permission table, and grants appropriate access to database roles. This automation eliminates manual policy creation, ensures consistency, and simplifies adding new models.

RLS Enforcement:

The system enforces multi-tenant isolation through session-scoped filtering:

  • Each database connection sets a user session variable identifying the current user
  • Permission table queries filter by this session variable
  • RLS policies on analytical tables use EXISTS clauses joining to the filtered permission table
  • Database-kernel enforcement prevents application-layer bypass

Multi-Role Architecture:

I created a role hierarchy supporting different access patterns:

  • Read-only AI role: RLS-enforced access for AI applications requiring user-scoped data
  • Cross-business analytics role: Bypasses RLS for legitimate multi-tenant dashboards
  • Transformation role: DDL privileges for schema management, bypasses RLS during builds

Analytics Consumption Layer:

Apache Superset Dashboards:

I deployed a self-service BI platform enabling business stakeholders to explore financial data:

  • Containerized deployment: Superset with PostgreSQL backend
  • Database connection: Direct connection to analytical warehouse with cross-business analytics privileges
  • Pre-built dashboards: Executive summary, business performance, transaction analytics
  • Role-based access: Superset RBAC mapping to warehouse roles
  • SQL Lab: Ad-hoc query interface for power users

Technical Implementation

Dagster as Orchestration Backbone:

Dagster serves as the central orchestration engine coordinating all data workflows:

  • Asset-based architecture with automatic dependency resolution between batch ingestion and DBT transformations
  • Sensors triggering downstream processing on new data arrival
  • Time-based partitioning for incremental processing and backfills
  • Integrated observability with run logs, asset lineage visualization, and failure alerting

DBT-Dagster Integration:

DBT models materialized as Dagster assets enabling unified workflow management. Incremental models process only changed data for efficiency, while post-hooks automatically deploy RLS policies after table creation. Automated data quality tests run on every transformation ensuring data integrity.

Production Operations:

The platform runs in distributed mode with Kafka Connect workers processing events continuously, offset management ensuring exactly-once delivery, and comprehensive monitoring via JMX metrics exported to Prometheus for observability.

Key Design Decisions

Dual ingestion strategy (Event-driven + Batch):

  • Real-time event ingestion from Kafka topics for operational data requiring low latency
  • Batch API ingestion for external providers with rate limits and historical data
  • Unified landing in PostgreSQL RDS for consistent transformations

DBT macro for RLS automation:

  • Eliminates manual policy creation reducing human error
  • Ensures consistent security policies across all models
  • Enables testing RLS policies in development environments
  • Simplifies adding new models (single post-hook configuration)

Three-role security model:

  • Read-only AI role with RLS enforcement for user-scoped data access
  • Cross-business analytics role bypassing RLS for legitimate multi-tenant dashboards
  • Transformation role with DDL privileges for schema management during builds

PostgreSQL RDS as single source of truth:

  • Centralizes data from Kafka events and external APIs
  • Enables SQL-based transformations (DBT)
  • Native RLS support for multi-tenant security
  • ACID transactions for data consistency

Results & Impact

  • Consolidated data from Kafka event streams and external APIs into unified PostgreSQL RDS warehouse
  • Implemented automated Row-Level Security via DBT macros ensuring business-level data isolation
  • Built Dagster orchestration managing batch ingestion from external providers
  • Deployed Kafka Connect event-driven replication with sophisticated SMT pipelines
  • Created reusable DBT macro library for analytical models (balances, metrics, aggregations)
  • Enabled self-service analytics via Apache Superset while maintaining security compliance
  • Established foundation for AI-powered data analyst leveraging RLS architecture
  • Reduced manual reporting effort through automated daily metric calculations

Technologies

Python, Dagster, Kafka Connect, DBT, PostgreSQL RDS, Row-Level Security (RLS), Apache Superset, Docker, Pydantic, SQL