The open-source engine that powers CAPYSQUASH - PostgreSQL parser-based consolidation with Docker validation

pgsquash-engine: The Technology Behind CAPYSQUASH

The open-source engine that powers CAPYSQUASH - using PostgreSQL's parser and Docker validation for accurate migration consolidation.

New to CAPYSQUASH? Start with the CAPYSQUASH Platform for one-click squashing and team collaboration. For command-line usage, see capysquash-cli. This page covers the underlying engine technology for developers who want to understand how it works or use it as a library.

pgsquash-engine is the technology foundation that powers CAPYSQUASH. It's available as an open-source Go library (MIT license) and handles all the parsing, analysis, and consolidation logic. The engine uses PostgreSQL's actual parser (pg_query_go) rather than regex patterns, and validates results with Docker-based schema comparison.

What is pgsquash-engine?

Foundation: The technology that powers CAPYSQUASH Platform and capysquash-cli
Go Library: Available for custom integrations and tooling
Open Source: MIT licensed, fully transparent and auditable
Parser-Based: Uses PostgreSQL's pg_query_go parser for accurate SQL analysis

🎯 How pgsquash-engine Works

Parser-Based Approach: pgsquash-engine uses PostgreSQL's actual parser (pg_query_go) to analyze SQL, then validates results through Docker-based schema comparison to ensure the consolidated migrations produce an equivalent schema.

Technical Approach

Aspect	pgsquash-engine Approach
SQL Parsing	PostgreSQL's `pg_query_go` parser (not regex)
Validation	Docker-based schema comparison
Dependencies	DAG-based dependency resolution
Safety	4 configurable safety levels
Extensibility	Plugin system for third-party patterns
License	MIT License - open source

pgsquash-engine processes migrations through a 5-phase pipeline that handles parsing, analysis, consolidation, generation, and validation.

Phase 1: Parse & Track

Convert raw SQL into structured object lifecycles using PostgreSQL's actual parser

Uses pg_query_go (PostgreSQL's C parser) for perfect SQL understanding
Builds Abstract Syntax Trees for every statement
Tracks object state across all migrations (CREATE → ALTER → DROP)
Extracts dependencies and relationships between objects

Key Technologies: pg_query_go, AST analysis, object lifecycle tracking

Phase 2: Analyze Dependencies

Build intelligent dependency graphs and detect consolidation opportunities

Creates directed acyclic graphs (DAGs) of object dependencies
Performs topological sorting for optimal execution order
Detects circular dependencies and classifies risk levels
Identifies consolidation opportunities and potential conflicts

Output: Sorted object list + risk assessment + cycle warnings

Phase 3: Apply Consolidation Rules

Transform migrations based on safety level

Each safety level applies different consolidation strategies:

Conservative (Safest):

Multiple CREATE consolidation
CREATE → ALTER folding
Column evolution tracking

Standard (Recommended):

All Conservative rules
DROP → CREATE cycle handling
RLS policy consolidation
Transaction boundary respect

Aggressive (Maximum optimization):

All Standard rules
Function deduplication
Advanced pattern matching

Paranoid (Requires DB connection):

All Aggressive rules
Dead code removal
Live schema analysis

Phase 4: Generate Optimized SQL

Build final SQL with proper ordering and post-processing

Category-based grouping (Extensions → Tables → Functions → Indexes → Data)
Dependency-aware sorting within each category
Intelligent post-processing fixes:
- Fix malformed DROP TRIGGER syntax
- Ensure extension dependency order
- Remove orphaned ALTER statements
- Clean function artifacts
- Add missing semicolons

Result: Production-ready SQL with consistent formatting

Phase 5: Mathematical Proof

Rigorous validation of schema equivalence

Three validation approaches available:

TWO_CONTAINERS (Most accurate):

Separate containers for original and squashed migrations
Apply migrations independently
Compare schemas via pg_dump

TWO_DATABASES (Balanced):

Single container, two databases
Apply migrations to different databases
Compare via SQL diff queries

SCHEMA_DIFF (Fastest):

Single database, sequential application
Capture schema state before/after
Compare captured schemas

Result: Mathematical proof that squashed migrations produce identical schemas

📦 Public API Packages

pgsquash-engine now exports 4 public packages for different use cases:

pkg/cli - CLI Execution API

Build custom CLI wrappers or integrate the full CLI into your applications.

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/cli"

// Execute the full CLI
cli.SetVersionInfo("0.9.7", "2025-10-21", "abc123")
cli.SetBrandName("capysquash")
cli.Execute()

Use cases: Custom branded CLIs, CI/CD integration, CLI automation

pkg/engine - Library API

Use pgsquash programmatically for custom migration tools and workflows.

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/engine"

// Squash migrations programmatically
result, err := engine.SquashDirectory("./migrations", &engine.Config{
    SafetyLevel: engine.Standard,
    Verbose:     true,
})
fmt.Println(result.SQL)

Use cases: Custom migration tools, automated workflows, batch processing, library integration

pkg/plugins - Plugin Management

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/plugins"

// Register all built-in plugins (Supabase, Clerk, Prisma, Drizzle)
plugins.RegisterDefault()

pkg/utils - Logging Utilities

Structured logging for your applications.

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/utils"

logger := utils.NewLogger(utils.LogLevelInfo, os.Stdout)
utils.SetDefaultLogger(logger)

View complete API documentation →

Purpose: Convert raw SQL into structured data
Key Files: parser.go, dependencies.go, statement_analyzer.go
Technology: pg_query_go for PostgreSQL-grade parsing
Output: Structured Migration objects with Statement arrays

Tracking Domain (`internal/tracking/`)

Purpose: Build object lifecycle representations
Key Files: tracker.go, lifecycle.go, dependency_graph.go
Features: Cycle detection, risk assessment, consolidation rules
Output: ObjectLifecycle objects + consolidation results

Squasher Domain (`internal/squasher/`)

Purpose: Orchestrate the entire consolidation process
Key Files: engine.go, unified_resolver.go, extension_detector.go
Responsibility: Phase orchestration, plugin management, statistics

Builder Domain (`internal/builder/`)

Purpose: Construct optimized SQL output
Key Files: sql_builder.go, formatter.go
Features: Category grouping, dependency sorting, consistent formatting

Validation Domain (`internal/validation/`)

Purpose: Verify schema equivalence
Key Files: docker_validator.go, schema_diff.go, extension_detector.go
Technology: Docker containers, schema comparison algorithms

Plugin Domain (`internal/plugins/`)

Purpose: Third-party integration support
Supported: Clerk, Supabase, Auth0, NextAuth patterns
Features: Pattern detection, compatibility SQL, custom rules

🎛️ Advanced Features

Plugin System

pgsquash-engine automatically detects and activates plugins for third-party services:

# Example: Clerk JWT v2 detection
[plugins] Discovering plugins from 15 migrations...
[plugins] Activated 2 plugins: [clerk supabase]

Supported Plugins:

Clerk: JWT v2 table pattern detection and preservation
Supabase: Auth schema, RLS policies, and function patterns
Auth0: Custom authentication table patterns
Neon: Serverless PostgreSQL specific optimizations

Safety Levels

Choose your optimization aggressiveness based on your needs:

Level	File Reduction	Safety	Best For
Paranoid	20-30%	Maximum	First-time use, production databases
Conservative	50-60%	High	Production databases, risk-averse teams
Standard	70-80%	Balanced	Most use cases (recommended)
Aggressive	85-95%	Moderate	Development, fresh starts

AI-Powered Analysis (Optional)

When enabled, pgsquash uses AI for advanced analysis:

Function equivalency detection: Identify duplicate functions with different names
Dead code detection: Find unused tables, indexes, and functions
Semantic analysis: Understand business logic patterns
Optimization suggestions: Recommend further improvements

📊 Example Results

Case Study: Startup Migration Cleanup

Before pgsquash:

287 migration files
15 minutes deployment time
Frequent merge conflicts
2.3GB migration history

After pgsquash (Standard safety):

12 migration files (96% reduction)
45 seconds deployment time (20x faster)
No merge conflicts
120MB optimized history

Performance Metrics

Metric	Before	After	Improvement
File Count	287	12	96% reduction
Deployment Time	15 min	45 sec	20x faster
Repository Size	2.3GB	120MB	95% smaller
CI/CD Time	8 min	2 min	4x faster

🚀 Using pgsquash-engine

As a Library (Recommended)

# Install as a Go module
go get github.com/CAPYSQUASH/pgsquash-engine

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/engine"

// Use programmatically
result, err := engine.SquashDirectory("./migrations", &engine.Config{
    SafetyLevel: engine.Standard,
})

View Library API Documentation →

Via capysquash-cli (For CLI Usage)

# Install capysquash-cli for command-line usage
brew install capysquash-cli

# Or download from releases
# https://github.com/CAPYSQUASH/capysquash-cli/releases

# Use the CLI
capysquash analyze migrations/
capysquash squash migrations/ --safety=standard

View CLI Documentation →

Configuration

Create pgsquash.config.json:

{
  "safetyLevel": "standard",
  "plugins": {
    "supabase": true,
    "clerk": true
  },
  "validation": {
    "enabled": true,
    "approach": "two_databases"
  },
  "ai": {
    "enabled": false,
    "provider": "openai"
  }
}

🔧 Advanced Usage

API Server Mode

Run pgsquash as an HTTP API server:

# Start the API server
api-server --port 8080

# Analyze via API
curl -X POST http://localhost:8080/analyze \
  -H "Content-Type: application/json" \
  -d '{"migrations": [...], "safetyLevel": "standard"}'

GitHub Integration

Set up automatic PR analysis:

# Install GitHub App (via platform)
# Configure webhook endpoint
# Automatic analysis on PR push/update

Docker Validation

Ensure mathematical equivalence:

# Validate consolidation
pgsquash validate original/ squashed/ --approach=two_containers

# Result: ☑ Validation successful: Schemas are equivalent

🎯 When to Use Each Component

Use CAPYSQUASH Platform (Recommended) When:

Getting started with migration squashing - one-click setup
Team collaboration and shared projects are needed
Automated GitHub integration and PR analysis are required
Visual interface for migration analysis and optimization
Analytics and reporting on optimization impact are valuable
Scheduled automation and webhook notifications are desired

Use capysquash-cli (Power Users) When:

Local development and quick migration analysis
CI/CD pipelines needing command-line automation
Individual developers who prefer terminal workflows
Quick checks before committing migrations

Use pgsquash-engine (Library) When:

Building custom tools that need programmatic migration optimization
Integrating into existing systems and workflows
Creating custom automation with full control
Understanding the internals of how CAPYSQUASH works

📚 Next Steps

CAPYSQUASH Platform - Get started with one-click squashing
capysquash-cli Documentation - Command-line usage
Library API Guide - Use pgsquash-engine programmatically
Configuration Guide - Advanced configuration options
PostgreSQL Internals - Deep dive into parser technology

Ready to start? Try CAPYSQUASH Platform for the easiest experience, or install capysquash-cli for command-line usage. Developers can use the Library API for custom integrations.

pgsquash-engine: The Technology Behind CAPYSQUASH

On this page