CAPYSQUASH

pgsquash-engine: The Technology Behind CAPYSQUASH

The open-source engine that powers CAPYSQUASH - PostgreSQL parser-based consolidation with Docker validation

pgsquash-engine: The Technology Behind CAPYSQUASH

The open-source engine that powers CAPYSQUASH - using PostgreSQL's parser and Docker validation for accurate migration consolidation.

New to CAPYSQUASH? Start with the CAPYSQUASH Platform for one-click squashing and team collaboration. For command-line usage, see capysquash-cli. This page covers the underlying engine technology for developers who want to understand how it works or use it as a library.

pgsquash-engine is the technology foundation that powers CAPYSQUASH. It's available as an open-source Go library (MIT license) and handles all the parsing, analysis, and consolidation logic. The engine uses PostgreSQL's actual parser (pg_query_go) rather than regex patterns, and validates results with Docker-based schema comparison.

What is pgsquash-engine?

  • Foundation: The technology that powers CAPYSQUASH Platform and capysquash-cli
  • Go Library: Available for custom integrations and tooling
  • Open Source: MIT licensed, fully transparent and auditable
  • Parser-Based: Uses PostgreSQL's pg_query_go parser for accurate SQL analysis

🎯 How pgsquash-engine Works

Parser-Based Approach: pgsquash-engine uses PostgreSQL's actual parser (pg_query_go) to analyze SQL, then validates results through Docker-based schema comparison to ensure the consolidated migrations produce an equivalent schema.

Technical Approach

Aspectpgsquash-engine Approach
SQL ParsingPostgreSQL's pg_query_go parser (not regex)
ValidationDocker-based schema comparison
DependenciesDAG-based dependency resolution
Safety4 configurable safety levels
ExtensibilityPlugin system for third-party patterns
LicenseMIT License - open source

pgsquash-engine processes migrations through a 5-phase pipeline that handles parsing, analysis, consolidation, generation, and validation.

Phase 1: Parse & Track

Convert raw SQL into structured object lifecycles using PostgreSQL's actual parser

  • Uses pg_query_go (PostgreSQL's C parser) for perfect SQL understanding
  • Builds Abstract Syntax Trees for every statement
  • Tracks object state across all migrations (CREATE β†’ ALTER β†’ DROP)
  • Extracts dependencies and relationships between objects

Key Technologies: pg_query_go, AST analysis, object lifecycle tracking

Phase 2: Analyze Dependencies

Build intelligent dependency graphs and detect consolidation opportunities

  • Creates directed acyclic graphs (DAGs) of object dependencies
  • Performs topological sorting for optimal execution order
  • Detects circular dependencies and classifies risk levels
  • Identifies consolidation opportunities and potential conflicts

Output: Sorted object list + risk assessment + cycle warnings

Phase 3: Apply Consolidation Rules

Transform migrations based on safety level

Each safety level applies different consolidation strategies:

Conservative (Safest):

  • Multiple CREATE consolidation
  • CREATE β†’ ALTER folding
  • Column evolution tracking

Standard (Recommended):

  • All Conservative rules
  • DROP β†’ CREATE cycle handling
  • RLS policy consolidation
  • Transaction boundary respect

Aggressive (Maximum optimization):

  • All Standard rules
  • Function deduplication
  • Advanced pattern matching

Paranoid (Requires DB connection):

  • All Aggressive rules
  • Dead code removal
  • Live schema analysis

Phase 4: Generate Optimized SQL

Build final SQL with proper ordering and post-processing

  • Category-based grouping (Extensions β†’ Tables β†’ Functions β†’ Indexes β†’ Data)
  • Dependency-aware sorting within each category
  • Intelligent post-processing fixes:
    • Fix malformed DROP TRIGGER syntax
    • Ensure extension dependency order
    • Remove orphaned ALTER statements
    • Clean function artifacts
    • Add missing semicolons

Result: Production-ready SQL with consistent formatting

Phase 5: Mathematical Proof

Rigorous validation of schema equivalence

Three validation approaches available:

TWO_CONTAINERS (Most accurate):

  • Separate containers for original and squashed migrations
  • Apply migrations independently
  • Compare schemas via pg_dump

TWO_DATABASES (Balanced):

  • Single container, two databases
  • Apply migrations to different databases
  • Compare via SQL diff queries

SCHEMA_DIFF (Fastest):

  • Single database, sequential application
  • Capture schema state before/after
  • Compare captured schemas

Result: Mathematical proof that squashed migrations produce identical schemas

πŸ“¦ Public API Packages

pgsquash-engine now exports 4 public packages for different use cases:

pkg/cli - CLI Execution API

Build custom CLI wrappers or integrate the full CLI into your applications.

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/cli"

// Execute the full CLI
cli.SetVersionInfo("0.9.7", "2025-10-21", "abc123")
cli.SetBrandName("capysquash")
cli.Execute()

Use cases: Custom branded CLIs, CI/CD integration, CLI automation

pkg/engine - Library API

Use pgsquash programmatically for custom migration tools and workflows.

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/engine"

// Squash migrations programmatically
result, err := engine.SquashDirectory("./migrations", &engine.Config{
    SafetyLevel: engine.Standard,
    Verbose:     true,
})
fmt.Println(result.SQL)

Use cases: Custom migration tools, automated workflows, batch processing, library integration

pkg/plugins - Plugin Management

Register and manage third-party integrations.

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/plugins"

// Register all built-in plugins (Supabase, Clerk, Prisma, Drizzle)
plugins.RegisterDefault()

pkg/utils - Logging Utilities

Structured logging for your applications.

import "github.com/CAPYSQUASH/pgsquash-engine/pkg/utils"

logger := utils.NewLogger(utils.LogLevelInfo, os.Stdout)
utils.SetDefaultLogger(logger)

View complete API documentation β†’

πŸ—οΈ Technical Architecture

Domain-Driven Design

pgsquash-engine is organized into 8 specialized domains, each with clear responsibilities:

Parser Domain (internal/parser/)

  • Purpose: Convert raw SQL into structured data
  • Key Files: parser.go, dependencies.go, statement_analyzer.go
  • Technology: pg_query_go for PostgreSQL-grade parsing
  • Output: Structured Migration objects with Statement arrays

Tracking Domain (internal/tracking/)

  • Purpose: Build object lifecycle representations
  • Key Files: tracker.go, lifecycle.go, dependency_graph.go
  • Features: Cycle detection, risk assessment, consolidation rules
  • Output: ObjectLifecycle objects + consolidation results

Squasher Domain (internal/squasher/)

  • Purpose: Orchestrate the entire consolidation process
  • Key Files: engine.go, unified_resolver.go, extension_detector.go
  • Responsibility: Phase orchestration, plugin management, statistics

Builder Domain (internal/builder/)

  • Purpose: Construct optimized SQL output
  • Key Files: sql_builder.go, formatter.go
  • Features: Category grouping, dependency sorting, consistent formatting

Validation Domain (internal/validation/)

  • Purpose: Verify schema equivalence
  • Key Files: docker_validator.go, schema_diff.go, extension_detector.go
  • Technology: Docker containers, schema comparison algorithms

Plugin Domain (internal/plugins/)

  • Purpose: Third-party integration support
  • Supported: Clerk, Supabase, Auth0, NextAuth patterns
  • Features: Pattern detection, compatibility SQL, custom rules

πŸŽ›οΈ Advanced Features

Plugin System

pgsquash-engine automatically detects and activates plugins for third-party services:

# Example: Clerk JWT v2 detection
[plugins] Discovering plugins from 15 migrations...
[plugins] Activated 2 plugins: [clerk supabase]

Supported Plugins:

  • Clerk: JWT v2 table pattern detection and preservation
  • Supabase: Auth schema, RLS policies, and function patterns
  • Auth0: Custom authentication table patterns
  • Neon: Serverless PostgreSQL specific optimizations

Safety Levels

Choose your optimization aggressiveness based on your needs:

LevelFile ReductionSafetyBest For
Paranoid20-30%MaximumFirst-time use, production databases
Conservative50-60%HighProduction databases, risk-averse teams
Standard70-80%BalancedMost use cases (recommended)
Aggressive85-95%ModerateDevelopment, fresh starts

AI-Powered Analysis (Optional)

When enabled, pgsquash uses AI for advanced analysis:

  • Function equivalency detection: Identify duplicate functions with different names
  • Dead code detection: Find unused tables, indexes, and functions
  • Semantic analysis: Understand business logic patterns
  • Optimization suggestions: Recommend further improvements

πŸ“Š Example Results

Case Study: Startup Migration Cleanup

Before pgsquash:

  • 287 migration files
  • 15 minutes deployment time
  • Frequent merge conflicts
  • 2.3GB migration history

After pgsquash (Standard safety):

  • 12 migration files (96% reduction)
  • 45 seconds deployment time (20x faster)
  • No merge conflicts
  • 120MB optimized history

Performance Metrics

MetricBeforeAfterImprovement
File Count2871296% reduction
Deployment Time15 min45 sec20x faster
Repository Size2.3GB120MB95% smaller
CI/CD Time8 min2 min4x faster

πŸš€ Using pgsquash-engine

# Install as a Go module
go get github.com/CAPYSQUASH/pgsquash-engine
import "github.com/CAPYSQUASH/pgsquash-engine/pkg/engine"

// Use programmatically
result, err := engine.SquashDirectory("./migrations", &engine.Config{
    SafetyLevel: engine.Standard,
})

View Library API Documentation β†’

Via capysquash-cli (For CLI Usage)

# Install capysquash-cli for command-line usage
brew install capysquash-cli

# Or download from releases
# https://github.com/CAPYSQUASH/capysquash-cli/releases

# Use the CLI
capysquash analyze migrations/
capysquash squash migrations/ --safety=standard

View CLI Documentation β†’

Configuration

Create pgsquash.config.json:

{
  "safetyLevel": "standard",
  "plugins": {
    "supabase": true,
    "clerk": true
  },
  "validation": {
    "enabled": true,
    "approach": "two_databases"
  },
  "ai": {
    "enabled": false,
    "provider": "openai"
  }
}

πŸ”§ Advanced Usage

API Server Mode

Run pgsquash as an HTTP API server:

# Start the API server
api-server --port 8080

# Analyze via API
curl -X POST http://localhost:8080/analyze \
  -H "Content-Type: application/json" \
  -d '{"migrations": [...], "safetyLevel": "standard"}'

GitHub Integration

Set up automatic PR analysis:

# Install GitHub App (via platform)
# Configure webhook endpoint
# Automatic analysis on PR push/update

Docker Validation

Ensure mathematical equivalence:

# Validate consolidation
pgsquash validate original/ squashed/ --approach=two_containers

# Result: β˜‘ Validation successful: Schemas are equivalent

🎯 When to Use Each Component

  • Getting started with migration squashing - one-click setup
  • Team collaboration and shared projects are needed
  • Automated GitHub integration and PR analysis are required
  • Visual interface for migration analysis and optimization
  • Analytics and reporting on optimization impact are valuable
  • Scheduled automation and webhook notifications are desired

Use capysquash-cli (Power Users) When:

  • Local development and quick migration analysis
  • CI/CD pipelines needing command-line automation
  • Individual developers who prefer terminal workflows
  • Quick checks before committing migrations

Use pgsquash-engine (Library) When:

  • Building custom tools that need programmatic migration optimization
  • Integrating into existing systems and workflows
  • Creating custom automation with full control
  • Understanding the internals of how CAPYSQUASH works

πŸ“š Next Steps


Ready to start? Try CAPYSQUASH Platform for the easiest experience, or install capysquash-cli for command-line usage. Developers can use the Library API for custom integrations.

How is this guide?

On this page