pgsquash-engine: The Technology Behind CAPYSQUASH
The open-source engine that powers CAPYSQUASH - PostgreSQL parser-based consolidation with Docker validation
pgsquash-engine: The Technology Behind CAPYSQUASH
The open-source engine that powers CAPYSQUASH - using PostgreSQL's parser and Docker validation for accurate migration consolidation.
New to CAPYSQUASH? Start with the CAPYSQUASH Platform for one-click squashing and team collaboration. For command-line usage, see capysquash-cli. This page covers the underlying engine technology for developers who want to understand how it works or use it as a library.
pgsquash-engine is the technology foundation that powers CAPYSQUASH. It's available as an open-source Go library (MIT license) and handles all the parsing, analysis, and consolidation logic. The engine uses PostgreSQL's actual parser (pg_query_go) rather than regex patterns, and validates results with Docker-based schema comparison.
What is pgsquash-engine?
- Foundation: The technology that powers CAPYSQUASH Platform and capysquash-cli
- Go Library: Available for custom integrations and tooling
- Open Source: MIT licensed, fully transparent and auditable
- Parser-Based: Uses PostgreSQL's
pg_query_goparser for accurate SQL analysis
π― How pgsquash-engine Works
Parser-Based Approach: pgsquash-engine uses PostgreSQL's actual parser (pg_query_go) to analyze SQL, then validates results through Docker-based schema comparison to ensure the consolidated migrations produce an equivalent schema.
Technical Approach
| Aspect | pgsquash-engine Approach |
|---|---|
| SQL Parsing | PostgreSQL's pg_query_go parser (not regex) |
| Validation | Docker-based schema comparison |
| Dependencies | DAG-based dependency resolution |
| Safety | 4 configurable safety levels |
| Extensibility | Plugin system for third-party patterns |
| License | MIT License - open source |
pgsquash-engine processes migrations through a 5-phase pipeline that handles parsing, analysis, consolidation, generation, and validation.
Phase 1: Parse & Track
Convert raw SQL into structured object lifecycles using PostgreSQL's actual parser
- Uses
pg_query_go(PostgreSQL's C parser) for perfect SQL understanding - Builds Abstract Syntax Trees for every statement
- Tracks object state across all migrations (CREATE β ALTER β DROP)
- Extracts dependencies and relationships between objects
Key Technologies: pg_query_go, AST analysis, object lifecycle tracking
Phase 2: Analyze Dependencies
Build intelligent dependency graphs and detect consolidation opportunities
- Creates directed acyclic graphs (DAGs) of object dependencies
- Performs topological sorting for optimal execution order
- Detects circular dependencies and classifies risk levels
- Identifies consolidation opportunities and potential conflicts
Output: Sorted object list + risk assessment + cycle warnings
Phase 3: Apply Consolidation Rules
Transform migrations based on safety level
Each safety level applies different consolidation strategies:
Conservative (Safest):
- Multiple CREATE consolidation
- CREATE β ALTER folding
- Column evolution tracking
Standard (Recommended):
- All Conservative rules
- DROP β CREATE cycle handling
- RLS policy consolidation
- Transaction boundary respect
Aggressive (Maximum optimization):
- All Standard rules
- Function deduplication
- Advanced pattern matching
Paranoid (Requires DB connection):
- All Aggressive rules
- Dead code removal
- Live schema analysis
Phase 4: Generate Optimized SQL
Build final SQL with proper ordering and post-processing
- Category-based grouping (Extensions β Tables β Functions β Indexes β Data)
- Dependency-aware sorting within each category
- Intelligent post-processing fixes:
- Fix malformed DROP TRIGGER syntax
- Ensure extension dependency order
- Remove orphaned ALTER statements
- Clean function artifacts
- Add missing semicolons
Result: Production-ready SQL with consistent formatting
Phase 5: Mathematical Proof
Rigorous validation of schema equivalence
Three validation approaches available:
TWO_CONTAINERS (Most accurate):
- Separate containers for original and squashed migrations
- Apply migrations independently
- Compare schemas via
pg_dump
TWO_DATABASES (Balanced):
- Single container, two databases
- Apply migrations to different databases
- Compare via SQL diff queries
SCHEMA_DIFF (Fastest):
- Single database, sequential application
- Capture schema state before/after
- Compare captured schemas
Result: Mathematical proof that squashed migrations produce identical schemas
π¦ Public API Packages
pgsquash-engine now exports 4 public packages for different use cases:
pkg/cli - CLI Execution API
Build custom CLI wrappers or integrate the full CLI into your applications.
import "github.com/CAPYSQUASH/pgsquash-engine/pkg/cli"
// Execute the full CLI
cli.SetVersionInfo("0.9.7", "2025-10-21", "abc123")
cli.SetBrandName("capysquash")
cli.Execute()Use cases: Custom branded CLIs, CI/CD integration, CLI automation
pkg/engine - Library API
Use pgsquash programmatically for custom migration tools and workflows.
import "github.com/CAPYSQUASH/pgsquash-engine/pkg/engine"
// Squash migrations programmatically
result, err := engine.SquashDirectory("./migrations", &engine.Config{
SafetyLevel: engine.Standard,
Verbose: true,
})
fmt.Println(result.SQL)Use cases: Custom migration tools, automated workflows, batch processing, library integration
pkg/plugins - Plugin Management
Register and manage third-party integrations.
import "github.com/CAPYSQUASH/pgsquash-engine/pkg/plugins"
// Register all built-in plugins (Supabase, Clerk, Prisma, Drizzle)
plugins.RegisterDefault()pkg/utils - Logging Utilities
Structured logging for your applications.
import "github.com/CAPYSQUASH/pgsquash-engine/pkg/utils"
logger := utils.NewLogger(utils.LogLevelInfo, os.Stdout)
utils.SetDefaultLogger(logger)View complete API documentation β
ποΈ Technical Architecture
Domain-Driven Design
pgsquash-engine is organized into 8 specialized domains, each with clear responsibilities:
Parser Domain (internal/parser/)
- Purpose: Convert raw SQL into structured data
- Key Files:
parser.go,dependencies.go,statement_analyzer.go - Technology:
pg_query_gofor PostgreSQL-grade parsing - Output: Structured
Migrationobjects withStatementarrays
Tracking Domain (internal/tracking/)
- Purpose: Build object lifecycle representations
- Key Files:
tracker.go,lifecycle.go,dependency_graph.go - Features: Cycle detection, risk assessment, consolidation rules
- Output:
ObjectLifecycleobjects + consolidation results
Squasher Domain (internal/squasher/)
- Purpose: Orchestrate the entire consolidation process
- Key Files:
engine.go,unified_resolver.go,extension_detector.go - Responsibility: Phase orchestration, plugin management, statistics
Builder Domain (internal/builder/)
- Purpose: Construct optimized SQL output
- Key Files:
sql_builder.go,formatter.go - Features: Category grouping, dependency sorting, consistent formatting
Validation Domain (internal/validation/)
- Purpose: Verify schema equivalence
- Key Files:
docker_validator.go,schema_diff.go,extension_detector.go - Technology: Docker containers, schema comparison algorithms
Plugin Domain (internal/plugins/)
- Purpose: Third-party integration support
- Supported: Clerk, Supabase, Auth0, NextAuth patterns
- Features: Pattern detection, compatibility SQL, custom rules
ποΈ Advanced Features
Plugin System
pgsquash-engine automatically detects and activates plugins for third-party services:
# Example: Clerk JWT v2 detection
[plugins] Discovering plugins from 15 migrations...
[plugins] Activated 2 plugins: [clerk supabase]Supported Plugins:
- Clerk: JWT v2 table pattern detection and preservation
- Supabase: Auth schema, RLS policies, and function patterns
- Auth0: Custom authentication table patterns
- Neon: Serverless PostgreSQL specific optimizations
Safety Levels
Choose your optimization aggressiveness based on your needs:
| Level | File Reduction | Safety | Best For |
|---|---|---|---|
| Paranoid | 20-30% | Maximum | First-time use, production databases |
| Conservative | 50-60% | High | Production databases, risk-averse teams |
| Standard | 70-80% | Balanced | Most use cases (recommended) |
| Aggressive | 85-95% | Moderate | Development, fresh starts |
AI-Powered Analysis (Optional)
When enabled, pgsquash uses AI for advanced analysis:
- Function equivalency detection: Identify duplicate functions with different names
- Dead code detection: Find unused tables, indexes, and functions
- Semantic analysis: Understand business logic patterns
- Optimization suggestions: Recommend further improvements
π Example Results
Case Study: Startup Migration Cleanup
Before pgsquash:
- 287 migration files
- 15 minutes deployment time
- Frequent merge conflicts
- 2.3GB migration history
After pgsquash (Standard safety):
- 12 migration files (96% reduction)
- 45 seconds deployment time (20x faster)
- No merge conflicts
- 120MB optimized history
Performance Metrics
| Metric | Before | After | Improvement |
|---|---|---|---|
| File Count | 287 | 12 | 96% reduction |
| Deployment Time | 15 min | 45 sec | 20x faster |
| Repository Size | 2.3GB | 120MB | 95% smaller |
| CI/CD Time | 8 min | 2 min | 4x faster |
π Using pgsquash-engine
As a Library (Recommended)
# Install as a Go module
go get github.com/CAPYSQUASH/pgsquash-engineimport "github.com/CAPYSQUASH/pgsquash-engine/pkg/engine"
// Use programmatically
result, err := engine.SquashDirectory("./migrations", &engine.Config{
SafetyLevel: engine.Standard,
})View Library API Documentation β
Via capysquash-cli (For CLI Usage)
# Install capysquash-cli for command-line usage
brew install capysquash-cli
# Or download from releases
# https://github.com/CAPYSQUASH/capysquash-cli/releases
# Use the CLI
capysquash analyze migrations/
capysquash squash migrations/ --safety=standardConfiguration
Create pgsquash.config.json:
{
"safetyLevel": "standard",
"plugins": {
"supabase": true,
"clerk": true
},
"validation": {
"enabled": true,
"approach": "two_databases"
},
"ai": {
"enabled": false,
"provider": "openai"
}
}π§ Advanced Usage
API Server Mode
Run pgsquash as an HTTP API server:
# Start the API server
api-server --port 8080
# Analyze via API
curl -X POST http://localhost:8080/analyze \
-H "Content-Type: application/json" \
-d '{"migrations": [...], "safetyLevel": "standard"}'GitHub Integration
Set up automatic PR analysis:
# Install GitHub App (via platform)
# Configure webhook endpoint
# Automatic analysis on PR push/updateDocker Validation
Ensure mathematical equivalence:
# Validate consolidation
pgsquash validate original/ squashed/ --approach=two_containers
# Result: β Validation successful: Schemas are equivalentπ― When to Use Each Component
Use CAPYSQUASH Platform (Recommended) When:
- Getting started with migration squashing - one-click setup
- Team collaboration and shared projects are needed
- Automated GitHub integration and PR analysis are required
- Visual interface for migration analysis and optimization
- Analytics and reporting on optimization impact are valuable
- Scheduled automation and webhook notifications are desired
Use capysquash-cli (Power Users) When:
- Local development and quick migration analysis
- CI/CD pipelines needing command-line automation
- Individual developers who prefer terminal workflows
- Quick checks before committing migrations
Use pgsquash-engine (Library) When:
- Building custom tools that need programmatic migration optimization
- Integrating into existing systems and workflows
- Creating custom automation with full control
- Understanding the internals of how CAPYSQUASH works
π Next Steps
- CAPYSQUASH Platform - Get started with one-click squashing
- capysquash-cli Documentation - Command-line usage
- Library API Guide - Use pgsquash-engine programmatically
- Configuration Guide - Advanced configuration options
- PostgreSQL Internals - Deep dive into parser technology
Ready to start? Try CAPYSQUASH Platform for the easiest experience, or install capysquash-cli for command-line usage. Developers can use the Library API for custom integrations.
How is this guide?