Migration Analysis
How CAPYSQUASH analyzes and understands your migrations
MIGRATION ANALYSIS
Learn how CAPYSQUASH analyzes your migrations to safely consolidate them.
OVERVIEW
capysquash uses a sophisticated multi-phase analysis engine to understand your migrations, track object lifecycles, and identify safe consolidation opportunities.
HOW IT WORKS
PARSING
Parse SQL using PostgreSQL's official parser
TRACKING
Track object lifecycles and changes over time
DEPENDENCY ANALYSIS
Build dependency graph and detect conflicts
CONSOLIDATION
Identify safe consolidation opportunities
GENERATION
Generate optimized, dependency-sorted migrations
PHASE 1: PARSING
PostgreSQL Parser
pgsquash uses PostgreSQL's official parser (pg_query_go) for 100% accurate SQL analysis.
Why this matters:
- ☑ Understands ALL PostgreSQL syntax
- ☑ Handles complex queries, CTEs, window functions
- ☑ Parses proprietary extensions (PostGIS, pgvector, etc.)
- ☑ Same parser PostgreSQL itself uses
Example:
-- capysquash can parse complex statements like:
CREATE TABLE users (
id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
data jsonb NOT NULL,
search tsvector GENERATED ALWAYS AS (
to_tsvector('english', data->>'name')
) STORED
);Statement Extraction
From each migration file, capysquash extracts:
📊 METADATA
- ► Object type (table, index, function)
- ► Object name and schema
- ► Operation type (CREATE, ALTER, DROP)
- ► Source file and line number
🔗 RELATIONSHIPS
- ► Foreign key references
- ► Function dependencies
- ► Schema dependencies
- ► Extension requirements
PHASE 2: OBJECT TRACKING
Lifecycle Tracking
capysquash tracks how each database object evolves:
Table: users
├─ Migration 001: CREATE TABLE users (id INT)
├─ Migration 003: ALTER TABLE users ADD COLUMN email TEXT
├─ Migration 007: ALTER TABLE users ADD COLUMN name TEXT
├─ Migration 012: ALTER TABLE users ALTER COLUMN email SET NOT NULL
└─ Final state: users (id INT, email TEXT NOT NULL, name TEXT)Consolidation opportunity: Merge into single CREATE TABLE statement.
State Management
For each object, capysquash tracks:
| Property | Description | Example |
|---|---|---|
| Current State | Latest definition | users (id, email, name) |
| Change History | All modifications | 4 ALTERs tracked |
| Dependencies | What it references | FK to profiles table |
| Dependents | What references it | Index idx_users_email |
| Lifecycle Events | CREATE, ALTER, DROP | Created in 001, altered 3x |
Example Tracking
Input migrations:
-- 001_init.sql
CREATE TABLE posts (
id serial PRIMARY KEY
);
-- 003_add_title.sql
ALTER TABLE posts ADD COLUMN title text;
-- 007_add_author.sql
ALTER TABLE posts ADD COLUMN author_id int;
-- 012_add_fk.sql
ALTER TABLE posts
ADD CONSTRAINT fk_author
FOREIGN KEY (author_id) REFERENCES users(id);Tracked state:
Table: posts
Created: 001_init.sql
Columns added: title (003), author_id (007)
Constraints added: fk_author (012)
Dependencies: users table
Final statement: CREATE TABLE posts (
id serial PRIMARY KEY,
title text,
author_id int,
CONSTRAINT fk_author FOREIGN KEY (author_id) REFERENCES users(id)
);PHASE 3: DEPENDENCY ANALYSIS
Dependency Graph
capysquash builds a directed acyclic graph (DAG) of dependencies:
Extensions
│
├─→ Schemas
│ │
│ ├─→ Custom Types (ENUMs, Composite)
│ │ │
│ │ └─→ Tables
│ │ │
│ │ ├─→ Constraints
│ │ │
│ │ ├─→ Indexes
│ │ │
│ │ └─→ Functions/Triggers
│ │
│ └─→ RLS Policies
│
└─→ Data (INSERTs)Dependency Types
DEPENDENCY CATEGORIES
1. SCHEMA DEPENDENCIES
Table must be created before FK constraint
users → posts.fk_author2. TYPE DEPENDENCIES
ENUM must exist before column uses it
CREATE TYPE status → users.status status3. FUNCTION DEPENDENCIES
Function before trigger
handle_updated_at() → trigger on users4. EXTENSION DEPENDENCIES
Extension before its features
uuid-ossp → gen_random_uuid()5. DATA DEPENDENCIES
Table before INSERTs
CREATE TABLE → INSERT INTOCircular Dependency Detection
capysquash detects and resolves circular dependencies:
Example problem:
-- Table A references B
CREATE TABLE a (b_id INT REFERENCES b(id));
-- Table B references A
CREATE TABLE b (a_id INT REFERENCES a(id));Solution:
-- Create tables without FKs first
CREATE TABLE a (b_id INT);
CREATE TABLE b (a_id INT);
-- Add FKs after both exist
ALTER TABLE a ADD CONSTRAINT fk_b FOREIGN KEY (b_id) REFERENCES b(id);
ALTER TABLE b ADD CONSTRAINT fk_a FOREIGN KEY (a_id) REFERENCES a(id);PHASE 4: CONSOLIDATION DETECTION
Consolidation Opportunities
capysquash identifies multiple types of consolidation:
☑ SAFE CONSOLIDATIONS
- ► CREATE + ALTER → Single CREATE
- ► Multiple ALTERs → Combined ALTER
- ► Duplicate indexes → Deduplicated
- ► Identical functions → Single definition
- ► Multiple GRANTs → Consolidated
⚠️ ADVANCED (Higher safety levels)
- ► DROP → CREATE cycles → Single CREATE
- ► Unused columns → Removed
- ► Dead code → Eliminated
- ► RLS policies → Optimized
Example: CREATE + ALTER Consolidation
Before (3 migrations):
-- Migration 001
CREATE TABLE users (id INT PRIMARY KEY);
-- Migration 005
ALTER TABLE users ADD COLUMN email TEXT;
-- Migration 012
ALTER TABLE users ADD COLUMN name TEXT NOT NULL;After (consolidated):
CREATE TABLE users (
id INT PRIMARY KEY,
email TEXT,
name TEXT NOT NULL
);Safety: ☑ Safe - final schema is identical
PHASE 5: AI-POWERED ANALYSIS (Optional)
When enabled, AI analysis provides semantic understanding:
What AI Analyzes
-
Function Similarity
- Identifies functionally identical but differently formatted code
- Detects semantic duplicates
-
Dead Code Detection
- Finds unused functions
- Identifies orphaned triggers
- Detects unreferenced types
-
Optimization Suggestions
- Better index strategies
- RLS policy improvements
- Query performance hints
Example AI Insight
-- AI detects these are semantically identical:
CREATE FUNCTION update_timestamp_v1()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION update_modified_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;AI suggests: Consolidate to single function (NOW() and CURRENT_TIMESTAMP are equivalent).
PHASE 6: OUTPUT GENERATION
Dependency-Sorted Output
capysquash generates migrations in dependency order:
Organized Categories
| Category | Contents | Why This Order |
|---|---|---|
| Extensions | CREATE EXTENSION | Required by other objects |
| Schemas & Types | CREATE SCHEMA, TYPE, ENUM | Referenced by tables |
| Tables | CREATE TABLE (structure only) | Core schema |
| Constraints | FK, CHECK, UNIQUE | After tables exist |
| Indexes | CREATE INDEX | After constraints |
| Functions/Triggers | Functions, triggers, views | After tables |
| RLS/Security | RLS policies, GRANTs | After functions |
| Data | INSERT statements | After everything |
SAFETY GUARANTEES
capysquash provides multiple safety guarantees:
SAFETY MECHANISMS
☑ PARSE VALIDATION
Every statement parsed with PostgreSQL's parser before and after
☑ DEPENDENCY VERIFICATION
All dependencies resolved before output generation
☑ SCHEMA COMPARISON (with Docker)
Byte-for-byte comparison of resulting schemas
☑ ROLLBACK SUPPORT
Keep originals, test consolidated versions separately
☑ DRY RUN MODE
Preview changes without modifying files
PERFORMANCE OPTIMIZATIONS
For large migration sets (500+ files):
Streaming Mode
capysquash squash migrations/*.sql --streaming --memory-limit 512Benefits:
- Processes files in batches
- Reduced memory footprint
- Handles massive migration sets
Parallel Processing
capysquash squash migrations/*.sql --parallel 8Benefits:
- Parse files concurrently
- Faster for large projects
- Utilizes multiple CPU cores
ANALYSIS METRICS
When you run analysis, you see:
📊 ANALYSIS RESULTS
Files: 156 migrations
Statements: 2,847 total
Size: 12.4 MB
🎯 CONSOLIDATION POTENTIAL: 78%
Opportunities detected:
► CREATE + ALTER pairs: 89 instances
► Duplicate indexes: 23 found
► Duplicate functions: 12 found
► RLS policies: 34 can be optimized
► Dead code: 8 unused functions
Estimated output:
► Files: 34 (78% reduction)
► Statements: 1,124 (61% reduction)
► Size: ~4.2 MB (66% reduction)NEXT STEPS
How is this guide?