Comprehensive guide to using CAPYSQUASH effectively and writing safe PostgreSQL migrations

Best Practices Guide

Expert strategies for using CAPYSQUASH effectively and writing production-grade PostgreSQL migrations

This comprehensive guide covers CAPYSQUASH-specific best practices alongside battle-tested PostgreSQL migration patterns to help you optimize safely and ship confidently.

Applies To

These best practices apply to CAPYSQUASH Platform, capysquash-cli, and pgsquash-engine integrations.

When to Use CAPYSQUASH
Safety Level Selection
Migration Organization
PostgreSQL Best Practices
Version Control Workflow
Team Collaboration
Validation & Testing

When to Use CAPYSQUASH

✓ Good Times to Squash

After Milestones

✓ After beta launch
✓ After 1.0 release
✓ After major refactor

Regular Cadence

✓ Monthly or quarterly
✓ When consolidation > 50%
✓ Before major releases

When Onboarding

✓ Before new developers join
✓ Before open-sourcing
✓ Before client handoff

✗ Bad Times to Squash

Avoid These Scenarios

During Active Development: Wait for feature branches to merge—don't squash mid-sprint.

Right Before Critical Deploy: Squash early in the release cycle and test thoroughly.

Without Validation: Always validate squashed migrations. Never skip Docker validation.

Safety Level Selection

Choose the appropriate safety level based on your environment, project stage, and team size:

By Environment

Recommended: aggressive or standard

Fast iteration is priority
Easy to rebuild if issues occur
Lower risk tolerance needed

capysquash squash migrations/*.sql --safety aggressive

Recommended: standard or conservative

Balance between speed and safety
Testing ground for production
Should mirror production data

capysquash squash migrations/*.sql --safety standard

Recommended: conservative or paranoid

Maximum safety is critical
Zero tolerance for data loss
Thorough validation required

capysquash squash migrations/*.sql --safety paranoid

By Project Stage

Project Stage	Recommended Safety Level	Rationale
Early startup (< 6 months)	`aggressive`	Rapid iteration, easy to rebuild
Growing (6-18 months)	`standard`	Balancing speed with stability
Established (18+ months)	`conservative`	Production data, customer impact
Enterprise / Critical Systems	`paranoid`	Zero-downtime requirements

By Team Size

Team Size	Recommended Safety Level
Solo developer	`aggressive` or `standard`
Small team (2-5)	`standard`
Medium team (6-20)	`conservative`
Large team (20+)	`conservative` or `paranoid`

Migration Organization

File Naming Conventions

Recommended: Sequential with descriptive names

001_extensions_and_schemas.sql
002_user_authentication.sql
003_content_management.sql
004_analytics_tables.sql
005_indexes_and_constraints.sql
006_rls_policies.sql
007_functions_and_triggers.sql
008_seed_data.sql

Benefits:

Clear ordering maintains dependency chain
Self-documenting purpose
Easy team navigation
Git-friendly for conflict resolution

Migration Categories

Organize migrations by dependency order to ensure safe consolidation:

Extensions & Schemas

Enable PostgreSQL extensions and create custom schemas first

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE SCHEMA IF NOT EXISTS analytics;

Types & Enums

Define custom types before tables that use them

CREATE TYPE user_role AS ENUM ('admin', 'user', 'guest');

Tables & Core Structure

Create table definitions

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  email TEXT NOT NULL,
  role user_role NOT NULL
);

Constraints & Foreign Keys

Add relationships after all tables exist

ALTER TABLE orders
ADD CONSTRAINT fk_orders_users
FOREIGN KEY (user_id) REFERENCES users(id);

Indexes

Create indexes for query performance

CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Functions & Triggers

Add procedural code

CREATE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Security & RLS

Apply Row Level Security policies and grants

ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_isolation ON users
FOR ALL USING (auth.uid() = id);

Seed Data

Insert reference or initial data last

INSERT INTO roles (name) VALUES ('admin'), ('user');

PostgreSQL Best Practices

Core Principles

Safety First

Never risk data loss or extended downtime

Minimize Locks

Keep operations non-blocking whenever possible

Test Everything

Validate migrations against production-like data

Migration Anti-Patterns

☒ Anti-Pattern 1: Adding Non-Nullable Columns

Don't Do This

ALTER TABLE users ADD COLUMN email VARCHAR(255) NOT NULL;

Why it's bad:

Requires full table rewrite
Locks table for duration (could be minutes on large tables)
Fails if table has existing rows without a default

Do this instead:

-- Step 1: Add column as nullable
ALTER TABLE users ADD COLUMN email VARCHAR(255);

-- Step 2: Backfill data (in batches for large tables)
UPDATE users
SET email = 'placeholder@example.com'
WHERE email IS NULL;

-- Step 3: Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

-- Step 1: Add column with default (PostgreSQL 11+ optimizes this!)
ALTER TABLE users
ADD COLUMN email VARCHAR(255)
DEFAULT 'placeholder@example.com';

-- Step 2: Remove default (doesn't affect existing rows)
ALTER TABLE users ALTER COLUMN email DROP DEFAULT;

-- Step 3: Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

PostgreSQL 11+ optimizes adding columns with defaults—no table rewrite required!

☒ Anti-Pattern 2: Creating Indexes on Large Tables

Don't Do This

CREATE INDEX idx_users_email ON users(email);

Why it's bad:

Acquires SHARE lock on table
Blocks writes for entire index build
On a 10M row table, could take 30+ seconds

Do this instead:

CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

What CONCURRENTLY does:

✓ Builds index without blocking writes
✓ Takes longer but doesn't lock table
✓ Safe for production use

Important caveats:

Cannot run inside a transaction
Requires slightly more disk space during build
May fail if canceled (leaves invalid index)

Check for failed concurrent indexes:

SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE indexname IN (
    SELECT indexrelid::regclass::text
    FROM pg_index
    WHERE NOT indisvalid
);

☒ Anti-Pattern 3: Changing Column Types Carelessly

Dangerous Operation

ALTER TABLE users ALTER COLUMN id TYPE BIGINT;

Why it's risky:

Requires full table rewrite
Locks table during rewrite
Can break foreign key constraints
May require rebuilding all indexes

Safe type changes (no rewrite):

VARCHAR(50) → VARCHAR(100) ✓
NUMERIC(10,2) → NUMERIC(12,2) ✓
VARCHAR → TEXT ✓

For unsafe changes, use the shadow column pattern:

Add New Column

ALTER TABLE users ADD COLUMN id_new BIGINT;

Backfill Data

-- In batches for large tables
UPDATE users
SET id_new = id
WHERE id_new IS NULL
LIMIT 10000;
-- Repeat until complete

Swap Columns

BEGIN;
ALTER TABLE users RENAME COLUMN id TO id_old;
ALTER TABLE users RENAME COLUMN id_new TO id;
COMMIT;

Cleanup

ALTER TABLE users DROP COLUMN id_old;

☒ Anti-Pattern 4: Adding Foreign Keys Without `NOT VALID`

Blocking Operation

ALTER TABLE orders
ADD CONSTRAINT fk_orders_users
FOREIGN KEY (user_id) REFERENCES users(id);

Why it's bad: Acquires lock while scanning entire table to validate existing rows.

Do this instead:

-- Step 1: Add constraint as NOT VALID (doesn't check existing rows)
ALTER TABLE orders
ADD CONSTRAINT fk_orders_users
FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;

-- Step 2: Validate constraint (checks existing rows without exclusive lock)
ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_users;

Why this works:

NOT VALID adds constraint instantly
VALIDATE scans table without blocking writes
New rows are checked immediately
Existing rows validated in background

Batching Large Updates

For updates affecting millions of rows, use batching to avoid long-running transactions:

DO $$
DECLARE
    batch_size INT := 10000;
    updated INT;
    total_updated BIGINT := 0;
BEGIN
    LOOP
        UPDATE users
        SET status = 'active'
        WHERE status IS NULL
        AND id IN (
            SELECT id FROM users
            WHERE status IS NULL
            LIMIT batch_size
        );

        GET DIAGNOSTICS updated = ROW_COUNT;
        total_updated := total_updated + updated;

        RAISE NOTICE 'Updated % rows (total: %)', updated, total_updated;

        EXIT WHEN updated = 0;

        -- Pause to let other transactions through
        PERFORM pg_sleep(0.1);
    END LOOP;

    RAISE NOTICE 'Migration complete. Total rows updated: %', total_updated;
END $$;

PostgreSQL Gotchas to Avoid

Quick Tips

TEXT vs VARCHAR: They're the same! Use TEXT (no arbitrary length limit)

SERIAL vs IDENTITY: Use GENERATED ALWAYS AS IDENTITY (PostgreSQL 10+) for better SQL standard compliance

TIMESTAMP vs TIMESTAMPTZ: Always use TIMESTAMPTZ for user-generated times to store UTC properly

git checkout -b backup-migrations-$(date +%Y%m%d)
git push origin backup-migrations-$(date +%Y%m%d)

Create Feature Branch

git checkout main
git checkout -b chore/consolidate-migrations

Run Squash

capysquash squash migrations/*.sql \
  --output clean/ \
  --safety conservative

Validate Results

capysquash validate migrations/ clean/

Replace Migrations

mv migrations/ migrations_original/
mv clean/ migrations/

Commit Changes

git add migrations/
git commit -m "chore: consolidate migrations (287 → 12 files)"

Create Pull Request

gh pr create \
  --title "Consolidate migrations" \
  --body "Reduces migration count by 96%. See details in commit."

Commit Message Template

chore: consolidate migrations

- Reduced from 287 to 12 files (96% reduction)
- Validated with pgsquash Docker validation
- Safety level: conservative
- All schemas identical

Consolidation details:
- CREATE + ALTER pairs: 89 consolidated
- Duplicate indexes removed: 23
- RLS policies optimized: 34
- Deployment time: 8min → 45sec

Team Collaboration

Communication Checklist

Before squashing:

✓ Announce in team chat
✓ Ensure no active migration work in progress
✓ Check for open PRs touching migrations
✓ Plan timing (end of sprint, between features)

After squashing:

✓ Share validation results
✓ Update team documentation
✓ Notify about file location changes
✓ Update migration trackers or wikis

Code Review Checklist

Reviewer Checklist

Reviewers should verify:

Validation passed successfully
Safety level is documented
File count reduction is documented
Commit message is clear and detailed
No open PRs conflict with this change
Deployment process updated (if needed)
CI/CD pipelines still work correctly
Team has been notified of changes

Validation & Testing

Always Validate

Never Skip Validation

# ALWAYS run validation before deploying
capysquash validate migrations/ clean/

Never skip validation for:

Production migrations
Shared branches (main, develop)
Client projects
First-time squashing

capysquash validate --show-diff migrations/ clean/

Try More Conservative

capysquash squash migrations/*.sql --safety paranoid

# 1. Clone production schema (without data)
pg_dump --schema-only production > schema.sql
psql test_db < schema.sql

# 2. Generate synthetic data
psql test_db <<EOF
INSERT INTO users (name, email)
SELECT
    'User ' || generate_series,
    'user' || generate_series || '@example.com'
FROM generate_series(1, 1000000);
EOF

# 3. Run migration
psql test_db < migration.sql

# 4. Measure performance
psql test_db -c "EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';"

Configuration Management

Environment-Specific Configs

Create different configurations for each environment:

File: pgsquash.dev.json

{
  "safety_level": "aggressive",
  "rules": {
    "remove_dead_code": true,
    "deduplicate_functions": true
  },
  "performance": {
    "parallel_workers": 8
  }
}

Usage:

capysquash squash migrations/*.sql --config pgsquash.dev.json

File: pgsquash.prod.json

{
  "safety_level": "conservative",
  "rules": {
    "remove_dead_code": false,
    "deduplicate_functions": false
  },
  "output": {
    "preserve_comments": true,
    "add_headers": true
  }
}

Usage:

capysquash squash migrations/*.sql --config pgsquash.prod.json

Keep Config in Version Control

git add pgsquash.*.json
git commit -m "docs: add pgsquash configurations for each environment"

Common Pitfalls

☒ Don't: Squash in Feature Branch

Bad Practice

# Bad: Squashing in feature branch
git checkout feature/new-table
capysquash squash migrations/*.sql --output clean/
# Now you'll have conflicts with main!

Do instead: Squash in a separate chore branch from main

☒ Don't: Skip Validation

Dangerous

# Bad: No validation
capysquash squash migrations/*.sql --output clean/
mv clean/ migrations/
git commit -m "squashed" # 😱 Hope and pray!

Do instead: Always validate first—it takes seconds and prevents disasters

☒ Don't: Lose Original Migrations

Data Loss Risk

# Bad: Overwriting originals without backup
capysquash squash migrations/*.sql --output migrations/ --force
# Lost originals forever!

Do instead: Output to a new directory and keep backups

Rollback Strategy

Keep Backups

# Before consolidating
cp -r migrations/ migrations_backup_$(date +%Y%m%d)/

# Or use git tags
git tag backup-migrations-$(date +%Y%m%d)
git push origin --tags

Rollback if Needed

# If issues arise after consolidation
mv migrations/ migrations_consolidated/
mv migrations_backup_20240115/ migrations/

# Or with git
git revert <consolidation-commit>

ORM-Specific Patterns

Prisma

# Squash Prisma migrations
capysquash squash prisma/migrations/*/migration.sql --output consolidated/

# Keep Prisma metadata intact
# Don't delete _prisma_migrations table

Drizzle

# Squash Drizzle migrations
capysquash squash drizzle/migrations/*.sql --output consolidated/

# Preserve __drizzle_migrations table

Supabase

# Squash Supabase migrations
capysquash squash supabase/migrations/*.sql --output consolidated/

CAPYSQUASH automatically preserves Supabase-specific patterns:

Auth schema and functions
Storage schema and buckets
RLS policies
Realtime functions

Monitoring & Maintenance

Track Migration Health

Create a monthly check script:

#!/bin/bash
# monthly-migration-check.sh

DATE=$(date +%Y-%m-%d)
mkdir -p reports

capysquash analyze migrations/*.sql \
  --report-format json > "reports/$DATE.json"

# Extract metrics
FILE_COUNT=$(jq '.file_count' "reports/$DATE.json")
CONSOLIDATION=$(jq '.consolidation_percentage' "reports/$DATE.json")

echo "$DATE,$FILE_COUNT,$CONSOLIDATION" >> metrics.csv

# Alert if consolidation opportunity is high
if (( $(echo "$CONSOLIDATION > 60" | bc -l) )); then
  echo "⚠️  Time to consolidate migrations ($CONSOLIDATION% opportunity)"
fi

Set Consolidation Thresholds

Consolidation Opportunity	Action
< 30%	No action needed
30-50%	Consider consolidating soon
50-70%	Plan consolidation within sprint
> 70%	Consolidate ASAP (technical debt building)

Best Practices Guide

After Milestones

Regular Cadence

When Onboarding

Safety First

Minimize Locks

Test Everything

CI/CD Integration

Docker Validation

GitHub Integration

Safety Levels

On this page