CAPYSQUASH

Best Practices Guide

Comprehensive guide to using CAPYSQUASH effectively and writing safe PostgreSQL migrations

Best Practices Guide

Expert strategies for using CAPYSQUASH effectively and writing production-grade PostgreSQL migrations

This comprehensive guide covers CAPYSQUASH-specific best practices alongside battle-tested PostgreSQL migration patterns to help you optimize safely and ship confidently.

Applies To

These best practices apply to CAPYSQUASH Platform, capysquash-cli, and pgsquash-engine integrations.

Table of Contents


When to Use CAPYSQUASH

✓ Good Times to Squash

After Milestones

  • ✓ After beta launch
  • ✓ After 1.0 release
  • ✓ After major refactor

Regular Cadence

  • ✓ Monthly or quarterly
  • ✓ When consolidation > 50%
  • ✓ Before major releases

When Onboarding

  • ✓ Before new developers join
  • ✓ Before open-sourcing
  • ✓ Before client handoff

✗ Bad Times to Squash

Avoid These Scenarios

During Active Development: Wait for feature branches to merge—don't squash mid-sprint.

Right Before Critical Deploy: Squash early in the release cycle and test thoroughly.

Without Validation: Always validate squashed migrations. Never skip Docker validation.


Safety Level Selection

Choose the appropriate safety level based on your environment, project stage, and team size:

By Environment

Recommended: aggressive or standard

  • Fast iteration is priority
  • Easy to rebuild if issues occur
  • Lower risk tolerance needed
capysquash squash migrations/*.sql --safety aggressive

Recommended: standard or conservative

  • Balance between speed and safety
  • Testing ground for production
  • Should mirror production data
capysquash squash migrations/*.sql --safety standard

Recommended: conservative or paranoid

  • Maximum safety is critical
  • Zero tolerance for data loss
  • Thorough validation required
capysquash squash migrations/*.sql --safety paranoid

By Project Stage

Project StageRecommended Safety LevelRationale
Early startup (< 6 months)aggressiveRapid iteration, easy to rebuild
Growing (6-18 months)standardBalancing speed with stability
Established (18+ months)conservativeProduction data, customer impact
Enterprise / Critical SystemsparanoidZero-downtime requirements

By Team Size

Team SizeRecommended Safety Level
Solo developeraggressive or standard
Small team (2-5)standard
Medium team (6-20)conservative
Large team (20+)conservative or paranoid

Migration Organization

File Naming Conventions

Recommended: Sequential with descriptive names

001_extensions_and_schemas.sql
002_user_authentication.sql
003_content_management.sql
004_analytics_tables.sql
005_indexes_and_constraints.sql
006_rls_policies.sql
007_functions_and_triggers.sql
008_seed_data.sql

Benefits:

  • Clear ordering maintains dependency chain
  • Self-documenting purpose
  • Easy team navigation
  • Git-friendly for conflict resolution

Migration Categories

Organize migrations by dependency order to ensure safe consolidation:

Extensions & Schemas

Enable PostgreSQL extensions and create custom schemas first

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE SCHEMA IF NOT EXISTS analytics;

Types & Enums

Define custom types before tables that use them

CREATE TYPE user_role AS ENUM ('admin', 'user', 'guest');

Tables & Core Structure

Create table definitions

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  email TEXT NOT NULL,
  role user_role NOT NULL
);

Constraints & Foreign Keys

Add relationships after all tables exist

ALTER TABLE orders
ADD CONSTRAINT fk_orders_users
FOREIGN KEY (user_id) REFERENCES users(id);

Indexes

Create indexes for query performance

CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Functions & Triggers

Add procedural code

CREATE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Security & RLS

Apply Row Level Security policies and grants

ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_isolation ON users
FOR ALL USING (auth.uid() = id);

Seed Data

Insert reference or initial data last

INSERT INTO roles (name) VALUES ('admin'), ('user');

PostgreSQL Best Practices

Core Principles

Safety First

Never risk data loss or extended downtime

Minimize Locks

Keep operations non-blocking whenever possible

Test Everything

Validate migrations against production-like data

Migration Anti-Patterns

☒ Anti-Pattern 1: Adding Non-Nullable Columns

Don't Do This

ALTER TABLE users ADD COLUMN email VARCHAR(255) NOT NULL;

Why it's bad:

  • Requires full table rewrite
  • Locks table for duration (could be minutes on large tables)
  • Fails if table has existing rows without a default

Do this instead:

-- Step 1: Add column as nullable
ALTER TABLE users ADD COLUMN email VARCHAR(255);

-- Step 2: Backfill data (in batches for large tables)
UPDATE users
SET email = 'placeholder@example.com'
WHERE email IS NULL;

-- Step 3: Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
-- Step 1: Add column with default (PostgreSQL 11+ optimizes this!)
ALTER TABLE users
ADD COLUMN email VARCHAR(255)
DEFAULT 'placeholder@example.com';

-- Step 2: Remove default (doesn't affect existing rows)
ALTER TABLE users ALTER COLUMN email DROP DEFAULT;

-- Step 3: Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

PostgreSQL 11+ optimizes adding columns with defaults—no table rewrite required!

☒ Anti-Pattern 2: Creating Indexes on Large Tables

Don't Do This

CREATE INDEX idx_users_email ON users(email);

Why it's bad:

  • Acquires SHARE lock on table
  • Blocks writes for entire index build
  • On a 10M row table, could take 30+ seconds

Do this instead:

CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

What CONCURRENTLY does:

  • ✓ Builds index without blocking writes
  • ✓ Takes longer but doesn't lock table
  • ✓ Safe for production use

Important caveats:

  • Cannot run inside a transaction
  • Requires slightly more disk space during build
  • May fail if canceled (leaves invalid index)

Check for failed concurrent indexes:

SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE indexname IN (
    SELECT indexrelid::regclass::text
    FROM pg_index
    WHERE NOT indisvalid
);

☒ Anti-Pattern 3: Changing Column Types Carelessly

Dangerous Operation

ALTER TABLE users ALTER COLUMN id TYPE BIGINT;

Why it's risky:

  • Requires full table rewrite
  • Locks table during rewrite
  • Can break foreign key constraints
  • May require rebuilding all indexes

Safe type changes (no rewrite):

  • VARCHAR(50)VARCHAR(100)
  • NUMERIC(10,2)NUMERIC(12,2)
  • VARCHARTEXT

For unsafe changes, use the shadow column pattern:

Add New Column

ALTER TABLE users ADD COLUMN id_new BIGINT;

Backfill Data

-- In batches for large tables
UPDATE users
SET id_new = id
WHERE id_new IS NULL
LIMIT 10000;
-- Repeat until complete

Swap Columns

BEGIN;
ALTER TABLE users RENAME COLUMN id TO id_old;
ALTER TABLE users RENAME COLUMN id_new TO id;
COMMIT;

Cleanup

ALTER TABLE users DROP COLUMN id_old;

☒ Anti-Pattern 4: Adding Foreign Keys Without NOT VALID

Blocking Operation

ALTER TABLE orders
ADD CONSTRAINT fk_orders_users
FOREIGN KEY (user_id) REFERENCES users(id);

Why it's bad: Acquires lock while scanning entire table to validate existing rows.

Do this instead:

-- Step 1: Add constraint as NOT VALID (doesn't check existing rows)
ALTER TABLE orders
ADD CONSTRAINT fk_orders_users
FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;

-- Step 2: Validate constraint (checks existing rows without exclusive lock)
ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_users;

Why this works:

  • NOT VALID adds constraint instantly
  • VALIDATE scans table without blocking writes
  • New rows are checked immediately
  • Existing rows validated in background

Batching Large Updates

For updates affecting millions of rows, use batching to avoid long-running transactions:

DO $$
DECLARE
    batch_size INT := 10000;
    updated INT;
    total_updated BIGINT := 0;
BEGIN
    LOOP
        UPDATE users
        SET status = 'active'
        WHERE status IS NULL
        AND id IN (
            SELECT id FROM users
            WHERE status IS NULL
            LIMIT batch_size
        );

        GET DIAGNOSTICS updated = ROW_COUNT;
        total_updated := total_updated + updated;

        RAISE NOTICE 'Updated % rows (total: %)', updated, total_updated;

        EXIT WHEN updated = 0;

        -- Pause to let other transactions through
        PERFORM pg_sleep(0.1);
    END LOOP;

    RAISE NOTICE 'Migration complete. Total rows updated: %', total_updated;
END $$;

PostgreSQL Gotchas to Avoid

Quick Tips

TEXT vs VARCHAR: They're the same! Use TEXT (no arbitrary length limit)

SERIAL vs IDENTITY: Use GENERATED ALWAYS AS IDENTITY (PostgreSQL 10+) for better SQL standard compliance

TIMESTAMP vs TIMESTAMPTZ: Always use TIMESTAMPTZ for user-generated times to store UTC properly


Version Control Workflow

Before Squashing

Create Backup Branch

git checkout -b backup-migrations-$(date +%Y%m%d)
git push origin backup-migrations-$(date +%Y%m%d)

Create Feature Branch

git checkout main
git checkout -b chore/consolidate-migrations

Run Squash

capysquash squash migrations/*.sql \
  --output clean/ \
  --safety conservative

Validate Results

capysquash validate migrations/ clean/

Replace Migrations

mv migrations/ migrations_original/
mv clean/ migrations/

Commit Changes

git add migrations/
git commit -m "chore: consolidate migrations (287 → 12 files)"

Create Pull Request

gh pr create \
  --title "Consolidate migrations" \
  --body "Reduces migration count by 96%. See details in commit."

Commit Message Template

chore: consolidate migrations

- Reduced from 287 to 12 files (96% reduction)
- Validated with pgsquash Docker validation
- Safety level: conservative
- All schemas identical

Consolidation details:
- CREATE + ALTER pairs: 89 consolidated
- Duplicate indexes removed: 23
- RLS policies optimized: 34
- Deployment time: 8min → 45sec

Team Collaboration

Communication Checklist

Before squashing:

  • ✓ Announce in team chat
  • ✓ Ensure no active migration work in progress
  • ✓ Check for open PRs touching migrations
  • ✓ Plan timing (end of sprint, between features)

After squashing:

  • ✓ Share validation results
  • ✓ Update team documentation
  • ✓ Notify about file location changes
  • ✓ Update migration trackers or wikis

Code Review Checklist

Reviewer Checklist

Reviewers should verify:

  • Validation passed successfully
  • Safety level is documented
  • File count reduction is documented
  • Commit message is clear and detailed
  • No open PRs conflict with this change
  • Deployment process updated (if needed)
  • CI/CD pipelines still work correctly
  • Team has been notified of changes

Validation & Testing

Always Validate

Never Skip Validation

# ALWAYS run validation before deploying
capysquash validate migrations/ clean/

Never skip validation for:

  • Production migrations
  • Shared branches (main, develop)
  • Client projects
  • First-time squashing

When Validation Fails

Don't Panic

Validation is designed to catch issues—this is expected behavior.

Review the Diff

capysquash validate --show-diff migrations/ clean/

Try More Conservative

capysquash squash migrations/*.sql --safety paranoid

Check for Edge Cases

Complex triggers, custom types, or ORM-specific patterns may need special handling.

Report Issues

If paranoid mode fails validation, file an issue with reproduction details.

Testing Migrations

Test against production-like data:

# 1. Clone production schema (without data)
pg_dump --schema-only production > schema.sql
psql test_db < schema.sql

# 2. Generate synthetic data
psql test_db <<EOF
INSERT INTO users (name, email)
SELECT
    'User ' || generate_series,
    'user' || generate_series || '@example.com'
FROM generate_series(1, 1000000);
EOF

# 3. Run migration
psql test_db < migration.sql

# 4. Measure performance
psql test_db -c "EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';"

Configuration Management

Environment-Specific Configs

Create different configurations for each environment:

File: pgsquash.dev.json

{
  "safety_level": "aggressive",
  "rules": {
    "remove_dead_code": true,
    "deduplicate_functions": true
  },
  "performance": {
    "parallel_workers": 8
  }
}

Usage:

capysquash squash migrations/*.sql --config pgsquash.dev.json

File: pgsquash.prod.json

{
  "safety_level": "conservative",
  "rules": {
    "remove_dead_code": false,
    "deduplicate_functions": false
  },
  "output": {
    "preserve_comments": true,
    "add_headers": true
  }
}

Usage:

capysquash squash migrations/*.sql --config pgsquash.prod.json

Keep Config in Version Control

git add pgsquash.*.json
git commit -m "docs: add pgsquash configurations for each environment"

Common Pitfalls

☒ Don't: Squash in Feature Branch

Bad Practice

# Bad: Squashing in feature branch
git checkout feature/new-table
capysquash squash migrations/*.sql --output clean/
# Now you'll have conflicts with main!

Do instead: Squash in a separate chore branch from main

☒ Don't: Skip Validation

Dangerous

# Bad: No validation
capysquash squash migrations/*.sql --output clean/
mv clean/ migrations/
git commit -m "squashed" # 😱 Hope and pray!

Do instead: Always validate first—it takes seconds and prevents disasters

☒ Don't: Lose Original Migrations

Data Loss Risk

# Bad: Overwriting originals without backup
capysquash squash migrations/*.sql --output migrations/ --force
# Lost originals forever!

Do instead: Output to a new directory and keep backups


Rollback Strategy

Keep Backups

# Before consolidating
cp -r migrations/ migrations_backup_$(date +%Y%m%d)/

# Or use git tags
git tag backup-migrations-$(date +%Y%m%d)
git push origin --tags

Rollback if Needed

# If issues arise after consolidation
mv migrations/ migrations_consolidated/
mv migrations_backup_20240115/ migrations/

# Or with git
git revert <consolidation-commit>

ORM-Specific Patterns

Prisma

# Squash Prisma migrations
capysquash squash prisma/migrations/*/migration.sql --output consolidated/

# Keep Prisma metadata intact
# Don't delete _prisma_migrations table

Drizzle

# Squash Drizzle migrations
capysquash squash drizzle/migrations/*.sql --output consolidated/

# Preserve __drizzle_migrations table

Supabase

# Squash Supabase migrations
capysquash squash supabase/migrations/*.sql --output consolidated/

CAPYSQUASH automatically preserves Supabase-specific patterns:

  • Auth schema and functions
  • Storage schema and buckets
  • RLS policies
  • Realtime functions

Monitoring & Maintenance

Track Migration Health

Create a monthly check script:

#!/bin/bash
# monthly-migration-check.sh

DATE=$(date +%Y-%m-%d)
mkdir -p reports

capysquash analyze migrations/*.sql \
  --report-format json > "reports/$DATE.json"

# Extract metrics
FILE_COUNT=$(jq '.file_count' "reports/$DATE.json")
CONSOLIDATION=$(jq '.consolidation_percentage' "reports/$DATE.json")

echo "$DATE,$FILE_COUNT,$CONSOLIDATION" >> metrics.csv

# Alert if consolidation opportunity is high
if (( $(echo "$CONSOLIDATION > 60" | bc -l) )); then
  echo "⚠️  Time to consolidate migrations ($CONSOLIDATION% opportunity)"
fi

Set Consolidation Thresholds

Consolidation OpportunityAction
< 30%No action needed
30-50%Consider consolidating soon
50-70%Plan consolidation within sprint
> 70%Consolidate ASAP (technical debt building)

Next Steps

How is this guide?

On this page