CAPYSQUASH

Dependency Resolution

How capysquash builds and uses dependency graphs for safe consolidation

DEPENDENCY RESOLUTION

The brain behind safe migration consolidation

THE PROBLEM

SQL operations have complex interdependencies:

WHY ORDER MATTERS

-- This works:
CREATE TABLE users (id SERIAL PRIMARY KEY);
CREATE TABLE posts (author_id INT REFERENCES users(id));

-- This fails:
CREATE TABLE posts (author_id INT REFERENCES users(id));
CREATE TABLE users (id SERIAL PRIMARY KEY);
-- Error: relation "users" does not exist

Consolidating migrations requires understanding which operations depend on which objects.

HOW capysquash TRACKS DEPENDENCIES

Step 1: Object Lifecycle Tracking

capysquash tracks every database object through its entire lifecycle:

// Internal tracking structure
type ObjectLifecycle struct {
    Name       string           // e.g., "users", "posts", "idx_email"
    Type       ObjectType       // TABLE, INDEX, CONSTRAINT, FUNCTION, etc.
    CreatedIn  *Migration       // Which migration created it
    ModifiedIn []*Migration     // Which migrations modified it
    DroppedIn  *Migration       // Which migration dropped it (if any)
    DependsOn  []string         // Other objects this depends on
}

Step 2: Build Dependency Graph

EXAMPLE MIGRATIONS

-- 001_users.sql
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email TEXT
);

-- 002_posts.sql
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    author_id INT
);

-- 003_foreign_key.sql
ALTER TABLE posts
ADD CONSTRAINT fk_author
FOREIGN KEY (author_id)
REFERENCES users(id);

-- 004_email_index.sql
CREATE INDEX idx_email
ON users(email);

DEPENDENCY GRAPH

users (table)
  ├─→ users_id_seq (sequence)
  ├─→ users.id (column)
  ├─→ users.email (column)
  │     └─→ idx_email (index)
  └─→ users_pkey (constraint)

posts (table)
  ├─→ posts_id_seq (sequence)
  ├─→ posts.id (column)
  ├─→ posts.author_id (column)
  │     └─→ fk_author (constraint)
  │           └─→ depends on users.id
  └─→ posts_pkey (constraint)

Step 3: Topological Sorting

capysquash uses topological sort to determine safe consolidation order:

// Simplified algorithm
func TopologicalSort(objects []Object) ([]Object, error) {
    sorted := []Object{}
    visited := make(map[string]bool)

    for _, obj := range objects {
        if err := visit(obj, &sorted, visited); err != nil {
            return nil, err // Circular dependency detected
        }
    }

    return sorted, nil
}

func visit(obj Object, sorted *[]Object, visited map[string]bool) error {
    if visited[obj.Name] {
        return nil // Already processed
    }

    // Visit dependencies first
    for _, dep := range obj.Dependencies {
        if err := visit(dep, sorted, visited); err != nil {
            return err
        }
    }

    visited[obj.Name] = true
    *sorted = append(*sorted, obj)
    return nil
}

DEPENDENCY TYPES

1. Direct Table Dependencies

Foreign Keys:

-- posts depends on users
ALTER TABLE posts ADD CONSTRAINT fk_author
    FOREIGN KEY (author_id) REFERENCES users(id);

Detection:

  • Parse REFERENCES clause
  • Extract referenced table and column
  • Add posts.fk_author → users.id dependency

2. Column Dependencies

Columns on Tables:

-- users.email depends on users table existing
ALTER TABLE users ADD COLUMN email TEXT;

Detection:

  • Table must exist before adding columns
  • Add users.email → users dependency

3. Index Dependencies

Indexes on Columns:

-- idx_email depends on users.email existing
CREATE INDEX idx_email ON users(email);

Detection:

  • Index requires target column(s) to exist
  • Add idx_email → users.email dependency

4. Constraint Dependencies

Check Constraints:

-- Constraint depends on column existing
ALTER TABLE users ADD CONSTRAINT check_email
    CHECK (email ~* '^[A-Za-z0-9._%+-]+@');

Detection:

  • Parse constraint expression
  • Extract referenced columns
  • Add check_email → users.email dependency

5. Function Dependencies

Functions Using Tables:

CREATE FUNCTION count_user_posts(user_id INT)
RETURNS INT AS $$
    SELECT COUNT(*) FROM posts WHERE author_id = user_id;
$$ LANGUAGE sql;

Detection:

  • Parse function body for table references
  • Add count_user_posts → posts dependency
  • Note: This is best-effort (PL/pgSQL is complex)

6. Trigger Dependencies

Triggers on Tables:

CREATE TRIGGER update_timestamp
    BEFORE UPDATE ON users
    FOR EACH ROW
    EXECUTE FUNCTION update_modified_column();

Detection:

  • Trigger depends on table existing
  • Trigger depends on function existing
  • Add update_timestamp → users and update_timestamp → update_modified_column dependencies

7. View Dependencies

Views Referencing Tables:

CREATE VIEW active_users AS
    SELECT * FROM users WHERE active = true;

Detection:

  • Parse view query for table references
  • Add active_users → users dependency

8. Type Dependencies

Composite Types:

CREATE TYPE address AS (
    street TEXT,
    city TEXT
);

ALTER TABLE users ADD COLUMN addr address;

Detection:

  • Column type references custom type
  • Add users.addr → address dependency

CONSOLIDATION STRATEGIES

Safe Consolidation (Conservative)

RULES:

  • ☑ Merge operations on same object if no intermediate dependencies
  • ☑ Preserve all dependency order
  • ☑ Never reorder across dependency boundaries
  • ☒ Don't merge if any risk detected

Example:

-- Original:
CREATE TABLE users (id INT);
ALTER TABLE users ADD email TEXT;
ALTER TABLE users ADD name TEXT;

-- Consolidated:
CREATE TABLE users (id INT, email TEXT, name TEXT);
-- Safe: All modifications on same table, no external dependencies

Aggressive Consolidation

RULES:

  • ☑ Reorder operations if dependency graph allows
  • ☑ Merge even with intermediate steps
  • ☑ Eliminate redundant operations
  • ⚠️ Higher risk of subtle behavior changes

Example:

-- Original:
CREATE INDEX idx_a ON users(email);
CREATE INDEX idx_b ON posts(title);
CREATE INDEX idx_c ON users(name);

-- Consolidated (reordered):
CREATE INDEX idx_a ON users(email);
CREATE INDEX idx_c ON users(name);
CREATE INDEX idx_b ON posts(title);
-- Safe: Independent indexes can be reordered

CIRCULAR DEPENDENCY DETECTION

capysquash detects circular dependencies and prevents consolidation:

EXAMPLE: CIRCULAR REFERENCE

-- This creates a circular dependency:
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    primary_address_id INT
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(id)
);

ALTER TABLE users
    ADD CONSTRAINT fk_primary_address
    FOREIGN KEY (primary_address_id)
    REFERENCES addresses(id);

Dependency Graph:

users → addresses (via addresses.user_id FK)
addresses → users (via users.primary_address_id FK)

capysquash Action:

  • ☒ Cannot consolidate into single migration
  • ☑ Keeps in separate migrations to allow deferred constraints
  • ⚠️ Warns user about circular dependency

ADVANCED SCENARIOS

Scenario 1: Rename Tracking

-- Migration 001
CREATE TABLE old_users (id INT);

-- Migration 002
INSERT INTO old_users VALUES (1), (2), (3);

-- Migration 003
ALTER TABLE old_users RENAME TO users;

-- Migration 004
ALTER TABLE users ADD COLUMN email TEXT;

Dependency Tracking:

  • users is a renamed version of old_users
  • Migration 004 operates on users, but data exists in old_users
  • capysquash tracks the rename chain

Consolidation:

-- Consolidated (cannot merge due to data migration):
CREATE TABLE old_users (id INT);
INSERT INTO old_users VALUES (1), (2), (3);
ALTER TABLE old_users RENAME TO users;
ALTER TABLE users ADD COLUMN email TEXT;

-- Alternative (if no data migration):
CREATE TABLE users (id INT, email TEXT);

Scenario 2: Drop and Recreate

-- Migration 001
CREATE TABLE users (id INT);

-- Migration 002
CREATE INDEX idx_id ON users(id);

-- Migration 003
DROP INDEX idx_id;

-- Migration 004
CREATE INDEX idx_id ON users(id);

Dependency Tracking:

  • First idx_id created in migration 002
  • idx_id dropped in migration 003
  • New idx_id created in migration 004 (different object, same name)

Consolidation:

-- Aggressive mode: Eliminate the drop/recreate cycle
CREATE TABLE users (id INT);
CREATE INDEX idx_id ON users(id);

-- Conservative mode: Preserve history
CREATE TABLE users (id INT);
CREATE INDEX idx_id ON users(id);
DROP INDEX idx_id;
CREATE INDEX idx_id ON users(id);

Scenario 3: Cross-Schema Dependencies

-- Migration 001
CREATE SCHEMA auth;
CREATE TABLE auth.users (id INT);

-- Migration 002
CREATE SCHEMA app;
CREATE TABLE app.posts (
    id INT,
    author_id INT REFERENCES auth.users(id)
);

Dependency Tracking:

  • app.posts depends on auth.users
  • auth.users depends on auth schema
  • app.posts depends on app schema

Consolidation:

-- Must preserve schema creation order and cross-schema references
CREATE SCHEMA auth;
CREATE TABLE auth.users (id INT);
CREATE SCHEMA app;
CREATE TABLE app.posts (
    id INT,
    author_id INT REFERENCES auth.users(id)
);

PERFORMANCE OPTIMIZATIONS

Parallel Dependency Resolution

For large migration sets (500+ files), capysquash uses parallel processing:

// Parse migrations in parallel
results := make(chan ParseResult, len(files))
for _, file := range files {
    go func(f string) {
        ast, err := parser.Parse(f)
        results <- ParseResult{File: f, AST: ast, Err: err}
    }(file)
}

// Collect results and build dependency graph
graph := NewDependencyGraph()
for range files {
    result := <-results
    if result.Err == nil {
        graph.AddStatements(result.AST)
    }
}

Incremental Graph Updates

When analyzing individual migrations, capysquash updates the graph incrementally:

// Instead of rebuilding entire graph:
graph.RemoveMigration("005_old.sql")
graph.AddMigration("005_new.sql")

// Recomputes only affected subgraph
graph.RecalculateDependencies("users")

DEBUGGING DEPENDENCIES

Visualize Dependency Graph

capysquash analyze migrations/ --show-deps --format=dot > deps.dot
dot -Tpng deps.dot -o deps.png

Output:

  • Graphviz DOT format
  • Render with any graph visualization tool
  • See all dependencies visually

Explain Dependency Chain

capysquash analyze migrations/ --explain-dep posts.fk_author

Output:

posts.fk_author (constraint)
  └─→ Created in: 003_foreign_key.sql
  └─→ Depends on:
      ├─→ posts.author_id (column)
      │   └─→ Created in: 002_posts.sql
      │   └─→ Depends on:
      │       └─→ posts (table)
      │           └─→ Created in: 002_posts.sql
      └─→ users.id (column)
          └─→ Created in: 001_users.sql
          └─→ Depends on:
              └─→ users (table)
                  └─→ Created in: 001_users.sql

NEXT STEPS


Questions about dependencies? Check GitHub Discussions

How is this guide?

On this page