Dependency Resolution
How capysquash builds and uses dependency graphs for safe consolidation
DEPENDENCY RESOLUTION
The brain behind safe migration consolidation
THE PROBLEM
SQL operations have complex interdependencies:
WHY ORDER MATTERS
-- This works:
CREATE TABLE users (id SERIAL PRIMARY KEY);
CREATE TABLE posts (author_id INT REFERENCES users(id));
-- This fails:
CREATE TABLE posts (author_id INT REFERENCES users(id));
CREATE TABLE users (id SERIAL PRIMARY KEY);
-- Error: relation "users" does not existConsolidating migrations requires understanding which operations depend on which objects.
HOW capysquash TRACKS DEPENDENCIES
Step 1: Object Lifecycle Tracking
capysquash tracks every database object through its entire lifecycle:
// Internal tracking structure
type ObjectLifecycle struct {
Name string // e.g., "users", "posts", "idx_email"
Type ObjectType // TABLE, INDEX, CONSTRAINT, FUNCTION, etc.
CreatedIn *Migration // Which migration created it
ModifiedIn []*Migration // Which migrations modified it
DroppedIn *Migration // Which migration dropped it (if any)
DependsOn []string // Other objects this depends on
}Step 2: Build Dependency Graph
EXAMPLE MIGRATIONS
-- 001_users.sql
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT
);
-- 002_posts.sql
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
author_id INT
);
-- 003_foreign_key.sql
ALTER TABLE posts
ADD CONSTRAINT fk_author
FOREIGN KEY (author_id)
REFERENCES users(id);
-- 004_email_index.sql
CREATE INDEX idx_email
ON users(email);DEPENDENCY GRAPH
users (table)
├─→ users_id_seq (sequence)
├─→ users.id (column)
├─→ users.email (column)
│ └─→ idx_email (index)
└─→ users_pkey (constraint)
posts (table)
├─→ posts_id_seq (sequence)
├─→ posts.id (column)
├─→ posts.author_id (column)
│ └─→ fk_author (constraint)
│ └─→ depends on users.id
└─→ posts_pkey (constraint)Step 3: Topological Sorting
capysquash uses topological sort to determine safe consolidation order:
// Simplified algorithm
func TopologicalSort(objects []Object) ([]Object, error) {
sorted := []Object{}
visited := make(map[string]bool)
for _, obj := range objects {
if err := visit(obj, &sorted, visited); err != nil {
return nil, err // Circular dependency detected
}
}
return sorted, nil
}
func visit(obj Object, sorted *[]Object, visited map[string]bool) error {
if visited[obj.Name] {
return nil // Already processed
}
// Visit dependencies first
for _, dep := range obj.Dependencies {
if err := visit(dep, sorted, visited); err != nil {
return err
}
}
visited[obj.Name] = true
*sorted = append(*sorted, obj)
return nil
}DEPENDENCY TYPES
1. Direct Table Dependencies
Foreign Keys:
-- posts depends on users
ALTER TABLE posts ADD CONSTRAINT fk_author
FOREIGN KEY (author_id) REFERENCES users(id);Detection:
- Parse
REFERENCESclause - Extract referenced table and column
- Add
posts.fk_author → users.iddependency
2. Column Dependencies
Columns on Tables:
-- users.email depends on users table existing
ALTER TABLE users ADD COLUMN email TEXT;Detection:
- Table must exist before adding columns
- Add
users.email → usersdependency
3. Index Dependencies
Indexes on Columns:
-- idx_email depends on users.email existing
CREATE INDEX idx_email ON users(email);Detection:
- Index requires target column(s) to exist
- Add
idx_email → users.emaildependency
4. Constraint Dependencies
Check Constraints:
-- Constraint depends on column existing
ALTER TABLE users ADD CONSTRAINT check_email
CHECK (email ~* '^[A-Za-z0-9._%+-]+@');Detection:
- Parse constraint expression
- Extract referenced columns
- Add
check_email → users.emaildependency
5. Function Dependencies
Functions Using Tables:
CREATE FUNCTION count_user_posts(user_id INT)
RETURNS INT AS $$
SELECT COUNT(*) FROM posts WHERE author_id = user_id;
$$ LANGUAGE sql;Detection:
- Parse function body for table references
- Add
count_user_posts → postsdependency - Note: This is best-effort (PL/pgSQL is complex)
6. Trigger Dependencies
Triggers on Tables:
CREATE TRIGGER update_timestamp
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_modified_column();Detection:
- Trigger depends on table existing
- Trigger depends on function existing
- Add
update_timestamp → usersandupdate_timestamp → update_modified_columndependencies
7. View Dependencies
Views Referencing Tables:
CREATE VIEW active_users AS
SELECT * FROM users WHERE active = true;Detection:
- Parse view query for table references
- Add
active_users → usersdependency
8. Type Dependencies
Composite Types:
CREATE TYPE address AS (
street TEXT,
city TEXT
);
ALTER TABLE users ADD COLUMN addr address;Detection:
- Column type references custom type
- Add
users.addr → addressdependency
CONSOLIDATION STRATEGIES
Safe Consolidation (Conservative)
RULES:
- ☑ Merge operations on same object if no intermediate dependencies
- ☑ Preserve all dependency order
- ☑ Never reorder across dependency boundaries
- ☒ Don't merge if any risk detected
Example:
-- Original:
CREATE TABLE users (id INT);
ALTER TABLE users ADD email TEXT;
ALTER TABLE users ADD name TEXT;
-- Consolidated:
CREATE TABLE users (id INT, email TEXT, name TEXT);
-- Safe: All modifications on same table, no external dependenciesAggressive Consolidation
RULES:
- ☑ Reorder operations if dependency graph allows
- ☑ Merge even with intermediate steps
- ☑ Eliminate redundant operations
- ⚠️ Higher risk of subtle behavior changes
Example:
-- Original:
CREATE INDEX idx_a ON users(email);
CREATE INDEX idx_b ON posts(title);
CREATE INDEX idx_c ON users(name);
-- Consolidated (reordered):
CREATE INDEX idx_a ON users(email);
CREATE INDEX idx_c ON users(name);
CREATE INDEX idx_b ON posts(title);
-- Safe: Independent indexes can be reorderedCIRCULAR DEPENDENCY DETECTION
capysquash detects circular dependencies and prevents consolidation:
EXAMPLE: CIRCULAR REFERENCE
-- This creates a circular dependency:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
primary_address_id INT
);
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id)
);
ALTER TABLE users
ADD CONSTRAINT fk_primary_address
FOREIGN KEY (primary_address_id)
REFERENCES addresses(id);Dependency Graph:
users → addresses (via addresses.user_id FK)
addresses → users (via users.primary_address_id FK)capysquash Action:
- ☒ Cannot consolidate into single migration
- ☑ Keeps in separate migrations to allow deferred constraints
- ⚠️ Warns user about circular dependency
ADVANCED SCENARIOS
Scenario 1: Rename Tracking
-- Migration 001
CREATE TABLE old_users (id INT);
-- Migration 002
INSERT INTO old_users VALUES (1), (2), (3);
-- Migration 003
ALTER TABLE old_users RENAME TO users;
-- Migration 004
ALTER TABLE users ADD COLUMN email TEXT;Dependency Tracking:
usersis a renamed version ofold_users- Migration 004 operates on
users, but data exists inold_users - capysquash tracks the rename chain
Consolidation:
-- Consolidated (cannot merge due to data migration):
CREATE TABLE old_users (id INT);
INSERT INTO old_users VALUES (1), (2), (3);
ALTER TABLE old_users RENAME TO users;
ALTER TABLE users ADD COLUMN email TEXT;
-- Alternative (if no data migration):
CREATE TABLE users (id INT, email TEXT);Scenario 2: Drop and Recreate
-- Migration 001
CREATE TABLE users (id INT);
-- Migration 002
CREATE INDEX idx_id ON users(id);
-- Migration 003
DROP INDEX idx_id;
-- Migration 004
CREATE INDEX idx_id ON users(id);Dependency Tracking:
- First
idx_idcreated in migration 002 idx_iddropped in migration 003- New
idx_idcreated in migration 004 (different object, same name)
Consolidation:
-- Aggressive mode: Eliminate the drop/recreate cycle
CREATE TABLE users (id INT);
CREATE INDEX idx_id ON users(id);
-- Conservative mode: Preserve history
CREATE TABLE users (id INT);
CREATE INDEX idx_id ON users(id);
DROP INDEX idx_id;
CREATE INDEX idx_id ON users(id);Scenario 3: Cross-Schema Dependencies
-- Migration 001
CREATE SCHEMA auth;
CREATE TABLE auth.users (id INT);
-- Migration 002
CREATE SCHEMA app;
CREATE TABLE app.posts (
id INT,
author_id INT REFERENCES auth.users(id)
);Dependency Tracking:
app.postsdepends onauth.usersauth.usersdepends onauthschemaapp.postsdepends onappschema
Consolidation:
-- Must preserve schema creation order and cross-schema references
CREATE SCHEMA auth;
CREATE TABLE auth.users (id INT);
CREATE SCHEMA app;
CREATE TABLE app.posts (
id INT,
author_id INT REFERENCES auth.users(id)
);PERFORMANCE OPTIMIZATIONS
Parallel Dependency Resolution
For large migration sets (500+ files), capysquash uses parallel processing:
// Parse migrations in parallel
results := make(chan ParseResult, len(files))
for _, file := range files {
go func(f string) {
ast, err := parser.Parse(f)
results <- ParseResult{File: f, AST: ast, Err: err}
}(file)
}
// Collect results and build dependency graph
graph := NewDependencyGraph()
for range files {
result := <-results
if result.Err == nil {
graph.AddStatements(result.AST)
}
}Incremental Graph Updates
When analyzing individual migrations, capysquash updates the graph incrementally:
// Instead of rebuilding entire graph:
graph.RemoveMigration("005_old.sql")
graph.AddMigration("005_new.sql")
// Recomputes only affected subgraph
graph.RecalculateDependencies("users")DEBUGGING DEPENDENCIES
Visualize Dependency Graph
capysquash analyze migrations/ --show-deps --format=dot > deps.dot
dot -Tpng deps.dot -o deps.pngOutput:
- Graphviz DOT format
- Render with any graph visualization tool
- See all dependencies visually
Explain Dependency Chain
capysquash analyze migrations/ --explain-dep posts.fk_authorOutput:
posts.fk_author (constraint)
└─→ Created in: 003_foreign_key.sql
└─→ Depends on:
├─→ posts.author_id (column)
│ └─→ Created in: 002_posts.sql
│ └─→ Depends on:
│ └─→ posts (table)
│ └─→ Created in: 002_posts.sql
└─→ users.id (column)
└─→ Created in: 001_users.sql
└─→ Depends on:
└─→ users (table)
└─→ Created in: 001_users.sqlNEXT STEPS
- SQL Parsing - How capysquash parses SQL accurately
- Docker Validation - Validating consolidated migrations
- Safety Levels - Choosing consolidation aggressiveness
Questions about dependencies? Check GitHub Discussions
How is this guide?