Autogenerating and Reviewing Migration Scripts with Alembic

Alembic's revision --autogenerate command compares your SQLAlchemy MetaData object against the live database schema and emits a migration script containing the op-directives needed to bring them in sync — part of the Alembic async migrations and schema evolution workflow every production team needs to master. This guide covers the full cycle: wiring target_metadata, tuning the diff engine, editing generated revisions, writing data migrations, and managing branch graphs.

Concept & Execution Model

Autogenerate sits at the intersection of two graphs: the model graph (your Python Table and DeclarativeBase definitions) and the reflected graph (the live schema Alembic reads from the database at migration time). The diff engine walks both graphs, column by column and constraint by constraint, and translates differences into op.add_column, op.drop_index, op.alter_column, and similar directives.

This works only if env.py is wired correctly. The critical line is:

# alembic/env.py
from myapp.models import Base  # imports all mapped classes via the registry

target_metadata = Base.metadata

Without target_metadata, autogenerate has no model graph to compare against and will emit an empty migration every time. Import the metadata after all modules that define tables have been imported — a common mistake is importing Base from a module that itself does not import submodules where Order, Product, or Tenant models live. A robust pattern is a dedicated model loader:

# alembic/env.py — safe model import pattern
from myapp.models import Base
# Explicitly import every subpackage so that their Table registrations execute
import myapp.models.users      # noqa: F401
import myapp.models.orders     # noqa: F401
import myapp.models.products   # noqa: F401
import myapp.models.invoices   # noqa: F401
import myapp.models.tenants    # noqa: F401

target_metadata = Base.metadata

The # noqa: F401 comments suppress linter warnings for unused imports — these imports are intentionally side-effect-only.

Alembic autogenerate diff pipeline Shows model metadata and reflected DB schema feeding into the diff engine, which outputs op directives written into a migration script file. SQLAlchemy MetaData Table: users Table: orders Table: products Table: invoices ix_user_email FK: orders→users target_metadata Base.metadata Live Database Schema Table: users Table: orders products absent ✗ invoices absent ✗ ix_user_email FK missing ✗ inspector.reflect dialect introspect Diff Engine compare_type=True compare_server_default naming_convention include_object hook include_schemas Migration Script op.create_table("products", ...) op.add_column / op.create_foreign_key model graph reflected graph

Running Autogenerate

# Generate a named revision against the configured database URL
alembic revision --autogenerate -m "add_products_table_and_orders_fk"

# Override the URL without editing alembic.ini
alembic revision --autogenerate -m "add_invoices" \
    --url postgresql+asyncpg://app:secret@localhost/appdb

Alembic connects to the database URL configured in alembic.ini, reflects the schema, diffs against target_metadata, and writes a new file under alembic/versions/. The filename starts with a random hex token — that token is the revision identifier used in depends_on and down_revision chains. Never rename or re-token a revision file after committing it; doing so severs the version chain for anyone who has already applied it.

What Autogenerate Can and Cannot Detect

Understanding the detection boundaries prevents nasty surprises in production.

Detected automaticallyRequires flag or manual step
New and dropped tablesStored procedures, functions, triggers, views
New, dropped, and modified columnsPostgres SEQUENCE objects
Primary key changesPartial indexes (WHERE clause)
Foreign key additions and dropsColumn-level comments
Index additions and dropsSchema creation / drop
Unique constraint changesEnum type bodies (CREATE TYPE)
Server default changes (compare_server_default=True)Row-level data changes
Column type changes (compare_type=True)Any dialect-specific extension objects

This breakdown is stable across SQLAlchemy 2.0 and Alembic 1.13+. For partial index detection, you can write a custom render_item hook that emits a raw op.execute("CREATE INDEX ... WHERE ...") instead of using op.create_index. That hook must also handle the corresponding downgrade path manually.

Query Construction & Async Execution Patterns

Because autogenerate needs to reflect the live schema, it requires a database connection. In async projects wired through configuring Alembic with async SQLAlchemy engines, env.py must run the reflection inside asyncio.run() using run_sync:

# alembic/env.py — complete async env.py for autogenerate
import asyncio
from logging.config import fileConfig

from alembic import context
from sqlalchemy.ext.asyncio import async_engine_from_config

# Side-effect imports: register all ORM models with Base.metadata
from myapp.models import Base          # noqa: F401
import myapp.models.users              # noqa: F401
import myapp.models.orders             # noqa: F401
import myapp.models.products           # noqa: F401
import myapp.models.invoices           # noqa: F401

config = context.config
fileConfig(config.config_file_name)
target_metadata = Base.metadata


def do_run_migrations(connection):
    context.configure(
        connection=connection,
        target_metadata=target_metadata,
        compare_type=True,
        compare_server_default=True,
        include_schemas=True,
    )
    with context.begin_transaction():
        context.run_migrations()


async def run_async_migrations() -> None:
    connectable = async_engine_from_config(
        config.get_section(config.config_ini_section),
        prefix="sqlalchemy.",
    )
    async with connectable.connect() as connection:
        await connection.run_sync(do_run_migrations)
    await connectable.dispose()


def run_migrations_online() -> None:
    asyncio.run(run_async_migrations())


run_migrations_online()

The run_sync call hands a synchronous DBAPI connection to Alembic's reflection machinery, which does not know about async. This is the canonical pattern — attempting to call context.run_migrations() directly inside an async def without run_sync raises sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called.

For offline mode (generating SQL without a live database connection), the async engine is unavailable. Autogenerate requires online mode; offline mode is only suitable for alembic upgrade --sql to generate a SQL script from an already-generated revision.

State Management & Session Boundaries

Autogenerate reads schema state at the moment the connection is made. If your application uses multiple MetaData objects — for example, one per tenant schema or one for a read model — you must either merge them or use include_object / include_schemas filters (covered fully in the excluding tables and schemas guide) to keep Alembic from generating spurious drop directives for tables it does not own.

To target multiple MetaData objects in a single autogenerate run, pass them as a list:

# alembic/env.py — multiple MetaData objects
from myapp.app_models import AppBase
from myapp.admin_models import AdminBase

target_metadata = [AppBase.metadata, AdminBase.metadata]

Alembic 1.7+ supports this. Each MetaData object must use the same naming_convention or constraint detection will produce unpredictable results — the diff engine compares constraint names, so inconsistent conventions produce false positives.

Transaction Boundaries for Data Migrations

Alembic wraps each migration in a transaction by default (transaction_per_migration=True). If you run DDL and DML together — creating a column and then back-filling values — both must succeed atomically or the schema ends up in a partially migrated state. PostgreSQL supports transactional DDL, so this works cleanly. MySQL does not; on MySQL separate the DDL and DML into distinct revisions.

The op.get_bind() function returns the DBAPI connection bound to the current migration context. Use it inside upgrade() to execute raw SQL or SQLAlchemy Core expressions for data work. Avoid importing your ORM models directly inside migration files — model definitions change over time, but migrations are frozen at a specific schema state.

Advanced Autogenerate Patterns

compare_type and compare_server_default

By default, Alembic does not compare column types or server defaults, because type comparison across different database dialects is lossy. Enable both flags when you need fine-grained drift detection:

context.configure(
    connection=connection,
    target_metadata=target_metadata,
    compare_type=True,            # detect VARCHAR(100) → VARCHAR(255) changes
    compare_server_default=True,  # detect DEFAULT now() → DEFAULT CURRENT_TIMESTAMP
)

compare_type=True triggers dialect.compare_type(inspector_column, metadata_column) — a method some dialects implement incompletely. For custom TypeDecorator subclasses, override __repr__ so the string comparison produces stable output:

# myapp/types.py
import sqlalchemy as sa


class TenantId(sa.TypeDecorator):
    """UUID stored as a VARCHAR(36) with tenant isolation semantics."""

    impl = sa.String(36)
    cache_ok = True

    def __repr__(self) -> str:
        # Stable repr prevents autogenerate from treating every column as changed
        return "TenantId()"

    def process_bind_param(self, value, dialect):
        return str(value) if value is not None else None

    def process_result_value(self, value, dialect):
        return value

Verify idempotency by running autogenerate twice in a row — the second run must emit no changes. If it does, the type repr is not stable.

compare_server_default=True is similarly fragile for expressions. PostgreSQL renders nextval('seq'::regclass) while your model might declare server_default=text("nextval('seq')"). Use a custom process_revision_directives hook to normalize both sides before comparison, or simply exclude known-unstable columns from comparison using include_object.

MetaData Naming Conventions

Without a naming_convention, Alembic cannot reliably detect renamed constraints because the database assigns auto-generated names that differ from what your model implies. Set a naming_convention on your MetaData object and Alembic will construct predictable names it can compare:

# myapp/models.py
from sqlalchemy import MetaData
from sqlalchemy.orm import DeclarativeBase

NAMING_CONVENTION = {
    "ix": "ix_%(column_0_label)s",
    "uq": "uq_%(table_name)s_%(column_0_name)s",
    "ck": "ck_%(table_name)s_%(constraint_name)s",
    "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
    "pk": "pk_%(table_name)s",
}


class Base(DeclarativeBase):
    metadata = MetaData(naming_convention=NAMING_CONVENTION)

With this in place, running alembic revision --autogenerate after adding a UniqueConstraint will emit:

op.create_unique_constraint(
    "uq_orders_invoice_number",
    "orders",
    ["invoice_number"],
)

...rather than the database-assigned orders_invoice_number_key. Deterministic names mean autogenerate can compare the model's expected constraint name against the reflected name and detect true renames versus spurious differences.

Apply the naming convention before your first migration. Retrofitting it to an existing database requires a migration that drops and recreates every anonymous constraint under its new canonical name — a non-trivial operation covered in the zero-downtime schema migration strategies guide.

Editing Generated Revisions

Autogenerate is a starting point, not the final word. Every generated migration should be reviewed and often edited before committing. Open the revision file in alembic/versions/ after generation and check it against this list:

  1. Verify operation order. op.create_table must precede op.create_foreign_key targeting it. Autogenerate generally gets this right, but multi-table cycles require manual reordering.
  2. Remove spurious op.drop_table directives. If a third-party table appears in the DB but not in your metadata, autogenerate will propose dropping it. Always check these before committing.
  3. Check nullable changes. If a column was nullable=True in the DB and you declare nullable=False in the model, autogenerate emits op.alter_column(..., nullable=False) — which fails if the column has existing NULL values. Add a data-cleanup step before the alter_column.
  4. Review index naming. Without a naming convention, index names may differ between the model and the database even when the columns are identical.

Data Migrations: op.bulk_insert and Chunked op.execute

For seeding reference data, op.bulk_insert is cleaner and dialect-portable compared to raw op.execute:

# Seeding reference data in a migration
from alembic import op
import sqlalchemy as sa

invoice_statuses = sa.table(
    "invoice_statuses",
    sa.column("code", sa.String),
    sa.column("label", sa.String),
    sa.column("sort_order", sa.Integer),
)


def upgrade() -> None:
    op.create_table(
        "invoice_statuses",
        sa.Column("code", sa.String(8), primary_key=True),
        sa.Column("label", sa.String(64), nullable=False),
        sa.Column("sort_order", sa.Integer, nullable=False, server_default="0"),
    )
    op.bulk_insert(
        invoice_statuses,
        [
            {"code": "DRAFT",  "label": "Draft",    "sort_order": 1},
            {"code": "SENT",   "label": "Sent",      "sort_order": 2},
            {"code": "PAID",   "label": "Paid",      "sort_order": 3},
            {"code": "VOID",   "label": "Voided",    "sort_order": 4},
            {"code": "REFUND", "label": "Refunded",  "sort_order": 5},
        ],
    )


def downgrade() -> None:
    op.drop_table("invoice_statuses")

op.bulk_insert uses the bound connection directly and does not go through the ORM session, so there is no risk of triggering validators, events, or lazy loads during migration.

For back-filling large tables, batch chunked updates to avoid lock escalation:

# upgrade() — chunked back-fill migration for orders.status_code
from alembic import op
import sqlalchemy as sa

orders_table = sa.table(
    "orders",
    sa.column("id", sa.Integer),
    sa.column("status", sa.String),
    sa.column("status_code", sa.SmallInteger),
)


def upgrade() -> None:
    # Step 1: add column as nullable
    op.add_column(
        "orders",
        sa.Column("status_code", sa.SmallInteger, nullable=True),
    )

    # Step 2: back-fill in 10 000-row chunks
    conn = op.get_bind()
    while True:
        result = conn.execute(
            sa.update(orders_table)
            .where(orders_table.c.status_code.is_(None))
            .where(orders_table.c.status.isnot(None))
            .values(
                status_code=sa.case(
                    (orders_table.c.status == "pending",   0),
                    (orders_table.c.status == "completed", 1),
                    (orders_table.c.status == "cancelled", 2),
                    else_=99,
                )
            )
            .returning(orders_table.c.id)
            .limit(10_000)
        )
        if result.rowcount == 0:
            break

    # Step 3: enforce NOT NULL now that all rows are populated
    op.alter_column("orders", "status_code", nullable=False)


def downgrade() -> None:
    op.drop_column("orders", "status_code")

The RETURNING ... LIMIT 10_000 pattern on PostgreSQL is efficient — it updates rows, returns their IDs, and the loop exits when no more NULL rows remain. On MySQL (which lacks RETURNING), replace with a SELECT ... LIMIT 10_000 followed by an UPDATE ... WHERE id IN (...).

Branch and Merge Migrations

Alembic's version graph is a DAG, not a linear chain. When two developers each run alembic revision --autogenerate from the same head, they create two branches with the same down_revision:

base
 └─ 001_initial
     ├─ 002a_add_products  (developer A, down_revision=001)
     └─ 002b_add_invoices  (developer B, down_revision=001)

Running alembic upgrade head against a branched graph raises Multiple head revisions are present for given argument 'head'. Resolve it with:

# Inspect the branch points
alembic heads

# Merge the two heads into a single revision
alembic merge -m "merge_products_and_invoices_branches" 002a 002b

This generates a merge revision with down_revision = ("002a...", "002b..."). The merge revision's upgrade() and downgrade() should be no-ops unless there are real conflicts — for example, both branches adding the same column to the same table. In that case, remove the duplicate op.add_column from the merge revision and add a try/except guard or an existence check.

Avoid branches entirely in single-developer or CI-gated pipelines by always generating revisions from the confirmed head:

# Verify before generating
alembic heads

# Generate from the single confirmed head
alembic revision --autogenerate -m "add_tenant_isolation_column"

# CI gate: fail if there are multiple heads after generation
python -c "
import subprocess, sys
result = subprocess.run(['alembic', 'heads'], capture_output=True, text=True)
heads = [l for l in result.stdout.splitlines() if l.strip() and '(head)' in l]
if len(heads) > 1:
    print(f'ERROR: {len(heads)} heads detected — resolve before merging', file=sys.stderr)
    sys.exit(1)
"

Hybrid Architectures & Migration Strategies

When mixing SQLAlchemy Core and ORM within the same application, both styles contribute tables to Base.metadata as long as you define Table objects using the same MetaData instance. Core tables defined with sa.Table("products", Base.metadata, ...) appear in autogenerate output identically to ORM-mapped classes.

For the 1.4 → 2.0 migration path, the most important change in env.py is replacing engine.execute() calls with explicit connection contexts — the legacy execution model is removed in 2.0. Also replace connection.execute(text(...)) with connection.execute(sa.text(...)) and ensure every text() clause uses bound parameters (:param_name style) rather than f-string interpolation.

For multi-tenant architectures with one Postgres schema per tenant, use version_table_schema to pin the alembic_version table to a shared schema rather than letting it land in each tenant schema:

context.configure(
    connection=connection,
    target_metadata=target_metadata,
    include_schemas=True,
    version_table_schema="public",   # write alembic_version to the public schema
    version_table="alembic_version",
)

Without version_table_schema, Alembic creates an alembic_version table in every schema it encounters, which triggers spurious drop directives on subsequent autogenerate runs against other schemas.

For microservice architectures where each service owns a subset of tables in a shared database, the safest pattern is to define a per-service MetaData and use include_object to scope autogenerate to only that service's tables. The excluding tables and schemas guide provides production-ready include_object and include_name implementations for this pattern.

Production Pitfalls & Anti-Patterns

  • Forgetting to import submodules before setting target_metadata. If myapp/models/__init__.py does not import orders.py, users.py, etc., those tables are absent from Base.metadata. Autogenerate then emits op.drop_table for every table it sees in the database but not in the metadata. Fix: explicitly import all model modules in env.py. Verify with python -c "from myapp.models import Base; print(sorted(Base.metadata.tables.keys()))".
  • Running autogenerate against a hand-altered dev database. If a developer ran ALTER TABLE manually without going through Alembic, the reflected schema diverges from the migration history. The generated revision is correct against the live database but breaks when applied to a freshly migrated schema. Always autogenerate against a database brought up entirely via alembic upgrade head from a clean state.
  • Checking in unreviewed autogenerated scripts. Autogenerate frequently emits op.drop_table for tables belonging to PostGIS, Celery, or other applications sharing the same database. These drop directives will silently delete production data. Every generated script must be reviewed with git diff and manually executed against a staging database before committing.
  • Using compare_type=True with TypeDecorator subclasses without stable __repr__. If your custom type does not have a deterministic string representation, type comparison treats every column using it as changed on every autogenerate run. Add def __repr__(self): return "MyType()" and run autogenerate twice — the second run must produce no changes.
  • Raw SQL strings in op.execute for portable DDL. A migration using op.execute("ALTER TABLE orders ADD COLUMN amount NUMERIC DEFAULT 0") is not dialect-portable and breaks on MySQL. Use op.add_column with sa.Column wherever Alembic has an abstraction, and reserve op.execute for DDL that has no Alembic equivalent (triggers, stored procedures, custom Postgres types).
  • Generating a revision while alembic_version is behind the physical schema. The diff engine reflects the current physical schema, not the schema implied by alembic_version. If someone ran a manual CREATE INDEX or ADD COLUMN in production, the physical schema is ahead. The next autogenerate run includes those changes in the migration, causing DuplicateObject errors when applied to databases that were migrated properly. Run alembic current and alembic history before generating, and audit any discrepancies.

Frequently Asked Questions

Why does autogenerate emit an empty migration even though I added a model column? The model module was not imported before target_metadata was assigned. Add import myapp.models.orders to env.py before the target_metadata = Base.metadata line. Verify with python -c "from myapp.models import Base; print(sorted(Base.metadata.tables.keys()))" that all expected tables appear. If the table is present but the column is missing, check that the column is defined at the class level and not only at the instance level.

Can I use autogenerate with multiple MetaData objects? Yes. Pass a list: target_metadata = [AppBase.metadata, AdminBase.metadata]. Alembic 1.7+ supports this. Each MetaData must use the same naming_convention, otherwise constraint detection will be unreliable. The diff engine processes each metadata object independently and merges the resulting op lists.

Does autogenerate detect changes inside server_default expressions? Only when compare_server_default=True is set. Even then, dialect-specific rendering differences cause false positives. For PostgreSQL, text("now()") in the model and CURRENT_TIMESTAMP in the reflected schema will not match even though they are semantically equivalent. Write a process_revision_directives hook to normalize both sides, or use include_object to exclude known-unstable columns from server-default comparison.

How do I make autogenerate ignore the spatial_ref_sys table from PostGIS? Use an include_object hook in env.py. Pass it to context.configure alongside target_metadata. The excluding tables and schemas guide shows a complete implementation for PostGIS, Celery, and other third-party tables.

What happens if two branches both add the same column to the same table? Alembic's merge revision includes both op.add_column calls. At upgrade time, whichever branch applies second attempts to add a column that already exists, causing DuplicateColumnError (PostgreSQL) or OperationalError: duplicate column name (MySQL/SQLite). Resolve by manually editing the merge revision to remove the duplicate operation and add a guard comment explaining the resolution.