Autogenerating and Reviewing Migration Scripts with Alembic
Alembic's revision --autogenerate command compares your SQLAlchemy MetaData object against the live database schema and emits a migration script containing the op-directives needed to bring them in sync — part of the Alembic async migrations and schema evolution workflow every production team needs to master. This guide covers the full cycle: wiring target_metadata, tuning the diff engine, editing generated revisions, writing data migrations, and managing branch graphs.
Concept & Execution Model
Autogenerate sits at the intersection of two graphs: the model graph (your Python Table and DeclarativeBase definitions) and the reflected graph (the live schema Alembic reads from the database at migration time). The diff engine walks both graphs, column by column and constraint by constraint, and translates differences into op.add_column, op.drop_index, op.alter_column, and similar directives.
This works only if env.py is wired correctly. The critical line is:
# alembic/env.py
from myapp.models import Base # imports all mapped classes via the registry
target_metadata = Base.metadata
Without target_metadata, autogenerate has no model graph to compare against and will emit an empty migration every time. Import the metadata after all modules that define tables have been imported — a common mistake is importing Base from a module that itself does not import submodules where Order, Product, or Tenant models live. A robust pattern is a dedicated model loader:
# alembic/env.py — safe model import pattern
from myapp.models import Base
# Explicitly import every subpackage so that their Table registrations execute
import myapp.models.users # noqa: F401
import myapp.models.orders # noqa: F401
import myapp.models.products # noqa: F401
import myapp.models.invoices # noqa: F401
import myapp.models.tenants # noqa: F401
target_metadata = Base.metadata
The # noqa: F401 comments suppress linter warnings for unused imports — these imports are intentionally side-effect-only.
Running Autogenerate
# Generate a named revision against the configured database URL
alembic revision --autogenerate -m "add_products_table_and_orders_fk"
# Override the URL without editing alembic.ini
alembic revision --autogenerate -m "add_invoices" \
--url postgresql+asyncpg://app:secret@localhost/appdb
Alembic connects to the database URL configured in alembic.ini, reflects the schema, diffs against target_metadata, and writes a new file under alembic/versions/. The filename starts with a random hex token — that token is the revision identifier used in depends_on and down_revision chains. Never rename or re-token a revision file after committing it; doing so severs the version chain for anyone who has already applied it.
What Autogenerate Can and Cannot Detect
Understanding the detection boundaries prevents nasty surprises in production.
| Detected automatically | Requires flag or manual step |
|---|---|
| New and dropped tables | Stored procedures, functions, triggers, views |
| New, dropped, and modified columns | Postgres SEQUENCE objects |
| Primary key changes | Partial indexes (WHERE clause) |
| Foreign key additions and drops | Column-level comments |
| Index additions and drops | Schema creation / drop |
| Unique constraint changes | Enum type bodies (CREATE TYPE) |
Server default changes (compare_server_default=True) | Row-level data changes |
Column type changes (compare_type=True) | Any dialect-specific extension objects |
This breakdown is stable across SQLAlchemy 2.0 and Alembic 1.13+. For partial index detection, you can write a custom render_item hook that emits a raw op.execute("CREATE INDEX ... WHERE ...") instead of using op.create_index. That hook must also handle the corresponding downgrade path manually.
Query Construction & Async Execution Patterns
Because autogenerate needs to reflect the live schema, it requires a database connection. In async projects wired through configuring Alembic with async SQLAlchemy engines, env.py must run the reflection inside asyncio.run() using run_sync:
# alembic/env.py — complete async env.py for autogenerate
import asyncio
from logging.config import fileConfig
from alembic import context
from sqlalchemy.ext.asyncio import async_engine_from_config
# Side-effect imports: register all ORM models with Base.metadata
from myapp.models import Base # noqa: F401
import myapp.models.users # noqa: F401
import myapp.models.orders # noqa: F401
import myapp.models.products # noqa: F401
import myapp.models.invoices # noqa: F401
config = context.config
fileConfig(config.config_file_name)
target_metadata = Base.metadata
def do_run_migrations(connection):
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True,
compare_server_default=True,
include_schemas=True,
)
with context.begin_transaction():
context.run_migrations()
async def run_async_migrations() -> None:
connectable = async_engine_from_config(
config.get_section(config.config_ini_section),
prefix="sqlalchemy.",
)
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
await connectable.dispose()
def run_migrations_online() -> None:
asyncio.run(run_async_migrations())
run_migrations_online()
The run_sync call hands a synchronous DBAPI connection to Alembic's reflection machinery, which does not know about async. This is the canonical pattern — attempting to call context.run_migrations() directly inside an async def without run_sync raises sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called.
For offline mode (generating SQL without a live database connection), the async engine is unavailable. Autogenerate requires online mode; offline mode is only suitable for alembic upgrade --sql to generate a SQL script from an already-generated revision.
State Management & Session Boundaries
Autogenerate reads schema state at the moment the connection is made. If your application uses multiple MetaData objects — for example, one per tenant schema or one for a read model — you must either merge them or use include_object / include_schemas filters (covered fully in the excluding tables and schemas guide) to keep Alembic from generating spurious drop directives for tables it does not own.
To target multiple MetaData objects in a single autogenerate run, pass them as a list:
# alembic/env.py — multiple MetaData objects
from myapp.app_models import AppBase
from myapp.admin_models import AdminBase
target_metadata = [AppBase.metadata, AdminBase.metadata]
Alembic 1.7+ supports this. Each MetaData object must use the same naming_convention or constraint detection will produce unpredictable results — the diff engine compares constraint names, so inconsistent conventions produce false positives.
Transaction Boundaries for Data Migrations
Alembic wraps each migration in a transaction by default (transaction_per_migration=True). If you run DDL and DML together — creating a column and then back-filling values — both must succeed atomically or the schema ends up in a partially migrated state. PostgreSQL supports transactional DDL, so this works cleanly. MySQL does not; on MySQL separate the DDL and DML into distinct revisions.
The op.get_bind() function returns the DBAPI connection bound to the current migration context. Use it inside upgrade() to execute raw SQL or SQLAlchemy Core expressions for data work. Avoid importing your ORM models directly inside migration files — model definitions change over time, but migrations are frozen at a specific schema state.
Advanced Autogenerate Patterns
compare_type and compare_server_default
By default, Alembic does not compare column types or server defaults, because type comparison across different database dialects is lossy. Enable both flags when you need fine-grained drift detection:
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True, # detect VARCHAR(100) → VARCHAR(255) changes
compare_server_default=True, # detect DEFAULT now() → DEFAULT CURRENT_TIMESTAMP
)
compare_type=True triggers dialect.compare_type(inspector_column, metadata_column) — a method some dialects implement incompletely. For custom TypeDecorator subclasses, override __repr__ so the string comparison produces stable output:
# myapp/types.py
import sqlalchemy as sa
class TenantId(sa.TypeDecorator):
"""UUID stored as a VARCHAR(36) with tenant isolation semantics."""
impl = sa.String(36)
cache_ok = True
def __repr__(self) -> str:
# Stable repr prevents autogenerate from treating every column as changed
return "TenantId()"
def process_bind_param(self, value, dialect):
return str(value) if value is not None else None
def process_result_value(self, value, dialect):
return value
Verify idempotency by running autogenerate twice in a row — the second run must emit no changes. If it does, the type repr is not stable.
compare_server_default=True is similarly fragile for expressions. PostgreSQL renders nextval('seq'::regclass) while your model might declare server_default=text("nextval('seq')"). Use a custom process_revision_directives hook to normalize both sides before comparison, or simply exclude known-unstable columns from comparison using include_object.
MetaData Naming Conventions
Without a naming_convention, Alembic cannot reliably detect renamed constraints because the database assigns auto-generated names that differ from what your model implies. Set a naming_convention on your MetaData object and Alembic will construct predictable names it can compare:
# myapp/models.py
from sqlalchemy import MetaData
from sqlalchemy.orm import DeclarativeBase
NAMING_CONVENTION = {
"ix": "ix_%(column_0_label)s",
"uq": "uq_%(table_name)s_%(column_0_name)s",
"ck": "ck_%(table_name)s_%(constraint_name)s",
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"pk": "pk_%(table_name)s",
}
class Base(DeclarativeBase):
metadata = MetaData(naming_convention=NAMING_CONVENTION)
With this in place, running alembic revision --autogenerate after adding a UniqueConstraint will emit:
op.create_unique_constraint(
"uq_orders_invoice_number",
"orders",
["invoice_number"],
)
...rather than the database-assigned orders_invoice_number_key. Deterministic names mean autogenerate can compare the model's expected constraint name against the reflected name and detect true renames versus spurious differences.
Apply the naming convention before your first migration. Retrofitting it to an existing database requires a migration that drops and recreates every anonymous constraint under its new canonical name — a non-trivial operation covered in the zero-downtime schema migration strategies guide.
Editing Generated Revisions
Autogenerate is a starting point, not the final word. Every generated migration should be reviewed and often edited before committing. Open the revision file in alembic/versions/ after generation and check it against this list:
- Verify operation order.
op.create_tablemust precedeop.create_foreign_keytargeting it. Autogenerate generally gets this right, but multi-table cycles require manual reordering. - Remove spurious
op.drop_tabledirectives. If a third-party table appears in the DB but not in your metadata, autogenerate will propose dropping it. Always check these before committing. - Check
nullablechanges. If a column wasnullable=Truein the DB and you declarenullable=Falsein the model, autogenerate emitsop.alter_column(..., nullable=False)— which fails if the column has existing NULL values. Add a data-cleanup step before thealter_column. - Review index naming. Without a naming convention, index names may differ between the model and the database even when the columns are identical.
Data Migrations: op.bulk_insert and Chunked op.execute
For seeding reference data, op.bulk_insert is cleaner and dialect-portable compared to raw op.execute:
# Seeding reference data in a migration
from alembic import op
import sqlalchemy as sa
invoice_statuses = sa.table(
"invoice_statuses",
sa.column("code", sa.String),
sa.column("label", sa.String),
sa.column("sort_order", sa.Integer),
)
def upgrade() -> None:
op.create_table(
"invoice_statuses",
sa.Column("code", sa.String(8), primary_key=True),
sa.Column("label", sa.String(64), nullable=False),
sa.Column("sort_order", sa.Integer, nullable=False, server_default="0"),
)
op.bulk_insert(
invoice_statuses,
[
{"code": "DRAFT", "label": "Draft", "sort_order": 1},
{"code": "SENT", "label": "Sent", "sort_order": 2},
{"code": "PAID", "label": "Paid", "sort_order": 3},
{"code": "VOID", "label": "Voided", "sort_order": 4},
{"code": "REFUND", "label": "Refunded", "sort_order": 5},
],
)
def downgrade() -> None:
op.drop_table("invoice_statuses")
op.bulk_insert uses the bound connection directly and does not go through the ORM session, so there is no risk of triggering validators, events, or lazy loads during migration.
For back-filling large tables, batch chunked updates to avoid lock escalation:
# upgrade() — chunked back-fill migration for orders.status_code
from alembic import op
import sqlalchemy as sa
orders_table = sa.table(
"orders",
sa.column("id", sa.Integer),
sa.column("status", sa.String),
sa.column("status_code", sa.SmallInteger),
)
def upgrade() -> None:
# Step 1: add column as nullable
op.add_column(
"orders",
sa.Column("status_code", sa.SmallInteger, nullable=True),
)
# Step 2: back-fill in 10 000-row chunks
conn = op.get_bind()
while True:
result = conn.execute(
sa.update(orders_table)
.where(orders_table.c.status_code.is_(None))
.where(orders_table.c.status.isnot(None))
.values(
status_code=sa.case(
(orders_table.c.status == "pending", 0),
(orders_table.c.status == "completed", 1),
(orders_table.c.status == "cancelled", 2),
else_=99,
)
)
.returning(orders_table.c.id)
.limit(10_000)
)
if result.rowcount == 0:
break
# Step 3: enforce NOT NULL now that all rows are populated
op.alter_column("orders", "status_code", nullable=False)
def downgrade() -> None:
op.drop_column("orders", "status_code")
The RETURNING ... LIMIT 10_000 pattern on PostgreSQL is efficient — it updates rows, returns their IDs, and the loop exits when no more NULL rows remain. On MySQL (which lacks RETURNING), replace with a SELECT ... LIMIT 10_000 followed by an UPDATE ... WHERE id IN (...).
Branch and Merge Migrations
Alembic's version graph is a DAG, not a linear chain. When two developers each run alembic revision --autogenerate from the same head, they create two branches with the same down_revision:
base
└─ 001_initial
├─ 002a_add_products (developer A, down_revision=001)
└─ 002b_add_invoices (developer B, down_revision=001)
Running alembic upgrade head against a branched graph raises Multiple head revisions are present for given argument 'head'. Resolve it with:
# Inspect the branch points
alembic heads
# Merge the two heads into a single revision
alembic merge -m "merge_products_and_invoices_branches" 002a 002b
This generates a merge revision with down_revision = ("002a...", "002b..."). The merge revision's upgrade() and downgrade() should be no-ops unless there are real conflicts — for example, both branches adding the same column to the same table. In that case, remove the duplicate op.add_column from the merge revision and add a try/except guard or an existence check.
Avoid branches entirely in single-developer or CI-gated pipelines by always generating revisions from the confirmed head:
# Verify before generating
alembic heads
# Generate from the single confirmed head
alembic revision --autogenerate -m "add_tenant_isolation_column"
# CI gate: fail if there are multiple heads after generation
python -c "
import subprocess, sys
result = subprocess.run(['alembic', 'heads'], capture_output=True, text=True)
heads = [l for l in result.stdout.splitlines() if l.strip() and '(head)' in l]
if len(heads) > 1:
print(f'ERROR: {len(heads)} heads detected — resolve before merging', file=sys.stderr)
sys.exit(1)
"
Hybrid Architectures & Migration Strategies
When mixing SQLAlchemy Core and ORM within the same application, both styles contribute tables to Base.metadata as long as you define Table objects using the same MetaData instance. Core tables defined with sa.Table("products", Base.metadata, ...) appear in autogenerate output identically to ORM-mapped classes.
For the 1.4 → 2.0 migration path, the most important change in env.py is replacing engine.execute() calls with explicit connection contexts — the legacy execution model is removed in 2.0. Also replace connection.execute(text(...)) with connection.execute(sa.text(...)) and ensure every text() clause uses bound parameters (:param_name style) rather than f-string interpolation.
For multi-tenant architectures with one Postgres schema per tenant, use version_table_schema to pin the alembic_version table to a shared schema rather than letting it land in each tenant schema:
context.configure(
connection=connection,
target_metadata=target_metadata,
include_schemas=True,
version_table_schema="public", # write alembic_version to the public schema
version_table="alembic_version",
)
Without version_table_schema, Alembic creates an alembic_version table in every schema it encounters, which triggers spurious drop directives on subsequent autogenerate runs against other schemas.
For microservice architectures where each service owns a subset of tables in a shared database, the safest pattern is to define a per-service MetaData and use include_object to scope autogenerate to only that service's tables. The excluding tables and schemas guide provides production-ready include_object and include_name implementations for this pattern.
Production Pitfalls & Anti-Patterns
- Forgetting to import submodules before setting
target_metadata. Ifmyapp/models/__init__.pydoes not importorders.py,users.py, etc., those tables are absent fromBase.metadata. Autogenerate then emitsop.drop_tablefor every table it sees in the database but not in the metadata. Fix: explicitly import all model modules inenv.py. Verify withpython -c "from myapp.models import Base; print(sorted(Base.metadata.tables.keys()))". - Running autogenerate against a hand-altered dev database. If a developer ran
ALTER TABLEmanually without going through Alembic, the reflected schema diverges from the migration history. The generated revision is correct against the live database but breaks when applied to a freshly migrated schema. Always autogenerate against a database brought up entirely viaalembic upgrade headfrom a clean state. - Checking in unreviewed autogenerated scripts. Autogenerate frequently emits
op.drop_tablefor tables belonging to PostGIS, Celery, or other applications sharing the same database. These drop directives will silently delete production data. Every generated script must be reviewed withgit diffand manually executed against a staging database before committing. - Using
compare_type=TruewithTypeDecoratorsubclasses without stable__repr__. If your custom type does not have a deterministic string representation, type comparison treats every column using it as changed on every autogenerate run. Adddef __repr__(self): return "MyType()"and run autogenerate twice — the second run must produce no changes. - Raw SQL strings in
op.executefor portable DDL. A migration usingop.execute("ALTER TABLE orders ADD COLUMN amount NUMERIC DEFAULT 0")is not dialect-portable and breaks on MySQL. Useop.add_columnwithsa.Columnwherever Alembic has an abstraction, and reserveop.executefor DDL that has no Alembic equivalent (triggers, stored procedures, custom Postgres types). - Generating a revision while
alembic_versionis behind the physical schema. The diff engine reflects the current physical schema, not the schema implied byalembic_version. If someone ran a manualCREATE INDEXorADD COLUMNin production, the physical schema is ahead. The next autogenerate run includes those changes in the migration, causingDuplicateObjecterrors when applied to databases that were migrated properly. Runalembic currentandalembic historybefore generating, and audit any discrepancies.
Frequently Asked Questions
Why does autogenerate emit an empty migration even though I added a model column?
The model module was not imported before target_metadata was assigned. Add import myapp.models.orders to env.py before the target_metadata = Base.metadata line. Verify with python -c "from myapp.models import Base; print(sorted(Base.metadata.tables.keys()))" that all expected tables appear. If the table is present but the column is missing, check that the column is defined at the class level and not only at the instance level.
Can I use autogenerate with multiple MetaData objects?
Yes. Pass a list: target_metadata = [AppBase.metadata, AdminBase.metadata]. Alembic 1.7+ supports this. Each MetaData must use the same naming_convention, otherwise constraint detection will be unreliable. The diff engine processes each metadata object independently and merges the resulting op lists.
Does autogenerate detect changes inside server_default expressions?
Only when compare_server_default=True is set. Even then, dialect-specific rendering differences cause false positives. For PostgreSQL, text("now()") in the model and CURRENT_TIMESTAMP in the reflected schema will not match even though they are semantically equivalent. Write a process_revision_directives hook to normalize both sides, or use include_object to exclude known-unstable columns from server-default comparison.
How do I make autogenerate ignore the spatial_ref_sys table from PostGIS?
Use an include_object hook in env.py. Pass it to context.configure alongside target_metadata. The excluding tables and schemas guide shows a complete implementation for PostGIS, Celery, and other third-party tables.
What happens if two branches both add the same column to the same table?
Alembic's merge revision includes both op.add_column calls. At upgrade time, whichever branch applies second attempts to add a column that already exists, causing DuplicateColumnError (PostgreSQL) or OperationalError: duplicate column name (MySQL/SQLite). Resolve by manually editing the merge revision to remove the duplicate operation and add a guard comment explaining the resolution.
Related
- Alembic async migrations and schema evolution — parent pillar covering the full migration lifecycle from project setup to production deployment.
- Configuring Alembic with async SQLAlchemy engines — prerequisite: wiring
env.pyforasyncpgbefore autogenerate can connect to the database. - Excluding tables and schemas from Alembic autogenerate —
include_objectandinclude_namehooks to suppress spurious drop directives for third-party tables. - Zero-downtime schema migration strategies — sequencing migrations on live systems, applying naming conventions retroactively, and safe column alterations.