Paginating Large Result Sets with Keyset Pagination in SQLAlchemy
Keyset pagination (also called seek pagination) replaces OFFSET with a WHERE (created_at, id) > (:last_ts, :last_id) clause so every page fetch is an O(log N) index seek instead of an O(N) sequential scan — this page belongs to the Streaming Large Result Sets with yield_per cluster.
Quick Answer
Before — OFFSET pagination that degrades at scale:
from __future__ import annotations
import asyncio
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
import datetime
class Base(DeclarativeBase):
pass
class Order(Base):
__tablename__ = "orders"
id: Mapped[int] = mapped_column(primary_key=True)
created_at: Mapped[datetime.datetime] = mapped_column(index=True)
customer_id: Mapped[int] = mapped_column()
total_cents: Mapped[int] = mapped_column()
engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100
async def fetch_orders_offset(page: int) -> list[Order]:
"""OFFSET grows O(N) — Postgres discards `page * PAGE_SIZE` rows on every call."""
async with AsyncSession(engine) as session:
result = await session.execute(
select(Order)
.order_by(Order.created_at, Order.id)
.limit(PAGE_SIZE)
.offset(page * PAGE_SIZE) # Full scan cost grows with each page
)
return list(result.scalars())
After — keyset pagination with constant cost per page:
from __future__ import annotations
import asyncio
import datetime
from typing import Optional
from sqlalchemy import select, tuple_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class Order(Base):
__tablename__ = "orders"
id: Mapped[int] = mapped_column(primary_key=True)
created_at: Mapped[datetime.datetime] = mapped_column()
customer_id: Mapped[int] = mapped_column()
total_cents: Mapped[int] = mapped_column()
engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100
async def fetch_orders_keyset(
last_created_at: Optional[datetime.datetime] = None,
last_id: Optional[int] = None,
) -> list[Order]:
"""Keyset seek: always O(log N) index seek regardless of page depth."""
async with AsyncSession(engine) as session:
stmt = select(Order).order_by(Order.created_at, Order.id).limit(PAGE_SIZE)
if last_created_at is not None and last_id is not None:
# tuple_() emits: WHERE (created_at, id) > (:p1, :p2)
stmt = stmt.where(
tuple_(Order.created_at, Order.id) > (last_created_at, last_id)
)
result = await session.execute(stmt)
rows = list(result.scalars())
return rows
Execution Context & Async Workflow Integration
Why OFFSET degrades at scale
PostgreSQL evaluates OFFSET N by executing the full query plan, scanning the index, fetching each heap row, and then discarding the first N results before returning the page. For a table of one million orders, fetching page 500 at 100 rows per page means Postgres reads and discards 50,000 rows every single time. The cost is strictly proportional to the offset value — doubling the page number doubles the query time.
Keyset pagination eliminates this by converting "skip N rows" into a direct index seek. The compound condition WHERE (created_at, id) > (:last_ts, :last_id) lets Postgres start reading from exactly where the previous page ended. With a composite index on (created_at, id), the database descends the B-tree to the cursor position in O(log N) time and reads forward. Page 5,000 costs the same as page 1.
How SQLAlchemy constructs the keyset WHERE clause
SQLAlchemy 2.0 exposes tuple_() from sqlalchemy to emit row-value comparisons. tuple_(Order.created_at, Order.id) > (last_ts, last_id) compiles directly to the PostgreSQL row-value expression (created_at, id) > ($1, $2), which the query planner recognises as an index range condition and satisfies with a single B-tree descent.
The unique tie-breaking column — id in the examples above — is mandatory. Without it, two orders with identical created_at timestamps that straddle a page boundary can be silently omitted or duplicated as the cursor advances.
Async implementation with AsyncSession
Because AsyncSession.execute() is a coroutine, keyset pagination integrates cleanly into asyncio-based web handlers and background tasks. The cursor state (last_created_at, last_id) is carried between requests either in a signed token returned to the API caller, or held server-side in a Redis key for background batch jobs.
from __future__ import annotations
import asyncio
import datetime
from dataclasses import dataclass
from typing import Optional
from sqlalchemy import select, tuple_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class Order(Base):
__tablename__ = "orders"
id: Mapped[int] = mapped_column(primary_key=True)
created_at: Mapped[datetime.datetime] = mapped_column()
customer_id: Mapped[int] = mapped_column()
total_cents: Mapped[int] = mapped_column()
@dataclass
class PageCursor:
last_created_at: datetime.datetime
last_id: int
engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100
async def paginate_orders(
cursor: Optional[PageCursor] = None,
) -> tuple[list[Order], Optional[PageCursor]]:
"""Return one page of Orders and a cursor for the next page."""
async with AsyncSession(engine) as session:
stmt = select(Order).order_by(Order.created_at, Order.id).limit(PAGE_SIZE)
if cursor is not None:
stmt = stmt.where(
tuple_(Order.created_at, Order.id)
> (cursor.last_created_at, cursor.last_id)
)
result = await session.execute(stmt)
rows = list(result.scalars())
next_cursor: Optional[PageCursor] = None
if len(rows) == PAGE_SIZE:
last = rows[-1]
next_cursor = PageCursor(last_created_at=last.created_at, last_id=last.id)
return rows, next_cursor
This approach complements server-side cursor streaming covered in Using yield_per to Stream Millions of Rows in Async. Use keyset pagination when callers need page-by-page access with a resumable cursor (API responses, exports triggered by HTTP requests). Use yield_per when the consuming code processes all rows in a single long-lived connection without exposing page tokens externally.
Resolving Warnings, Errors & Common Mistakes
| Warning / Error | Root Cause | Production Fix |
|---|---|---|
| Rows silently duplicated or skipped between pages | ORDER BY created_at without a unique tie-breaker. Two rows sharing the same created_at value straddle the keyset boundary unpredictably. | Always append a unique column — typically id — as the final sort key and include it in the keyset: ORDER BY created_at, id and WHERE (created_at, id) > (:ts, :id). |
sqlalchemy.exc.ProgrammingError: column "..." does not exist | A column referenced in ORDER BY or the keyset WHERE clause is absent from the mapped table or not projected by the query. This occurs when the ORM class is out of sync with the schema. | Run alembic upgrade head to apply pending migrations. Verify Order.__tablename__ reflects the correct table. Use inspect(engine) to confirm column presence at startup. |
| Index not used — sequential scan on large table | No composite index on the keyset columns, or only a single-column index on created_at without id. PostgreSQL falls back to a sequential scan when it cannot satisfy both sort columns from one index. | Create CREATE INDEX idx_orders_created_at_id ON orders (created_at, id);. Confirm with EXPLAIN (ANALYZE, BUFFERS) that the plan shows Index Scan using idx_orders_created_at_id. |
OFFSET on 10 M-row table causing 30+ second queries | Legacy pagination code still using .offset(page * size). Each call forces Postgres to read and discard millions of rows before returning the requested page. | Replace .offset() with keyset WHERE (created_at, id) > (:ts, :id). For one-time migrations where random-access pages are unavoidable, consider materialised cursor tables. |
| Keyset returns wrong results after mid-pagination inserts | New rows inserted with created_at values earlier than the current cursor position are invisible; rows inserted with values equal to the last keyset boundary can appear twice. | This is expected and correct for append-only workloads. For mutable datasets requiring strict consistency, snapshot the dataset into a temporary table or use FOR UPDATE SKIP LOCKED with a cursor queue. |
| Non-deterministic ordering breaks keyset stability | ORDER BY RANDOM() or any non-deterministic expression makes the keyset meaningless — different executions return different orderings for the same cursor value. | Keyset pagination requires a stable, deterministic ORDER BY. Use immutable columns (e.g., created_at, id) and never randomise the sort in a paginated query. |
Advanced Keyset Pagination Optimization
Bidirectional pagination with a dual-cursor
Most REST APIs expose both next and previous page tokens. Implementing backward navigation with keyset pagination requires reversing the sort direction while keeping the index usable.
The trick is to reverse the comparison operator and the ORDER BY direction for the previous-page fetch, then reverse the result list before returning it to the caller. This avoids storing the full result set and keeps both directions O(log N).
from __future__ import annotations
import asyncio
import datetime
from dataclasses import dataclass
from typing import Optional
from sqlalchemy import select, tuple_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class Order(Base):
__tablename__ = "orders"
id: Mapped[int] = mapped_column(primary_key=True)
created_at: Mapped[datetime.datetime] = mapped_column()
customer_id: Mapped[int] = mapped_column()
total_cents: Mapped[int] = mapped_column()
engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100
async def fetch_previous_page(
before_created_at: datetime.datetime,
before_id: int,
) -> list[Order]:
"""
Fetch the page of Orders immediately before the given cursor.
Reverses ORDER BY to seek backwards, then reverses results to
restore ascending order for the caller.
"""
async with AsyncSession(engine) as session:
stmt = (
select(Order)
# Reverse sort for the backward seek
.order_by(Order.created_at.desc(), Order.id.desc())
.where(
tuple_(Order.created_at, Order.id) < (before_created_at, before_id)
)
.limit(PAGE_SIZE)
)
result = await session.execute(stmt)
rows = list(result.scalars())
# Re-invert to ascending order before returning
rows.reverse()
return rows
Covering index to eliminate heap fetches
When the API only returns a subset of columns (e.g., id, created_at, total_cents for an order list view), a covering index that includes those columns lets Postgres satisfy the entire query from the index without visiting the heap at all. This further reduces I/O on large tables.
-- Covering index: satisfies ORDER BY, keyset WHERE, and projected columns
-- without any heap access for the common list-view query.
CREATE INDEX idx_orders_keyset_covering
ON orders (created_at, id)
INCLUDE (customer_id, total_cents);
SQLAlchemy can take advantage of this automatically — no ORM changes needed. Verify with EXPLAIN (ANALYZE): the plan should show Index Only Scan with zero heap fetches once the visibility map is current.
Nullable columns in the keyset with NULLS LAST
When sorting by an optional column such as Invoice.settled_at, null values require explicit handling. PostgreSQL defaults to NULLS LAST in ascending order and NULLS FIRST in descending order, but the keyset comparison must match the sort order exactly or results will be inconsistent.
from __future__ import annotations
import asyncio
import datetime
from typing import Optional
from sqlalchemy import select, tuple_, nulls_last
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class Invoice(Base):
__tablename__ = "invoices"
id: Mapped[int] = mapped_column(primary_key=True)
settled_at: Mapped[Optional[datetime.datetime]] = mapped_column(nullable=True)
customer_id: Mapped[int] = mapped_column()
amount_cents: Mapped[int] = mapped_column()
engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100
async def fetch_invoices_nullable_keyset(
last_settled_at: Optional[datetime.datetime],
last_id: int,
) -> list[Invoice]:
"""
Paginate invoices sorted by nullable settled_at (NULLS LAST) then id.
Rows where settled_at IS NULL appear at the end of the result set.
"""
async with AsyncSession(engine) as session:
stmt = (
select(Invoice)
.order_by(nulls_last(Invoice.settled_at.asc()), Invoice.id)
.limit(PAGE_SIZE)
)
if last_settled_at is not None:
# Settled rows: standard tuple comparison works
stmt = stmt.where(
tuple_(Invoice.settled_at, Invoice.id) > (last_settled_at, last_id)
)
else:
# Cursor is within the NULL partition — filter by id only
stmt = stmt.where(
Invoice.settled_at.is_(None),
Invoice.id > last_id,
)
result = await session.execute(stmt)
return list(result.scalars())
This pattern also applies to Window Functions and Analytical Queries where NULLS LAST ordering affects frame boundaries.
Frequently Asked Questions
Can I use keyset pagination with SQLAlchemy ORM relationships and joined eager loading?
Yes, but the keyset WHERE clause must reference columns on the primary entity being paginated, not on a joined relationship. Add .options(selectinload(Order.line_items)) after constructing the base keyset query. Avoid joinedload with keyset pagination — the LIMIT applies before the join is expanded, which causes joinedload to return incomplete relationship collections for the last row on each page. selectinload fires a second query keyed on the primary IDs returned and avoids this entirely. See Complex Joins and Relationship Loading Strategies for a full comparison.
When is OFFSET still acceptable?
For tables under roughly 50,000 rows, the performance difference is negligible and OFFSET keeps the implementation simple. OFFSET is also the only option when the caller needs true random page access (e.g., "jump to page 47") rather than sequential forward/backward traversal. Keyset cursors are inherently sequential — they cannot skip to an arbitrary page without iterating through all preceding pages.
How do I encode the cursor safely for an HTTP API?
Serialize (last_created_at.isoformat(), last_id) as a JSON string, then base64-encode it and sign it with itsdangerous.URLSafeTimedSerializer or similar. Never trust a raw cursor value from an untrusted client — validate that both fields are present and of the correct type before issuing the database query. An unsigned cursor allows callers to probe arbitrary database positions.
Does keyset pagination work with yield_per for background batch processing?
They serve complementary roles. yield_per holds a server-side cursor open for the lifetime of a single long-running async with AsyncSession block and streams rows in memory-bounded chunks — ideal for ETL or data exports within a single process. Keyset pagination stores only the cursor value between independent requests, making it suitable for stateless API handlers or distributed workers that cannot share a database connection. When connection pool exhaustion is a concern — a risk covered in depth at Handling Connection Leaks and Pool Exhaustion — keyset pagination releases the connection after each page fetch and is therefore safer under high concurrency than a long-lived yield_per cursor.
Related
- Streaming Large Result Sets with yield_per — parent cluster covering server-side cursor streaming for batch processing workloads
- Using yield_per to Stream Millions of Rows in Async — when to use a persistent cursor instead of a keyset token
- High-Performance Bulk Inserts and Updates — efficient write patterns that complement large-table read strategies
- Handling Connection Leaks and Pool Exhaustion — why releasing connections between keyset pages matters under concurrency