Paginating Large Result Sets with Keyset Pagination in SQLAlchemy

Keyset pagination (also called seek pagination) replaces OFFSET with a WHERE (created_at, id) > (:last_ts, :last_id) clause so every page fetch is an O(log N) index seek instead of an O(N) sequential scan — this page belongs to the Streaming Large Result Sets with yield_per cluster.

Quick Answer

Before — OFFSET pagination that degrades at scale:

from __future__ import annotations

import asyncio
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
import datetime


class Base(DeclarativeBase):
    pass


class Order(Base):
    __tablename__ = "orders"

    id: Mapped[int] = mapped_column(primary_key=True)
    created_at: Mapped[datetime.datetime] = mapped_column(index=True)
    customer_id: Mapped[int] = mapped_column()
    total_cents: Mapped[int] = mapped_column()


engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")

PAGE_SIZE = 100


async def fetch_orders_offset(page: int) -> list[Order]:
    """OFFSET grows O(N) — Postgres discards `page * PAGE_SIZE` rows on every call."""
    async with AsyncSession(engine) as session:
        result = await session.execute(
            select(Order)
            .order_by(Order.created_at, Order.id)
            .limit(PAGE_SIZE)
            .offset(page * PAGE_SIZE)  # Full scan cost grows with each page
        )
        return list(result.scalars())

After — keyset pagination with constant cost per page:

from __future__ import annotations

import asyncio
import datetime
from typing import Optional

from sqlalchemy import select, tuple_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column


class Base(DeclarativeBase):
    pass


class Order(Base):
    __tablename__ = "orders"

    id: Mapped[int] = mapped_column(primary_key=True)
    created_at: Mapped[datetime.datetime] = mapped_column()
    customer_id: Mapped[int] = mapped_column()
    total_cents: Mapped[int] = mapped_column()


engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")

PAGE_SIZE = 100


async def fetch_orders_keyset(
    last_created_at: Optional[datetime.datetime] = None,
    last_id: Optional[int] = None,
) -> list[Order]:
    """Keyset seek: always O(log N) index seek regardless of page depth."""
    async with AsyncSession(engine) as session:
        stmt = select(Order).order_by(Order.created_at, Order.id).limit(PAGE_SIZE)

        if last_created_at is not None and last_id is not None:
            # tuple_() emits: WHERE (created_at, id) > (:p1, :p2)
            stmt = stmt.where(
                tuple_(Order.created_at, Order.id) > (last_created_at, last_id)
            )

        result = await session.execute(stmt)
        rows = list(result.scalars())
        return rows

Execution Context & Async Workflow Integration

Why OFFSET degrades at scale

PostgreSQL evaluates OFFSET N by executing the full query plan, scanning the index, fetching each heap row, and then discarding the first N results before returning the page. For a table of one million orders, fetching page 500 at 100 rows per page means Postgres reads and discards 50,000 rows every single time. The cost is strictly proportional to the offset value — doubling the page number doubles the query time.

Keyset pagination eliminates this by converting "skip N rows" into a direct index seek. The compound condition WHERE (created_at, id) > (:last_ts, :last_id) lets Postgres start reading from exactly where the previous page ended. With a composite index on (created_at, id), the database descends the B-tree to the cursor position in O(log N) time and reads forward. Page 5,000 costs the same as page 1.

OFFSET vs Keyset Pagination Cost Bar chart showing that OFFSET pagination query cost grows linearly with page number while keyset pagination maintains constant cost at every page. Rows scanned 0 25k 50k 75k 100k p.1 p.2 p.3 p.4 p.5 Page Number OFFSET (linear scan) Keyset (index seek)

How SQLAlchemy constructs the keyset WHERE clause

SQLAlchemy 2.0 exposes tuple_() from sqlalchemy to emit row-value comparisons. tuple_(Order.created_at, Order.id) > (last_ts, last_id) compiles directly to the PostgreSQL row-value expression (created_at, id) > ($1, $2), which the query planner recognises as an index range condition and satisfies with a single B-tree descent.

The unique tie-breaking column — id in the examples above — is mandatory. Without it, two orders with identical created_at timestamps that straddle a page boundary can be silently omitted or duplicated as the cursor advances.

Async implementation with AsyncSession

Because AsyncSession.execute() is a coroutine, keyset pagination integrates cleanly into asyncio-based web handlers and background tasks. The cursor state (last_created_at, last_id) is carried between requests either in a signed token returned to the API caller, or held server-side in a Redis key for background batch jobs.

from __future__ import annotations

import asyncio
import datetime
from dataclasses import dataclass
from typing import Optional

from sqlalchemy import select, tuple_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column


class Base(DeclarativeBase):
    pass


class Order(Base):
    __tablename__ = "orders"

    id: Mapped[int] = mapped_column(primary_key=True)
    created_at: Mapped[datetime.datetime] = mapped_column()
    customer_id: Mapped[int] = mapped_column()
    total_cents: Mapped[int] = mapped_column()


@dataclass
class PageCursor:
    last_created_at: datetime.datetime
    last_id: int


engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100


async def paginate_orders(
    cursor: Optional[PageCursor] = None,
) -> tuple[list[Order], Optional[PageCursor]]:
    """Return one page of Orders and a cursor for the next page."""
    async with AsyncSession(engine) as session:
        stmt = select(Order).order_by(Order.created_at, Order.id).limit(PAGE_SIZE)
        if cursor is not None:
            stmt = stmt.where(
                tuple_(Order.created_at, Order.id)
                > (cursor.last_created_at, cursor.last_id)
            )
        result = await session.execute(stmt)
        rows = list(result.scalars())

    next_cursor: Optional[PageCursor] = None
    if len(rows) == PAGE_SIZE:
        last = rows[-1]
        next_cursor = PageCursor(last_created_at=last.created_at, last_id=last.id)

    return rows, next_cursor

This approach complements server-side cursor streaming covered in Using yield_per to Stream Millions of Rows in Async. Use keyset pagination when callers need page-by-page access with a resumable cursor (API responses, exports triggered by HTTP requests). Use yield_per when the consuming code processes all rows in a single long-lived connection without exposing page tokens externally.


Resolving Warnings, Errors & Common Mistakes

Warning / ErrorRoot CauseProduction Fix
Rows silently duplicated or skipped between pagesORDER BY created_at without a unique tie-breaker. Two rows sharing the same created_at value straddle the keyset boundary unpredictably.Always append a unique column — typically id — as the final sort key and include it in the keyset: ORDER BY created_at, id and WHERE (created_at, id) > (:ts, :id).
sqlalchemy.exc.ProgrammingError: column "..." does not existA column referenced in ORDER BY or the keyset WHERE clause is absent from the mapped table or not projected by the query. This occurs when the ORM class is out of sync with the schema.Run alembic upgrade head to apply pending migrations. Verify Order.__tablename__ reflects the correct table. Use inspect(engine) to confirm column presence at startup.
Index not used — sequential scan on large tableNo composite index on the keyset columns, or only a single-column index on created_at without id. PostgreSQL falls back to a sequential scan when it cannot satisfy both sort columns from one index.Create CREATE INDEX idx_orders_created_at_id ON orders (created_at, id);. Confirm with EXPLAIN (ANALYZE, BUFFERS) that the plan shows Index Scan using idx_orders_created_at_id.
OFFSET on 10 M-row table causing 30+ second queriesLegacy pagination code still using .offset(page * size). Each call forces Postgres to read and discard millions of rows before returning the requested page.Replace .offset() with keyset WHERE (created_at, id) > (:ts, :id). For one-time migrations where random-access pages are unavoidable, consider materialised cursor tables.
Keyset returns wrong results after mid-pagination insertsNew rows inserted with created_at values earlier than the current cursor position are invisible; rows inserted with values equal to the last keyset boundary can appear twice.This is expected and correct for append-only workloads. For mutable datasets requiring strict consistency, snapshot the dataset into a temporary table or use FOR UPDATE SKIP LOCKED with a cursor queue.
Non-deterministic ordering breaks keyset stabilityORDER BY RANDOM() or any non-deterministic expression makes the keyset meaningless — different executions return different orderings for the same cursor value.Keyset pagination requires a stable, deterministic ORDER BY. Use immutable columns (e.g., created_at, id) and never randomise the sort in a paginated query.

Advanced Keyset Pagination Optimization

Bidirectional pagination with a dual-cursor

Most REST APIs expose both next and previous page tokens. Implementing backward navigation with keyset pagination requires reversing the sort direction while keeping the index usable.

The trick is to reverse the comparison operator and the ORDER BY direction for the previous-page fetch, then reverse the result list before returning it to the caller. This avoids storing the full result set and keeps both directions O(log N).

from __future__ import annotations

import asyncio
import datetime
from dataclasses import dataclass
from typing import Optional

from sqlalchemy import select, tuple_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column


class Base(DeclarativeBase):
    pass


class Order(Base):
    __tablename__ = "orders"

    id: Mapped[int] = mapped_column(primary_key=True)
    created_at: Mapped[datetime.datetime] = mapped_column()
    customer_id: Mapped[int] = mapped_column()
    total_cents: Mapped[int] = mapped_column()


engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100


async def fetch_previous_page(
    before_created_at: datetime.datetime,
    before_id: int,
) -> list[Order]:
    """
    Fetch the page of Orders immediately before the given cursor.
    Reverses ORDER BY to seek backwards, then reverses results to
    restore ascending order for the caller.
    """
    async with AsyncSession(engine) as session:
        stmt = (
            select(Order)
            # Reverse sort for the backward seek
            .order_by(Order.created_at.desc(), Order.id.desc())
            .where(
                tuple_(Order.created_at, Order.id) < (before_created_at, before_id)
            )
            .limit(PAGE_SIZE)
        )
        result = await session.execute(stmt)
        rows = list(result.scalars())

    # Re-invert to ascending order before returning
    rows.reverse()
    return rows

Covering index to eliminate heap fetches

When the API only returns a subset of columns (e.g., id, created_at, total_cents for an order list view), a covering index that includes those columns lets Postgres satisfy the entire query from the index without visiting the heap at all. This further reduces I/O on large tables.

-- Covering index: satisfies ORDER BY, keyset WHERE, and projected columns
-- without any heap access for the common list-view query.
CREATE INDEX idx_orders_keyset_covering
    ON orders (created_at, id)
    INCLUDE (customer_id, total_cents);

SQLAlchemy can take advantage of this automatically — no ORM changes needed. Verify with EXPLAIN (ANALYZE): the plan should show Index Only Scan with zero heap fetches once the visibility map is current.

Nullable columns in the keyset with NULLS LAST

When sorting by an optional column such as Invoice.settled_at, null values require explicit handling. PostgreSQL defaults to NULLS LAST in ascending order and NULLS FIRST in descending order, but the keyset comparison must match the sort order exactly or results will be inconsistent.

from __future__ import annotations

import asyncio
import datetime
from typing import Optional

from sqlalchemy import select, tuple_, nulls_last
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column


class Base(DeclarativeBase):
    pass


class Invoice(Base):
    __tablename__ = "invoices"

    id: Mapped[int] = mapped_column(primary_key=True)
    settled_at: Mapped[Optional[datetime.datetime]] = mapped_column(nullable=True)
    customer_id: Mapped[int] = mapped_column()
    amount_cents: Mapped[int] = mapped_column()


engine = create_async_engine("postgresql+asyncpg://user:pass@host/db")
PAGE_SIZE = 100


async def fetch_invoices_nullable_keyset(
    last_settled_at: Optional[datetime.datetime],
    last_id: int,
) -> list[Invoice]:
    """
    Paginate invoices sorted by nullable settled_at (NULLS LAST) then id.
    Rows where settled_at IS NULL appear at the end of the result set.
    """
    async with AsyncSession(engine) as session:
        stmt = (
            select(Invoice)
            .order_by(nulls_last(Invoice.settled_at.asc()), Invoice.id)
            .limit(PAGE_SIZE)
        )

        if last_settled_at is not None:
            # Settled rows: standard tuple comparison works
            stmt = stmt.where(
                tuple_(Invoice.settled_at, Invoice.id) > (last_settled_at, last_id)
            )
        else:
            # Cursor is within the NULL partition — filter by id only
            stmt = stmt.where(
                Invoice.settled_at.is_(None),
                Invoice.id > last_id,
            )

        result = await session.execute(stmt)
        return list(result.scalars())

This pattern also applies to Window Functions and Analytical Queries where NULLS LAST ordering affects frame boundaries.


Frequently Asked Questions

Can I use keyset pagination with SQLAlchemy ORM relationships and joined eager loading?

Yes, but the keyset WHERE clause must reference columns on the primary entity being paginated, not on a joined relationship. Add .options(selectinload(Order.line_items)) after constructing the base keyset query. Avoid joinedload with keyset pagination — the LIMIT applies before the join is expanded, which causes joinedload to return incomplete relationship collections for the last row on each page. selectinload fires a second query keyed on the primary IDs returned and avoids this entirely. See Complex Joins and Relationship Loading Strategies for a full comparison.

When is OFFSET still acceptable?

For tables under roughly 50,000 rows, the performance difference is negligible and OFFSET keeps the implementation simple. OFFSET is also the only option when the caller needs true random page access (e.g., "jump to page 47") rather than sequential forward/backward traversal. Keyset cursors are inherently sequential — they cannot skip to an arbitrary page without iterating through all preceding pages.

How do I encode the cursor safely for an HTTP API?

Serialize (last_created_at.isoformat(), last_id) as a JSON string, then base64-encode it and sign it with itsdangerous.URLSafeTimedSerializer or similar. Never trust a raw cursor value from an untrusted client — validate that both fields are present and of the correct type before issuing the database query. An unsigned cursor allows callers to probe arbitrary database positions.

Does keyset pagination work with yield_per for background batch processing?

They serve complementary roles. yield_per holds a server-side cursor open for the lifetime of a single long-running async with AsyncSession block and streams rows in memory-bounded chunks — ideal for ETL or data exports within a single process. Keyset pagination stores only the cursor value between independent requests, making it suitable for stateless API handlers or distributed workers that cannot share a database connection. When connection pool exhaustion is a concern — a risk covered in depth at Handling Connection Leaks and Pool Exhaustion — keyset pagination releases the connection after each page fetch and is therefore safer under high concurrency than a long-lived yield_per cursor.