How DuckDB Spatial Handles Coordinate Systems

DuckDB Spatial decouples geometric primitives from coordinate reference system (CRS) metadata to optimize columnar execution. Unlike legacy GIS engines that embed projection definitions directly into geometry binary payloads, DuckDB attaches CRS identifiers as logical schema metadata. This architecture reduces serialization overhead but mandates explicit metadata propagation during query planning. As documented in the DuckDB Spatial Architecture & Fundamentals specification, the extension intercepts Apache Arrow readers to preserve EPSG/SRID tags without inflating the in-memory footprint. When no CRS tag is present, the engine defaults to EPSG:4326, a behavior that frequently causes silent misalignment in downstream analytical joins.

Ingestion Pathways & CRS Metadata Handling

GeoJSON ingestion adheres to RFC 7946, which mandates WGS 84. DuckDB’s JSON reader parses the optional crs property but strips it during Arrow conversion unless explicitly retained via read_json_auto(..., columns={'geometry': 'GEOMETRY'}). GeoParquet parsing operates differently: the extension reads the geo metadata block in the Parquet footer, extracts the crs JSON object, and validates PROJ strings or EPSG codes before attaching them to the column’s logical type. If the footer contains malformed PROJJSON, the parser falls back to UNKNOWN, triggering implicit Cartesian distance calculations. Engineers must validate GeoParquet CRS blocks prior to ingestion to prevent silent drift.

Transformation Pipeline & Memory/IO Boundaries

Coordinate transformations execute via embedded PROJ libraries. ST_Transform(geom, target_srid) materializes intermediate coordinate arrays in contiguous memory. For datasets exceeding 10M rows, transformation memory scales linearly with vertex count, frequently breaching the default max_memory threshold (80% of system RAM). To prevent OOM termination, enforce PRAGMA memory_limit='12GB'; and batch transformations using partitioned COPY operations or windowed ROW_NUMBER() filters. Disk-based execution via SET threads=4; and external sorting reduces peak heap pressure but introduces sequential I/O latency. Projection logic and datum grid caching follow the CRS Mapping & Transformations pipeline, which stores PROJ grids in ~/.duckdb/proj to eliminate redundant network fetches during bulk conversions.

Spatial Indexing Internals & CRS-Agnostic Execution

DuckDB’s spatial indexing relies on a flat R-tree constructed over raw bounding box coordinates. The index is strictly CRS-agnostic; it does not account for projection distortion, geodesic curvature, or unit normalization. Queries like ST_DWithin or ST_Intersects operate on untransformed coordinate space, meaning cross-CRS joins will produce false negatives or catastrophic bounding box mismatches. Always normalize geometries to a common CRS before index construction. Use ST_Transform explicitly in WHERE clauses or materialize transformed columns prior to index creation.

CRS Drift Troubleshooting & Incident Resolution

When spatial joins yield zero matches despite overlapping geometries, verify CRS alignment using diagnostic queries and enforce strict fallback routing.

-- Inspect column types (DuckDB exposes no per-column CRS metadata)
SELECT column_name, data_type
FROM duckdb_columns()
WHERE table_name = 'target_table';

-- DuckDB has no SRID to "assign"; if a layer's CRS is only known out-of-band,
-- transform it from the assumed source CRS into your working CRS.
CREATE OR REPLACE TABLE target_norm AS
SELECT * EXCLUDE (geometry),
       ST_Transform(geometry, 'EPSG:4326', 'EPSG:3857') AS geometry
FROM target_table;

-- Validate bounding box consistency post-transformation
SELECT ST_XMin(bbox), ST_YMin(bbox), ST_XMax(bbox), ST_YMax(bbox)
FROM (SELECT ST_Extent(geometry) AS bbox FROM target_norm) sub;

Fallback routing for an unknown CRS involves applying a manual affine correction via ST_Affine once the transform parameters are known. Because DuckDB has no ST_SRID, implement pre-flight validation on coordinate ranges — reject rows whose extents fall outside the expected bounds. Guard at ingestion:

-- Reject obviously mis-projected rows (lon/lat must sit within ±180/±90)
DELETE FROM target_table
WHERE NOT (ST_XMin(geometry) BETWEEN -180 AND 180
       AND ST_YMin(geometry) BETWEEN -90 AND 90);

Enterprise Deployment & Access Control

In multi-tenant environments, isolate spatial workloads using dedicated DuckDB instances with explicit PRAGMA threads=1 for deterministic execution. Access control integrates with external IAM providers via federated credential injection; however, CRS metadata remains unencrypted in Arrow buffers. Apply column-level encryption, or restrict access by exposing curated views and attaching the database read-only — DuckDB has no GRANT/RLS. For high-throughput ingestion, route GeoParquet files through a validation proxy that enforces strict PROJJSON compliance before reaching the execution engine. Enforce projection consistency with coordinate-range filters inside those views rather than per-row CRS checks.