{"schemaVersion":"1.0.0","docId":"adapters","source":"adapters","slug":"adapters","path":"/docs/adapters","raw_path":"/raw/adapters.md","title":"Adapters","headings":[],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/adapters/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Adapters\nnavTitle: Adapters\ndescription: Storage adapters for the LakeQL write pipeline — Parquet generation, S3/MinIO uploads, and Hive table management.\nentrypoint: /docs/adapters/overview/introduction\n---\n","description":"Storage adapters for the LakeQL write pipeline — Parquet generation, S3/MinIO uploads, and Hive table management.","navTitle":"Adapters","keywords":["adapters","storage","lakeql","write","pipeline"]}
{"schemaVersion":"1.0.0","docId":"adapters/overview/hive-table-manager","source":"adapters","slug":"overview/hive-table-manager","path":"/docs/adapters/overview/hive-table-manager","raw_path":"/raw/adapters/overview/hive-table-manager.md","title":"Hive Table Manager","headings":[{"level":2,"text":"How it works","id":"how-it-works"},{"level":2,"text":"External table locations","id":"external-table-locations"},{"level":2,"text":"Table strategies","id":"table-strategies"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Generated DDL","id":"generated-ddl"},{"level":2,"text":"Configuration","id":"configuration"},{"level":2,"text":"Table definition","id":"table-definition"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/adapters/overview/hive-table-manager/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Hive Table Manager\nnavTitle: Hive Table Manager\ndescription: How the write pipeline manages Hive external tables in Trino.\n---\n\nThe Hive Table Manager is responsible for creating and maintaining external tables in Trino's Hive catalog. After the write pipeline uploads Parquet files to object storage, the table manager ensures a corresponding Hive table exists that points to that data — making it immediately queryable.\n\n## How it works\n\nEvery mutation write goes through this flow:\n\n1. Data is written as Parquet to S3/MinIO\n2. The existing Hive table is dropped (if present)\n3. A new external table is created pointing to the uploaded data\n\nThis \"drop + create\" approach ensures the table schema always matches the current data, even if fields were added or removed.\n\n## External table locations\n\nHive external tables point to a **directory** in object storage, not a single file. The table manager uses the `s3a://` URI scheme, which is required by Hive's underlying Hadoop FileSystem.\n\n```\ns3a://<bucket>/<basePath>/latest.parquet/\n         │         │              │\n         │         │              └─ Directory containing Parquet file(s)\n         │         └─ Configured basePath for the endpoint\n         └─ Configured bucket name\n```\n\n<Note>\n  The `s3a://` scheme is always used — regardless of whether the storage is AWS\n  S3 or MinIO. The standard `s3://` scheme is not recognized by Hadoop's\n  FileSystem.\n</Note>\n\n## Table strategies\n\nDepending on the [load strategy](/docs/adapters/write-pipeline/load-strategies), different tables are managed:\n\n| Strategy           | Tables created                                                               |\n| ------------------ | ---------------------------------------------------------------------------- |\n| `full_load`        | `<tableName>` → `latest.parquet/`                                            |\n| `full_load_append` | `<tableName>_latest` → `latest.parquet/`, `<tableName>_all` → `all.parquet/` |\n| `append`           | `<tableName>` → `all.parquet/`                                               |\n\nFor `full_load_append`, both tables are managed atomically — if creating one fails, the manager attempts a best-effort rollback by dropping both.\n\n## Usage\n\nThe table manager is typically used internally by `executeWritePipeline`. For direct usage:\n\n```ts\nimport { createHiveTableManager } from \"@lakeql/adapters\"\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst trinoClient = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"admin\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst hiveManager = createHiveTableManager({\n  client: trinoClient,\n  bucket: \"my-datalake\",\n})\n\n// Build a properly formatted location URI\nconst location = hiveManager.buildExternalLocation(\n  \"analytics/events/latest.parquet/\"\n)\n// => \"s3a://my-datalake/analytics/events/latest.parquet/\"\n\n// Create a table pointing to that location\nawait hiveManager.recreateTable({\n  catalog: \"hive\",\n  schema: \"analytics\",\n  tableName: \"events\",\n  externalLocation: location,\n  columns: [\n    { name: \"event_id\", type: \"VARCHAR\" },\n    { name: \"message\", type: \"VARCHAR\" },\n    { name: \"created_at\", type: \"TIMESTAMP(3)\" },\n  ],\n})\n```\n\n## Generated DDL\n\nThe above produces:\n\n```sql\nDROP TABLE IF EXISTS hive.analytics.events;\n\nCREATE TABLE hive.analytics.events (\n  event_id VARCHAR,\n  message VARCHAR,\n  created_at TIMESTAMP(3)\n)\nWITH (\n  external_location = 's3a://my-datalake/analytics/events/latest.parquet/',\n  format = 'PARQUET'\n);\n```\n\n## Configuration\n\n<InterfaceReference\n  file=\"adapters/src/hive-table-manager\"\n  name=\"HiveTableManagerConfig\"\n  mode=\"declaration\"\n/>\n\n## Table definition\n\n<InterfaceReference\n  file=\"adapters/src/hive-table-manager\"\n  name=\"HiveTableDefinition\"\n  mode=\"declaration\"\n/>\n","description":"How the write pipeline manages Hive external tables in Trino.","navTitle":"Hive Table Manager","keywords":["table","external","manager","write","pipeline"]}
{"schemaVersion":"1.0.0","docId":"adapters/overview/introduction","source":"adapters","slug":"overview/introduction","path":"/docs/adapters/overview/introduction","raw_path":"/raw/adapters/overview/introduction.md","title":"Introduction","headings":[{"level":2,"text":"What it does","id":"what-it-does"},{"level":2,"text":"Package exports","id":"package-exports"},{"level":2,"text":"Architecture","id":"architecture"},{"level":2,"text":"Storage types","id":"storage-types"},{"level":2,"text":"Installation","id":"installation"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/adapters/overview/introduction/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Introduction\nnavTitle: Introduction\ndescription: Overview of the @lakeql/adapters package and how it powers the mutation write pipeline.\n---\n\nThe `@lakeql/adapters` package provides the storage layer for LakeQL mutations. When a GraphQL mutation is executed, the adapters package handles everything from converting records to Parquet, uploading to object storage, and managing Hive external table DDL in Trino.\n\n## What it does\n\nThe package is responsible for three core tasks:\n\n1. **Parquet generation** — Converts input records to columnar Parquet format using `@lakeql/parquet`\n2. **Object storage** — Uploads Parquet files to S3 or S3-compatible storage (MinIO, etc.)\n3. **Hive DDL management** — Creates and manages external tables in Trino's Hive catalog so the data is immediately queryable\n\n## Package exports\n\nThe main entry point (`@lakeql/adapters`) exports storage operations, the write pipeline, and the Hive table manager. See the [API Reference](/docs/adapters/api-reference) for the full list of exports.\n\n## Architecture\n\n```mermaid\ngraph TD\n    A[GraphQL Mutation] --> B[executeWritePipeline]\n    B --> C[writeParquet]\n    B --> D[Storage Operations]\n    B --> E[Hive Table Manager]\n    C --> D\n    D --> F[S3 / MinIO]\n    E --> G[Trino DDL]\n```\n\nThe `executeWritePipeline` function orchestrates the full flow. It accepts records, a JSON Schema describing their shape, and a pipeline configuration that determines the load strategy, storage target, and table definition.\n\n## Storage types\n\nThe adapters support two storage backends:\n\n| Type    | Description                 | Credentials                                      |\n| ------- | --------------------------- | ------------------------------------------------ |\n| `s3`    | AWS S3                      | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`     |\n| `minio` | S3-compatible (MinIO, etc.) | `MINIO_ACCESS_KEY_ID`, `MINIO_SECRET_ACCESS_KEY` |\n\nBoth use the same S3 protocol under the hood. The `minio` type requires an explicit `endpoint` configuration.\n\n## Installation\n\n<Command variant=\"install\">@lakeql/adapters</Command>\n\nPeer dependencies for S3/MinIO storage:\n\n<Command variant=\"install\">\n  @aws-sdk/client-s3 @aws-sdk/s3-presigned-post @aws-sdk/s3-request-presigner\n</Command>\n","description":"Overview of the @lakeql/adapters package and how it powers the mutation write pipeline.","navTitle":"Introduction","keywords":["package","introduction","overview","lakeqladapters","powers"]}
{"schemaVersion":"1.0.0","docId":"adapters/storage/storage-operations","source":"adapters","slug":"storage/storage-operations","path":"/docs/adapters/storage/storage-operations","raw_path":"/raw/adapters/storage/storage-operations.md","title":"Storage Operations","headings":[{"level":2,"text":"createStorageOperations","id":"create-storage-operations"},{"level":2,"text":"Configuration","id":"configuration"},{"level":2,"text":"Operations","id":"operations"},{"level":2,"text":"Environment variables","id":"environment-variables"},{"level":2,"text":"Error handling","id":"error-handling"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/adapters/storage/storage-operations/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Storage Operations\nnavTitle: Storage Operations\ndescription: Low-level S3/MinIO storage operations for uploading and managing Parquet files.\n---\n\nStorage operations provide the low-level interface for interacting with S3 or S3-compatible object storage. The write pipeline uses these internally, but they are exported for advanced use cases.\n\n## createStorageOperations\n\nCreates a storage operations instance backed by `files-sdk`. The adapter selection is based on `config.type`:\n\n- `\"s3\"` — reads credentials from `AWS_*` environment variables\n- `\"minio\"` — reads credentials from `MINIO_*` environment variables\n\n```ts\nimport { createStorageOperations } from \"@lakeql/adapters\"\n\nconst storage = createStorageOperations({\n  type: \"minio\",\n  bucket: \"my-datalake\",\n  endpoint: \"http://localhost:9000\",\n})\n```\n\n## Configuration\n\n<InterfaceReference\n  file=\"adapters/src/storage-operations\"\n  name=\"StorageConfig\"\n  mode=\"declaration\"\n/>\n\n## Operations\n\nThe returned object provides these methods:\n\n| Method                 | Description                                           |\n| ---------------------- | ----------------------------------------------------- |\n| `upload(buffer, path)` | Upload a `Uint8Array` to the given path in the bucket |\n| `deletePrefix(prefix)` | Delete all objects under the given prefix             |\n\n## Environment variables\n\n### S3\n\n| Variable                | Description              |\n| ----------------------- | ------------------------ |\n| `AWS_ACCESS_KEY_ID`     | AWS access key           |\n| `AWS_SECRET_ACCESS_KEY` | AWS secret key           |\n| `AWS_DEFAULT_REGION`    | Default region           |\n| `AWS_ENDPOINT_URL`      | Optional custom endpoint |\n\n### MinIO\n\n| Variable                  | Description      |\n| ------------------------- | ---------------- |\n| `MINIO_ACCESS_KEY_ID`     | MinIO access key |\n| `MINIO_SECRET_ACCESS_KEY` | MinIO secret key |\n\n<Note>\n  The `endpoint` field in the config is required for MinIO. For S3, it's\n  optional and defaults to the standard AWS endpoint for the region.\n</Note>\n\n## Error handling\n\nStorage errors are wrapped in a `StorageError` with a descriptive message including the operation and path:\n\n```ts\nimport { StorageError } from \"@lakeql/adapters\"\n\ntry {\n  await storage.upload(buffer, \"path/to/file.parquet\")\n} catch (error) {\n  if (error instanceof StorageError) {\n    console.error(error.message)\n    // \"Storage upload failed for \"path/to/file.parquet\": The specified bucket does not exist\"\n  }\n}\n```\n\n<InlineReference\n  file=\"adapters/src/storage-operations\"\n  include={[\"StorageError\"]}\n/>\n","description":"Low-level S3/MinIO storage operations for uploading and managing Parquet files.","navTitle":"Storage Operations","keywords":["operations","storage","low-level","s3minio","uploading"]}
{"schemaVersion":"1.0.0","docId":"adapters/write-pipeline/execute-write-pipeline","source":"adapters","slug":"write-pipeline/execute-write-pipeline","path":"/docs/adapters/write-pipeline/execute-write-pipeline","raw_path":"/raw/adapters/write-pipeline/execute-write-pipeline.md","title":"executeWritePipeline","headings":[{"level":2,"text":"Signature","id":"signature"},{"level":2,"text":"Input","id":"input"},{"level":2,"text":"Configuration","id":"configuration"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Pipeline steps","id":"pipeline-steps"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/adapters/write-pipeline/execute-write-pipeline/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: executeWritePipeline\nnavTitle: executeWritePipeline\ndescription: The main entry point for persisting mutation data through the write pipeline.\n---\n\n`executeWritePipeline` is the primary function for persisting GraphQL mutation input. It converts records to Parquet, uploads them to object storage, and manages the corresponding Hive external table in Trino.\n\n## Signature\n\n```ts\nfunction executeWritePipeline(input: WritePipelineInput): Promise<void>\n```\n\n## Input\n\n<InterfaceReference\n  file=\"adapters/src/write-pipeline\"\n  name=\"WritePipelineInput\"\n  mode=\"declaration\"\n/>\n\n## Configuration\n\n<InterfaceReference\n  file=\"adapters/src/write-pipeline\"\n  name=\"WritePipelineConfig\"\n  mode=\"declaration\"\n/>\n\n## Usage\n\n```ts\nimport { executeWritePipeline } from \"@lakeql/adapters\"\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst trinoClient = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"admin\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nawait executeWritePipeline({\n  records: [\n    {\n      event_id: \"abc-123\",\n      message: \"Hello\",\n      timestamp: \"2025-01-15T10:30:00Z\",\n    },\n  ],\n  jsonSchema: {\n    type: \"object\",\n    properties: {\n      event_id: { type: \"string\" },\n      message: { type: \"string\" },\n      timestamp: { type: \"string\", format: \"date-time\" },\n    },\n  },\n  config: {\n    loadStrategy: \"full_load\",\n    type: \"minio\",\n    bucket: \"my-datalake\",\n    basePath: \"analytics/events\",\n    endpoint: \"http://localhost:9000\",\n    table: {\n      catalog: \"hive\",\n      schema: \"analytics\",\n      tableName: \"events\",\n    },\n    trinoClient,\n  },\n})\n```\n\n## Pipeline steps\n\nThe pipeline executes these steps in order:\n\n1. **Convert to Parquet** — Records are serialized to a Parquet buffer using the provided JSON Schema\n2. **Upload to storage** — The Parquet file is uploaded to the configured bucket/path\n3. **Manage Hive DDL** — The external table is dropped and recreated pointing to the new data location\n\nIf any step fails, the pipeline stops immediately and throws the error. There is no automatic rollback of earlier steps.\n\n<Note>\n  The `jsonSchema` property is typically generated by the CLI as\n  `json-schema.json` in each endpoint directory. You don't need to write it by\n  hand.\n</Note>\n","description":"The main entry point for persisting mutation data through the write pipeline.","navTitle":"executeWritePipeline","keywords":["pipeline","executewritepipeline","entry","point","persisting"]}
{"schemaVersion":"1.0.0","docId":"adapters/write-pipeline/load-strategies","source":"adapters","slug":"write-pipeline/load-strategies","path":"/docs/adapters/write-pipeline/load-strategies","raw_path":"/raw/adapters/write-pipeline/load-strategies.md","title":"Load Strategies","headings":[{"level":2,"text":"Available strategies","id":"available-strategies"},{"level":2,"text":"Partitioning interaction","id":"partitioning-interaction"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/adapters/write-pipeline/load-strategies/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Load Strategies\nnavTitle: Load Strategies\ndescription: How full_load, full_load_append, and append strategies control data storage and table management.\n---\n\nThe `loadStrategy` configuration determines how new data is stored and how Hive tables are managed. Each strategy provides a different tradeoff between data freshness, history retention, and storage costs.\n\n## Available strategies\n\n### full_load\n\nThe default strategy. Replaces all existing data with the new records on every write.\n\n**Steps:**\n\n1. Delete all existing files at the base path\n2. Upload new Parquet file to `<basePath>/latest.parquet/<uuid>.parquet`\n3. Drop and recreate the Hive table pointing to the `latest.parquet/` directory\n\n**Use when:** You always want the table to reflect the most recent payload. No history is kept.\n\n**Table structure:**\n| Table | Location |\n|-------|----------|\n| `<tableName>` | `s3a://<bucket>/<basePath>/latest.parquet/` |\n\n### full_load_append\n\nCombines `full_load` with historical archiving. The latest data is always queryable, and all historical writes are preserved in an append-only directory.\n\n**Steps:**\n\n1. Upload Parquet file to `<basePath>/latest.parquet/<uuid>.parquet` (replaces previous)\n2. Upload the same Parquet file to `<basePath>/all.parquet/<partition_path>`\n3. Drop and recreate both `_latest` and `_all` Hive tables\n\n**Use when:** You need both a \"current state\" view and a historical log of all writes.\n\n**Table structure:**\n| Table | Location |\n|-------|----------|\n| `<tableName>_latest` | `s3a://<bucket>/<basePath>/latest.parquet/` |\n| `<tableName>_all` | `s3a://<bucket>/<basePath>/all.parquet/` |\n\n### append\n\nOnly appends data. No \"latest\" snapshot is maintained.\n\n**Steps:**\n\n1. Upload Parquet file to `<basePath>/all.parquet/<partition_path>`\n2. Drop and recreate the Hive table pointing to the `all.parquet/` directory\n\n**Use when:** You're building a log or event stream where every write adds to the history.\n\n**Table structure:**\n| Table | Location |\n|-------|----------|\n| `<tableName>` | `s3a://<bucket>/<basePath>/all.parquet/` |\n\n## Partitioning interaction\n\nFor `full_load`, partitioning configuration is ignored — data is always written as a single file in `latest.parquet/`.\n\nFor `full_load_append` and `append`, the partition path within `all.parquet/` is determined by the [partitioning configuration](/docs/adapters/write-pipeline/partitioning).\n\n<Note>\n  The `full_load` strategy deletes the entire base path before uploading. Make\n  sure you don't share the `basePath` between multiple endpoints.\n</Note>\n","description":"How full_load, full_load_append, and append strategies control data storage and table management.","navTitle":"Load Strategies","keywords":["strategies","fullload","fullloadappend","append","control"]}
{"schemaVersion":"1.0.0","docId":"adapters/write-pipeline/partitioning","source":"adapters","slug":"write-pipeline/partitioning","path":"/docs/adapters/write-pipeline/partitioning","raw_path":"/raw/adapters/write-pipeline/partitioning.md","title":"Partitioning","headings":[{"level":2,"text":"Partitioning modes","id":"partitioning-modes"},{"level":2,"text":"Timestamp mode","id":"timestamp-mode"},{"level":2,"text":"Disabled mode","id":"disabled-mode"},{"level":2,"text":"Field mode","id":"field-mode"},{"level":2,"text":"Custom mode","id":"custom-mode"},{"level":2,"text":"Related types","id":"related-types"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/adapters/write-pipeline/partitioning/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Partitioning\nnavTitle: Partitioning\ndescription: How data is partitioned into Hive-style paths within the all.parquet/ directory.\n---\n\nPartitioning controls how data is organized within the `all.parquet/` directory for `full_load_append` and `append` strategies. It determines the subdirectory structure using Hive-style `key=value` paths.\n\n<Note>\n  Partitioning only applies to `full_load_append` and `append` strategies. For\n  `full_load`, partitioning is ignored.\n</Note>\n\n## Partitioning modes\n\nThe `partitioning` config value determines the mode:\n\n| Value                   | Mode      | Description                                               |\n| ----------------------- | --------- | --------------------------------------------------------- |\n| `true` (default)        | Timestamp | Partitions by write timestamp                             |\n| `false`                 | Disabled  | Flat file layout (UUID-only paths)                        |\n| `\"fieldName\"`           | Field     | Partitions by a record field's date value                 |\n| `\"field:component/...\"` | Custom    | Multi-segment partitioning with date component extraction |\n\n## Timestamp mode\n\nWhen `partitioning: true` (the default), each write is partitioned by the current timestamp. The pipeline automatically injects `load_timestamp`, `load_timestamp_year`, and `load_timestamp_month` columns into the records and schema.\n\n```\nall.parquet/year=2025/month=01/day=15/<uuid>.parquet\n```\n\nThe `partitioningFormat` controls granularity:\n\n| Format             | Example path                               |\n| ------------------ | ------------------------------------------ |\n| `\"year\"`           | `year=2025/<uuid>.parquet`                 |\n| `\"year/month\"`     | `year=2025/month=01/<uuid>.parquet`        |\n| `\"year/month/day\"` | `year=2025/month=01/day=15/<uuid>.parquet` |\n\n## Disabled mode\n\nWhen `partitioning: false`, files are placed directly in `all.parquet/` with a flat UUID-based path:\n\n```\nall.parquet/<uuid>.parquet\n```\n\n## Field mode\n\nWhen `partitioning` is a simple field name (e.g. `\"event_date\"`), the pipeline extracts the date value from each record's field and partitions accordingly:\n\n```json\n{ \"partitioning\": \"event_date\", \"partitioningFormat\": \"year/month\" }\n```\n\nFor a record with `event_date: \"2025-03-20\"`:\n\n```\nall.parquet/year=2025/month=03/<uuid>.parquet\n```\n\nRecords are grouped by their partition key — records with the same partition path are written to the same Parquet file.\n\n<Warning>\n  If a record is missing the partition field, has a null value, or contains an\n  unparseable date, the pipeline throws a `PartitionFieldError`.\n</Warning>\n\n## Custom mode\n\nCustom partitioning allows multi-segment paths with field extraction and date component parsing. The format uses `/` to separate segments and `:` to specify a date component:\n\n```json\n{ \"partitioning\": \"region/event_date:year/event_date:month\" }\n```\n\nFor a record with `region: \"eu-west-1\"` and `event_date: \"2025-03-20\"`:\n\n```\nall.parquet/region=eu-west-1/year=2025/month=03/<uuid>.parquet\n```\n\n### Segment types\n\n| Segment            | Description              | Example output     |\n| ------------------ | ------------------------ | ------------------ |\n| `fieldName`        | Raw field value          | `customer_id=acme` |\n| `fieldName:year`   | Year from ISO date       | `year=2025`        |\n| `fieldName:month`  | Month from ISO date      | `month=03`         |\n| `fieldName:day`    | Day from ISO date        | `day=20`           |\n| `fieldName:hour`   | Hour from ISO datetime   | `hour=14`          |\n| `fieldName:minute` | Minute from ISO datetime | `minute=30`        |\n| `fieldName:second` | Second from ISO datetime | `second=45`        |\n\n## Related types\n\n<InterfaceReference\n  file=\"adapters/src/write-pipeline\"\n  name=\"ResolvedPartitioning\"\n  mode=\"declaration\"\n/>\n\n<InlineReference\n  file=\"adapters/src/write-pipeline\"\n  include={[\"PartitionMode\", \"PartitionSegment\", \"PartitionFieldError\"]}\n/>\n","description":"How data is partitioned into Hive-style paths within the all.parquet/ directory.","navTitle":"Partitioning","keywords":["partitioning","partitioned","hive-style","paths","within"]}
{"schemaVersion":"1.0.0","docId":"api","source":"api","slug":"api","path":"/docs/api","raw_path":"/raw/api.md","title":"API","headings":[],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: API\nnavTitle: API\ndescription: GraphQL API server with Hono, Yoga, and Pothos.\nentrypoint: /docs/api/overview/introduction\n---\n","description":"GraphQL API server with Hono, Yoga, and Pothos.","navTitle":"API","keywords":["graphql","server","pothos"]}
{"schemaVersion":"1.0.0","docId":"api/authentication/jwt-authentication","source":"api","slug":"authentication/jwt-authentication","path":"/docs/api/authentication/jwt-authentication","raw_path":"/raw/api/authentication/jwt-authentication.md","title":"JWT Authentication","headings":[{"level":2,"text":"GetUserResolver type","id":"get-user-resolver-type"},{"level":2,"text":"Default behavior","id":"default-behavior"},{"level":2,"text":"Mock authentication","id":"mock-authentication"},{"level":2,"text":"Custom JWT verification","id":"custom-jwt-verification"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/authentication/jwt-authentication/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: JWT Authentication\nnavTitle: JWT Authentication\ndescription: Configure user authentication via JWT tokens or mock authentication for development.\n---\n\nLakeQL uses JWT-based authentication to identify the current user on each request. The `getUser` resolver extracts user information from the `Authorization` header and makes it available in the GraphQL context.\n\n## GetUserResolver type\n\n```ts\ntype GetUserResolver = (\n  req: Request\n) => Promise<JWTPayload | null | undefined> | JWTPayload | null | undefined\n```\n\nReturning `null` or `undefined` means the request is unauthenticated. The `JWTPayload` type extends jose's `JWTPayload` with an additional `userName` field:\n\n```ts\n// Extends jose JWTPayload\ninterface JWTPayload {\n  userName: string\n  // ...standard JWT claims (iss, sub, aud, exp, etc.)\n}\n```\n\n## Default behavior\n\nThe built-in `getUser` resolver supports mock authentication for local development. It does not perform real JWT verification — you must provide a custom resolver for production.\n\n## Mock authentication\n\nSet the following environment variables to enable mock auth:\n\n```bash\nAUTH_MOCK=true\nAUTH_MOCK_TOKEN=my-dev-token\n```\n\nThen pass the token as the `Authorization` header and the username via `x-username`:\n\n```bash\ncurl -X POST http://localhost:4000/graphql \\\n  -H \"Authorization: my-dev-token\" \\\n  -H \"x-username: developer\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"{ __typename }\"}'\n```\n\n## Custom JWT verification\n\nFor production, provide a custom `getUser` resolver that verifies tokens using your identity provider:\n\n```ts path=\"src/auth.ts\"\nimport type { GetUserResolver } from \"@lakeql/api/types\"\nimport { jwtVerify, createRemoteJWKSet } from \"jose\"\n\nconst jwks = createRemoteJWKSet(\n  new URL(\"https://auth.example.com/.well-known/jwks.json\")\n)\n\nexport const getUser: GetUserResolver = async (req) => {\n  const authHeader = req.headers.get(\"authorization\")\n  if (!authHeader?.startsWith(\"Bearer \")) {\n    return null\n  }\n\n  const token = authHeader.slice(7)\n\n  try {\n    const { payload } = await jwtVerify(token, jwks, {\n      issuer: \"https://auth.example.com\",\n      audience: \"lakeql-api\",\n    })\n\n    return {\n      ...payload,\n      userName: payload.sub ?? \"unknown\",\n    }\n  } catch {\n    return null\n  }\n}\n```\n\nPass it to `defineConfig` in your `src/config.ts`:\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\n\nimport { getUser } from \"./auth\"\nimport { allConfigs } from \"./config-registry\"\n\nexport const config = defineConfig({\n  allConfigs,\n  baseDir: import.meta.dirname,\n  getUser,\n  schemaPath: \"./schemas\",\n})\n```\n\nThe resolved user is available in every GraphQL resolver via `context.currentUser`.\n","description":"Configure user authentication via JWT tokens or mock authentication for development.","navTitle":"JWT Authentication","keywords":["authentication","configure","tokens","development","getuserresolver"]}
{"schemaVersion":"1.0.0","docId":"api/authentication/permissions","source":"api","slug":"authentication/permissions","path":"/docs/api/authentication/permissions","raw_path":"/raw/api/authentication/permissions.md","title":"Permissions","headings":[{"level":2,"text":"Permission interface","id":"permission-interface"},{"level":2,"text":"Wildcard access","id":"wildcard-access"},{"level":2,"text":"createPermission helper","id":"create-permission-helper"},{"level":2,"text":"How permissions are evaluated","id":"how-permissions-are-evaluated"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/authentication/permissions/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Permissions\ndescription: Define table-level permission rules for technical users via the Permission interface.\n---\n\nPermissions define which catalogs, schemas, and tables a technical user can access. They act as an application-level allow list on top of Trino's built-in authorization — primarily useful for service accounts that execute queries via a shared system user.\n\n## Permission interface\n\n<InterfaceReference file=\"api/src/types\" name=\"Permission\" />\n\n## Wildcard access\n\nUse `[\"*\"]` in the `tables` array to grant access to all tables within a catalog/schema combination:\n\n```ts\n{\n  catalog: \"hive\",\n  schema: \"analytics\",\n  tables: [\"*\"] // access all tables in hive.analytics\n}\n```\n\n## createPermission helper\n\nFor type-safe permission construction that validates against your generated configs, use the `createPermission` helper with `defineConfig`:\n\n```ts path=\"src/permissions.ts\"\nimport { createPermission as createPermissionFromApi } from \"@lakeql/api/helpers\"\nimport type { Permission } from \"@lakeql/api/types\"\n\nimport { allConfigs } from \"./config-registry\"\n\nconst createPermission = createPermissionFromApi(allConfigs)\n\nexport const permissions: Permission[] = [\n  {\n    name: \"data-pipeline-service\",\n    useSystemUser: true,\n    permissions: {\n      Query: [\n        createPermission(\"hive\", \"raw_data\", [\"events\", \"users\", \"sessions\"]),\n      ],\n      Mutation: [createPermission(\"hive\", \"processed\", [\"aggregated_events\"])],\n    },\n  },\n  {\n    name: \"reporting-service\",\n    useSystemUser: true,\n    permissions: {\n      Query: [createPermission(\"hive\", \"analytics\", [\"*\"])],\n      Mutation: [],\n    },\n  },\n]\n```\n\n## How permissions are evaluated\n\n- **Read (Query):** If no permission entry exists for a user, reads are allowed (Trino handles auth for human users). If rules exist, at least one must match the requested catalog/schema/table.\n- **Write (Mutation):** If no permission entry exists, writes are denied. Rules must explicitly grant access.\n\nSee [Scope Authorization](/docs/api/authentication/scope-authorization) for the full evaluation logic.\n","description":"Define table-level permission rules for technical users via the Permission interface.","keywords":["permission","permissions","interface","define","table-level"]}
{"schemaVersion":"1.0.0","docId":"api/authentication/scope-authorization","source":"api","slug":"authentication/scope-authorization","path":"/docs/api/authentication/scope-authorization","raw_path":"/raw/api/authentication/scope-authorization.md","title":"Scope Authorization","headings":[{"level":2,"text":"Auth scopes","id":"auth-scopes"},{"level":2,"text":"Read permission logic","id":"read-permission-logic"},{"level":2,"text":"Write permission logic","id":"write-permission-logic"},{"level":2,"text":"Custom resolvers","id":"custom-resolvers"},{"level":2,"text":"Applying scopes to fields","id":"applying-scopes-to-fields"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/authentication/scope-authorization/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Scope Authorization\ndescription: Three authorization scopes control access to GraphQL fields — authorized, readPermission, and writePermission.\n---\n\nLakeQL uses Pothos's scope auth plugin to enforce authorization at the field level. Every query or mutation field can declare which scope is required to access it.\n\n## Auth scopes\n\n| Scope             | Check                                              | Description                            |\n| ----------------- | -------------------------------------------------- | -------------------------------------- |\n| `authorized`      | `!!context.currentUser`                            | User is authenticated (any valid JWT)  |\n| `readPermission`  | Evaluates catalog/schema/table against permissions | User can read from the specified table |\n| `writePermission` | Evaluates catalog/schema/table against permissions | User can write to the specified table  |\n\n## Read permission logic\n\nThe default `hasReadPermission` resolver follows this decision model:\n\n1. **No user** → deny\n2. **No permission entry for user** → allow (Trino handles auth for human users)\n3. **Permission entry exists but no Query rules** → allow\n4. **Query rules exist** → at least one rule must match the catalog, schema, and table name\n\nThis default-allow model works because human users (OAuth2 Authorization Code Flow) typically have direct Trino identities. Technical users that go through a shared system account need explicit rules.\n\n## Write permission logic\n\nThe default `hasWritePermission` resolver is stricter:\n\n1. **No user** → deny\n2. **No permission entry for user** → deny\n3. **Permission entry exists but no Mutation rules** → deny\n4. **Mutation rules exist** → at least one rule must match the catalog, schema, and table name\n\nWrites always require explicit permission because they execute via a system user in Trino and bypass Trino's per-user authorization.\n\n## Custom resolvers\n\nOverride the default logic by providing custom resolvers in `defineConfig`:\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\n\nimport { allConfigs } from \"./config-registry\"\n\nexport const config = defineConfig({\n  allConfigs,\n  hasReadPermission: ({ context, catalog, schema, tableName }) => {\n    // Custom logic: check external authorization service\n    return checkExternalAuthService(context.currentUser, {\n      action: \"read\",\n      resource: `${catalog}.${schema}.${tableName}`,\n    })\n  },\n  hasWritePermission: ({ context, catalog, schema, tableName }) => {\n    // Custom logic: all writes require admin role\n    return context.currentUser?.role === \"admin\"\n  },\n})\n```\n\n## Applying scopes to fields\n\nWhen building custom query schemas, use `authScopes` to protect fields:\n\n```ts\nimport { builder } from \"@lakeql/api/builder\"\n\nbuilder.queryField(\"sensitiveData\", (t) =>\n  t.field({\n    type: \"String\",\n    authScopes: {\n      readPermission: {\n        catalog: \"hive\",\n        schema: \"internal\",\n        tableName: \"secrets\",\n      },\n    },\n    resolve: () => \"classified information\",\n  })\n)\n```\n\nFor mutations, use `writePermission`:\n\n```ts\nbuilder.mutationField(\"updateRecord\", (t) =>\n  t.field({\n    type: \"Boolean\",\n    authScopes: {\n      writePermission: {\n        catalog: \"hive\",\n        schema: \"production\",\n        tableName: \"records\",\n      },\n    },\n    args: {\n      id: t.arg.string({ required: true }),\n      value: t.arg.string({ required: true }),\n    },\n    resolve: async (_parent, args, context) => {\n      // perform the write operation\n      return true\n    },\n  })\n)\n```\n\nFields without `authScopes` are publicly accessible (no authentication required).\n","description":"Three authorization scopes control access to GraphQL fields — authorized, readPermission, and writePermission.","keywords":["scopes","authorization","fields","permission","logic"]}
{"schemaVersion":"1.0.0","docId":"api/customization/cors-configuration","source":"api","slug":"customization/cors-configuration","path":"/docs/api/customization/cors-configuration","raw_path":"/raw/api/customization/cors-configuration.md","title":"CORS Configuration","headings":[{"level":2,"text":"Default CORS configuration","id":"default-cors-configuration"},{"level":2,"text":"Customizing CORS","id":"customizing-cors"},{"level":2,"text":"Adding custom middleware","id":"adding-custom-middleware"},{"level":2,"text":"Adding authentication middleware","id":"adding-authentication-middleware"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/customization/cors-configuration/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: CORS Configuration\nnavTitle: CORS Configuration\ndescription: Customize CORS settings and add additional Hono middleware to the API server.\n---\n\nBy default, `createApiServer` applies permissive CORS settings on the GraphQL endpoint. For production deployments, you'll typically want to restrict the allowed origins and customize headers.\n\n## Default CORS configuration\n\nThe built-in CORS middleware is applied to POST, GET, and OPTIONS requests on the GraphQL path:\n\n| Setting        | Default Value                       |\n| -------------- | ----------------------------------- |\n| `origin`       | `\"*\"` (all origins)                 |\n| `allowMethods` | `[\"POST\", \"GET\", \"OPTIONS\"]`        |\n| `allowHeaders` | `[\"content-type\", \"authorization\"]` |\n| `credentials`  | `true`                              |\n\n## Customizing CORS\n\nSince `createApiServer` returns the Hono `app` instance, you can add or override middleware:\n\n```ts\nimport { createApiServer } from \"@lakeql/api/server\"\nimport { cors } from \"hono/cors\"\n\nconst { app } = await createApiServer({\n  schemaPath: \"./src/schemas\",\n})\n\n// Add restrictive CORS for a specific path\napp.use(\n  \"/graphql/*\",\n  cors({\n    origin: [\"https://app.example.com\", \"https://admin.example.com\"],\n    allowMethods: [\"POST\", \"OPTIONS\"],\n    allowHeaders: [\"content-type\", \"authorization\", \"x-request-id\"],\n    credentials: true,\n    maxAge: 86400,\n  })\n)\n```\n\n## Adding custom middleware\n\nThe Hono app supports any compatible middleware. Add rate limiting, request logging, or custom headers:\n\n```ts\nimport { createApiServer, startApiServer } from \"@lakeql/api/server\"\nimport { serve } from \"@hono/node-server\"\n\nconst { app, yoga } = await createApiServer({\n  schemaPath: \"./src/schemas\",\n})\n\n// Custom request ID header\napp.use(\"*\", async (c, next) => {\n  const requestId = crypto.randomUUID()\n  c.header(\"x-request-id\", requestId)\n  await next()\n})\n\n// Custom health endpoint\napp.get(\"/ready\", (c) =>\n  c.json({ status: \"ready\", timestamp: new Date().toISOString() })\n)\n\n// Start manually with the customized app\nserve({\n  fetch: app.fetch,\n  port: 4000,\n})\n```\n\n## Adding authentication middleware\n\nFor scenarios where you need pre-GraphQL authentication checks:\n\n```ts\nimport { createApiServer } from \"@lakeql/api/server\"\n\nconst { app } = await createApiServer({\n  schemaPath: \"./src/schemas\",\n})\n\n// Block unauthenticated requests before they reach Yoga\napp.use(\"/graphql/*\", async (c, next) => {\n  const auth = c.req.header(\"authorization\")\n  if (!auth && c.req.method !== \"OPTIONS\") {\n    return c.json({ error: \"Authorization required\" }, 401)\n  }\n  await next()\n})\n```\n\nNote that this is separate from the GraphQL-level `authScopes` — it blocks requests at the HTTP layer before they reach the GraphQL resolver.\n","description":"Customize CORS settings and add additional Hono middleware to the API server.","navTitle":"CORS Configuration","keywords":["middleware","configuration","adding","customize","settings"]}
{"schemaVersion":"1.0.0","docId":"api/customization/custom-queries-mutations","source":"api","slug":"customization/custom-queries-mutations","path":"/docs/api/customization/custom-queries-mutations","raw_path":"/raw/api/customization/custom-queries-mutations.md","title":"Custom Queries & Mutations","headings":[{"level":2,"text":"Creating an endpoint definition","id":"creating-an-endpoint-definition"},{"level":2,"text":"Generating the endpoint","id":"generating-the-endpoint"},{"level":2,"text":"Query-only endpoints","id":"query-only-endpoints"},{"level":2,"text":"Field options","id":"field-options"},{"level":2,"text":"Mutation load strategies","id":"mutation-load-strategies"},{"level":2,"text":"File discovery","id":"file-discovery"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/customization/custom-queries-mutations/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Custom Queries & Mutations\nnavTitle: Custom Queries & Mutations\ndescription: Add custom query and mutation endpoints using the CLI endpoint builder.\n---\n\nThe recommended way to add custom queries and mutations is through the CLI's `create-endpoint` command. It generates all necessary files (schema, config, types, JSON Schema) from a simple endpoint definition — the same way the `pull` command works, but from a JSON definition instead of an existing Trino table.\n\n## Creating an endpoint definition\n\nDefine your endpoint as a JSON file:\n\n```json path=\"my-endpoint.json\"\n{\n  \"version\": \"1.0\",\n  \"tableName\": \"user_events\",\n  \"catalog\": \"hive\",\n  \"schema\": \"analytics\",\n  \"fields\": [\n    { \"name\": \"event_id\", \"type\": \"String\", \"options\": { \"required\": true } },\n    { \"name\": \"message\", \"type\": \"String\" },\n    { \"name\": \"timestamp\", \"type\": \"DateTime\" }\n  ],\n  \"mutation\": {\n    \"loadStrategy\": \"full_load\",\n    \"type\": \"minio\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"analytics/user_events\",\n    \"endpoint\": \"http://localhost:9000\"\n  }\n}\n```\n\n## Generating the endpoint\n\nRun the CLI to generate all files:\n\n<Command variant=\"exec\">\n  lakeql-cli create-endpoint --from-file ./my-endpoint.json\n</Command>\n\nThis generates:\n\n```\nsrc/schemas/custom/hive/analytics/user_events/\n├── config.ts              # Hive + storage config\n├── endpoint.json          # Persisted definition\n├── json-schema.json       # JSON Schema for Parquet serialization\n├── query-schema.ts        # GraphQL query with sorting, filtering, paging\n└── mutation-schema.ts     # GraphQL mutation with write pipeline\n```\n\nThe `config-registry.ts` is updated automatically to include the new endpoint.\n\n## Query-only endpoints\n\nOmit the `mutation` field to generate a query-only endpoint:\n\n```json path=\"query-only.json\"\n{\n  \"version\": \"1.0\",\n  \"tableName\": \"reports\",\n  \"catalog\": \"hive\",\n  \"schema\": \"analytics\",\n  \"fields\": [\n    { \"name\": \"report_id\", \"type\": \"String\" },\n    { \"name\": \"title\", \"type\": \"String\" },\n    { \"name\": \"created_at\", \"type\": \"DateTime\" }\n  ]\n}\n```\n\nOr explicitly disable mutations:\n\n```json\n{ \"mutation\": false }\n```\n\n## Field options\n\nEach field supports options for mutation input behavior:\n\n| Option        | Default | Description                                                    |\n| ------------- | ------- | -------------------------------------------------------------- |\n| `required`    | `false` | Field is required in mutation input                            |\n| `readOnly`    | `false` | Field appears in queries but is excluded from mutation input   |\n| `validations` | `[]`    | Zod validation refinements (email, url, uuid, min, max, regex) |\n\n```json\n{\n  \"name\": \"email\",\n  \"type\": \"String\",\n  \"options\": {\n    \"required\": true,\n    \"validations\": [{ \"type\": \"email\" }]\n  }\n}\n```\n\n## Mutation load strategies\n\nThe `mutation` config supports three strategies. See [Load Strategies](/docs/adapters/write-pipeline/load-strategies) for details.\n\n| Strategy           | Behavior                        |\n| ------------------ | ------------------------------- |\n| `full_load`        | Replace all data on every write |\n| `full_load_append` | Replace latest + keep history   |\n| `append`           | Append only                     |\n\n## File discovery\n\nThe API server automatically discovers all schema files at startup by scanning for:\n\n```\nschemas/**/query-schema.{ts,js,mjs}\nschemas/**/mutation-schema.{ts,js,mjs}\n```\n\nNo manual registration is needed — just generate and start the server.\n","description":"Add custom query and mutation endpoints using the CLI endpoint builder.","navTitle":"Custom Queries & Mutations","keywords":["endpoint","custom","mutation","endpoints","queries"]}
{"schemaVersion":"1.0.0","docId":"api/customization/extending-core","source":"api","slug":"customization/extending-core","path":"/docs/api/customization/extending-core","raw_path":"/raw/api/customization/extending-core.md","title":"Extending Core","headings":[{"level":2,"text":"Custom GetUserResolver","id":"custom-get-user-resolver"},{"level":2,"text":"Custom ReadPermissionResolver","id":"custom-read-permission-resolver"},{"level":2,"text":"Custom WritePermissionResolver","id":"custom-write-permission-resolver"},{"level":2,"text":"Registering custom resolvers","id":"registering-custom-resolvers"},{"level":2,"text":"Adding custom scalars","id":"adding-custom-scalars"},{"level":2,"text":"Overriding built-in scalar serialization","id":"overriding-built-in-scalar-serialization"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/customization/extending-core/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Extending Core\ndescription: Provide custom resolvers for authentication, permissions, and add additional scalars.\n---\n\nLakeQL's extension points let you replace built-in behavior with custom implementations. The primary customization targets are the authentication resolver, permission resolvers, and scalar types.\n\n## Custom GetUserResolver\n\nReplace the built-in mock auth with your own JWT verification:\n\n```ts\nimport type { GetUserResolver } from \"@lakeql/api/types\"\nimport { jwtVerify } from \"jose\"\n\nexport const customGetUser: GetUserResolver = async (req) => {\n  const token = req.headers.get(\"authorization\")?.replace(\"Bearer \", \"\")\n  if (!token) return null\n\n  try {\n    const { payload } = await jwtVerify(token, secretKey)\n    return { ...payload, userName: payload.sub ?? \"unknown\" }\n  } catch {\n    return null\n  }\n}\n```\n\n## Custom ReadPermissionResolver\n\nOverride how read permissions are evaluated:\n\n```ts\nimport type { ReadPermissionResolver } from \"@lakeql/api/types\"\n\nexport const customReadPermission: ReadPermissionResolver = ({\n  context,\n  catalog,\n  schema,\n  tableName,\n}) => {\n  // Allow all reads for admin users\n  if (context.currentUser?.role === \"admin\") {\n    return true\n  }\n\n  // Check against external policy engine\n  return checkPolicy(context.currentUser, \"read\", {\n    catalog,\n    schema,\n    tableName,\n  })\n}\n```\n\n## Custom WritePermissionResolver\n\nOverride how write permissions are evaluated:\n\n```ts\nimport type { WritePermissionResolver } from \"@lakeql/api/types\"\n\nexport const customWritePermission: WritePermissionResolver = ({\n  context,\n  catalog,\n  schema,\n  tableName,\n}) => {\n  // Only service accounts can write\n  if (!context.currentUser?.isServiceAccount) {\n    return false\n  }\n\n  return checkPolicy(context.currentUser, \"write\", {\n    catalog,\n    schema,\n    tableName,\n  })\n}\n```\n\n## Registering custom resolvers\n\nPass all custom resolvers via `defineConfig`:\n\n```ts\nimport { defineConfig } from \"@lakeql/api/config\"\nimport { allConfigs } from \"./generated/configs\"\nimport { customGetUser } from \"./auth\"\nimport { customReadPermission, customWritePermission } from \"./permissions\"\n\nexport default defineConfig({\n  allConfigs,\n  getUser: customGetUser,\n  hasReadPermission: customReadPermission,\n  hasWritePermission: customWritePermission,\n})\n```\n\n## Adding custom scalars\n\nRegister additional scalars on the shared builder in a `query-schema.ts` file. Place it in a directory that sorts alphabetically before your other schemas (e.g., `00-scalars/`) to ensure the scalars are available when other schema files are loaded.\n\n```ts path=\"src/schemas/00-scalars/query-schema.ts\"\nimport { builder } from \"@lakeql/api/builder\"\nimport { JSONResolver, BigIntResolver } from \"graphql-scalars\"\n\nbuilder.addScalarType(\"JSON\", JSONResolver, {})\nbuilder.addScalarType(\"BigInt\", BigIntResolver, {})\n```\n\n<Callout type=\"info\">\n  `graphql` and `graphql-scalars` are transitive dependencies of `@lakeql/api`.\n  To avoid issues if those internals ever change, install them explicitly in\n  your project.\n\n<Command variant=\"install\">graphql graphql-scalars</Command>\n\n</Callout>\n\nThese scalars become available in all other schema files loaded after this one. Place scalar definitions in a directory that sorts alphabetically before other schemas (e.g., `00-scalars/`) to ensure they're loaded first.\n\n## Overriding built-in scalar serialization\n\nGraphQL's built-in `Int` scalar only accepts 32-bit signed integers (max ~2.1 billion). Both Hive/Trino `INT` and `BIGINT` columns are mapped to the `Integer` field type by the `pull` command, so values beyond the 32-bit range can reach GraphQL at runtime. When that happens, the default `Int.serialize` throws a range error.\n\nYou can monkey-patch the serializer at startup to handle these edge cases.\n\n### Setup\n\nCreate a `query-schema.ts` file in a directory that sorts alphabetically before your other schemas (e.g., `00-scalars/`). LakeQL loads all `query-schema.{ts,js,mjs}` and `mutation-schema.{ts,js,mjs}` files in alphabetical directory order, so the override will be active before any resolvers execute.\n\n```text\nsrc/\n└── schemas/\n    ├── 00-scalars/\n    │   └── query-schema.ts   ← Int override goes here\n    ├── my-catalog/\n    │   └── ...\n    └── ...\n```\n\n```ts path=\"src/schemas/00-scalars/query-schema.ts\"\nimport { GraphQLInt } from \"graphql\"\n\n// Preserve the original serializer as a fallback\nconst originalIntSerialize = GraphQLInt.serialize.bind(GraphQLInt)\n\nGraphQLInt.serialize = (value: unknown) => {\n  // Pass through any safe integer value directly, even beyond 32-bit range\n  if (\n    typeof value === \"number\" &&\n    Number.isInteger(value) &&\n    Number.isFinite(value)\n  ) {\n    return value\n  }\n\n  // Handle numeric strings (e.g., from database drivers returning string IDs)\n  if (typeof value === \"string\" && value !== \"\") {\n    const num = Number(value)\n    if (Number.isInteger(num) && Number.isFinite(num)) return num\n  }\n\n  // Fall back to default behavior for anything else\n  return originalIntSerialize(value)\n}\n```\n\nSince this file is named `query-schema.ts` and lives in `00-scalars/`, LakeQL will import it automatically — no additional configuration or explicit import is needed.\n","description":"Provide custom resolvers for authentication, permissions, and add additional scalars.","keywords":["custom","resolvers","scalars","extending","provide"]}
{"schemaVersion":"1.0.0","docId":"api/overview/introduction","source":"api","slug":"overview/introduction","path":"/docs/api/overview/introduction","raw_path":"/raw/api/overview/introduction.md","title":"Introduction","headings":[{"level":2,"text":"How it connects","id":"how-it-connects"},{"level":2,"text":"Schema loading","id":"schema-loading"},{"level":2,"text":"Quick start","id":"quick-start"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/overview/introduction/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Introduction\ndescription: What @lakeql/api provides and how it fits into the LakeQL ecosystem.\n---\n\n`@lakeql/api` is the runtime GraphQL server package in the LakeQL ecosystem. It exposes a fully typed GraphQL API that queries data from Trino — a distributed SQL query engine — and returns structured, paginated results to clients.\n\nThe package combines three core libraries into a cohesive server stack:\n\n- **Hono** — A lightweight, high-performance HTTP framework that handles routing, CORS, and middleware.\n- **GraphQL Yoga** — A spec-compliant GraphQL server with built-in support for subscriptions, file uploads, and health checks.\n- **Pothos** — A code-first, type-safe GraphQL schema builder that generates the schema from TypeScript definitions.\n\n## How it connects\n\n`@lakeql/api` sits at the center of the LakeQL architecture. It depends on several sibling packages:\n\n| Package                        | Role                                                          |\n| ------------------------------ | ------------------------------------------------------------- |\n| `@lakeql/trino-client`         | Executes SQL queries against Trino                            |\n| `@lakeql/query-builder`        | Constructs SQL from GraphQL arguments (filters, pagination)   |\n| `@lakeql/response-transformer` | Transforms raw Trino responses into GraphQL-compatible shapes |\n| `@lakeql/helpers`              | Shared utility functions                                      |\n| `@lakeql/logger`               | Structured logging                                            |\n\n## Schema loading\n\nQuery and mutation schemas are loaded automatically at startup using glob patterns. The server scans for files matching `schemas/**/{query,mutation}-schema.{ts,js,mjs}` in the configured schema directory. Each schema file registers its types and resolvers on the shared Pothos builder — no manual wiring required.\n\n## Quick start\n\nInstall the required packages:\n\n<Command variant=\"install\">\n  @lakeql/api @lakeql/query-builder @lakeql/trino-client @t3-oss/env-core zod\n</Command>\n\nSet up the configuration (`src/config.ts`):\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\n\nimport { getUser } from \"./auth\"\nimport { allConfigs } from \"./config-registry\"\nimport { permissions } from \"./permissions\"\n\nconst baseDir = import.meta.dirname\n\nexport const config = defineConfig({\n  allConfigs,\n  baseDir,\n  getUser,\n  graphqlPath: \"/graphql\",\n  healthCheckEndpoint: \"/live\",\n  permissions,\n  port: 4000,\n  schemaPath: \"./schemas\",\n})\n```\n\nCreate the entry point (`src/index.ts`):\n\n```ts path=\"src/index.ts\"\nimport { config } from \"./config\"\n\nawait config.startServer()\n```\n\nThe server starts on `http://localhost:4000` with GraphiQL available at `/graphql` in development mode.\n\n<Note title=\"Use @lakeql/create-app for a ready-to-go setup\">\n  Instead of wiring everything manually, use\n  [`@lakeql/create-app`](/docs/lakeql/create-app/usage) to scaffold a fully\n  functional project with auth, permissions, environment validation, and CLI\n  integration pre-configured.\n</Note>\n","description":"What @lakeql/api provides and how it fits into the LakeQL ecosystem.","keywords":["introduction","lakeqlapi","provides","lakeql","ecosystem"]}
{"schemaVersion":"1.0.0","docId":"api/schema-builder/builder-configuration","source":"api","slug":"schema-builder/builder-configuration","path":"/docs/api/schema-builder/builder-configuration","raw_path":"/raw/api/schema-builder/builder-configuration.md","title":"Builder Configuration","headings":[{"level":2,"text":"Builder setup","id":"builder-setup"},{"level":2,"text":"Context type","id":"context-type"},{"level":2,"text":"Auth scopes","id":"auth-scopes"},{"level":2,"text":"SortDirection enum","id":"sort-direction-enum"},{"level":2,"text":"Importing the builder","id":"importing-the-builder"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/schema-builder/builder-configuration/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Builder Configuration\ndescription: Pothos SchemaBuilder setup with scope auth, validation plugins, and the GraphQL context type.\n---\n\nThe Pothos `SchemaBuilder` is the foundation of LakeQL's type-safe GraphQL schema. It's pre-configured with auth scopes and validation, and shared across all query schema files.\n\n## Builder setup\n\n```ts\nimport SchemaBuilder from \"@pothos/core\"\nimport ScopeAuthPlugin from \"@pothos/plugin-scope-auth\"\nimport ValidationPlugin from \"@pothos/plugin-validation\"\n\nconst builder = new SchemaBuilder<{\n  Context: Context\n  Scalars: Partial<UserScalars[\"Scalars\"]>\n  AuthScopes: {\n    authorized: boolean\n    readPermission: PermissionFields\n    writePermission: PermissionFields\n  }\n}>({\n  plugins: [ScopeAuthPlugin, ValidationPlugin],\n  scopeAuth: {\n    /* ... */\n  },\n  validation: {\n    /* ... */\n  },\n})\n```\n\n## Context type\n\nEvery resolver receives this context object:\n\n<InterfaceReference file=\"api/src/types\" name=\"Context\" />\n\nA `logger` instance is also injected into the context by the Yoga server setup.\n\n## Auth scopes\n\nThe builder defines three scopes that can be applied to any field:\n\n```ts\nAuthScopes: {\n  authorized: boolean // user is authenticated\n  readPermission: PermissionFields // { catalog, schema, tableName }\n  writePermission: PermissionFields // { catalog, schema, tableName }\n}\n```\n\n## SortDirection enum\n\nA shared enum for ordering query results:\n\n```ts\nimport { SortDirection } from \"@lakeql/api/builder\"\n// Values: \"ASC\" | \"DESC\"\n```\n\n## Importing the builder\n\nCustom query schemas import the shared builder instance to register types and fields:\n\n```ts\nimport { builder } from \"@lakeql/api/builder\"\n\nbuilder.queryField(\"hello\", (t) =>\n  t.string({\n    resolve: () => \"world\",\n  })\n)\n```\n\nAll files that import `builder` and define types or fields are automatically discovered during schema loading — no explicit registration needed.\n","description":"Pothos SchemaBuilder setup with scope auth, validation plugins, and the GraphQL context type.","keywords":["builder","setup","context","configuration","pothos"]}
{"schemaVersion":"1.0.0","docId":"api/schema-builder/comparison-types","source":"api","slug":"schema-builder/comparison-types","path":"/docs/api/schema-builder/comparison-types","raw_path":"/raw/api/schema-builder/comparison-types.md","title":"Comparison Types","headings":[{"level":2,"text":"Available comparison types","id":"available-comparison-types"},{"level":2,"text":"Usage in GraphQL queries","id":"usage-in-graph-ql-queries"},{"level":2,"text":"Importing comparison types","id":"importing-comparison-types"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/schema-builder/comparison-types/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Comparison Types\ndescription: Pre-built GraphQL input types for filtering queries with typed comparison operators.\n---\n\nLakeQL provides seven comparison input types that power the generated `Where` filter inputs. Each type exposes operators appropriate for its scalar type.\n\n## Available comparison types\n\n### StringFieldComparison\n\n| Operator  | Type       | Description                |\n| --------- | ---------- | -------------------------- |\n| `eq`      | `String`   | Equal to                   |\n| `neq`     | `String`   | Not equal to               |\n| `like`    | `String`   | SQL LIKE pattern match     |\n| `notLike` | `String`   | SQL NOT LIKE pattern match |\n| `in`      | `[String]` | Value in list              |\n| `notIn`   | `[String]` | Value not in list          |\n\n### IntFieldComparison\n\n| Operator | Type    | Description           |\n| -------- | ------- | --------------------- |\n| `eq`     | `Int`   | Equal to              |\n| `neq`    | `Int`   | Not equal to          |\n| `lt`     | `Int`   | Less than             |\n| `lte`    | `Int`   | Less than or equal    |\n| `gt`     | `Int`   | Greater than          |\n| `gte`    | `Int`   | Greater than or equal |\n| `in`     | `[Int]` | Value in list         |\n| `notIn`  | `[Int]` | Value not in list     |\n\n### FloatFieldComparison\n\n| Operator | Type      | Description           |\n| -------- | --------- | --------------------- |\n| `eq`     | `Float`   | Equal to              |\n| `neq`    | `Float`   | Not equal to          |\n| `lt`     | `Float`   | Less than             |\n| `lte`    | `Float`   | Less than or equal    |\n| `gt`     | `Float`   | Greater than          |\n| `gte`    | `Float`   | Greater than or equal |\n| `in`     | `[Float]` | Value in list         |\n| `notIn`  | `[Float]` | Value not in list     |\n\n### BooleanFieldComparison\n\n| Operator | Type      | Description |\n| -------- | --------- | ----------- |\n| `is`     | `Boolean` | Is true     |\n| `isNot`  | `Boolean` | Is false    |\n\n### DateFieldComparison\n\n| Operator | Type   | Description           |\n| -------- | ------ | --------------------- |\n| `eq`     | `Date` | Equal to              |\n| `neq`    | `Date` | Not equal to          |\n| `lt`     | `Date` | Less than             |\n| `lte`    | `Date` | Less than or equal    |\n| `gt`     | `Date` | Greater than          |\n| `gte`    | `Date` | Greater than or equal |\n\n### DateTimeFieldComparison\n\n| Operator | Type       | Description           |\n| -------- | ---------- | --------------------- |\n| `eq`     | `DateTime` | Equal to              |\n| `neq`    | `DateTime` | Not equal to          |\n| `lt`     | `DateTime` | Less than             |\n| `lte`    | `DateTime` | Less than or equal    |\n| `gt`     | `DateTime` | Greater than          |\n| `gte`    | `DateTime` | Greater than or equal |\n\n### IDFieldComparison\n\n| Operator | Type   | Description       |\n| -------- | ------ | ----------------- |\n| `eq`     | `ID`   | Equal to          |\n| `neq`    | `ID`   | Not equal to      |\n| `in`     | `[ID]` | Value in list     |\n| `notIn`  | `[ID]` | Value not in list |\n\n## Usage in GraphQL queries\n\nThese comparison types appear in generated `Where` input types for each table. Use them to filter query results:\n\n```graphql\nquery FilteredUsers {\n  users(\n    where: {\n      age: { gte: 18, lt: 65 }\n      email: { like: \"%@example.com\" }\n      status: { in: [\"active\", \"pending\"] }\n    }\n  ) {\n    nodes {\n      id\n      name\n      email\n    }\n  }\n}\n```\n\n## Importing comparison types\n\nIf you need to reference these types in custom schemas:\n\n```ts\nimport {\n  StringFieldComparison,\n  IntFieldComparison,\n  FloatFieldComparison,\n  BooleanFieldComparison,\n  DateFieldComparison,\n  DateTimeFieldComparison,\n  IDFieldComparison,\n} from \"@lakeql/api/builder\"\n```\n","description":"Pre-built GraphQL input types for filtering queries with typed comparison operators.","keywords":["comparison","types","graphql","queries","pre-built"]}
{"schemaVersion":"1.0.0","docId":"api/schema-builder/input-validation","source":"api","slug":"schema-builder/input-validation","path":"/docs/api/schema-builder/input-validation","raw_path":"/raw/api/schema-builder/input-validation.md","title":"Input Validation","headings":[{"level":2,"text":"How it works","id":"how-it-works"},{"level":2,"text":"Error response format","id":"error-response-format"},{"level":2,"text":"Built-in validation","id":"built-in-validation"},{"level":2,"text":"Adding validation to custom inputs","id":"adding-validation-to-custom-inputs"},{"level":2,"text":"Validation on query arguments","id":"validation-on-query-arguments"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/schema-builder/input-validation/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Input Validation\ndescription: Validate GraphQL input fields using Zod schemas via the Pothos ValidationPlugin.\n---\n\nLakeQL integrates Pothos's ValidationPlugin with Zod to validate input arguments before they reach your resolvers. Invalid inputs return a structured `VALIDATION_FAILED` error with detailed issues.\n\n## How it works\n\nThe ValidationPlugin is registered on the shared builder and uses Zod schemas to validate input field values. When validation fails, a custom error handler formats the response with error code `200` (VALIDATION_FAILED) and the list of Zod issues.\n\n## Error response format\n\nWhen validation fails, the GraphQL error includes:\n\n```json\n{\n  \"errors\": [\n    {\n      \"message\": \"Validation failed\",\n      \"extensions\": {\n        \"code\": \"VALIDATION_FAILED\",\n        \"http\": { \"status\": 400 },\n        \"additionalInformation\": [\n          {\n            \"message\": \"Value must be less than or equal to 2000\",\n            \"path\": [\"perPage\"]\n          }\n        ]\n      }\n    }\n  ]\n}\n```\n\n## Built-in validation\n\nThe `Paging` input type uses validation to enforce `perPage` limits:\n\n```ts\nimport { z } from \"zod\"\nimport { builder, getMaxRecordsPerPage } from \"@lakeql/api/builder\"\n\nconst Paging = builder.inputType(\"Paging\", {\n  fields: (t) => ({\n    page: t.int({ defaultValue: 1 }),\n    perPage: t.int({\n      defaultValue: 100,\n      validate: z\n        .number()\n        .min(1)\n        .refine((value) => value <= getMaxRecordsPerPage(), {\n          message: `Value must be less than or equal to ${getMaxRecordsPerPage()}`,\n        }),\n    }),\n  }),\n})\n```\n\n## Adding validation to custom inputs\n\nUse the `validate` option on any input field to attach a Zod schema:\n\n```ts\nimport { z } from \"zod\"\nimport { builder } from \"@lakeql/api/builder\"\n\nconst CreateUserInput = builder.inputType(\"CreateUserInput\", {\n  fields: (t) => ({\n    email: t.string({\n      required: true,\n      validate: z.string().email(\"Must be a valid email address\"),\n    }),\n    name: t.string({\n      required: true,\n      validate: z.string().min(2).max(100),\n    }),\n    age: t.int({\n      validate: z.number().min(0).max(150),\n    }),\n  }),\n})\n```\n\n## Validation on query arguments\n\nYou can also validate query-level arguments:\n\n```ts\nimport { z } from \"zod\"\nimport { builder } from \"@lakeql/api/builder\"\n\nbuilder.queryField(\"search\", (t) =>\n  t.field({\n    type: [\"String\"],\n    args: {\n      query: t.arg.string({\n        required: true,\n        validate: z\n          .string()\n          .min(3, \"Search query must be at least 3 characters\"),\n      }),\n      limit: t.arg.int({\n        defaultValue: 10,\n        validate: z.number().min(1).max(100),\n      }),\n    },\n    resolve: (_parent, args) => {\n      // args.query is guaranteed to be at least 3 chars\n      // args.limit is guaranteed to be between 1 and 100\n      return [`Result for: ${args.query}`]\n    },\n  })\n)\n```\n","description":"Validate GraphQL input fields using Zod schemas via the Pothos ValidationPlugin.","keywords":["validation","input","validate","graphql","fields"]}
{"schemaVersion":"1.0.0","docId":"api/schema-builder/pagination","source":"api","slug":"schema-builder/pagination","path":"/docs/api/schema-builder/pagination","raw_path":"/raw/api/schema-builder/pagination.md","title":"Pagination","headings":[{"level":2,"text":"Paging input","id":"paging-input"},{"level":2,"text":"PageInfo output","id":"page-info-output"},{"level":2,"text":"ConnectionInterface","id":"connection-interface"},{"level":2,"text":"Configuring maxRecordsPerPage","id":"configuring-max-records-per-page"},{"level":2,"text":"Example query","id":"example-query"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/schema-builder/pagination/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Pagination\ndescription: Built-in page-based pagination with Paging input, PageInfo output, and ConnectionInterface.\n---\n\nLakeQL uses offset-based pagination (page + perPage) rather than cursor-based pagination. This maps naturally to Trino's `LIMIT` and `OFFSET` clauses and provides predictable navigation for tabular data.\n\n## Paging input\n\nThe `Paging` input type is available on all generated query fields:\n\n```graphql\ninput Paging {\n  page: Int = 1\n  perPage: Int = 100\n}\n```\n\n| Field     | Default | Constraints                         |\n| --------- | ------- | ----------------------------------- |\n| `page`    | `1`     | Must be ≥ 1                         |\n| `perPage` | `100`   | Must be ≥ 1 and ≤ maxRecordsPerPage |\n\n## PageInfo output\n\nEvery paginated response includes a `PageInfo` object:\n\n```graphql\ntype PageInfo {\n  currentPage: Int!\n  hasNext: Boolean!\n  hasPrevious: Boolean!\n  maxPages: Int!\n  nextPage: Int\n  previousPage: Int\n}\n```\n\n| Field          | Description                                           |\n| -------------- | ----------------------------------------------------- |\n| `currentPage`  | The current page number                               |\n| `hasNext`      | Whether more pages exist after the current one        |\n| `hasPrevious`  | Whether pages exist before the current one            |\n| `maxPages`     | Total number of pages based on totalCount and perPage |\n| `nextPage`     | Page number for the next page (null if on last page)  |\n| `previousPage` | Page number for the previous page (null if on first)  |\n\n## ConnectionInterface\n\nAll paginated query responses conform to the `ConnectionInterface`:\n\n```ts\ninterface ConnectionInterface<T> {\n  totalCount: number\n  pageInfo: PageInfoInterface\n  nodes: T[]\n}\n```\n\nIn GraphQL, this looks like:\n\n```graphql\ntype UserConnection {\n  totalCount: Int!\n  pageInfo: PageInfo!\n  nodes: [User!]!\n}\n```\n\n## Configuring maxRecordsPerPage\n\nThe maximum allowed value for `perPage` is controlled by:\n\n1. The `API_MAX_RECORDS_PER_PAGE` environment variable (default: `2000`)\n2. The `maxRecordsPerPage` option in `defineConfig`\n\n```ts\nimport { defineConfig } from \"@lakeql/api/config\"\nimport { allConfigs } from \"./generated/configs\"\n\nexport default defineConfig({\n  allConfigs,\n  maxRecordsPerPage: 500, // override the env default\n})\n```\n\nRequesting more than `maxRecordsPerPage` returns a validation error.\n\n## Example query\n\n```graphql\nquery PaginatedUsers {\n  users(paging: { page: 2, perPage: 25 }) {\n    totalCount\n    pageInfo {\n      currentPage\n      hasNext\n      hasPrevious\n      maxPages\n      nextPage\n      previousPage\n    }\n    nodes {\n      id\n      name\n      email\n    }\n  }\n}\n```\n","description":"Built-in page-based pagination with Paging input, PageInfo output, and ConnectionInterface.","keywords":["pagination","paging","input","pageinfo","output"]}
{"schemaVersion":"1.0.0","docId":"api/schema-builder/scalar-types","source":"api","slug":"schema-builder/scalar-types","path":"/docs/api/schema-builder/scalar-types","raw_path":"/raw/api/schema-builder/scalar-types.md","title":"Scalar Types","headings":[{"level":2,"text":"Date","id":"date"},{"level":2,"text":"DateTime","id":"date-time"},{"level":2,"text":"File","id":"file"},{"level":2,"text":"Source code","id":"source-code"},{"level":2,"text":"Usage in custom schemas","id":"usage-in-custom-schemas"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/schema-builder/scalar-types/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Scalar Types\ndescription: Custom GraphQL scalar types for Date, DateTime, and File uploads.\n---\n\nLakeQL registers three custom scalar types on the shared builder. These extend the standard GraphQL scalars to handle common data types from Trino.\n\n## Date\n\nA date-only scalar (no time component). Uses `DateResolver` from `graphql-scalars`.\n\n- **Input:** ISO 8601 date string (e.g., `\"2024-01-15\"`)\n- **Output:** JavaScript `Date` object\n\n## DateTime\n\nA full timestamp scalar with timezone support. Uses `DateTimeResolver` from `graphql-scalars`.\n\n- **Input:** ISO 8601 datetime string (e.g., `\"2024-01-15T10:30:00Z\"`)\n- **Output:** JavaScript `Date` object\n\n## File\n\nAn input-only scalar for file uploads. Attempting to serialize (output) a `File` scalar throws an error.\n\n- **Input:** `File` object (from multipart form data)\n- **Output:** Not supported (throws `\"Uploads can only be used as input types\"`)\n\n## Source code\n\n```ts\nimport { DateResolver, DateTimeResolver } from \"graphql-scalars\"\nimport { builder } from \"@lakeql/api/builder\"\n\nbuilder.addScalarType(\"Date\", DateResolver, {})\nbuilder.addScalarType(\"DateTime\", DateTimeResolver, {})\n\nbuilder.scalarType(\"File\", {\n  serialize: () => {\n    throw new Error(\"Uploads can only be used as input types\")\n  },\n})\n```\n\n## Usage in custom schemas\n\nReference these scalars by name when defining fields:\n\n```ts\nimport { builder } from \"@lakeql/api/builder\"\n\nconst EventType = builder\n  .objectRef<{\n    id: string\n    name: string\n    createdAt: Date\n    occurredOn: Date\n  }>(\"Event\")\n  .implement({\n    fields: (t) => ({\n      id: t.exposeID(\"id\"),\n      name: t.exposeString(\"name\"),\n      createdAt: t.expose(\"createdAt\", { type: \"DateTime\" }),\n      occurredOn: t.expose(\"occurredOn\", { type: \"Date\" }),\n    }),\n  })\n```\n","description":"Custom GraphQL scalar types for Date, DateTime, and File uploads.","keywords":["scalar","types","custom","datetime","graphql"]}
{"schemaVersion":"1.0.0","docId":"api/server-setup/create-api-server","source":"api","slug":"server-setup/create-api-server","path":"/docs/api/server-setup/create-api-server","raw_path":"/raw/api/server-setup/create-api-server.md","title":"createApiServer","headings":[{"level":2,"text":"Signature","id":"signature"},{"level":2,"text":"Return type","id":"return-type"},{"level":2,"text":"What it sets up","id":"what-it-sets-up"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Starting the server","id":"starting-the-server"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/server-setup/create-api-server/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: createApiServer\nnavTitle: createApiServer\ndescription: Create a configured Hono application with GraphQL Yoga integration, CORS, and logging middleware.\n---\n\n`createApiServer` is the primary factory function for building a LakeQL API instance. It wires together Hono, GraphQL Yoga, and all middleware into a ready-to-use server object.\n\n## Signature\n\n```ts\nfunction createApiServer(options?: ApiRuntimeConfig): Promise<ApiServer>\n```\n\n## Return type\n\n<InterfaceReference file=\"api/src/server\" name=\"ApiServer\" mode=\"declaration\" />\n\n## What it sets up\n\n1. Creates a structured logger via `@lakeql/logger`\n2. Initializes GraphQL Yoga with the loaded schema and context\n3. Mounts Hono logger middleware on all routes\n4. Configures CORS on the GraphQL path (POST, GET, OPTIONS)\n5. Connects the Yoga request handler to Hono\n\n## Usage\n\n```ts path=\"src/server.ts\"\nimport { createApiServer } from \"@lakeql/api/server\"\n\nconst { app, logger, yoga } = await createApiServer({\n  baseDir: import.meta.dirname,\n  schemaPath: \"./schemas\",\n  graphqlPath: \"/graphql\",\n  port: 4000,\n})\n\n// Add custom middleware or routes to the Hono app\napp.get(\"/health\", (c) => c.json({ status: \"ok\" }))\n```\n\n## Starting the server\n\nFor most projects, use `defineConfig` with `startServer()` instead of calling `createApiServer` directly:\n\n```ts path=\"src/index.ts\"\nimport { config } from \"./config\"\n\nawait config.startServer()\n```\n\nIf you don't need `defineConfig`, `startApiServer` calls `createApiServer` internally and binds to a port:\n\n```ts path=\"src/index.ts\"\nimport { startApiServer } from \"@lakeql/api/server\"\n\nawait startApiServer({\n  baseDir: import.meta.dirname,\n  schemaPath: \"./schemas\",\n  port: 4000,\n})\n```\n\n`startApiServer` uses `@hono/node-server` to serve the application via Node.js `http` module.\n","description":"Create a configured Hono application with GraphQL Yoga integration, CORS, and logging middleware.","navTitle":"createApiServer","keywords":["createapiserver","create","configured","application","graphql"]}
{"schemaVersion":"1.0.0","docId":"api/server-setup/define-config","source":"api","slug":"server-setup/define-config","path":"/docs/api/server-setup/define-config","raw_path":"/raw/api/server-setup/define-config.md","title":"defineConfig","headings":[{"level":2,"text":"Signature","id":"signature"},{"level":2,"text":"Configuration options","id":"configuration-options"},{"level":2,"text":"Return type","id":"return-type"},{"level":2,"text":"Usage","id":"usage"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/server-setup/define-config/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: defineConfig\nnavTitle: defineConfig\ndescription: Type-safe configuration wrapper that provides createYogaServer and startServer methods.\n---\n\n`defineConfig` is the recommended way to configure your LakeQL API server. It provides full type safety for permissions by inferring catalog, schema, and table names from your generated configs.\n\n## Signature\n\n```ts\nfunction defineConfig<const TConfig extends readonly SchemaConfigEntry[]>(\n  input: TConfig | DefineConfigOptions<TConfig>\n): DefinedApiConfig<TConfig>\n```\n\n## Configuration options\n\nThe `DefineConfigOptions` interface extends `ApiRuntimeConfig` with a required `allConfigs` field:\n\n<InterfaceReference file=\"api/src/config\" name=\"ApiRuntimeConfig\" />\n\n## Return type\n\n`defineConfig` returns a `DefinedApiConfig` object that includes all your config options plus two convenience methods:\n\n```ts\ninterface DefinedApiConfig<TConfig> {\n  // ...all config options\n  allConfigs: TConfig\n  createYogaServer: (logger) => Promise<ApiYoga>\n  startServer: () => Promise<void>\n}\n```\n\n## Usage\n\n### Minimal — pass only configs\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\nimport { allConfigs } from \"./config-registry\"\n\nexport const config = defineConfig(allConfigs)\n```\n\n### Full configuration (matching the template)\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\n\nimport { getUser } from \"./auth\"\nimport { allConfigs } from \"./config-registry\"\nimport { permissions } from \"./permissions\"\n\nconst baseDir = import.meta.dirname\n\nexport const config = defineConfig({\n  allConfigs,\n  baseDir,\n  getUser,\n  graphqlPath: \"/graphql\",\n  healthCheckEndpoint: \"/live\",\n  permissions,\n  port: 4000,\n  schemaPath: \"./schemas\",\n})\n```\n\n### Using startServer\n\n```ts path=\"src/index.ts\"\nimport { config } from \"./config\"\n\nawait config.startServer()\n```\n\nThis calls `startApiServer` internally with your defined configuration.\n","description":"Type-safe configuration wrapper that provides createYogaServer and startServer methods.","navTitle":"defineConfig","keywords":["configuration","defineconfig","type-safe","wrapper","provides"]}
{"schemaVersion":"1.0.0","docId":"api/server-setup/yoga-configuration","source":"api","slug":"server-setup/yoga-configuration","path":"/docs/api/server-setup/yoga-configuration","raw_path":"/raw/api/server-setup/yoga-configuration.md","title":"Yoga Configuration","headings":[{"level":2,"text":"Type","id":"type"},{"level":2,"text":"Available overrides","id":"available-overrides"},{"level":2,"text":"Default behavior","id":"default-behavior"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Logging configuration","id":"logging-configuration"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/api/server-setup/yoga-configuration/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Yoga Configuration\ndescription: Customize GraphQL Yoga options like GraphiQL, error masking, logging, and landing page.\n---\n\nThe `yogaConfig` option in `defineConfig` (or `createApiServer`) lets you override GraphQL Yoga's built-in behavior. It accepts all Yoga options except `schema` and `context`, which are managed internally.\n\n## Type\n\n```ts\ntype YogaConfigOverrides = Omit<\n  NonNullable<Parameters<typeof createYoga>[0]>,\n  \"schema\" | \"context\"\n>\n```\n\n## Available overrides\n\n| Option                | Type                  | Default                   | Description                              |\n| --------------------- | --------------------- | ------------------------- | ---------------------------------------- |\n| `healthCheckEndpoint` | `string`              | `\"/live\"`                 | Path for the health check endpoint       |\n| `graphiql`            | `boolean \\| object`   | `true` in development     | Enable/configure GraphiQL IDE            |\n| `logging`             | `boolean \\| LogLevel` | Based on `API_LOGGER` env | Control Yoga's internal logging          |\n| `maskedErrors`        | `boolean`             | `true`                    | Hide internal error details from clients |\n| `landingPage`         | `boolean`             | `false`                   | Show Yoga's default landing page         |\n\n## Default behavior\n\nWithout any overrides, the server applies these defaults:\n\n- **GraphiQL** is enabled when `NODE_ENV=development`\n- **maskedErrors** is `true` — internal errors are replaced with generic messages\n- **Logging** level is controlled by the `API_LOGGER` environment variable (default: `warn`). Set to `silent` to disable Yoga logging entirely.\n- **Landing page** is disabled\n\n## Usage\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\n\nimport { getUser } from \"./auth\"\nimport { allConfigs } from \"./config-registry\"\nimport { permissions } from \"./permissions\"\n\nconst baseDir = import.meta.dirname\n\nexport const config = defineConfig({\n  allConfigs,\n  baseDir,\n  getUser,\n  permissions,\n  port: 4000,\n  schemaPath: \"./schemas\",\n  yogaConfig: {\n    graphiql: {\n      title: \"LakeQL Explorer\",\n      defaultQuery: \"{ __typename }\",\n    },\n    maskedErrors: false, // expose full errors in development\n    healthCheckEndpoint: \"/healthz\",\n  },\n})\n```\n\n## Logging configuration\n\nThe `API_LOGGER` environment variable accepts these values:\n\n| Value    | Effect                        |\n| -------- | ----------------------------- |\n| `debug`  | All messages                  |\n| `info`   | Info, warnings, and errors    |\n| `warn`   | Warnings and errors (default) |\n| `error`  | Errors only                   |\n| `silent` | No Yoga logging output        |\n\n```bash\n# .env\nAPI_LOGGER=debug\n```\n","description":"Customize GraphQL Yoga options like GraphiQL, error masking, logging, and landing page.","keywords":["configuration","logging","customize","graphql","options"]}
{"schemaVersion":"1.0.0","docId":"cli","source":"cli","slug":"cli","path":"/docs/cli","raw_path":"/raw/cli.md","title":"LakeQL CLI","headings":[],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: LakeQL CLI\nnavTitle: CLI\ndescription: Schema introspection and code generation CLI.\nentrypoint: /docs/cli/overview/installation\n---\n","description":"Schema introspection and code generation CLI.","navTitle":"CLI","keywords":["lakeql","schema","introspection","generation"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/create-endpoint","source":"cli","slug":"commands/create-endpoint","path":"/docs/cli/commands/create-endpoint","raw_path":"/raw/cli/commands/create-endpoint.md","title":"create-endpoint","headings":[{"level":1,"text":"create-endpoint","id":"create-endpoint"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/create-endpoint/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: create-endpoint\nnavTitle: create-endpoint\ndescription: Generate a custom endpoint from a JSON definition file.\n---\n\n# create-endpoint\n\nGenerate a custom endpoint (query + mutation) from a JSON definition file. The definition file can be created using the [Endpoint Builder](/endpoint-builder) UI or written by hand.\n\n## Generated files\n\nFiles are written to `schemas/custom/<catalog>/<schema>/<tableName>/`:\n\n- `config.ts` — Endpoint configuration\n- `interface.ts` — TypeScript interface for the endpoint fields\n- `query-schema.ts` — GraphQL query schema definition\n- `mutation-schema.ts` — GraphQL mutation schema with write pipeline resolver (only when mutation is configured)\n- `validations.ts` — Zod validation schema (only when fields have validations configured)\n- `json-schema.json` — JSON Schema representation\n- `endpoint.json` — The endpoint definition (for re-generation)\n\n<Note>\n  The `mutation-schema.ts` and `validations.ts` files are only generated when\n  the endpoint definition includes mutation configuration. Query-only endpoints\n  skip these files entirely.\n</Note>\n\n## Usage\n\n```bash\nlakeql-cli create-endpoint --from-file ./my-endpoint.json\n```\n\n### With mutation enabled\n\n```\n✔ Loaded definition: user_events (catalog: hive, schema: analytics)\n  → schemas/custom/hive/analytics/user_events/config.ts\n  → schemas/custom/hive/analytics/user_events/interface.ts\n  → schemas/custom/hive/analytics/user_events/query-schema.ts\n  → schemas/custom/hive/analytics/user_events/mutation-schema.ts\n  → schemas/custom/hive/analytics/user_events/validations.ts\n  → schemas/custom/hive/analytics/user_events/json-schema.json\n  → schemas/custom/hive/analytics/user_events/endpoint.json\n  Load strategy: full_load\n✔ Config registry updated\n```\n\n### Without mutation (query-only)\n\n```\n✔ Loaded definition: my_table (catalog: analytics, schema: tracking)\n  → schemas/custom/analytics/tracking/my_table/config.ts\n  → schemas/custom/analytics/tracking/my_table/interface.ts\n  → schemas/custom/analytics/tracking/my_table/query-schema.ts\n  → schemas/custom/analytics/tracking/my_table/json-schema.json\n  → schemas/custom/analytics/tracking/my_table/endpoint.json\n  mutation: disabled\n✔ Config registry updated\n```\n\n## Definition file format\n\nThe JSON definition file must conform to the `EndpointDefinitionFormat` schema:\n\n```json\n{\n  \"version\": \"1.0\",\n  \"tableName\": \"user_events\",\n  \"catalog\": \"hive\",\n  \"schema\": \"analytics\",\n  \"fields\": [\n    {\n      \"name\": \"email\",\n      \"type\": \"String\",\n      \"options\": {\n        \"required\": true,\n        \"validations\": [{ \"type\": \"email\" }]\n      }\n    },\n    {\n      \"name\": \"age\",\n      \"type\": \"Integer\",\n      \"options\": {\n        \"required\": false,\n        \"validations\": [\n          { \"type\": \"min\", \"value\": 0 },\n          { \"type\": \"max\", \"value\": 150 }\n        ]\n      }\n    },\n    { \"name\": \"event_id\", \"type\": \"String\" },\n    { \"name\": \"timestamp\", \"type\": \"DateTime\" },\n    {\n      \"name\": \"metadata\",\n      \"type\": \"Object\",\n      \"fields\": [{ \"name\": \"source\", \"type\": \"String\" }]\n    },\n    {\n      \"name\": \"tags\",\n      \"type\": \"Array\",\n      \"items\": { \"type\": \"String\" }\n    }\n  ],\n  \"mutation\": {\n    \"loadStrategy\": \"full_load\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/analytics/user_events\",\n    \"endpoint\": \"https://s3.eu-central-1.amazonaws.com\"\n  }\n}\n```\n\n### Supported field types\n\n- **Primitives:** String, Integer, Float, Boolean, Date, DateTime\n- **Object:** Nested object with child fields (max 5 levels)\n- **Array:** Array of primitives or objects\n\n### Mutation configuration\n\nThe optional `mutation` field controls whether a write pipeline is generated for this endpoint. Set it to `false` (or omit it) for query-only endpoints. Set it to a configuration object to enable the mutation pipeline.\n\n<InterfaceReference\n  file=\"schema-generator/src/endpoint-schema\"\n  name=\"MutationConfig\"\n/>\n\n<Note>\n  The `type` field defaults to `\"s3\"`. Use `\"minio\"` for local development with\n  a MinIO-compatible endpoint. See the [Mutations\n  guide](/lakeql/guides/mutations#storage-configuration) for environment\n  variable details.\n</Note>\n\n#### Mutation with partitioning options\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"append\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/raw/events\",\n    \"partitioning\": \"event_date\",\n    \"partitioningFormat\": \"year/month\"\n  }\n}\n```\n\nThe `partitioning` and `partitioningFormat` fields are omitted from generated configs when `loadStrategy` is `full_load`. See the [Mutations guide](/lakeql/guides/mutations#partitioning) for full details on each partitioning mode.\n\nSee the [Load Strategies](/lakeql/guides/load-strategies) guide for details on when to use each strategy.\n\n<Note>\n  Endpoints generated via the `pull` command always set `mutation: false` since\n  pulled tables are query-only by default.\n</Note>\n\n### Field options\n\nEach field can include an optional `options` object to control mutation input behavior:\n\n<InterfaceReference\n  file=\"schema-generator/src/endpoint-schema\"\n  name=\"FieldOptions\"\n/>\n\n<Note>\n  When timestamp-based partitioning is active, the Endpoint Builder\n  automatically adds `load_timestamp` (DateTime), `load_timestamp_year`\n  (Integer), and `load_timestamp_month` (Integer) fields with `readOnly: true`.\n  These fields are populated by the write pipeline at runtime and are queryable\n  in Hive and directly in Parquet, but users cannot provide them as mutation\n  input.\n</Note>\n\n#### Available validations\n\n| Validation | Applies to     | Description                     | Example                                    |\n| ---------- | -------------- | ------------------------------- | ------------------------------------------ |\n| `email`    | String         | Must be a valid email address   | `{ \"type\": \"email\" }`                      |\n| `url`      | String         | Must be a valid URL             | `{ \"type\": \"url\" }`                        |\n| `uuid`     | String         | Must be a valid UUID            | `{ \"type\": \"uuid\" }`                       |\n| `min`      | Integer, Float | Minimum numeric value           | `{ \"type\": \"min\", \"value\": 0 }`            |\n| `max`      | Integer, Float | Maximum numeric value           | `{ \"type\": \"max\", \"value\": 150 }`          |\n| `regex`    | String         | Must match a regular expression | `{ \"type\": \"regex\", \"pattern\": \"^[A-Z]\" }` |\n\n### Validation rules\n\n- `tableName`, `catalog`, `schema`: alphanumeric + underscore, no leading digit, max 128 chars\n- Field names: alphanumeric + underscore, no leading digit, max 64 chars\n- No duplicate field names at the same nesting level\n- Object fields must have at least 1 child field\n\n## Workflow\n\n1. Use the [Endpoint Builder](/endpoint-builder) to define your schema visually\n2. Configure mutation support and load strategy if write operations are needed\n3. Download the JSON definition file\n4. Run `lakeql-cli create-endpoint --from-file ./your-endpoint.json`\n5. The generated mutation resolver is fully wired to the write pipeline — no manual stub implementation needed\n\n## Options\n\n<CliCommandDetails command=\"create-endpoint\" />\n","description":"Generate a custom endpoint from a JSON definition file.","navTitle":"create-endpoint","keywords":["create-endpoint","generate","custom","endpoint","definition"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/create-registry","source":"cli","slug":"commands/create-registry","path":"/docs/cli/commands/create-registry","raw_path":"/raw/cli/commands/create-registry.md","title":"lakeql-cli create-registry","headings":[{"level":2,"text":"Syntax","id":"syntax"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/create-registry/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli create-registry\"\nnavTitle: \"create-registry\"\ndescription: Generates the config registry for type-safe permissions.\n---\n\nScans your project for all `schemas/**/config.ts` files and generates a unified `config-registry.ts`. This registry enables type-safe usage of `createPermission` across your endpoints.\n\n## Syntax\n\n```bash\nlakeql-cli create-registry [options]\n```\n\n## Usage\n\n```bash\nlakeql-cli create-registry\n```\n\n```\n✔ Found 4 config files in schemas/\n✔ Generated config-registry.ts\n```\n\nThe generated file imports all discovered config modules and re-exports them as a typed registry object.\n\n## Options\n\n<CliCommandDetails command=\"create-registry\" />\n","description":"Generates the config registry for type-safe permissions.","navTitle":"create-registry","keywords":["lakeql-cli","create-registry","generates","config","registry"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/generate-import-config","source":"cli","slug":"commands/generate-import-config","path":"/docs/cli/commands/generate-import-config","raw_path":"/raw/cli/commands/generate-import-config.md","title":"lakeql-cli generate-import-config","headings":[{"level":2,"text":"Syntax","id":"syntax"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"How it works","id":"how-it-works"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/generate-import-config/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli generate-import-config\"\nnavTitle: \"generate-import-config\"\ndescription: Generate an import.config.mjs from already-pulled schemas.\ncliCommand: \"generate-import-config\"\n---\n\nScans the `schemas/generated/` directory and generates a ready-to-use `import.config.mjs` for the [`pull --bulk`](/cli/commands/pull#bulk-mode) command.\n\nThis is useful when you have already pulled a set of schemas and want to re-use them as a bulk config — without having to maintain the file by hand.\n\n## Syntax\n\n```bash\nlakeql-cli generate-import-config [options]\n```\n\n## Usage\n\n```bash\nlakeql-cli generate-import-config\n```\n\nIf the output file does not exist yet, the command writes it directly:\n\n```\nWritten to \"/my-project/import.config.mjs\".\n```\n\n### Overwriting an existing file\n\nIf the output file already exists, the command warns and prompts for confirmation before overwriting:\n\nUse `--force` to skip the prompt entirely:\n\n```bash\nlakeql-cli generate-import-config --force\n```\n\n### Custom output path\n\n```bash\nlakeql-cli generate-import-config --output ./config/bulk.mjs\n```\n\n## How it works\n\nThe command reads your `lakeql.config` config file to determine the base path for generated files (same as all other commands).\nThe `--source-path` flag overrides the config value if needed.\n\nIt then scans the directory structure under `schemas/generated/`:\n\n```\nschemas/generated/\n└── <catalog>/\n    └── <schema>/\n        └── <table>/       ← each subdirectory becomes a table entry\n```\n\nEach catalog/schema combination becomes one entry in the generated config.\nSchemas that contain no table subdirectories are silently skipped.\n\n## Options\n\n<CliCommandDetails command=\"generate-import-config\" />\n","description":"Generate an import.config.mjs from already-pulled schemas.","navTitle":"generate-import-config","keywords":["lakeql-cli","generate-import-config","generate","importconfigmjs","already-pulled"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/init","source":"cli","slug":"commands/init","path":"/docs/cli/commands/init","raw_path":"/raw/cli/commands/init.md","title":"lakeql-cli init","headings":[{"level":2,"text":"Syntax","id":"syntax"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/init/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli init\"\nnavTitle: \"init\"\ndescription: Initialize a lakeql.config.json configuration file in your project.\n---\n\nScaffolds a `lakeql.config.json` file in the current directory. This config controls where generated code is placed.\n\nIf no `src/` directory is detected, the CLI prompts you to choose or specify a source path. If `src/` exists, it is used automatically.\n\n## Syntax\n\n```bash\nlakeql-cli init\n```\n\n## Usage\n\n```bash\nlakeql-cli init\n```\n\n```\nDetected src/ directory — generated code will be placed in src/\nCreated lakeql.config.json at /path/to/project/lakeql.config.json\n```\n\nIf `lakeql.config.json` already exists, the CLI asks whether to overwrite it.\n\n## Options\n\nThis command has no options.\n","description":"Initialize a lakeql.config.json configuration file in your project.","navTitle":"init","keywords":["lakeql-cli","initialize","lakeqlconfigjson","configuration","project"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/list-columns","source":"cli","slug":"commands/list-columns","path":"/docs/cli/commands/list-columns","raw_path":"/raw/cli/commands/list-columns.md","title":"lakeql-cli list-columns","headings":[{"level":2,"text":"Syntax","id":"syntax"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/list-columns/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli list-columns\"\nnavTitle: \"list-columns\"\ndescription: Lists columns for the specified table including types and descriptions.\n---\n\nQueries Trino and prints all columns for a specific table, including their data types, extra metadata, and descriptions.\n\n## Syntax\n\n```bash\nlakeql-cli list-columns [options]\n```\n\n## Usage\n\n```bash\nlakeql-cli list-columns --catalog hive --schema analytics --table users\n```\n\n```\n┌─────────────┬──────────┬───────┬─────────────────────┐\n│ Column Name │ Type     │ Extra │ Description         │\n├─────────────┼──────────┼───────┼─────────────────────┤\n│ id          │ varchar  │       │ Unique identifier   │\n│ email       │ varchar  │       │ User email address  │\n│ created_at  │ timestamp│       │ Account creation ts │\n│ is_active   │ boolean  │       │ Whether user active │\n└─────────────┴──────────┴───────┴─────────────────────┘\n```\n\n## Options\n\n<CliCommandDetails command=\"list-columns\" />\n","description":"Lists columns for the specified table including types and descriptions.","navTitle":"list-columns","keywords":["lakeql-cli","list-columns","lists","columns","specified"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/list-schemas","source":"cli","slug":"commands/list-schemas","path":"/docs/cli/commands/list-schemas","raw_path":"/raw/cli/commands/list-schemas.md","title":"lakeql-cli list-schemas","headings":[{"level":2,"text":"Syntax","id":"syntax"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/list-schemas/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli list-schemas\"\nnavTitle: \"list-schemas\"\ndescription: Lists available schemas for the configured catalog.\n---\n\nQueries Trino and prints all schemas available in the specified catalog.\n\n## Syntax\n\n```bash\nlakeql-cli list-schemas [options]\n```\n\n## Usage\n\n```bash\nlakeql-cli list-schemas --catalog hive\n```\n\n```\n┌─────────────┐\n│ Schema Name │\n├─────────────┤\n│ default     │\n│ analytics   │\n│ staging     │\n│ raw         │\n└─────────────┘\n```\n\n## Options\n\n<CliCommandDetails command=\"list-schemas\" />\n","description":"Lists available schemas for the configured catalog.","navTitle":"list-schemas","keywords":["lakeql-cli","list-schemas","lists","available","schemas"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/list-tables","source":"cli","slug":"commands/list-tables","path":"/docs/cli/commands/list-tables","raw_path":"/raw/cli/commands/list-tables.md","title":"lakeql-cli list-tables","headings":[{"level":2,"text":"Syntax","id":"syntax"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/list-tables/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli list-tables\"\nnavTitle: \"list-tables\"\ndescription: Lists available tables for the configured catalog and schema.\n---\n\nQueries Trino and prints all tables in the specified catalog and schema. If `--schema` is not provided, the CLI prompts you to select one interactively.\n\n## Syntax\n\n```bash\nlakeql-cli list-tables [options]\n```\n\n## Usage\n\n```bash\nlakeql-cli list-tables --catalog hive --schema analytics\n```\n\n```\n┌────────────────┐\n│ Table Name     │\n├────────────────┤\n│ users          │\n│ events         │\n│ sessions       │\n│ page_views     │\n└────────────────┘\n```\n\n## Options\n\n<CliCommandDetails command=\"list-schemas\" />\n","description":"Lists available tables for the configured catalog and schema.","navTitle":"list-tables","keywords":["lakeql-cli","list-tables","lists","available","tables"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/list-views","source":"cli","slug":"commands/list-views","path":"/docs/cli/commands/list-views","raw_path":"/raw/cli/commands/list-views.md","title":"lakeql-cli list-views","headings":[{"level":2,"text":"Syntax","id":"syntax"},{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/list-views/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli list-views\"\nnavTitle: \"list-views\"\ndescription: Lists available views for the configured catalog and schema.\n---\n\nQueries Trino and prints all views in the specified catalog and schema. If `--schema` is not provided, the CLI prompts you to select one interactively.\n\n## Syntax\n\n```bash\nlakeql-cli list-views [options]\n```\n\n## Usage\n\n```bash\nlakeql-cli list-views --catalog hive --schema analytics\n```\n\n```\n┌──────────────────────┐\n│ View Name            │\n├──────────────────────┤\n│ active_users         │\n│ daily_revenue        │\n│ session_aggregates   │\n└──────────────────────┘\n```\n\n## Options\n\n<CliCommandDetails command=\"list-views\" />\n","description":"Lists available views for the configured catalog and schema.","navTitle":"list-views","keywords":["lakeql-cli","list-views","lists","available","views"]}
{"schemaVersion":"1.0.0","docId":"cli/commands/pull","source":"cli","slug":"commands/pull","path":"/docs/cli/commands/pull","raw_path":"/raw/cli/commands/pull.md","title":"lakeql-cli pull","headings":[{"level":2,"text":"Usage","id":"usage"},{"level":2,"text":"Bulk mode","id":"bulk-mode"},{"level":2,"text":"Error output","id":"error-output"},{"level":2,"text":"Generated files","id":"generated-files"},{"level":2,"text":"Options","id":"options"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/commands/pull/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: \"lakeql-cli pull\"\nnavTitle: \"pull\"\ndescription: Interactive query endpoint generation based on a remote table.\ncliCommand: \"pull\"\n---\n\nConnects to Trino, introspects table columns, and generates a full set of TypeScript files for type-safe query endpoints. When run without flags, the command walks you through interactive prompts to select a schema and tables.\n\n## Usage\n\n```bash\nlakeql-cli pull [options]\n```\n\n```bash\nlakeql-cli pull --catalog hive --schema myschema --table users\n```\n\n```\n› Pulling 1 item(s) from hive.myschema into ./src/schemas/generated...\n❯ Pull 1 item(s)\n  ✓ hive.myschema.users\n✓ Pull 1 item(s)\n✓ Create registry\n✔ Pull completed: 1 item(s) generated under ./src/schemas/generated/hive/myschema\n```\n\nThe registry is generated once after all selected items are processed.\n\nWhen more than 10 tables are selected in non-bulk mode, `pull` switches to a compact live progress view (`Completed X/Y | Active A/B`) with active load preview, instead of rendering one task line per table.\nUse `--concurrency <count>` to override the default limit of `8` concurrent pull operations.\n\n## Bulk mode\n\nWhen `--bulk` is specified, the command reads a config file and processes multiple schemas and tables in parallel — instead of using interactive prompts.\n\n### Syntax\n\n```bash\nlakeql-cli pull --bulk [options]\n```\n\n### Config file\n\nThe config file is automatically detected by looking for `import.config.{mjs,ts,js,json}` in the current directory (powered by [c12](https://github.com/unjs/c12)). You can override this with `--bulk-config`.\n\nUse the `@type` JSDoc annotation for type-safety and autocomplete:\n\n```javascript\n// import.config.mjs\n\n/** @type {import('@lakeql/cli').BulkPullConfig} */\nexport default [\n  {\n    schema: \"sales\",\n    tables: [\"orders\", \"customers\", \"products\"],\n    views: [\"daily_revenue\"],\n  },\n  {\n    schema: \"analytics\",\n    tables: [\"events\", \"sessions\"],\n  },\n  {\n    schema: \"inventory\",\n    catalog: \"warehouse\", // optional catalog override per entry\n    tables: [\"stock_levels\"],\n    views: [\"low_stock_alerts\"],\n  },\n]\n```\n\n#### Supported formats\n\nThe config file can be any of the following (in precedence order):\n\n1. `import.config.mjs`\n2. `import.config.ts`\n3. `import.config.js`\n4. `import.config.json`\n\n### Config schema\n\nEach entry in the array has the following shape:\n\n| Field     | Type       | Required | Description                      |\n| --------- | ---------- | -------- | -------------------------------- |\n| `schema`  | `string`   | Yes      | The schema to pull from          |\n| `catalog` | `string`   | No       | Catalog override for this entry  |\n| `tables`  | `string[]` | No       | Non-empty list of tables to pull |\n| `views`   | `string[]` | No       | Non-empty list of views to pull  |\n\nAt least one non-empty list (`tables` or `views`) must be provided per entry.\nEntries with both lists missing or empty fail validation before execution.\n\n### Catalog precedence\n\nThe catalog is resolved in the following order (first match wins):\n\n1. `--catalog` CLI flag (highest priority)\n2. `catalog` field in the config entry\n3. `HIVE_CATALOG` environment variable (fallback)\n\n### Execution behavior\n\n- All schema entries are processed **in parallel** for faster execution.\n- Tables and views within a single entry are processed sequentially for small entries.\n- Bulk item pulls are capped globally at 8 concurrent operations across the whole bulk run by default.\n- Bulk entries with more than 10 items switch to bounded parallel item processing under that global cap.\n- Use `--concurrency <count>` to raise or lower that limit for both bulk and non-bulk multi-item pulls.\n- The config registry is generated **once** at the end (not per entry).\n- If one entry fails, the remaining entries continue to execute.\n- Progress is displayed using a structured task list in the terminal.\n- Bulk entries with more than 10 items switch to the same compact live progress view used by large non-bulk pulls.\n\n### Usage\n\n```bash\n# Auto-detect config file (import.config.mjs, .ts, .js, or .json)\nlakeql-cli pull --bulk\n\n# Using a custom config file\nlakeql-cli pull --bulk --bulk-config=./my-import.config.mjs\n\n# With global catalog override\nlakeql-cli pull --bulk --catalog my_catalog\n\n# With a custom concurrency limit\nlakeql-cli pull --bulk --concurrency 5\n\n# Skip registry generation\nlakeql-cli pull --bulk --skip-registry\n```\n\n### Terminal output\n\n```\n⠋ Pull data\n  ✓ hive/sales — 4 item(s) pulled\n  ⠋ hive/analytics — 11 item(s)\n    › Completed 6/11 | Active 5/8\n    ›   - hive.analytics.events_6\n    ›   - hive.analytics.events_7\n    ›   - hive.analytics.events_8\n    ›   - hive.analytics.events_9\n    ›   - hive.analytics.events_10\n  ✓ warehouse/inventory — 2 item(s) pulled\n✓ Pull data\n✓ Create registry\n```\n\n## Error output\n\nWhen a request fails, the CLI prints structured output with context and hints:\n\n```text\n✖ LakeQL CLI failed.\n› Reason: Failed to list schemas.\n› Context: list-schemas (catalog=hive)\n› Root cause: fetch failed\n› Error code: ECONNREFUSED\n› Hint: Verify HIVE_HOST/HIVE_PORT, credentials and network reachability to Trino.\n```\n\nFor non-error aborts (for example prompt cancellation), the headline is shown as a warning and the command exits with code `0`.\n\n### Type export\n\nThe `BulkPullConfig` and `BulkPullEntry` types are exported from `@lakeql/cli` for use in your config file:\n\n```ts\nimport type { BulkPullConfig, BulkPullEntry } from \"@lakeql/cli\"\n```\n\n## Generated files\n\nFor each selected table, the following files are created under `schemas/generated/{catalog}/{schema}/{table}/`:\n\n- `config.ts` — Endpoint configuration\n- `interface.ts` — TypeScript interface for the table columns\n- `query-schema.ts` — GraphQL query schema definition\n- `json-schema.json` — JSON Schema representation\n- `endpoint.json` — Endpoint definition for re-generation\n\n`pull` generates query-only endpoints, so `mutation-schema.ts` is not created for pulled tables.\n\nField names from source schemas are normalized to valid identifier names during generation (for example, spaces become underscores).\nIf two source fields normalize to the same generated name, generation fails with a clear collision error instead of producing ambiguous output.\n\n## Options\n\n<CliCommandDetails command=\"pull\" />\n","description":"Interactive query endpoint generation based on a remote table.","navTitle":"pull","keywords":["lakeql-cli","interactive","query","endpoint","generation"]}
{"schemaVersion":"1.0.0","docId":"cli/configuration/config-file","source":"cli","slug":"configuration/config-file","path":"/docs/cli/configuration/config-file","raw_path":"/raw/cli/configuration/config-file.md","title":"Config File","headings":[{"level":2,"text":"Supported formats","id":"supported-formats"},{"level":2,"text":"Configuration","id":"configuration"},{"level":2,"text":"Path resolution","id":"path-resolution"},{"level":2,"text":"Examples","id":"examples"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/configuration/config-file/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Config File\ndescription: Reference for the lakeql config file.\n---\n\nThe lakeql config file controls where the CLI places generated code. It is created by running `lakeql-cli init`.\n\n## Supported formats\n\nThe CLI supports multiple config file formats, loaded in the following precedence order (first match wins):\n\n1. `lakeql.config.mjs` (recommended)\n2. `lakeql.config.ts`\n3. `lakeql.config.js`\n4. `lakeql.config.json`\n\nConfig loading is powered by [c12](https://github.com/unjs/c12).\n\n## Configuration\n\n<InterfaceReference file=\"cli/src/config\" name=\"LakeQLConfig\" />\n\n## Path resolution\n\n- Relative paths are resolved from the directory where the config file lives.\n- If `src/` is detected during `lakeql-cli init`, `sourcePath` is set to `\"src\"` automatically.\n- The CLI `--source-path` flag overrides the config value for that invocation.\n\n## Examples\n\n### MJS (recommended)\n\n```javascript\n// lakeql.config.mjs\n\n/** @type {import('@lakeql/cli').LakeQLConfig} */\nexport default {\n  sourcePath: \"src\",\n}\n```\n\n### JSON\n\n```json\n{\n  \"sourcePath\": \"src\"\n}\n```\n\nWith either config, running `lakeql-cli pull` places generated files under `src/schemas/generated/{catalog}/{schema}/{table}/`.\n","description":"Reference for the lakeql config file.","keywords":["config","reference","lakeql","supported","formats"]}
{"schemaVersion":"1.0.0","docId":"cli/configuration/environment-variables","source":"cli","slug":"configuration/environment-variables","path":"/docs/cli/configuration/environment-variables","raw_path":"/raw/cli/configuration/environment-variables.md","title":"Environment Variables","headings":[{"level":2,"text":"Required variables","id":"required-variables"},{"level":2,"text":"Optional variables","id":"optional-variables"},{"level":2,"text":"Example .env file","id":"example-env-file"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/configuration/environment-variables/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Environment Variables\ndescription: All environment variables used by the LakeQL CLI for Trino connectivity.\n---\n\nThe LakeQL CLI reads connection details from environment variables. These are required for any command that communicates with Trino.\n\n## Required variables\n\n| Variable        | Type   | Description                                                          |\n| --------------- | ------ | -------------------------------------------------------------------- |\n| `HIVE_HOST`     | string | Trino host URL including protocol (e.g. `https://trino.example.com`) |\n| `HIVE_PORT`     | number | Trino port (e.g. `443`)                                              |\n| `HIVE_USERNAME` | string | Authentication username                                              |\n| `HIVE_PASSWORD` | string | Authentication password                                              |\n| `HIVE_CATALOG`  | string | Default catalog name used when `--catalog` is not provided           |\n\n## Optional variables\n\n| Variable      | Type   | Default | Description                                        |\n| ------------- | ------ | ------- | -------------------------------------------------- |\n| `HIVE_SOURCE` | string | —       | Optional source identifier sent to Trino           |\n| `LOG_LEVEL`   | string | `warn`  | Log verbosity: `debug`, `info`, `warn`, or `error` |\n\n## Example .env file\n\n```bash\n# .env\nHIVE_HOST=https://trino.example.com\nHIVE_PORT=443\nHIVE_USERNAME=service-account\nHIVE_PASSWORD=s3cur3-p4ssw0rd\nHIVE_CATALOG=hive\nLOG_LEVEL=info\n```\n\nThe CLI does not load `.env` automatically.\nMake sure variables are present in the process environment before running commands.\n\nYou can do this in different ways:\n\n- export variables in your shell session\n- use a wrapper like `dotenv-cli`\n- run commands through your process manager or CI environment\n\nExample with `dotenv-cli`:\n\n```bash\ndotenv -e ./.env -- lakeql-cli pull\n```\n","description":"All environment variables used by the LakeQL CLI for Trino connectivity.","keywords":["variables","environment","lakeql","trino","connectivity"]}
{"schemaVersion":"1.0.0","docId":"cli/overview/installation","source":"cli","slug":"overview/installation","path":"/docs/cli/overview/installation","raw_path":"/raw/cli/overview/installation.md","title":"Installation","headings":[{"level":2,"text":"Install globally","id":"install-globally"},{"level":2,"text":"Run without installing","id":"run-without-installing"},{"level":2,"text":"Run within the monorepo","id":"run-within-the-monorepo"},{"level":2,"text":"Environment variables","id":"environment-variables"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/cli/overview/installation/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Installation\ndescription: How to install and run the LakeQL CLI for schema introspection and code generation.\n---\n\nThe LakeQL CLI (`@lakeql/cli`) orchestrates schema introspection from Trino and generates TypeScript code — interfaces, query schemas, and configs — that power type-safe GraphQL query endpoints in your project.\n\n## Install globally\n\n<Command variant=\"install\">-g @lakeql/cli</Command>\n\n## Run without installing\n\nUse `npx` or `pnpm dlx` to invoke the CLI without a global install:\n\n<Command variant=\"exec\">lakeql-cli</Command>\n\n## Run within the monorepo\n\nIf you're working inside the LakeQL monorepo:\n\n```bash\npnpm -F cli cli\n```\n\n## Environment variables\n\nThe CLI connects to Trino for schema introspection. Set the following environment variables before running any command:\n\n| Variable        | Required | Description                       |\n| --------------- | -------- | --------------------------------- |\n| `HIVE_HOST`     | Yes      | Trino host URL including protocol |\n| `HIVE_PORT`     | Yes      | Trino port                        |\n| `HIVE_USERNAME` | Yes      | Authentication username           |\n| `HIVE_PASSWORD` | Yes      | Authentication password           |\n| `HIVE_CATALOG`  | Yes      | Default catalog name              |\n\nCreate a `.env` file in your project root:\n\n```bash\n# .env.example\nHIVE_HOST=https://trino.example.com\nHIVE_PORT=443\nHIVE_USERNAME=your-username\nHIVE_PASSWORD=your-password\nHIVE_CATALOG=hive\n```\n\nThe CLI does not auto-load `.env` files.\nLoad variables into the process environment before invocation.\n\n```bash\n# Option 1: export in your shell\nexport HIVE_HOST=https://trino.example.com\nexport HIVE_PORT=443\nexport HIVE_USERNAME=your-username\nexport HIVE_PASSWORD=your-password\nexport HIVE_CATALOG=hive\nlakeql-cli pull\n\n# Option 2: use dotenv-cli\ndotenv -e ./.env -- lakeql-cli pull\n```\n","description":"How to install and run the LakeQL CLI for schema introspection and code generation.","keywords":["install","installation","lakeql","schema","introspection"]}
{"schemaVersion":"1.0.0","docId":"lakeql","source":"lakeql","slug":"lakeql","path":"/docs/lakeql","raw_path":"/raw/lakeql.md","title":"LakeQL","headings":[],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: LakeQL\nnavTitle: LakeQL\ndescription: Type-safe GraphQL APIs over Trino — overview, setup, and guides.\nentrypoint: /docs/lakeql/introduction/overview\n---\n","description":"Type-safe GraphQL APIs over Trino — overview, setup, and guides.","navTitle":"LakeQL","keywords":["lakeql","type-safe","graphql","trino","overview"]}
{"schemaVersion":"1.0.0","docId":"lakeql/architecture/data-flow","source":"lakeql","slug":"architecture/data-flow","path":"/docs/lakeql/architecture/data-flow","raw_path":"/raw/lakeql/architecture/data-flow.md","title":"Data Flow","headings":[{"level":2,"text":"The Generation Pipeline","id":"the-generation-pipeline"},{"level":2,"text":"Stage 1: Trino Introspection","id":"stage-1-trino-introspection"},{"level":2,"text":"Stage 2: Column Parsing","id":"stage-2-column-parsing"},{"level":2,"text":"Stage 3: Schema Generation","id":"stage-3-schema-generation"},{"level":2,"text":"Stage 4: File Generation","id":"stage-4-file-generation"},{"level":2,"text":"Re-running the Pipeline","id":"re-running-the-pipeline"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/architecture/data-flow/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Data Flow\ndescription: How LakeQL's CLI pipeline transforms Trino metadata into generated TypeScript files.\n---\n\n## The Generation Pipeline\n\nWhen you run `lakeql-cli pull`, a multi-stage pipeline transforms Trino metadata into ready-to-use TypeScript source files.\n\n```mermaid preview\nflowchart LR\n    trino[\"Trino Instance\"] -->|SHOW COLUMNS| cp[\"@lakeql/column-parser\"]\n    cp -->|Structured columns| sg[\"@lakeql/schema-generator\"]\n    sg -->|Models + JSON Schema| fg[\"@lakeql/file-generator\"]\n    fg -->|Writes to disk| files[\"Generated Files<br/>config.ts · interface.ts<br/>query-schema.ts · json-schema.json\"]\n```\n\n## Stage 1: Trino Introspection\n\n`@lakeql/trino-client` connects to your Trino instance and executes metadata queries:\n\n```ts\n// The CLI issues this for each table\nSHOW COLUMNS FROM hive.sales.orders\n```\n\nTrino returns rows like:\n\n| Column      | Type                                 | Extra | Comment |\n| ----------- | ------------------------------------ | ----- | ------- |\n| id          | bigint                               |       |         |\n| customer_id | bigint                               |       |         |\n| status      | varchar                              |       |         |\n| metadata    | row(source varchar, version integer) |       |         |\n| tags        | array(varchar)                       |       |         |\n\n## Stage 2: Column Parsing\n\n`@lakeql/column-parser` transforms raw type strings into structured objects. This handles Trino's complex type syntax including nested rows, arrays, and maps:\n\n```ts\n// Input: \"array(row(id bigint, name varchar))\"\n// Output:\n{\n  kind: \"array\",\n  element: {\n    kind: \"row\",\n    fields: [\n      { name: \"id\", type: { kind: \"scalar\", base: \"bigint\" } },\n      { name: \"name\", type: { kind: \"scalar\", base: \"varchar\" } }\n    ]\n  }\n}\n```\n\nThe parser handles all Trino types: `varchar`, `bigint`, `integer`, `double`, `boolean`, `date`, `timestamp`, `array(T)`, `map(K, V)`, and `row(...)`.\n\n## Stage 3: Schema Generation\n\n`@lakeql/schema-generator` takes the parsed column definitions and produces:\n\n- **JSON Schema** — Describes the response structure for runtime transformation\n- **GraphQL model definitions** — Pothos-compatible type and field definitions\n- **TypeScript interface fields** — Mapped types for the interface file\n\nThis stage maps Trino types to TypeScript and GraphQL types:\n\n| Trino Type  | TypeScript       | GraphQL          |\n| ----------- | ---------------- | ---------------- |\n| `bigint`    | `number`         | `Int` or `Float` |\n| `varchar`   | `string`         | `String`         |\n| `boolean`   | `boolean`        | `Boolean`        |\n| `date`      | `Date`           | `Date`           |\n| `timestamp` | `Date`           | `DateTime`       |\n| `array(T)`  | `T[]`            | `[T]`            |\n| `row(...)`  | nested interface | nested type      |\n\n## Stage 4: File Generation\n\n`@lakeql/file-generator` writes the final TypeScript source files to disk. Each file is formatted and ready to be imported by the API server.\n\nAfter generation, the CLI also updates the **config registry** — an aggregated index file that imports all generated configs:\n\n```ts path=\"src/config-registry.ts\"\nimport { ordersConfig } from \"./schemas/generated/hive/sales/orders/config\"\nimport { customersConfig } from \"./schemas/generated/hive/sales/customers/config\"\n\nexport const allConfigs = [ordersConfig, customersConfig] as const\n```\n\nThis registry is what the API server uses to discover and load all available table schemas.\n\n## Re-running the Pipeline\n\nThe pipeline is idempotent. Running `pull` again overwrites existing generated files with fresh versions based on current Trino metadata. This makes it safe to re-run whenever table schemas change.\n\n<Warning>\n  If you've made manual edits to generated files (e.g. adding custom fields to\n  `query-schema.ts`), those changes will be lost on the next `pull`. Place\n  custom resolvers in separate files instead.\n</Warning>\n","description":"How LakeQL's CLI pipeline transforms Trino metadata into generated TypeScript files.","keywords":["stage","pipeline","generation","trino","lakeqls"]}
{"schemaVersion":"1.0.0","docId":"lakeql/architecture/request-lifecycle","source":"lakeql","slug":"architecture/request-lifecycle","path":"/docs/lakeql/architecture/request-lifecycle","raw_path":"/raw/lakeql/architecture/request-lifecycle.md","title":"Request Lifecycle","headings":[{"level":2,"text":"Runtime Request Flow","id":"runtime-request-flow"},{"level":2,"text":"Step-by-Step Breakdown","id":"step-by-step-breakdown"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/architecture/request-lifecycle/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Request Lifecycle\ndescription: How a GraphQL request flows through LakeQL's API runtime, from HTTP to Trino and back.\n---\n\n## Runtime Request Flow\n\nEvery GraphQL query follows a predictable path through LakeQL's API runtime. Understanding this flow helps with debugging and extending the system.\n\n```mermaid\ngraph TD\n    client[\"Client Request\"]\n    hono[\"1. Hono Server<br/><small>HTTP routing, CORS, logging</small>\"]\n    yoga[\"2. GraphQL Yoga<br/><small>Query parsing, validation, execution</small>\"]\n    auth[\"3. Auth (getUser)<br/><small>Resolve current user from request</small>\"]\n    perm[\"4. Permission Check<br/><small>Check read/write access for table</small>\"]\n    resolve[\"5. Resolve Info Extraction<br/><small>Extract selected fields from query</small>\"]\n    qb[\"6. Query Builder<br/><small>Generate Trino SQL with CTEs</small>\"]\n    trino[\"7. Trino Client<br/><small>Execute SQL, poll for results</small>\"]\n    rt[\"8. Response Transformer<br/><small>Array → typed objects</small>\"]\n    page[\"9. Pagination<br/><small>Calculate pageInfo metadata</small>\"]\n    response[\"10. GraphQL Response<br/><small>Connection type response</small>\"]\n\n    client --> hono\n    hono --> yoga\n    yoga --> auth\n    auth --> perm\n    perm --> resolve\n    resolve --> qb\n    qb --> trino\n    trino --> rt\n    rt --> page\n    page --> response\n```\n\n## Step-by-Step Breakdown\n\n### 1. Hono Server\n\nThe request hits the Hono HTTP framework. Middleware handles:\n\n- Request logging via `hono/logger`\n- CORS headers for cross-origin requests\n- Routing to the GraphQL endpoint (default: `/graphql`)\n\n```ts\napp.on(\n  [\"POST\", \"GET\", \"OPTIONS\"],\n  \"/graphql/*\",\n  cors({ origin: \"*\", allowHeaders: [\"content-type\", \"authorization\"] }),\n  serveYoga({ yoga })\n)\n```\n\n### 2. GraphQL Yoga\n\nGraphQL Yoga takes over, parsing the query string, validating it against the schema, and beginning field resolution. The Pothos-generated schema defines which fields and types are available.\n\n### 3. Authentication (getUser)\n\nThe context factory calls `getUser(req)` to resolve the current user from the request. The default implementation supports:\n\n- **Mock auth** — When `AUTH_MOCK=true`, any request with the correct `AUTH_MOCK_TOKEN` is authenticated. The `x-username` header sets the user identity.\n- **Custom auth** — You can provide your own `getUser` resolver in `defineConfig` to integrate JWT validation, OAuth2, or any other auth mechanism.\n\n```ts\n// Default mock auth behavior\nif (\n  authHeader &&\n  env.AUTH_MOCK === true &&\n  authHeader === env.AUTH_MOCK_TOKEN\n) {\n  return {\n    userName: req.headers.get(\"x-username\") ?? \"###FALLBACK_MOCK_USER###\",\n  }\n}\n```\n\n### 4. Permission Check\n\nBefore executing the resolver, the system checks whether the authenticated user may access the requested table:\n\n- **Read permission** — Default-allow for users without explicit rules (Trino handles authorization). Explicit deny for technical users without matching `Query` rules.\n- **Write permission** — Default-deny. Requires an explicit `Mutation` rule matching the catalog, schema, and table.\n\n### 5. Resolve Info Extraction\n\nThe resolver extracts the requested fields from GraphQL's `resolveInfo` object. Only fields the client actually selected are included in the SQL query:\n\n```ts\n// If the client requests { nodes { id, name } }\n// selectFields becomes [\"id\", \"name\"]\nconst selectFields = getSelectFields(info, true)\n```\n\n### 6. Query Builder\n\n`@lakeql/query-builder` generates a Trino SQL statement using Kysely. The query uses two CTEs:\n\n```sql\n-- CTE 1: Count total matching records\nWITH total_count AS (\n  SELECT COUNT(*) AS total_records\n  FROM hive.sales.orders\n  WHERE status = 'shipped'\n),\n-- CTE 2: Fetch the requested page of records\nrecords AS (\n  SELECT id, name\n  FROM hive.sales.orders\n  WHERE status = 'shipped'\n  ORDER BY id ASC\n  FETCH FIRST 10 ROWS ONLY\n)\n-- Combine both results in a single response\nSELECT * FROM total_count FULL JOIN records ON TRUE\n```\n\nThis dual-CTE approach retrieves both the total count and the paginated results in a single query to Trino.\n\n### 7. Trino Client\n\n`@lakeql/trino-client` submits the SQL to Trino's REST API. It handles:\n\n- Statement submission via POST\n- Polling the `nextUri` until results are ready\n- Authentication headers (Basic or Bearer)\n- Error handling and retry\n\n### 8. Response Transformer\n\nTrino returns data as arrays (e.g. `[1001, 42, \"shipped\", 249.99]`). The response transformer maps these arrays to named objects using the JSON schema definition:\n\n```ts\n// Input from Trino:  [1001, 42, \"shipped\", 249.99]\n// Output:            { id: 1001, customer_id: 42, status: \"shipped\", total: 249.99 }\n```\n\nThis also handles nested objects (Trino `row` types) and arrays.\n\n### 9. Pagination Calculation\n\nThe helpers package calculates pagination metadata from the total count, current offset, and page size:\n\n```ts\n{\n  hasNext: true,\n  hasPrevious: false,\n  currentPage: 1,\n  maxPages: 15,\n  nextPage: 2,\n  previousPage: null\n}\n```\n\n### 10. GraphQL Response\n\nThe final response is returned as a GraphQL Connection type:\n\n```json\n{\n  \"data\": {\n    \"orders\": {\n      \"totalCount\": 142,\n      \"pageInfo\": { \"hasNext\": true, \"currentPage\": 1, \"maxPages\": 15 },\n      \"nodes\": [{ \"id\": 1001, \"status\": \"shipped\" }]\n    }\n  }\n}\n```\n","description":"How a GraphQL request flows through LakeQL's API runtime, from HTTP to Trino and back.","keywords":["request","runtime","lifecycle","graphql","flows"]}
{"schemaVersion":"1.0.0","docId":"lakeql/architecture/system-overview","source":"lakeql","slug":"architecture/system-overview","path":"/docs/lakeql/architecture/system-overview","raw_path":"/raw/lakeql/architecture/system-overview.md","title":"System Overview","headings":[{"level":2,"text":"Package Architecture","id":"package-architecture"},{"level":2,"text":"Package Dependency Diagram","id":"package-dependency-diagram"},{"level":2,"text":"CLI Flow — Introspection and Generation","id":"cli-flow-introspection-and-generation"},{"level":2,"text":"API Flow — Serving and Querying","id":"api-flow-serving-and-querying"},{"level":2,"text":"Design Principles","id":"design-principles"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/architecture/system-overview/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: System Overview\ndescription: Package dependency diagram and the two main operational flows in LakeQL.\n---\n\n## Package Architecture\n\nLakeQL's packages are organized around two distinct operational flows: the **CLI flow** for introspection and code generation, and the **API flow** for serving GraphQL queries at runtime.\n\n## Package Dependency Diagram\n\n```mermaid preview\ngraph LR\n    subgraph CLI Flow\n        cli[\"@lakeql/cli\"]\n        tc1[\"@lakeql/trino-client\"]\n        cp[\"@lakeql/column-parser\"]\n        sg[\"@lakeql/schema-generator\"]\n        fg[\"@lakeql/file-generator\"]\n\n        cli --> tc1\n        tc1 --> cp\n        cp --> sg\n        sg --> fg\n        cli --> fg\n    end\n\n    subgraph API Flow\n        api[\"@lakeql/api\"]\n        qb[\"@lakeql/query-builder\"]\n        tc2[\"@lakeql/trino-client\"]\n        rt[\"@lakeql/response-transformer\"]\n\n        api --> qb\n        qb --> tc2\n        tc2 --> rt\n        api --> rt\n    end\n\n    subgraph Shared\n        helpers[\"@lakeql/helpers\"]\n        logger[\"@lakeql/logger\"]\n    end\n\n    cli --> helpers\n    cli --> logger\n    api --> helpers\n    api --> logger\n```\n\n## CLI Flow — Introspection and Generation\n\nThe CLI flow runs at development time. Its job is to connect to Trino, discover table structures, and write TypeScript source files.\n\n| Step | Package                    | Action                                                   |\n| ---- | -------------------------- | -------------------------------------------------------- |\n| 1    | `@lakeql/cli`              | Parses CLI arguments, orchestrates the pipeline          |\n| 2    | `@lakeql/trino-client`     | Executes `SHOW COLUMNS` against Trino                    |\n| 3    | `@lakeql/column-parser`    | Parses raw type strings into structured definitions      |\n| 4    | `@lakeql/schema-generator` | Generates GraphQL models, JSON schemas, type definitions |\n| 5    | `@lakeql/file-generator`   | Writes TypeScript files to disk                          |\n\nThe output of this flow is committed to your repository and consumed by the API flow at runtime.\n\n## API Flow — Serving and Querying\n\nThe API flow runs at production time. It serves a GraphQL endpoint and translates queries into Trino SQL.\n\n| Step | Package                        | Action                                                  |\n| ---- | ------------------------------ | ------------------------------------------------------- |\n| 1    | `@lakeql/api`                  | Receives HTTP requests, runs auth, loads Pothos schemas |\n| 2    | `@lakeql/query-builder`        | Translates GraphQL resolve info into Kysely SQL         |\n| 3    | `@lakeql/trino-client`         | Submits SQL to Trino and polls for results              |\n| 4    | `@lakeql/response-transformer` | Converts array responses to typed objects               |\n| 5    | `@lakeql/helpers`              | Calculates pagination metadata                          |\n\n## Design Principles\n\n- **Single responsibility** — Each package does one thing well\n- **Shared nothing** — Packages communicate through well-defined interfaces, not shared state\n- **Generated over hand-written** — Prefer code generation over manual resolver authoring\n- **Type safety everywhere** — TypeScript types flow from Trino metadata through to GraphQL responses\n","description":"Package dependency diagram and the two main operational flows in LakeQL.","keywords":["package","dependency","diagram","system","overview"]}
{"schemaVersion":"1.0.0","docId":"lakeql/configuration/authentication","source":"lakeql","slug":"configuration/authentication","path":"/docs/lakeql/configuration/authentication","raw_path":"/raw/lakeql/configuration/authentication.md","title":"Authentication","headings":[{"level":2,"text":"Authentication Model","id":"authentication-model"},{"level":2,"text":"Default Mock Authentication","id":"default-mock-authentication"},{"level":2,"text":"Custom getUser Resolver","id":"custom-get-user-resolver"},{"level":2,"text":"JWTPayload Interface","id":"jwt-payload-interface"},{"level":2,"text":"Permission System","id":"permission-system"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/configuration/authentication/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Authentication\ndescription: Configure JWT authentication, mock auth for development, and custom user resolvers.\n---\n\n## Authentication Model\n\nLakeQL uses a pluggable authentication system. The `getUser` function resolves the current user from each incoming request and makes it available in the GraphQL context.\n\n## Default Mock Authentication\n\nFor development, LakeQL provides mock authentication controlled by environment variables:\n\n```bash\nAUTH_MOCK=true\nAUTH_MOCK_TOKEN=\"my-dev-token\"\n```\n\nWhen mock auth is enabled, requests are authenticated if:\n\n1. `AUTH_MOCK` is `true`\n2. The `Authorization` header matches `AUTH_MOCK_TOKEN`\n\nThe user's identity is read from the `x-username` header, falling back to a placeholder if not provided:\n\n```ts\n// Default getUser implementation (simplified)\nasync function getUser(req: Request): Promise<JWTPayload | null> {\n  const authHeader = req.headers.get(\"authorization\")\n\n  if (\n    authHeader &&\n    env.AUTH_MOCK === true &&\n    authHeader === env.AUTH_MOCK_TOKEN\n  ) {\n    return {\n      userName: req.headers.get(\"x-username\") ?? \"###FALLBACK_MOCK_USER###\",\n    }\n  }\n\n  return null\n}\n```\n\n## Custom getUser Resolver\n\nFor production, provide your own `getUser` function that validates real tokens:\n\n```ts\nimport { defineConfig } from \"@lakeql/api\"\nimport { jwtVerify } from \"jose\"\nimport { allConfigs } from \"./config-registry\"\n\nexport default defineConfig({\n  allConfigs,\n  async getUser(req) {\n    const authHeader = req.headers.get(\"authorization\")\n    if (!authHeader?.startsWith(\"Bearer \")) {\n      return null\n    }\n\n    const token = authHeader.slice(7)\n\n    try {\n      const { payload } = await jwtVerify(token, publicKey, {\n        issuer: \"https://auth.example.com\",\n        audience: \"lakeql-api\",\n      })\n\n      return { userName: payload.sub ?? \"unknown\" }\n    } catch {\n      return null\n    }\n  },\n})\n```\n\n## JWTPayload Interface\n\nThe user object conforms to the `JWTPayload` interface from the `jose` library, extended with a `userName` field:\n\n```ts\ninterface JWTPayload {\n  userName: string\n  // ... additional JWT standard claims available\n}\n```\n\nThe `userName` is used by the permission system to look up user-specific access rules.\n\n## Permission System\n\nAfter authentication, LakeQL checks table-level permissions using two functions:\n\n### Read Permission (`hasReadPermission`)\n\n- **Unauthenticated users** — Always denied\n- **Users without explicit rules** — Allowed (Trino enforces access)\n- **Users with Query rules** — Must match the target catalog, schema, and table\n\nThe default-allow model for reads makes sense because human users typically exist in Trino and are subject to Trino's own authorization. Technical users (service accounts) that don't exist in Trino should have explicit rules.\n\n### Write Permission (`hasWritePermission`)\n\n- **Unauthenticated users** — Always denied\n- **Users without explicit rules** — Always denied\n- **Users with Mutation rules** — Must match the target catalog, schema, and table\n\nWrites default to deny because they often execute under a shared system user in Trino.\n\n### Permission Configuration\n\nDefine permissions in `defineConfig`:\n\n```ts\nimport { defineConfig } from \"@lakeql/api\"\nimport { allConfigs } from \"./config-registry\"\n\nexport default defineConfig({\n  allConfigs,\n  permissions: [\n    {\n      name: \"data-ingestion-service\",\n      useSystemUser: true,\n      permissions: {\n        Query: [{ catalog: \"hive\", schema: \"raw\", tables: [\"*\"] }],\n        Mutation: [\n          { catalog: \"hive\", schema: \"raw\", tables: [\"events\", \"users\"] },\n        ],\n      },\n    },\n    {\n      name: \"analytics-dashboard\",\n      useSystemUser: false,\n      permissions: {\n        Query: [{ catalog: \"hive\", schema: \"curated\", tables: [\"*\"] }],\n        Mutation: [],\n      },\n    },\n  ],\n})\n```\n\n<Note>\n  The wildcard `\"*\"` in the tables array grants access to all tables in that\n  catalog/schema combination. Use with caution for write permissions.\n</Note>\n\n### Custom Permission Resolvers\n\nOverride the default permission logic entirely:\n\n```ts\nexport default defineConfig({\n  allConfigs,\n  hasReadPermission({ context, catalog, schema, tableName }) {\n    // Your custom read permission logic\n    return context.currentUser !== null\n  },\n  hasWritePermission({ context, catalog, schema, tableName }) {\n    // Your custom write permission logic\n    return context.currentUser?.userName === \"admin\"\n  },\n})\n```\n","description":"Configure JWT authentication, mock auth for development, and custom user resolvers.","keywords":["authentication","custom","configure","development","resolvers"]}
{"schemaVersion":"1.0.0","docId":"lakeql/configuration/environment-variables","source":"lakeql","slug":"configuration/environment-variables","path":"/docs/lakeql/configuration/environment-variables","raw_path":"/raw/lakeql/configuration/environment-variables.md","title":"Environment Variables","headings":[{"level":2,"text":"Overview","id":"overview"},{"level":2,"text":"Complete Reference","id":"complete-reference"},{"level":2,"text":"Validation Skip Conditions","id":"validation-skip-conditions"},{"level":2,"text":"Programmatic Configuration","id":"programmatic-configuration"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/configuration/environment-variables/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Environment Variables\ndescription: Complete reference for all environment variables used across LakeQL packages.\n---\n\n## Overview\n\nLakeQL validates environment variables at startup using `@t3-oss/env-core` with Zod schemas. Missing or invalid values cause an immediate, descriptive failure so misconfiguration is caught before the server starts handling requests.\n\n## Complete Reference\n\n| Variable                   | Package  | Type      | Required | Default       | Description                                                  |\n| -------------------------- | -------- | --------- | -------- | ------------- | ------------------------------------------------------------ |\n| `HIVE_HOST`                | api, cli | `string`  | Yes      | —             | Trino host URL (e.g. `http://localhost`)                     |\n| `HIVE_PORT`                | api, cli | `number`  | Yes      | —             | Trino HTTP port (e.g. `8080`)                                |\n| `HIVE_USERNAME`            | api, cli | `string`  | Yes      | —             | Trino authentication username                                |\n| `HIVE_PASSWORD`            | api, cli | `string`  | Yes      | —             | Trino authentication password                                |\n| `HIVE_CATALOG`             | api, cli | `string`  | Yes      | —             | Default Trino catalog (e.g. `hive`)                          |\n| `HIVE_SOURCE`              | api, cli | `string`  | No       | —             | Value for `X-Trino-Source` header                            |\n| `API_PORT`                 | api      | `number`  | No       | `4000`        | GraphQL API server port                                      |\n| `API_LOGGER`               | api      | `enum`    | No       | `warn`        | Log level: `silent`, `debug`, `info`, `warn`, `error`        |\n| `API_MAX_RECORDS_PER_PAGE` | api      | `number`  | No       | `2000`        | Maximum records per paginated response                       |\n| `AUTH_MOCK`                | api      | `boolean` | No       | `false`       | Enable mock authentication                                   |\n| `AUTH_MOCK_TOKEN`          | api      | `string`  | No       | —             | Token for mock auth (checked against `Authorization` header) |\n| `NODE_ENV`                 | api      | `enum`    | No       | `development` | Environment: `development`, `production`, `test`             |\n\n## Validation Skip Conditions\n\nEnvironment validation is skipped when:\n\n- Running in CI (`CI` env var is set)\n- Running `npm run lint` or `npm run test` lifecycle events\n\nThis allows tooling like linters and type checkers to run without requiring a full `.env` file.\n\n## Programmatic Configuration\n\nMany settings can also be overridden programmatically via `defineConfig`:\n\n```ts\nimport { defineConfig } from \"@lakeql/api\"\nimport { allConfigs } from \"./config-registry\"\n\nexport default defineConfig({\n  allConfigs,\n  port: 3000, // Overrides API_PORT\n  maxRecordsPerPage: 500, // Overrides API_MAX_RECORDS_PER_PAGE\n  graphqlPath: \"/api/graphql\", // Custom GraphQL endpoint path\n  healthCheckEndpoint: \"/live\", // Health check path\n})\n```\n\nProgrammatic values take precedence over environment variables for runtime configuration like `port`, `maxRecordsPerPage`, `graphqlPath`, and `healthCheckEndpoint`.\n","description":"Complete reference for all environment variables used across LakeQL packages.","keywords":["environment","variables","complete","reference","across"]}
{"schemaVersion":"1.0.0","docId":"lakeql/configuration/trino-connection","source":"lakeql","slug":"configuration/trino-connection","path":"/docs/lakeql/configuration/trino-connection","raw_path":"/raw/lakeql/configuration/trino-connection.md","title":"Trino Connection","headings":[{"level":2,"text":"Connection Configuration","id":"connection-configuration"},{"level":2,"text":"Basic Auth","id":"basic-auth"},{"level":2,"text":"Bearer Auth","id":"bearer-auth"},{"level":2,"text":"Host and Port","id":"host-and-port"},{"level":2,"text":"Default Catalog and Schema","id":"default-catalog-and-schema"},{"level":2,"text":"Source Header","id":"source-header"},{"level":2,"text":"Connection in Code","id":"connection-in-code"},{"level":2,"text":"Query Execution Flow","id":"query-execution-flow"},{"level":2,"text":"Troubleshooting","id":"troubleshooting"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/configuration/trino-connection/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Trino Connection\ndescription: Configure the Trino connection including authentication, host settings, and catalog defaults.\n---\n\n## Connection Configuration\n\nLakeQL connects to Trino via its REST API. Both the CLI (for introspection) and the API server (for query execution) use the same connection settings.\n\n## Basic Auth\n\nThe default authentication method uses username and password, sent as HTTP Basic Auth:\n\n```bash\nHIVE_HOST=\"https://trino.example.com\"\nHIVE_PORT=8446\nHIVE_USERNAME=\"my-user\"\nHIVE_PASSWORD=\"my-password\"\n```\n\nThese credentials are included in every request to Trino as Basic Authentication headers.\n\n## Bearer Auth\n\nFor environments that use token-based authentication (e.g. OAuth2 access tokens), configure the Trino client to send a Bearer token instead. This is handled at the system user level for write operations where the API impersonates a service account.\n\n## Host and Port\n\n| Variable    | Description                                  | Example                                           |\n| ----------- | -------------------------------------------- | ------------------------------------------------- |\n| `HIVE_HOST` | Protocol + hostname of the Trino coordinator | `http://localhost`, `https://trino.prod.internal` |\n| `HIVE_PORT` | HTTP port Trino listens on                   | `8080` (default HTTP), `8446` (typical HTTPS)     |\n\nThe full Trino endpoint is constructed as `${HIVE_HOST}:${HIVE_PORT}/v1/statement`.\n\n<Note>\n  Use `https://` for production Trino instances. The Trino client respects the\n  protocol from `HIVE_HOST` for TLS connections.\n</Note>\n\n## Default Catalog and Schema\n\n```bash\nHIVE_CATALOG=hive\n```\n\nThe `HIVE_CATALOG` variable sets the default catalog context. This is used:\n\n- By the CLI when no `--catalog` flag is provided\n- By the API server as the default catalog for generated queries\n- In the `X-Trino-Catalog` session header\n\n## Source Header\n\n```bash\nHIVE_SOURCE=\"lakeql\"\n```\n\nThe optional `HIVE_SOURCE` variable sets the `X-Trino-Source` header, which identifies the application in Trino's query logs. Useful for monitoring and debugging which system submitted a given query.\n\n## Connection in Code\n\nThe Trino client is configured internally using these environment variables:\n\n```ts\nimport Bourne from \"@hapi/bourne\"\nimport got from \"got\"\n\n// The client sends requests to Trino's statement endpoint\nconst endpoint = `${env.HIVE_HOST}:${env.HIVE_PORT}/v1/statement`\n\n// Requests include authentication and session headers\nconst headers = {\n  \"X-Trino-User\": env.HIVE_USERNAME,\n  \"X-Trino-Source\": env.HIVE_SOURCE ?? \"lakeql\",\n  \"X-Trino-Catalog\": env.HIVE_CATALOG,\n  Authorization: `Basic ${btoa(`${env.HIVE_USERNAME}:${env.HIVE_PASSWORD}`)}`,\n}\n```\n\n## Query Execution Flow\n\nWhen the API server submits a query to Trino:\n\n1. **POST** the SQL statement to `/v1/statement`\n2. Trino returns a response with a `nextUri` if results aren't ready yet\n3. The client **polls** the `nextUri` until the query completes\n4. Final response contains column metadata and row data as arrays\n\n<Warning>\n  Large result sets are streamed by Trino across multiple pages. The LakeQL\n  trino-client handles pagination internally, but long-running queries may time\n  out depending on your Trino cluster configuration. Use `paging.limit` in your\n  GraphQL queries to keep result sizes manageable.\n</Warning>\n\n## Troubleshooting\n\n| Symptom                  | Likely Cause                                                 |\n| ------------------------ | ------------------------------------------------------------ |\n| `ECONNREFUSED`           | Trino is not running or `HIVE_HOST`/`HIVE_PORT` is wrong     |\n| `401 Unauthorized`       | Invalid `HIVE_USERNAME` or `HIVE_PASSWORD`                   |\n| `Catalog does not exist` | `HIVE_CATALOG` doesn't match a configured Trino catalog      |\n| `Schema does not exist`  | The target schema hasn't been created in Trino yet           |\n| Query timeout            | Trino cluster is overloaded or the query scans too much data |\n","description":"Configure the Trino connection including authentication, host settings, and catalog defaults.","keywords":["connection","trino","catalog","configure","including"]}
{"schemaVersion":"1.0.0","docId":"lakeql/contributing/contribution-guide","source":"lakeql","slug":"contributing/contribution-guide","path":"/docs/lakeql/contributing/contribution-guide","raw_path":"/raw/lakeql/contributing/contribution-guide.md","title":"Contribution Guide","headings":[{"level":2,"text":"Getting Started","id":"getting-started"},{"level":2,"text":"Project Structure","id":"project-structure"},{"level":2,"text":"Development Workflow","id":"development-workflow"},{"level":2,"text":"Code Style","id":"code-style"},{"level":2,"text":"Changesets","id":"changesets"},{"level":2,"text":"Commits and PRs","id":"commits-and-p-rs"},{"level":2,"text":"Adding a New Dataset Template","id":"adding-a-new-dataset-template"},{"level":2,"text":"Running Tests","id":"running-tests"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/contributing/contribution-guide/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Contribution Guide\ndescription: Guidelines for contributing to the LakeQL project.\n---\n\n## Getting Started\n\n1. Fork and clone the repository\n2. Follow the [Local Development](/docs/lakeql/contributing/local-development) guide to set up your environment\n3. Create a feature branch from `main`\n\n## Project Structure\n\n```\nlakeql/\n├── apps/\n│   └── docs/              # Documentation site\n├── packages/\n│   ├── adapters/          # Storage adapters (Hive/S3, future: Iceberg, ClickHouse)\n│   ├── api/               # GraphQL API (Pothos + Yoga)\n│   ├── cli/               # CLI tool\n│   ├── query-builder/     # SQL query builder (Kysely-based)\n│   ├── trino-client/      # Trino REST API client\n│   └── ...\n├── tooling/\n│   └── test-data/         # Test data generation + seed tooling\n├── .minitrino/            # Local Trino cluster configuration\n└── templates/             # App scaffolding templates\n```\n\n## Development Workflow\n\n```bash\n# Start local infrastructure\npnpm mt:start\npnpm seed --all\n\n# Run the development server\npnpm dev:backend\n\n# Run tests\npnpm test\n\n# Type checking\npnpm typecheck\n\n# Linting\npnpm lint\n```\n\n## Code Style\n\n- TypeScript strict mode is required\n- Follow existing patterns in the codebase\n\nBefore committing, run:\n\n```bash\n# Lint (check)\npnpm lint\n\n# Lint (auto-fix)\npnpm lint:fix\n\n# Format (check)\npnpm format\n\n# Format (auto-fix)\npnpm format:fix\n```\n\n## Changesets\n\nWe use [changesets](https://github.com/changesets/changesets) to manage versioning and changelogs for published packages.\nWhen your change affects a published `@lakeql/*` package, create a changeset:\n\n```bash\npnpm cs\n```\n\nThis opens an interactive prompt to select affected packages and describe the change.\nThe changeset file is committed alongside your code.\n\nWhen a changeset is **not** needed:\n\n- Changes to `tooling/`, docs, or dev infrastructure\n- Changes to `private: true` packages\n\nWhen a changeset **is** needed:\n\n- Any change to a published `@lakeql/*` package — including bug fixes, features, and refactors\n\n## Commits and PRs\n\n- Keep commits focused on a single change\n- Use clear commit messages describing the \"what\" and \"why\"\n- PRs should reference any related issues\n\n## Adding a New Dataset Template\n\nThe seed system uses dataset templates to generate test data for local development.\nEach template defines a set of Trino columns and a generator function that produces Parquet files.\nIf you need test data with a different structure — for example when working on a new feature that requires a specific schema — you can add a new template.\n\n1. Create a new file in `tooling/test-data/src/datasets/`\n2. Export `columns` (Trino column definitions) and `generate` (Parquet generator function)\n3. Import and use in `tooling/test-data/seed.config.ts`\n\n```ts\n// tooling/test-data/src/datasets/my-dataset.ts\nimport type { ColumnDefinition } from \"../seed/config\"\n\nexport const myColumns: ColumnDefinition[] = [\n  { name: \"id\", type: \"BIGINT\" },\n  { name: \"value\", type: \"VARCHAR\" },\n]\n\nexport async function myGenerate(\n  amount: number,\n  targetDir: string\n): Promise<string> {\n  // Generate parquet file, return path\n}\n```\n\n## Running Tests\n\n```bash\n# All tests\npnpm test\n\n# Specific package\npnpm -F @lakeql/trino-client test\n\n# With coverage\npnpm test -- --coverage\n```\n","description":"Guidelines for contributing to the LakeQL project.","keywords":["project","contribution","guide","guidelines","contributing"]}
{"schemaVersion":"1.0.0","docId":"lakeql/contributing/local-development","source":"lakeql","slug":"contributing/local-development","path":"/docs/lakeql/contributing/local-development","raw_path":"/raw/lakeql/contributing/local-development.md","title":"Local Development","headings":[{"level":2,"text":"Requirements","id":"requirements"},{"level":2,"text":"Quick Start","id":"quick-start"},{"level":2,"text":"Minitrino","id":"minitrino"},{"level":2,"text":"Seeding Test Data","id":"seeding-test-data"},{"level":2,"text":"Querying Trino","id":"querying-trino"},{"level":2,"text":"Connection Details","id":"connection-details"},{"level":2,"text":"Default Users","id":"default-users"},{"level":2,"text":"Troubleshooting","id":"troubleshooting"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/contributing/local-development/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Local Development\ndescription: Set up a local Trino/MinIO environment with test data for LakeQL development.\n---\n\n## Requirements\n\n- [uv](https://docs.astral.sh/uv/) — Python package manager (for minitrino)\n- [Docker](https://www.docker.com/) — container runtime\n- [pnpm](https://pnpm.io/) — Node.js package manager\n\n## Quick Start\n\n```bash\n# 1. Install dependencies\nuv sync\npnpm install\n\n# 2. Start the local Trino cluster (Hive, MinIO, LDAP, OAuth2)\npnpm mt:start\n\n# 3. Seed test data\npnpm seed --all\n\n# 4. Verify\npnpm query \"SELECT count(*) FROM hive.test.products\"\n```\n\n## Minitrino\n\nThe local environment uses [minitrino](https://minitrino.readthedocs.io/) to provision a Trino cluster with all necessary modules. The minitrino library configuration is stored in `.minitrino/lib/` within the project root — every developer gets the same setup.\n\n### Commands\n\n| Command           | Description                                         |\n| ----------------- | --------------------------------------------------- |\n| `pnpm mt:start`   | Provision and start the cluster (inkl. MinIO proxy) |\n| `pnpm mt:restart` | Restart the cluster                                 |\n| `pnpm mt:stop`    | Stop the cluster (keep volumes)                     |\n| `pnpm mt:clean`   | Stop and remove everything                          |\n\n### MinIO S3 Proxy\n\nMinitrino only exposes the MinIO Console (port 9001) by default. The `mt:start` command automatically starts a lightweight proxy that makes the S3 API available on `localhost:9000` for local tooling.\n\n## Seeding Test Data\n\nThe seed command generates Parquet files, uploads them to MinIO, and creates the corresponding Trino schemas and tables.\n\n### Commands\n\n| Command                         | Description                         |\n| ------------------------------- | ----------------------------------- |\n| `pnpm seed --all`               | Seed all definitions                |\n| `pnpm seed -d <name>`           | Seed a specific definition          |\n| `pnpm seed -d <name> -d <name>` | Seed multiple definitions           |\n| `pnpm seed --all --amount 500`  | Custom record count (default: 1000) |\n\nEach run is a full reset — existing table and data are replaced with freshly generated records.\n\n### Configuration\n\nSeed definitions are defined in `tooling/test-data/seed.config.ts`:\n\n```ts\nimport { defineSeeds } from \"./src/seed/config\"\nimport { simpleColumns, simpleGenerate } from \"./src/datasets/simple\"\nimport { complexColumns, complexGenerate } from \"./src/datasets/complex\"\n\nexport default defineSeeds([\n  {\n    name: \"products\",\n    schema: \"test\",\n    table: \"products\",\n    connector: \"hive\",\n    columns: simpleColumns,\n    generate: simpleGenerate,\n  },\n  {\n    name: \"orders\",\n    schema: \"test\",\n    table: \"orders\",\n    connector: \"hive\",\n    columns: complexColumns,\n    generate: complexGenerate,\n  },\n])\n```\n\nTo add a custom dataset, define `columns` and `generate` inline or create a new file under `tooling/test-data/src/datasets/`.\n\n## Querying Trino\n\nA lightweight query tool is included for quick verification:\n\n```bash\n# Table output (default)\npnpm query \"SELECT * FROM hive.test.products LIMIT 5\"\n\n# JSON output\npnpm query \"SELECT * FROM hive.test.orders LIMIT 3\" -f json\n\n# CSV output\npnpm query \"SELECT count(*) FROM hive.test.products\" -f csv\n```\n\n## Connection Details\n\n| Service       | URL                                 | Credentials                 |\n| ------------- | ----------------------------------- | --------------------------- |\n| Trino         | `http://localhost:8080`             | user: `admin`, no password  |\n| Trino UI      | `https://localhost:8443/ui`         | `admin@minitrino.com`       |\n| MinIO Console | `http://localhost:9001`             | `access-key` / `secret-key` |\n| MinIO S3 API  | `http://localhost:9000` (via proxy) | `access-key` / `secret-key` |\n\n<Note>\n  The Trino UI requires `https://` since the setup uses OAuth2. Make sure your\n  browser allows bypassing the `net::ERR_CERT_AUTHORITY_INVALID` error.\n</Note>\n\n## Default Users\n\nThe local setup uses minitrino's default users for LDAP and OAuth:\n\n- [LDAP users](https://minitrino.readthedocs.io/en/latest/modules/security/ldap.html#default-usernames-and-passwords)\n- [OAuth2 principals](https://minitrino.readthedocs.io/en/latest/modules/security/oauth2.html#default-oauth2-principals)\n\n## Troubleshooting\n\n### Seed fails with \"bucket does not exist\"\n\nEnsure the MinIO proxy is running. It starts automatically with `pnpm mt:start`, but you can restart it manually with `pnpm mt:proxy`.\n\n### Seed fails with \"401 Unauthorized\"\n\nTrino uses user-only auth (no password) for local development. This should work out of the box with minitrino defaults.\n\n### Query fails with connection error\n\nMake sure minitrino is running: `pnpm mt:start`\n","description":"Set up a local Trino/MinIO environment with test data for LakeQL development.","keywords":["local","development","trinominio","environment","lakeql"]}
{"schemaVersion":"1.0.0","docId":"lakeql/create-app/post-creation","source":"lakeql","slug":"create-app/post-creation","path":"/docs/lakeql/create-app/post-creation","raw_path":"/raw/lakeql/create-app/post-creation.md","title":"Post-Creation Steps","headings":[{"level":2,"text":"After Scaffolding","id":"after-scaffolding"},{"level":2,"text":"Common Next Steps","id":"common-next-steps"},{"level":2,"text":"Troubleshooting","id":"troubleshooting"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/create-app/post-creation/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Post-Creation Steps\ndescription: What to do after scaffolding a new LakeQL project.\n---\n\n## After Scaffolding\n\nOnce `@lakeql/create-app` finishes, follow these steps to get your API running.\n\n<Stepper>\n<StepperItem title=\"Configure environment\">\n\nCopy the example env file and fill in your Trino connection details:\n\n```bash\ncd my-api\ncp .env.example .env\n```\n\nEdit `.env`:\n\n```bash\nHIVE_HOST=\"http://localhost\"\nHIVE_PORT=8080\nHIVE_CATALOG=hive\nHIVE_USERNAME=\"trino\"\nHIVE_PASSWORD=\"your-password\"\n\nAUTH_MOCK=true\nAUTH_MOCK_TOKEN=\"dev-token\"\n\nAPI_PORT=4000\nAPI_LOGGER=\"info\"\n```\n\n</StepperItem>\n<StepperItem title=\"Pull schemas from Trino\">\n\nGenerate TypeScript files from your Trino tables:\n\n<Command variant=\"run\">cli pull --catalog hive --schema myschema</Command>\n\nThis creates:\n\n- `src/schemas/generated/hive/myschema/<table>/config.ts` — Table metadata\n- `src/schemas/generated/hive/myschema/<table>/interface.ts` — TypeScript interface\n- `src/schemas/generated/hive/myschema/<table>/query-schema.ts` — GraphQL query schema\n- `src/schemas/generated/hive/myschema/<table>/mutation-schema.ts` — GraphQL mutation schema\n- `src/schemas/generated/hive/myschema/<table>/json-schema.json` — Response schema\n- `src/schemas/generated/hive/myschema/<table>/endpoint.json` — Endpoint definition\n- `src/config-registry.ts` — Aggregated config index\n\nTo pull multiple tables at once, pass `--table` multiple times:\n\n<Command variant=\"run\">\n  cli pull --catalog hive --schema sales --table orders --table customers\n  --table products\n</Command>\n\n</StepperItem>\n<StepperItem title=\"Start the development server\">\n\nLaunch the development server:\n\n<Command variant=\"run\">dev</Command>\n\nYou should see:\n\n```\n* Server URL: http://localhost:4000/\n* GraphQL Endpoint: http://localhost:4000/graphql\n```\n\n</StepperItem>\n<StepperItem title=\"Open GraphiQL\">\n\nNavigate to [http://localhost:4000/graphql](http://localhost:4000/graphql) in your browser.\n\nSet the authorization header in GraphiQL's headers panel:\n\n```json\n{\n  \"Authorization\": \"dev-token\"\n}\n```\n\nRun a test query:\n\n```graphql\nquery {\n  __schema {\n    queryType {\n      fields {\n        name\n      }\n    }\n  }\n}\n```\n\nThis returns all available query fields — one for each table you pulled.\n\n</StepperItem>\n</Stepper>\n\n## Common Next Steps\n\n### Pull additional schemas\n\nAs you add tables to your lakehouse, pull them into the project:\n\n<Command variant=\"run\">cli pull --catalog hive --schema new_schema</Command>\n\n### Configure permissions\n\nSet up user-level read/write permissions for production. See [Authentication](/docs/lakeql/configuration/authentication).\n\n## Troubleshooting\n\n| Issue                                  | Solution                                                                                |\n| -------------------------------------- | --------------------------------------------------------------------------------------- |\n| `pull` fails with connection error     | Verify `HIVE_HOST` and `HIVE_PORT` in `.env`. Check Trino is running.                   |\n| Server starts but no queries available | Run `pnpm cli pull` first. Check that `schemaPath` points to the correct directory.     |\n| Auth errors in GraphiQL                | Set `AUTH_MOCK=true` and include the `Authorization` header matching `AUTH_MOCK_TOKEN`. |\n| Generated types have wrong fields      | Re-run `pull` to sync with current Trino table structure.                               |\n","description":"What to do after scaffolding a new LakeQL project.","keywords":["steps","after","scaffolding","post-creation","lakeql"]}
{"schemaVersion":"1.0.0","docId":"lakeql/create-app/template-structure","source":"lakeql","slug":"create-app/template-structure","path":"/docs/lakeql/create-app/template-structure","raw_path":"/raw/lakeql/create-app/template-structure.md","title":"Template Structure","headings":[{"level":2,"text":"Project Layout","id":"project-layout"},{"level":2,"text":"File Details","id":"file-details"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/create-app/template-structure/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Template Structure\ndescription: Understand the project layout created by `@lakeql/create-app`.\n---\n\n## Project Layout\n\nAfter scaffolding, your project has this structure:\n\n```\nmy-lakeql-project/\n├── src/\n│   ├── index.ts             # Entry point — starts the API server\n│   ├── config.ts            # defineConfig with runtime options\n│   ├── auth.ts              # Custom getUser resolver\n│   ├── permissions.ts       # Permission rules for technical users\n│   ├── env.ts               # Environment variable validation (t3-env + Zod)\n│   ├── config-registry.ts   # Auto-generated config index (empty initially)\n│   └── schemas/             # Generated schemas go here after pull\n│       └── generated/\n├── .env.example             # Template environment variables\n├── .nvmrc                   # Node.js major version used by the template\n├── package.json             # Dependencies and scripts\n├── tsconfig.json            # TypeScript configuration\n├── tsdown.config.ts         # Build configuration\n└── lakeql.config.json       # LakeQL CLI configuration\n```\n\n## File Details\n\n### src/index.ts\n\nThe entry point that starts the API server using the defined configuration:\n\n```ts\nimport { config } from \"./config\"\n\nawait config.startServer()\n```\n\n### src/config.ts\n\nConfigures the API server with `defineConfig`. This is where you wire together auth, permissions, and schema loading:\n\n```ts\nimport { defineConfig } from \"@lakeql/api/config\"\n\nimport { getUser } from \"./auth\"\nimport { allConfigs } from \"./config-registry\"\nimport { permissions } from \"./permissions\"\n\nconst baseDir = import.meta.dirname\n\nexport const config = defineConfig({\n  allConfigs,\n  baseDir,\n  getUser,\n  graphqlPath: \"/graphql\",\n  healthCheckEndpoint: \"/live\",\n  permissions,\n  port: 4000,\n  schemaPath: \"./schemas\",\n})\n```\n\nThe `schemaPath` is resolved relative to `baseDir` (the `src/` directory). Generated schemas placed in `src/schemas/` are loaded automatically at startup.\n\n### src/auth.ts\n\nA starter authentication resolver. By default, it supports mock auth for local development:\n\n```ts\nimport type { GetUserResolver } from \"@lakeql/api/types\"\n\nimport { env } from \"./env\"\n\nexport const getUser: GetUserResolver = async (request) => {\n  const authHeader = request.headers.get(\"authorization\")\n\n  if (authHeader) {\n    if (env.AUTH_MOCK && authHeader === env.AUTH_MOCK_TOKEN) {\n      return { userName: \"testuser\" }\n    }\n    return null\n  }\n\n  return null\n}\n```\n\nReplace this with real JWT verification (e.g. using `jose`) for production.\n\n### src/permissions.ts\n\nDefines per-user permission rules for technical users. Initially empty — add rules as needed:\n\n```ts\nimport { createPermission as createPermissionFromApi } from \"@lakeql/api/helpers\"\n\nimport { allConfigs } from \"./config-registry\"\n\nconst createPermission = createPermissionFromApi(allConfigs)\n\nexport const permissions = [\n  // {\n  //   name: \"testuser\",\n  //   useSystemUser: false,\n  //   permissions: {\n  //     Query: [createPermission(\"hive\", \"schema_name\", [\"table_name\"])],\n  //     Mutation: [],\n  //   },\n  // },\n]\n```\n\n### src/env.ts\n\nValidates environment variables at startup using `@t3-oss/env-core` with Zod schemas. If a required variable is missing or invalid, the process exits with a clear error message:\n\n```ts\nimport { createEnv } from \"@t3-oss/env-core\"\nimport { coerce, number, string, enum as zodEnum } from \"zod\"\n\nexport const env = createEnv({\n  runtimeEnv: process.env,\n  server: {\n    API_LOGGER: zodEnum([\"debug\", \"info\", \"warn\", \"error\", \"silent\"]).default(\n      \"warn\"\n    ),\n    API_PORT: coerce.number().default(4000),\n    AUTH_MOCK: coerce.boolean().default(false),\n    AUTH_MOCK_TOKEN: string().optional(),\n    HIVE_CATALOG: string().min(1),\n    HIVE_HOST: string(),\n    HIVE_PASSWORD: string().min(1),\n    HIVE_PORT: string()\n      .transform((s) => Number.parseInt(s, 10))\n      .pipe(number()),\n    HIVE_SOURCE: string().optional(),\n    HIVE_USERNAME: string().min(1),\n    NODE_ENV: zodEnum([\"development\", \"production\", \"test\"]).default(\n      \"development\"\n    ),\n  },\n})\n```\n\n### src/config-registry.ts\n\nAuto-generated by `lakeql-cli create-registry` (or automatically after `pull`). Initially empty — populated once you pull schemas:\n\n```ts\nexport const allConfigs = [] as const\n\nexport type AvailableCatalogs = (typeof allConfigs)[number][\"catalog\"]\nexport type AvailableSchemas = (typeof allConfigs)[number][\"schema\"]\nexport type AvailableTables = (typeof allConfigs)[number][\"tableName\"]\n```\n\nAfter pulling schemas, it looks like:\n\n```ts\nimport { ordersConfig } from \"./schemas/generated/hive/sales/orders/config\"\n\nexport const allConfigs = [ordersConfig] as const\n```\n\n### .env.example\n\nTemplate for required environment variables:\n\n```bash\n######\n# Hive Database\n######\nHIVE_HOST=\"http://localhost\"\nHIVE_PORT=8080\nHIVE_CATALOG=hive\nHIVE_USERNAME=admin\nHIVE_PASSWORD=\"secret123\"\nHIVE_SOURCE=\"lakeql\"\n\n######\n# Mock\n######\nAUTH_MOCK=true\nAUTH_MOCK_TOKEN=\"1234\"\n\n######\n# GraphQL API\n######\nAPI_PORT=4000\nAPI_LOGGER=\"warn\"\n```\n\n### package.json\n\nPre-configured with LakeQL packages and development tooling:\n\n```json\n{\n  \"name\": \"lakeql-app\",\n  \"type\": \"module\",\n  \"scripts\": {\n    \"build\": \"tsdown\",\n    \"dev\": \"pnpm with-env tsx ./src/index.ts\",\n    \"preview\": \"pnpm with-env node ./dist/index.mjs\",\n    \"start\": \"node ./dist/index.mjs\",\n    \"typecheck\": \"tsc --noEmit\",\n    \"with-env\": \"dotenv -e ./.env --\",\n    \"cli\": \"pnpm with-env lakeql-cli\"\n  },\n  \"dependencies\": {\n    \"@lakeql/api\": \"^latest\",\n    \"@lakeql/query-builder\": \"^latest\",\n    \"@lakeql/trino-client\": \"^latest\",\n    \"@t3-oss/env-core\": \"^latest\",\n    \"zod\": \"^latest\"\n  },\n  \"devDependencies\": {\n    \"@lakeql/cli\": \"^latest\",\n    \"dotenv-cli\": \"^latest\",\n    \"tsdown\": \"^latest\",\n    \"tsx\": \"^latest\",\n    \"typescript\": \"^latest\"\n  },\n  \"engines\": {\n    \"node\": \">=24\"\n  }\n}\n```\n\nKey scripts:\n\n| Script    | Command                               | Purpose                          |\n| --------- | ------------------------------------- | -------------------------------- |\n| `dev`     | `pnpm with-env tsx ./src/index.ts`    | Development with env vars loaded |\n| `build`   | `tsdown`                              | Production build                 |\n| `start`   | `node ./dist/index.mjs`               | Run production build             |\n| `preview` | `pnpm with-env node ./dist/index.mjs` | Run production build with .env   |\n| `cli`     | `pnpm with-env lakeql-cli`            | LakeQL CLI with env vars loaded  |\n\n### lakeql.config.json\n\nConfigures the CLI's code generation output path:\n\n```json\n{\n  \"sourcePath\": \"src\"\n}\n```\n\nThis tells the CLI to write generated files relative to `src/`, resulting in paths like `src/schemas/generated/{catalog}/{schema}/{table}/`.\n","description":"Understand the project layout created by `@lakeql/create-app`.","keywords":["project","layout","template","structure","understand"]}
{"schemaVersion":"1.0.0","docId":"lakeql/create-app/usage","source":"lakeql","slug":"create-app/usage","path":"/docs/lakeql/create-app/usage","raw_path":"/raw/lakeql/create-app/usage.md","title":"Usage","headings":[{"level":2,"text":"Scaffolding a New Project","id":"scaffolding-a-new-project"},{"level":2,"text":"Quick Start","id":"quick-start"},{"level":2,"text":"Interactive Mode","id":"interactive-mode"},{"level":2,"text":"CLI Flags","id":"cli-flags"},{"level":2,"text":"What Happens","id":"what-happens"},{"level":2,"text":"Project Name Validation","id":"project-name-validation"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/create-app/usage/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Usage\ndescription: How to use `@lakeql/create-app` to scaffold a new project.\n---\n\n## Scaffolding a New Project\n\n`@lakeql/create-app` is LakeQL's project scaffolding tool. It downloads a starter template, configures dependencies, and sets up your project structure.\n\n## Quick Start\n\n<Command variant=\"create\">@lakeql/app my-api</Command>\n\n## Interactive Mode\n\nWhen run without arguments, the CLI prompts you for:\n\n1. **Project name** — lowercase letters, numbers, hyphens, and underscores\n2. **Install dependencies** — yes/no\n3. **Package manager** — npm, pnpm, or yarn\n\n<Command variant=\"create\">@lakeql/create-app@latest</Command>\n\n```\n◆  create-lakeql-app\n│\n◇  What is your project name?\n│  my-lakeql-api\n│\n◇  Install dependencies?\n│  Yes\n│\n◇  Which package manager?\n│  pnpm\n│\n◇  Template downloaded successfully!\n◇  Package.json updated!\n◇  Dependencies installed!\n│\n└  🎉 Project created successfully!\n```\n\n## CLI Flags\n\n| Flag                                   | Description                                        |\n| -------------------------------------- | -------------------------------------------------- |\n| `[project-name]`                       | Project name as positional argument (skips prompt) |\n| `--package-manager=<pm>` or `-pm=<pm>` | Package manager: `npm`, `pnpm`, `yarn`, `bun`      |\n| `--no-install`                         | Skip dependency installation                       |\n| `--quiet` or `-q`                      | Suppress output (useful for scripts/CI)            |\n\n### Examples\n\n```bash\n# Non-interactive with pnpm\nnpx @lakeql/create-app@latest my-api --package-manager=pnpm\n\n# Skip install for CI\nnpx @lakeql/create-app@latest my-api --no-install --quiet\n\n# Short form\nnpx @lakeql/create-app@latest my-api -pm=yarn\n```\n\n## What Happens\n\n<Stepper>\n<StepperItem title=\"Download template\">\n\nThe tool downloads the app template from the LakeQL GitHub repository using [giget](https://github.com/unjs/giget). No git clone required.\n\n</StepperItem>\n<StepperItem title=\"Update package.json\">\n\nThe `package.json` is updated with:\n\n- Your chosen project name\n- Latest published versions of `@lakeql/api` and `@lakeql/cli` (fetched from npm registry)\n- Removal of workspace-specific fields\n\n</StepperItem>\n<StepperItem title=\"Install dependencies\">\n\nIf selected, dependencies are installed with your chosen package manager.\n\n</StepperItem>\n</Stepper>\n\n## Project Name Validation\n\nProject names must match the pattern: `^[a-z0-9-_]+$`\n\n- Lowercase letters, numbers, hyphens, and underscores only\n- No spaces or special characters\n- The name becomes both the directory name and the `name` field in `package.json`\n\n<Warning>\n  The scaffolding tool will exit with an error if the target directory already\n  exists. Choose a new name or remove the existing directory first.\n</Warning>\n","description":"How to use `@lakeql/create-app` to scaffold a new project.","keywords":["project","usage","lakeqlcreate-app","scaffold","scaffolding"]}
{"schemaVersion":"1.0.0","docId":"lakeql/getting-started/environment-configuration","source":"lakeql","slug":"getting-started/environment-configuration","path":"/docs/lakeql/getting-started/environment-configuration","raw_path":"/raw/lakeql/getting-started/environment-configuration.md","title":"Environment Configuration","headings":[{"level":2,"text":"Environment Variables","id":"environment-variables"},{"level":2,"text":"Example .env File","id":"example-env-file"},{"level":2,"text":"Validation Behavior","id":"validation-behavior"},{"level":2,"text":"Docker Environment","id":"docker-environment"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/getting-started/environment-configuration/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Environment Configuration\ndescription: Complete reference for all environment variables used by LakeQL packages.\n---\n\n## Environment Variables\n\nLakeQL uses environment variables for configuration. All variables are validated at startup using Zod schemas via `@t3-oss/env-core`.\n\n### Trino Connection\n\n| Variable        | Type   | Default  | Description                                                             |\n| --------------- | ------ | -------- | ----------------------------------------------------------------------- |\n| `HIVE_HOST`     | string | —        | Trino host URL (e.g. `http://localhost` or `https://trino.example.com`) |\n| `HIVE_PORT`     | number | —        | Trino HTTP port (e.g. `8080` or `8446`)                                 |\n| `HIVE_USERNAME` | string | —        | Username for Trino authentication                                       |\n| `HIVE_PASSWORD` | string | —        | Password for Trino authentication                                       |\n| `HIVE_CATALOG`  | string | —        | Default Trino catalog (e.g. `hive`)                                     |\n| `HIVE_SOURCE`   | string | `lakeql` | Value for the `X-Trino-Source` header                                   |\n\n### API Server\n\n| Variable                   | Type   | Default | Description                                           |\n| -------------------------- | ------ | ------- | ----------------------------------------------------- |\n| `API_PORT`                 | number | `4000`  | Port the GraphQL API server listens on                |\n| `API_LOGGER`               | enum   | `warn`  | Log level: `silent`, `debug`, `info`, `warn`, `error` |\n| `API_MAX_RECORDS_PER_PAGE` | number | `2000`  | Maximum records returned per page                     |\n\n### Authentication\n\n| Variable          | Type    | Default | Description                                 |\n| ----------------- | ------- | ------- | ------------------------------------------- |\n| `AUTH_MOCK`       | boolean | `false` | Enable mock authentication for development  |\n| `AUTH_MOCK_TOKEN` | string  | —       | Token value that grants access in mock mode |\n\n### Runtime\n\n| Variable   | Type | Default       | Description                                      |\n| ---------- | ---- | ------------- | ------------------------------------------------ |\n| `NODE_ENV` | enum | `development` | Environment: `development`, `production`, `test` |\n\n## Example .env File\n\n```bash\n# Trino Connection\nHIVE_HOST=\"http://localhost\"\nHIVE_PORT=8080\nHIVE_CATALOG=hive\nHIVE_USERNAME=\"trino\"\nHIVE_PASSWORD=\"trino-password\"\nHIVE_SOURCE=\"lakeql\"\n\n# Authentication\nAUTH_MOCK=true\nAUTH_MOCK_TOKEN=\"my-dev-token\"\n\n# API Server\nAPI_PORT=4000\nAPI_LOGGER=\"info\"\nAPI_MAX_RECORDS_PER_PAGE=500\n\n# Runtime\nNODE_ENV=development\n```\n\n## Validation Behavior\n\nEnvironment variables are validated on startup. If a required variable is missing or has an invalid value, the server fails fast with a descriptive error message.\n\n<Note>\n  Validation is skipped in CI environments and during `lint` or `test` npm\n  lifecycle events to avoid requiring full configuration for development\n  tooling.\n</Note>\n\n## Docker Environment\n\nWhen running with Docker Compose (e.g. the local lakehouse stack), use a separate `.env.docker` file:\n\n```bash\nHIVE_HOST=\"http://trino\"\nHIVE_PORT=8080\nHIVE_CATALOG=hive\nHIVE_USERNAME=\"admin\"\nHIVE_PASSWORD=\"admin\"\n\nMINIO_ROOT_USER=admin\nMINIO_ROOT_PASSWORD=password\nMINIO_BUCKET=warehouse\n\nPOSTGRES_DB=metastore\nPOSTGRES_USER=hive\nPOSTGRES_PASSWORD=hive\n\nS3_ENDPOINT=http://minio:9000\nS3_REGION=eu-central-1\n```\n","description":"Complete reference for all environment variables used by LakeQL packages.","keywords":["environment","variables","configuration","complete","reference"]}
{"schemaVersion":"1.0.0","docId":"lakeql/getting-started/first-run","source":"lakeql","slug":"getting-started/first-run","path":"/docs/lakeql/getting-started/first-run","raw_path":"/raw/lakeql/getting-started/first-run.md","title":"First Run","headings":[{"level":2,"text":"Pulling a Table Schema","id":"pulling-a-table-schema"},{"level":2,"text":"Starting the API Server","id":"starting-the-api-server"},{"level":2,"text":"Executing a Query","id":"executing-a-query"},{"level":2,"text":"What Happens Under the Hood","id":"what-happens-under-the-hood"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/getting-started/first-run/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: First Run\ndescription: Walk through pulling your first schema, understanding the generated files, and executing a GraphQL query.\n---\n\n## Pulling a Table Schema\n\nAfter configuring your environment variables, pull a schema from Trino:\n\n<Command variant=\"run\">cli pull --catalog hive --schema sales</Command>\n\nThis command introspects every table in `hive.sales` and generates five files per table.\n\n### Generated Files\n\nFor a table called `orders`, the CLI creates:\n\n```\nsrc/schemas/generated/hive/sales/orders/\n├── config.ts            # Table metadata and docs settings\n├── interface.ts         # TypeScript interface for the table\n├── query-schema.ts      # Pothos query schema with resolvers\n├── json-schema.json     # JSON Schema for response transformation\n└── endpoint.json # Endpoint definition (for re-generation)\n```\n\n**config.ts** — Contains the table's catalog, schema, table name, and docs settings:\n\n```ts\nexport const hiveConfig = {\n  catalog: \"hive\",\n  schema: \"sales\",\n  tableName: \"orders\",\n} as const\n\nexport const docsConfig = {\n  query: true,\n  mutation: false,\n  queryName: \"orders\",\n  mutationName: null,\n} as const\n```\n\n**interface.ts** — A TypeScript interface matching the table's column types:\n\n```ts\nexport interface Orders {\n  id: number\n  customer_id: number\n  status: string\n  total: number\n  created_at: Date\n}\n```\n\n**query-schema.ts** — A Pothos query schema defining the GraphQL type, comparison inputs, and resolver with filtering, sorting, and pagination:\n\n```ts\nimport { builder } from \"@lakeql/api\"\nimport { hiveConfig } from \"./config\"\n\n// Defines: OrdersType, OrdersFilter, OrdersSort, OrdersConnection\n// Registers: Query.orders(filter, sorting, paging) → OrdersConnection\n```\n\n**json-schema.json** — Used at runtime by `@lakeql/response-transformer` to map Trino's array responses into typed objects:\n\n```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"id\": { \"type\": \"integer\" },\n    \"customer_id\": { \"type\": \"integer\" },\n    \"status\": { \"type\": \"string\" },\n    \"total\": { \"type\": \"number\" },\n    \"created_at\": { \"type\": \"string\" }\n  }\n}\n```\n\n## Starting the API Server\n\nWith schemas generated, start the server:\n\n<Command variant=\"run\">dev</Command>\n\nThe server loads all query schemas from the configured `schemaPath` directory and registers them with the Pothos builder. On startup you'll see:\n\n```\n* Server URL: http://localhost:4000/\n* GraphQL Endpoint: http://localhost:4000/graphql\n```\n\n## Executing a Query\n\nOpen GraphiQL at `http://localhost:4000/graphql` and run:\n\n```graphql\nquery {\n  orders(\n    filter: { and: [{ status: { eq: \"shipped\" } }] }\n    paging: { limit: 10, offset: 0 }\n    sorting: [{ field: \"created_at\", direction: \"DESC\" }]\n  ) {\n    totalCount\n    pageInfo {\n      hasNext\n      hasPrevious\n      currentPage\n      maxPages\n      nextPage\n      previousPage\n    }\n    nodes {\n      id\n      customer_id\n      status\n      total\n    }\n  }\n}\n```\n\n<Note>\n  If `AUTH_MOCK=true`, include the `Authorization` header with your\n  `AUTH_MOCK_TOKEN` value. You can also set `x-username` to simulate different\n  users.\n</Note>\n\n### Example Response\n\n```json\n{\n  \"data\": {\n    \"orders\": {\n      \"totalCount\": 142,\n      \"pageInfo\": {\n        \"hasNext\": true,\n        \"hasPrevious\": false,\n        \"currentPage\": 1,\n        \"maxPages\": 15,\n        \"nextPage\": 2,\n        \"previousPage\": null\n      },\n      \"nodes\": [\n        {\n          \"id\": 1001,\n          \"customer_id\": 42,\n          \"status\": \"shipped\",\n          \"total\": 249.99\n        },\n        {\n          \"id\": 998,\n          \"customer_id\": 17,\n          \"status\": \"shipped\",\n          \"total\": 89.5\n        }\n      ]\n    }\n  }\n}\n```\n\n## What Happens Under the Hood\n\n1. GraphQL Yoga receives the query and resolves the `orders` field\n2. The resolver extracts selected fields from the GraphQL resolve info\n3. `@lakeql/query-builder` generates a Trino SQL query with two CTEs (`total_count` and `records`)\n4. `@lakeql/trino-client` executes the SQL against your Trino instance\n5. `@lakeql/response-transformer` converts Trino's array response into typed objects using the JSON schema\n6. Pagination metadata is calculated and the connection response is returned\n","description":"Walk through pulling your first schema, understanding the generated files, and executing a GraphQL query.","keywords":["first","pulling","schema","executing","query"]}
{"schemaVersion":"1.0.0","docId":"lakeql/getting-started/prerequisites","source":"lakeql","slug":"getting-started/prerequisites","path":"/docs/lakeql/getting-started/prerequisites","raw_path":"/raw/lakeql/getting-started/prerequisites.md","title":"Prerequisites","headings":[{"level":2,"text":"System Requirements","id":"system-requirements"},{"level":2,"text":"Trino Instance","id":"trino-instance"},{"level":2,"text":"Hive Catalog","id":"hive-catalog"},{"level":2,"text":"Verify Your Setup","id":"verify-your-setup"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/getting-started/prerequisites/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Prerequisites\ndescription: System requirements and dependencies needed to run LakeQL.\n---\n\n## System Requirements\n\nBefore getting started with LakeQL, make sure your environment meets these requirements:\n\n| Requirement    | Version                          | Notes                                       |\n| -------------- | -------------------------------- | ------------------------------------------- |\n| Node.js        | >= 22                            | Required for ESM and modern APIs            |\n| pnpm           | Latest                           | Recommended package manager (npm works too) |\n| Trino          | Any recent version               | Accessible via HTTP                         |\n| Hive Metastore | Compatible with your Trino setup | At least one schema with tables             |\n\n## Trino Instance\n\nYou need a running Trino instance that LakeQL can connect to. Options:\n\n- **Docker** — Run Trino locally with `docker run -p 8080:8080 trinodb/trino`\n- **minitrino** — Use [minitrino](https://github.com/jefflester/minitrino) for a full local lakehouse stack (Trino + Hive Metastore + MinIO)\n- **Existing cluster** — Connect to your team's Trino deployment\n\n<Note>\n  LakeQL uses the Trino REST API for both introspection (CLI) and query\n  execution (API). Make sure your Trino instance is reachable over HTTP/HTTPS\n  from your development machine.\n</Note>\n\n## Hive Catalog\n\nLakeQL generates schemas from tables registered in a Hive catalog. You need at minimum:\n\n- A Hive catalog configured in Trino (typically named `hive`)\n- At least one schema within that catalog\n- At least one table with columns in that schema\n\nIf you're using a local Docker setup, you can create test tables using Trino's SQL interface:\n\n```sql\nCREATE SCHEMA IF NOT EXISTS hive.myschema\nWITH (location = 's3a://warehouse/myschema');\n\nCREATE TABLE IF NOT EXISTS hive.myschema.users (\n  id BIGINT,\n  name VARCHAR,\n  email VARCHAR,\n  created_at TIMESTAMP\n)\nWITH (format = 'PARQUET');\n```\n\n## Verify Your Setup\n\nConfirm that Trino is accessible:\n\n```bash\n# Check Trino is responding\ncurl http://localhost:8080/v1/info\n```\n\nYou should see a JSON response with cluster information. Once this works, you're ready to scaffold a LakeQL project.\n","description":"System requirements and dependencies needed to run LakeQL.","keywords":["system","requirements","prerequisites","dependencies","needed"]}
{"schemaVersion":"1.0.0","docId":"lakeql/getting-started/quickstart","source":"lakeql","slug":"getting-started/quickstart","path":"/docs/lakeql/getting-started/quickstart","raw_path":"/raw/lakeql/getting-started/quickstart.md","title":"Quickstart","headings":[{"level":2,"text":"Create a new project","id":"create-a-new-project"},{"level":2,"text":"Configure environment variables","id":"configure-environment-variables"},{"level":2,"text":"Pull schemas from Trino","id":"pull-schemas-from-trino"},{"level":2,"text":"Start the API server","id":"start-the-api-server"},{"level":2,"text":"Run your first query","id":"run-your-first-query"},{"level":2,"text":"What's next","id":"whats-next"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/getting-started/quickstart/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Quickstart\ndescription: Scaffold a new LakeQL project and run your first GraphQL query in under 5 minutes.\n---\n\n## Create a new project\n\nUse `@lakeql/create-app` to bootstrap a project with all dependencies pre-configured:\n\n<Command variant=\"create\">@lakeql/create-app@latest my-api</Command>\n\nFollow the interactive prompts to select your package manager and install dependencies.\n\n## Configure environment variables\n\nNavigate into your project and create a `.env` file from the template:\n\n```bash\ncd my-api\ncp .env.example .env\n```\n\nEdit `.env` with your Trino connection details:\n\n```bash\nHIVE_HOST=\"http://localhost\"\nHIVE_PORT=8080\nHIVE_CATALOG=hive\nHIVE_USERNAME=\"trino\"\nHIVE_PASSWORD=\"your-password\"\n\nAUTH_MOCK=true\nAUTH_MOCK_TOKEN=\"1234\"\n\nAPI_PORT=4000\nAPI_LOGGER=\"info\"\n```\n\n## Pull schemas from Trino\n\nGenerate TypeScript schemas from your Trino tables using the built-in CLI alias:\n\n<Command variant=\"run\">cli pull --catalog hive --schema myschema</Command>\n\nThis introspects the selected tables in `hive.myschema` and generates typed config, interface, and query-schema files under `src/schemas/generated/`.\n\n## Start the API server\n\nLaunch the GraphQL server:\n\n<Command variant=\"run\">dev</Command>\n\nYou should see output like:\n\n```\n* Server URL: http://localhost:4000/\n* GraphQL Endpoint: http://localhost:4000/graphql\n```\n\n## Run your first query\n\nOpen [http://localhost:4000/graphql](http://localhost:4000/graphql) in your browser to access GraphiQL.\n\nSet the authorization header in the headers panel:\n\n```json\n{\n  \"Authorization\": \"1234\"\n}\n```\n\nRun a query against one of your generated schemas:\n\n```graphql\nquery {\n  users(paging: { page: 1, perPage: 5 }) {\n    totalCount\n    pageInfo {\n      hasNext\n      currentPage\n      maxPages\n    }\n    nodes {\n      id\n      name\n      email\n    }\n  }\n}\n```\n\n## What's next\n\n- Learn about [environment configuration](/docs/lakeql/getting-started/environment-configuration) for all available options\n- Understand the [generated files](/docs/lakeql/getting-started/first-run) in detail\n- Explore the [architecture](/docs/lakeql/architecture/system-overview) to see how the pieces fit together\n","description":"Scaffold a new LakeQL project and run your first GraphQL query in under 5 minutes.","keywords":["project","first","query","quickstart","scaffold"]}
{"schemaVersion":"1.0.0","docId":"lakeql/guides/custom-resolvers","source":"lakeql","slug":"guides/custom-resolvers","path":"/docs/lakeql/guides/custom-resolvers","raw_path":"/raw/lakeql/guides/custom-resolvers.md","title":"Custom Resolvers","headings":[{"level":2,"text":"Adding Custom Resolvers","id":"adding-custom-resolvers"},{"level":2,"text":"Creating a Custom Query","id":"creating-a-custom-query"},{"level":2,"text":"File Discovery","id":"file-discovery"},{"level":2,"text":"Custom Types","id":"custom-types"},{"level":2,"text":"Accessing Context","id":"accessing-context"},{"level":2,"text":"Best Practices","id":"best-practices"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/guides/custom-resolvers/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Custom Resolvers\ndescription: Add custom queries and mutations beyond the generated schema.\n---\n\n## Adding Custom Resolvers\n\nGenerated schemas cover standard CRUD patterns, but you'll often need custom queries, computed fields, or business-logic-specific endpoints. LakeQL makes this straightforward — create a new query schema file and it will be picked up automatically.\n\n## Creating a Custom Query\n\nCreate a new file in your schemas directory:\n\n```ts path=\"src/schemas/custom/hive/analytics/analytics-query-schema.ts\"\nimport { builder } from \"@lakeql/api/builder\"\n\nconst AnalyticsResult = builder.objectRef<{\n  period: string\n  totalOrders: number\n  revenue: number\n}>(\"AnalyticsResult\")\n\nAnalyticsResult.implement({\n  fields: (t) => ({\n    period: t.exposeString(\"period\"),\n    totalOrders: t.exposeInt(\"totalOrders\"),\n    revenue: t.exposeFloat(\"revenue\"),\n  }),\n})\n\nbuilder.queryField(\"orderAnalytics\", (t) =>\n  t.field({\n    type: [AnalyticsResult],\n    args: {\n      schema: t.arg.string({ required: true }),\n      startDate: t.arg.string({ required: true }),\n      endDate: t.arg.string({ required: true }),\n    },\n    resolve: async (_root, args, context) => {\n      // Your custom Trino query logic here\n      const sql = `\n        SELECT\n          date_format(created_at, '%Y-%m') AS period,\n          COUNT(*) AS total_orders,\n          SUM(total) AS revenue\n        FROM hive.${args.schema}.orders\n        WHERE created_at BETWEEN TIMESTAMP '${args.startDate}' AND TIMESTAMP '${args.endDate}'\n        GROUP BY date_format(created_at, '%Y-%m')\n        ORDER BY period DESC\n      `\n\n      // Execute via trino-client\n      // Return mapped results\n      return []\n    },\n  })\n)\n```\n\n## File Discovery\n\nThe API server loads all schema files from the configured `schemaPath` directory relative to `baseDir`. Any file ending in `-schema.ts` that imports and uses `builder` will be included in the final GraphQL schema.\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\nimport { allConfigs } from \"./config-registry\"\n\nconst baseDir = import.meta.dirname\n\nexport const config = defineConfig({\n  allConfigs,\n  baseDir,\n  schemaPath: \"./schemas\",\n  graphqlPath: \"/graphql\",\n  healthCheckEndpoint: \"/live\",\n  port: 4000,\n})\n```\n\n<Note>\n  Custom schema files are loaded alongside generated ones. Make sure your custom\n  types and field names don't conflict with generated names.\n</Note>\n\n## Custom Types\n\nDefine custom GraphQL types that aren't tied to a Trino table:\n\n```ts path=\"src/schemas/custom/health-query-schema.ts\"\nimport { builder } from \"@lakeql/api/builder\"\n\nconst HealthStatus = builder.objectRef<{\n  status: string\n  version: string\n  uptime: number\n}>(\"HealthStatus\")\n\nHealthStatus.implement({\n  fields: (t) => ({\n    status: t.exposeString(\"status\"),\n    version: t.exposeString(\"version\"),\n    uptime: t.exposeFloat(\"uptime\"),\n  }),\n})\n\nbuilder.queryField(\"health\", (t) =>\n  t.field({\n    type: HealthStatus,\n    resolve: () => ({\n      status: \"healthy\",\n      version: process.env.npm_package_version ?? \"unknown\",\n      uptime: process.uptime(),\n    }),\n  })\n)\n```\n\n## Accessing Context\n\nCustom resolvers have full access to the GraphQL context, including the authenticated user:\n\n```ts\nimport { builder } from \"@lakeql/api/builder\"\n\nbuilder.queryField(\"myProfile\", (t) =>\n  t.field({\n    type: UserProfile,\n    resolve: async (_root, _args, context) => {\n      if (!context.currentUser) {\n        throw new Error(\"Not authenticated\")\n      }\n\n      // Use context.currentUser.userName\n      return fetchProfile(context.currentUser.userName)\n    },\n  })\n)\n```\n\n## Best Practices\n\n- **Separate files** — Keep custom resolvers in `schemas/custom/` so `lakeql-cli pull` won't overwrite them\n- **Name carefully** — Prefix custom types to avoid conflicts (e.g. `CustomAnalyticsResult` vs generated `OrdersResult`)\n- **Use the builder** — Always import `builder` from `@lakeql/api/builder` to ensure types are registered in the same schema\n- **Check permissions** — Use auth scopes or implement your own auth checks in custom resolvers\n","description":"Add custom queries and mutations beyond the generated schema.","keywords":["custom","resolvers","queries","mutations","beyond"]}
{"schemaVersion":"1.0.0","docId":"lakeql/guides/deploying","source":"lakeql","slug":"guides/deploying","path":"/docs/lakeql/guides/deploying","raw_path":"/raw/lakeql/guides/deploying.md","title":"Deploying","headings":[{"level":2,"text":"Production Build","id":"production-build"},{"level":2,"text":"Docker","id":"docker"},{"level":2,"text":"Docker Compose","id":"docker-compose"},{"level":2,"text":"Environment Variables","id":"environment-variables"},{"level":2,"text":"Health Check","id":"health-check"},{"level":2,"text":"Production Checklist","id":"production-checklist"},{"level":2,"text":"Scaling","id":"scaling"},{"level":2,"text":"Platform-Specific Notes","id":"platform-specific-notes"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/guides/deploying/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Deploying\ndescription: Build and deploy a LakeQL API to production with Docker, Compose, and environment variable management.\n---\n\n## Production Build\n\nLakeQL projects use `tsdown` for building TypeScript into production-ready JavaScript:\n\n```bash\npnpm run build\n```\n\nThis produces optimized output in the `dist/` directory. The entry point is `dist/index.mjs`.\n\n```bash\n# Run the production build locally\nnode dist/index.mjs\n```\n\n## Docker\n\nThe app template includes a multi-stage `Dockerfile` that produces a minimal production image:\n\n```dockerfile path=\"Dockerfile\"\n# Build stage\nFROM node:24-alpine AS builder\n\nWORKDIR /app\n\nARG PNPM_VERSION=11\nRUN npm i -g pnpm@${PNPM_VERSION} && \\\n    pnpm config set store-dir ~/.pnpm-store\n\n# Install dependencies\nCOPY package.json pnpm-lock.yaml ./\nRUN pnpm install --frozen-lockfile\n\n# Copy source and build\nCOPY . .\nRUN pnpm run build\n\n# Production stage\nFROM node:24-alpine AS runner\n\nWORKDIR /app\n\nENV NODE_ENV=production\n\n# OCI image labels for traceability\nARG COMMIT_SHA\nARG SOURCE_URL\nARG VERSION\nARG LICENSE\nARG AUTHORS\n\nLABEL org.opencontainers.image.revision=${COMMIT_SHA} \\\n      org.opencontainers.image.source=${SOURCE_URL} \\\n      org.opencontainers.image.version=${VERSION} \\\n      org.opencontainers.image.licenses=${LICENSE} \\\n      org.opencontainers.image.authors=${AUTHORS}\n\n# Copy built output and production dependencies only\nCOPY --from=builder /app/dist ./dist\nCOPY --from=builder /app/package.json ./\nCOPY --from=builder /app/node_modules ./node_modules\n\n# OpenShift compatibility: allow random UID (root group) to access files\nRUN chgrp -R 0 /app && \\\n    chmod -R g=u /app\n\nUSER 1001\n\nEXPOSE 4000\n\nHEALTHCHECK --interval=30s --timeout=3s --retries=3 \\\n  CMD wget --no-verbose --tries=1 --spider http://localhost:4000/live || exit 1\n\nCMD [\"node\", \"dist/index.mjs\"]\n```\n\nBuild and run:\n\n```bash\ndocker build -t my-lakeql-api \\\n  --build-arg COMMIT_SHA=$(git rev-parse HEAD) \\\n  --build-arg SOURCE_URL=$(git remote get-url origin) \\\n  --build-arg VERSION=1.0.0 \\\n  .\ndocker run -p 4000:4000 --env-file .env.production my-lakeql-api\n```\n\n## Docker Compose\n\nThe template includes a `compose.yaml` for running the API alongside MinIO (S3-compatible storage for the mutation write pipeline):\n\n```bash\n# Start all services\ndocker compose up -d\n\n# View logs\ndocker compose logs -f app\n\n# Stop\ndocker compose down\n```\n\nThe Compose setup includes:\n\n- **app** — The LakeQL API server on port 4000\n- **minio** — S3-compatible storage on port 9000 (console on 9001)\n- **minio-init** — Creates the default bucket on startup\n\n<Note>\n  The Compose file does not include Trino or Hive Metastore. These are\n  infrastructure services that should be managed separately. Configure the\n  `HIVE_HOST` environment variable to point to your Trino cluster.\n</Note>\n\n## Environment Variables\n\nFor production, set environment variables through your deployment platform rather than `.env` files:\n\n```bash\ndocker run -p 4000:4000 \\\n  -e HIVE_HOST=\"https://trino.prod.internal\" \\\n  -e HIVE_PORT=8446 \\\n  -e HIVE_USERNAME=\"lakeql-service\" \\\n  -e HIVE_PASSWORD=\"$TRINO_PASSWORD\" \\\n  -e HIVE_CATALOG=hive \\\n  -e API_PORT=4000 \\\n  -e API_LOGGER=warn \\\n  -e AUTH_MOCK=false \\\n  -e NODE_ENV=production \\\n  my-lakeql-api\n```\n\n| Variable          | Required | Description                                  |\n| ----------------- | -------- | -------------------------------------------- |\n| `HIVE_HOST`       | Yes      | Trino server URL                             |\n| `HIVE_PORT`       | Yes      | Trino port (default: 8080)                   |\n| `HIVE_CATALOG`    | Yes      | Trino catalog name                           |\n| `HIVE_USERNAME`   | Yes      | Trino username                               |\n| `HIVE_PASSWORD`   | No       | Trino password                               |\n| `HIVE_SOURCE`     | No       | Source identifier (default: \"lakeql\")        |\n| `AUTH_MOCK`       | No       | Enable mock authentication (default: false)  |\n| `AUTH_MOCK_TOKEN` | No       | Token for mock auth                          |\n| `API_PORT`        | No       | Server port (default: 4000)                  |\n| `API_LOGGER`      | No       | Log level: error, warn, info (default: warn) |\n\n<Warning>\n  Never include `.env` files with real credentials in Docker images. Use secrets\n  management (AWS Secrets Manager, Vault, Kubernetes secrets) for sensitive\n  values.\n</Warning>\n\n## Health Check\n\nLakeQL exposes a configurable health check endpoint (default: `/live`):\n\n```ts path=\"src/config.ts\"\nimport { defineConfig } from \"@lakeql/api/config\"\nimport { allConfigs } from \"./config-registry\"\n\nconst baseDir = import.meta.dirname\n\nexport const config = defineConfig({\n  allConfigs,\n  baseDir,\n  healthCheckEndpoint: \"/live\",\n  // ...\n})\n```\n\nUse this for container orchestrator probes (Docker, Kubernetes, ECS):\n\n```yaml\n# Kubernetes liveness probe\nlivenessProbe:\n  httpGet:\n    path: /live\n    port: 4000\n  initialDelaySeconds: 5\n  periodSeconds: 10\n```\n\n## Production Checklist\n\n<Stepper>\n<StepperItem title=\"Disable mock auth\">\n\nSet `AUTH_MOCK=false` and implement a real `getUser` resolver with JWT validation in `src/auth.ts`.\n\n</StepperItem>\n<StepperItem title=\"Set appropriate limits\">\n\nConfigure `API_MAX_RECORDS_PER_PAGE` to prevent excessively large responses that could overwhelm Trino or the network.\n\n</StepperItem>\n<StepperItem title=\"Configure logging\">\n\nSet `API_LOGGER=warn` or `API_LOGGER=error` in production. Use `info` only for troubleshooting.\n\n</StepperItem>\n<StepperItem title=\"Secure Trino credentials\">\n\nStore `HIVE_USERNAME` and `HIVE_PASSWORD` in a secrets manager. Rotate credentials regularly.\n\n</StepperItem>\n<StepperItem title=\"Set NODE_ENV\">\n\nAlways set `NODE_ENV=production` for deployments. This affects error messages and performance optimizations.\n\n</StepperItem>\n<StepperItem title=\"Run generated schemas in CI\">\n\nInclude `lakeql-cli pull` in your CI pipeline to verify schemas are up to date with the target Trino cluster.\n\n</StepperItem>\n</Stepper>\n\n## Scaling\n\n- **Stateless** — LakeQL API servers are stateless. Scale horizontally behind a load balancer.\n- **Connection pooling** — Trino handles connections efficiently. Multiple API instances can share the same Trino cluster.\n- **Caching** — Consider adding HTTP caching (e.g. CDN or Redis) for frequently-accessed, slowly-changing data.\n- **Rate limiting** — Add rate limiting middleware to protect Trino from excessive query load.\n\n## Platform-Specific Notes\n\n### AWS ECS / Fargate\n\n```bash\n# Build and push to ECR\naws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin <account>.dkr.ecr.eu-central-1.amazonaws.com\ndocker build -t lakeql-api .\ndocker tag lakeql-api:latest <account>.dkr.ecr.eu-central-1.amazonaws.com/lakeql-api:latest\ndocker push <account>.dkr.ecr.eu-central-1.amazonaws.com/lakeql-api:latest\n```\n\n### Google Cloud Run\n\n```bash\ngcloud run deploy lakeql-api \\\n  --source . \\\n  --port 4000 \\\n  --set-env-vars \"HIVE_HOST=...\" \\\n  --region europe-west1\n```\n\n### Kubernetes\n\n```yaml path=\"k8s/deployment.yaml\"\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: lakeql-api\nspec:\n  replicas: 2\n  selector:\n    matchLabels:\n      app: lakeql-api\n  template:\n    metadata:\n      labels:\n        app: lakeql-api\n    spec:\n      containers:\n        - name: lakeql-api\n          image: my-registry/lakeql-api:latest\n          ports:\n            - containerPort: 4000\n          env:\n            - name: HIVE_HOST\n              valueFrom:\n                secretKeyRef:\n                  name: trino-credentials\n                  key: host\n          livenessProbe:\n            httpGet:\n              path: /live\n              port: 4000\n            initialDelaySeconds: 5\n            periodSeconds: 10\n          resources:\n            requests:\n              memory: \"128Mi\"\n              cpu: \"100m\"\n            limits:\n              memory: \"512Mi\"\n              cpu: \"500m\"\n```\n","description":"Build and deploy a LakeQL API to production with Docker, Compose, and environment variable management.","keywords":["production","docker","build","compose","environment"]}
{"schemaVersion":"1.0.0","docId":"lakeql/guides/extending-schema","source":"lakeql","slug":"guides/extending-schema","path":"/docs/lakeql/guides/extending-schema","raw_path":"/raw/lakeql/guides/extending-schema.md","title":"Extending Schema","headings":[{"level":2,"text":"Extending Generated Schema","id":"extending-generated-schema"},{"level":2,"text":"Adding Fields to Generated Types","id":"adding-fields-to-generated-types"},{"level":2,"text":"Custom Comparison Types","id":"custom-comparison-types"},{"level":2,"text":"Computed Fields","id":"computed-fields"},{"level":2,"text":"Custom Enum Types","id":"custom-enum-types"},{"level":2,"text":"Tips for Extension Files","id":"tips-for-extension-files"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/guides/extending-schema/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Extending Schema\ndescription: Extend generated GraphQL types with additional fields, custom comparison operators, and computed values.\n---\n\n## Extending Generated Schema\n\nGenerated query schemas provide standard fields, filtering, and pagination. You can extend them with additional fields, custom input types, or computed values without modifying generated files.\n\n## Adding Fields to Generated Types\n\nCreate a companion file that adds fields to an existing generated type:\n\n```ts path=\"src/schemas/custom/orders-extensions.ts\"\nimport { builder } from \"@lakeql/api/builder\"\n\n// Reference the generated type by name\nconst OrdersType = builder.objectRef<{ id: number; total: number }>(\"Orders\")\n\n// Add a computed field\nbuilder.objectField(OrdersType, \"formattedTotal\", (t) =>\n  t.string({\n    resolve: (parent) => `$${parent.total.toFixed(2)}`,\n  })\n)\n\n// Add a field that fetches from another source\nbuilder.objectField(OrdersType, \"customerName\", (t) =>\n  t.string({\n    nullable: true,\n    resolve: async (parent) => {\n      // Fetch customer name from a secondary source\n      return null\n    },\n  })\n)\n```\n\n<Note>\n  Extension files must be placed in the schema path directory (e.g.\n  `src/schemas/custom/`). Never place custom code inside generated directories —\n  it will be overwritten on the next `pull`.\n</Note>\n\n## Custom Comparison Types\n\nGenerated schemas include standard comparison operators (`eq`, `neq`, `gt`, `lt`, `like`, etc.). You can define custom input types for more specific filtering:\n\n```ts path=\"src/schemas/custom/date-range-input.ts\"\nimport { builder } from \"@lakeql/api/builder\"\n\nexport const DateRangeInput = builder.inputType(\"DateRangeInput\", {\n  fields: (t) => ({\n    from: t.string({ required: true }),\n    to: t.string({ required: true }),\n  }),\n})\n\nexport const AmountRangeInput = builder.inputType(\"AmountRangeInput\", {\n  fields: (t) => ({\n    min: t.float({ required: true }),\n    max: t.float({ required: true }),\n  }),\n})\n```\n\nUse these inputs in custom query fields:\n\n```ts\nimport { builder } from \"@lakeql/api/builder\"\nimport { DateRangeInput, AmountRangeInput } from \"../custom/date-range-input\"\n\nbuilder.queryField(\"ordersByRange\", (t) =>\n  t.field({\n    type: [\"Orders\"],\n    args: {\n      dateRange: t.arg({ type: DateRangeInput, required: true }),\n      amountRange: t.arg({ type: AmountRangeInput }),\n    },\n    resolve: async (_root, args, _context) => {\n      // Build custom query with range filters\n      return []\n    },\n  })\n)\n```\n\n## Computed Fields\n\nAdd fields that derive values from existing data at the GraphQL layer:\n\n```ts path=\"src/schemas/custom/computed-fields.ts\"\nimport { builder } from \"@lakeql/api/builder\"\n\nconst OrdersType = builder.objectRef<{\n  total: number\n  tax_rate: number\n  status: string\n  created_at: Date\n}>(\"Orders\")\n\n// Tax amount computed from total and tax_rate\nbuilder.objectField(OrdersType, \"taxAmount\", (t) =>\n  t.float({\n    resolve: (parent) => parent.total * parent.tax_rate,\n  })\n)\n\n// Human-readable status\nbuilder.objectField(OrdersType, \"statusLabel\", (t) =>\n  t.string({\n    resolve: (parent) => {\n      const labels: Record<string, string> = {\n        pending: \"Pending Review\",\n        shipped: \"Shipped\",\n        delivered: \"Delivered\",\n        cancelled: \"Cancelled\",\n      }\n      return labels[parent.status] ?? parent.status\n    },\n  })\n)\n\n// Age in days\nbuilder.objectField(OrdersType, \"ageInDays\", (t) =>\n  t.int({\n    resolve: (parent) => {\n      const now = Date.now()\n      const created = new Date(parent.created_at).getTime()\n      return Math.floor((now - created) / (1000 * 60 * 60 * 24))\n    },\n  })\n)\n```\n\n## Custom Enum Types\n\nDefine enum types for fields that have a fixed set of values:\n\n```ts path=\"src/schemas/custom/enums.ts\"\nimport { builder } from \"@lakeql/api/builder\"\n\nexport const OrderStatus = builder.enumType(\"OrderStatus\", {\n  values: [\n    \"pending\",\n    \"processing\",\n    \"shipped\",\n    \"delivered\",\n    \"cancelled\",\n  ] as const,\n})\n```\n\n## Tips for Extension Files\n\n- **Don't modify generated files** — They'll be overwritten on the next `pull`\n- **Use descriptive file names** — e.g. `orders-extensions.ts`, `orders-computed.ts`\n- **Place in `schemas/custom/`** — Keep custom code separate from generated directories\n- **Test independently** — Extensions can introduce runtime errors if the parent type changes after a `pull`\n","description":"Extend generated GraphQL types with additional fields, custom comparison operators, and computed values.","keywords":["types","generated","fields","custom","extending"]}
{"schemaVersion":"1.0.0","docId":"lakeql/guides/load-strategies","source":"lakeql","slug":"guides/load-strategies","path":"/docs/lakeql/guides/load-strategies","raw_path":"/raw/lakeql/guides/load-strategies.md","title":"Load Strategies","headings":[{"level":2,"text":"Load Strategies","id":"load-strategies"},{"level":2,"text":"full_load","id":"full-load"},{"level":2,"text":"full_load_append","id":"full-load-append"},{"level":2,"text":"append","id":"append"},{"level":2,"text":"Choosing a Strategy","id":"choosing-a-strategy"},{"level":2,"text":"Default Behavior","id":"default-behavior"},{"level":2,"text":"Partition Path Format","id":"partition-path-format"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/guides/load-strategies/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Load Strategies\ndescription: Understand the three load strategies (full_load, full_load_append, append) and when to use each for your mutation pipeline.\n---\n\n## Load Strategies\n\nThe mutation pipeline supports three load strategies that control how data is stored on S3 and how Hive external tables are managed. Choose a strategy based on your data lifecycle needs.\n\n## full_load\n\nReplaces all existing data on every write. The pipeline deletes the previous file, uploads a fresh snapshot, and recreates the Hive table.\n\n### File layout\n\n```\n<basePath>/\n  latest.parquet        ← single file, overwritten on each write\n```\n\n### Hive tables\n\n| Table name    | Points to                   |\n| ------------- | --------------------------- |\n| `<tableName>` | `<basePath>/latest.parquet` |\n\n### When to use\n\n- **Dimension tables** — small lookup data that changes infrequently (countries, categories, status codes)\n- **Reference data** — configuration or mapping tables where you always want the full current state\n- **Small datasets** — tables where the complete dataset fits comfortably in a single Parquet file\n\n### Pipeline steps\n\n1. Delete existing data at `<basePath>/latest.parquet`\n2. Upload new Parquet file to `<basePath>/latest.parquet`\n3. DROP TABLE IF EXISTS + CREATE TABLE pointing to latest\n\n### Example\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"full_load\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/config/status_codes\"\n  }\n}\n```\n\n---\n\n## full_load_append\n\nMaintains both a latest snapshot and a historical log. Each write updates the current state and appends a timestamped copy to the history partition.\n\n### File layout\n\n```\n<basePath>/\n  latest.parquet                              ← current snapshot (overwritten)\n  all.parquet/\n    year=2024/month=06/day=15/<uuid>.parquet   ← historical partition\n    year=2024/month=06/day=16/<uuid>.parquet\n    year=2024/month=07/day=01/<uuid>.parquet\n```\n\nThe partition structure above reflects the default `partitioningFormat` of `\"year/month/day\"`. When partitioning is disabled (`partitioning: false`), historical files are written flat under `all.parquet/<uuid>.parquet`. When using field-based partitioning, files are grouped by the specified field's date value instead of write timestamp.\n\n### Hive tables\n\n| Table name           | Points to                   |\n| -------------------- | --------------------------- |\n| `<tableName>_latest` | `<basePath>/latest.parquet` |\n| `<tableName>_all`    | `<basePath>/all.parquet/`   |\n\n### When to use\n\n- **Datasets needing both current state and history** — product catalogs, pricing tables, inventory snapshots\n- **Slowly changing dimensions** — where you want to query \"what was the state on date X?\"\n- **Compliance scenarios** — where you need to prove what data looked like at a given point in time\n\n### Pipeline steps\n\n1. Delete existing data at `<basePath>/latest.parquet`\n2. Upload new Parquet file to `<basePath>/latest.parquet`\n3. Upload same Parquet file to `<basePath>/all.parquet/year=YYYY/month=MM/day=DD/<uuid>.parquet`\n4. DROP + CREATE both `_latest` and `_all` tables (with rollback on partial failure)\n\n### Example\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"full_load_append\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/products/catalog\"\n  }\n}\n```\n\n<Note>\n  The `_all` table is partitioned by date, so you can query historical snapshots\n  efficiently using `WHERE year = '2024' AND month = '06'`.\n</Note>\n\n---\n\n## append\n\nOnly adds data. Never deletes or overwrites existing files. Each write creates a new partition file.\n\n### File layout\n\n```\n<basePath>/\n  all.parquet/\n    year=2024/month=06/day=15/<uuid-1>.parquet\n    year=2024/month=06/day=15/<uuid-2>.parquet   ← multiple writes per day\n    year=2024/month=06/day=16/<uuid-3>.parquet\n```\n\nThe partition structure depends on the `partitioningFormat` setting. When partitioning is disabled (`partitioning: false`), files are written flat under `all.parquet/<uuid>.parquet`. When using field-based partitioning, files are grouped by the specified field's date value.\n\n### Hive tables\n\n| Table name    | Points to                 |\n| ------------- | ------------------------- |\n| `<tableName>` | `<basePath>/all.parquet/` |\n\n### When to use\n\n- **Event streams** — clickstream data, page views, user interactions\n- **Time-series data** — sensor readings, metrics, measurements\n- **Audit logs** — system events, change logs, access records\n- **Any write-once data** — where historical records should never be modified\n\n### Pipeline steps\n\n1. Upload Parquet file to `<basePath>/all.parquet/year=YYYY/month=MM/day=DD/<uuid>.parquet`\n2. Recreate single Hive table pointing to `<basePath>/all.parquet/`\n\n### Example\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"append\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/raw/click_events\"\n  }\n}\n```\n\n---\n\n## Choosing a Strategy\n\n| Question                                      | Recommendation     |\n| --------------------------------------------- | ------------------ |\n| Do I need historical versions of the data?    | `full_load_append` |\n| Is the data write-once (events, logs)?        | `append`           |\n| Is this a small lookup/config table?          | `full_load`        |\n| Do I need both \"current state\" and \"history\"? | `full_load_append` |\n| Is the dataset large and ever-growing?        | `append`           |\n| Will I always send the full dataset?          | `full_load`        |\n\n## Default Behavior\n\nWhen no `loadStrategy` is specified in the mutation configuration, the pipeline defaults to `full_load`. This is the safest default for most use cases — it ensures the table always reflects the latest complete dataset.\n\n## Partition Path Format\n\nFor strategies that use partitioned storage (`full_load_append` and `append`), the partition path format is configurable via the `partitioningFormat` field:\n\n| `partitioningFormat` | Path structure                             |\n| -------------------- | ------------------------------------------ |\n| `\"year/month/day\"`   | `year=2024/month=06/day=15/<uuid>.parquet` |\n| `\"year/month\"`       | `year=2024/month=06/<uuid>.parquet`        |\n| `\"year\"`             | `year=2024/<uuid>.parquet`                 |\n\nThe default is `\"year/month/day\"`. The UUID filename prevents collisions when multiple writes occur within the same partition. The date-based partitioning enables efficient pruning when querying historical data.\n\n### Partitioning modes\n\nThe `partitioning` field controls which date value is used for partition paths:\n\n- **`true`** (default) — Partitions by the write timestamp. A `load_timestamp` field (DateTime, readOnly) is added to the endpoint definition and populated automatically at runtime. Additionally, `load_timestamp_year` (Integer, readOnly) and `load_timestamp_month` (Integer, readOnly) are injected as materialized partition columns for direct Parquet filtering without Hive metastore awareness.\n- **`false`** — Disables partitioning entirely. Files are written flat under `all.parquet/<uuid>.parquet` with no date directories.\n- **`\"field_name\"`** — Partitions by the value of the named date/datetime field in the record data. No `load_timestamp` field is added.\n- **Custom format string** (e.g. `\"customer_id/event_date:year/event_date:month\"`) — Partitions by a combination of fields and date components. See the [Mutations guide](/lakeql/guides/mutations#custom-partition-format) for syntax details.\n\n<Note>\n  The `full_load` strategy ignores partitioning entirely — it always writes a\n  single `latest.parquet` file regardless of the `partitioning` or\n  `partitioningFormat` settings.\n</Note>\n","description":"Understand the three load strategies (full_load, full_load_append, append) and when to use each for your mutation pipeline.","keywords":["strategies","fullload","fullloadappend","append","understand"]}
{"schemaVersion":"1.0.0","docId":"lakeql/guides/mutations","source":"lakeql","slug":"guides/mutations","path":"/docs/lakeql/guides/mutations","raw_path":"/raw/lakeql/guides/mutations.md","title":"Mutations","headings":[{"level":2,"text":"Working with Mutations","id":"working-with-mutations"},{"level":2,"text":"The Mutation Pipeline","id":"the-mutation-pipeline"},{"level":2,"text":"Enabling Mutations","id":"enabling-mutations"},{"level":2,"text":"Load Strategies","id":"load-strategies"},{"level":2,"text":"Mutation Configuration Reference","id":"mutation-configuration-reference"},{"level":2,"text":"Partitioning","id":"partitioning"},{"level":2,"text":"Storage Configuration","id":"storage-configuration"},{"level":2,"text":"Generated Resolver","id":"generated-resolver"},{"level":2,"text":"Input Validation","id":"input-validation"},{"level":2,"text":"Write Permission Model","id":"write-permission-model"},{"level":2,"text":"System User Impersonation","id":"system-user-impersonation"},{"level":2,"text":"Error Handling","id":"error-handling"},{"level":2,"text":"Custom Mutations","id":"custom-mutations"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/guides/mutations/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Mutations\ndescription: Enable write operations with the mutation pipeline, configure load strategies, and use system user impersonation.\n---\n\n## Working with Mutations\n\nLakeQL provides a complete mutation pipeline that handles data persistence end-to-end. When you configure mutation support in your endpoint definition, the generated resolver automatically handles validation, Parquet conversion, S3 upload, and Hive table management — no manual stub implementation needed.\n\n## The Mutation Pipeline\n\nWhen a mutation is executed, the generated resolver runs through this pipeline:\n\n1. **Input arrives** via GraphQL mutation\n2. **Validation** — Input is checked against a generated Zod schema (if field validations are configured)\n3. **Parquet conversion** — Records are serialized into Parquet format via `@lakeql/parquet`\n4. **S3 upload** — The Parquet file is uploaded to the configured base path\n5. **Hive DDL** — The external table is recreated to point at the new data\n\nIf any step fails, the pipeline stops immediately and propagates the error to the GraphQL error layer.\n\n## Enabling Mutations\n\nAdd a `mutation` configuration to your endpoint definition:\n\n```json path=\"my-endpoint.json\"\n{\n  \"version\": \"1.0\",\n  \"tableName\": \"user_events\",\n  \"catalog\": \"hive\",\n  \"schema\": \"analytics\",\n  \"fields\": [\n    {\n      \"name\": \"email\",\n      \"type\": \"String\",\n      \"options\": {\n        \"required\": true,\n        \"validations\": [{ \"type\": \"email\" }]\n      }\n    },\n    {\n      \"name\": \"age\",\n      \"type\": \"Integer\",\n      \"options\": {\n        \"required\": false,\n        \"validations\": [\n          { \"type\": \"min\", \"value\": 0 },\n          { \"type\": \"max\", \"value\": 150 }\n        ]\n      }\n    },\n    { \"name\": \"event_type\", \"type\": \"String\" },\n    { \"name\": \"timestamp\", \"type\": \"DateTime\" }\n  ],\n  \"mutation\": {\n    \"loadStrategy\": \"full_load\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/analytics/user_events\",\n    \"region\": \"eu-central-1\",\n    \"endpoint\": \"https://s3.eu-central-1.amazonaws.com\"\n  }\n}\n```\n\nThen generate the endpoint:\n\n```bash\nlakeql-cli create-endpoint --from-file ./my-endpoint.json\n```\n\nThis produces a `mutation-schema.ts` with a fully wired resolver that invokes the write pipeline, and a `validations.ts` with the Zod schema for input validation.\n\n## Load Strategies\n\nThe `loadStrategy` field controls how data is persisted. Choose based on your use case:\n\n| Strategy           | Best for                           | Behavior                                     |\n| ------------------ | ---------------------------------- | -------------------------------------------- |\n| `full_load`        | Dimension tables, reference data   | Replaces all data on every write             |\n| `full_load_append` | Datasets needing current + history | Maintains latest snapshot and historical log |\n| `append`           | Event streams, audit logs          | Only adds data, never replaces               |\n\nSee the [Load Strategies](/lakeql/guides/load-strategies) guide for detailed explanations and file layout diagrams.\n\n## Mutation Configuration Reference\n\n<InterfaceReference\n  file=\"schema-generator/src/endpoint-schema\"\n  name=\"MutationConfig\"\n/>\n\n## Partitioning\n\nThe `partitioning` and `partitioningFormat` fields control how files are organized in storage. These settings apply to `full_load_append` and `append` strategies only — `full_load` ignores partitioning entirely.\n\n### Write-timestamp partitioning (default)\n\nWhen `partitioning` is `true` (the default), files are partitioned by the current UTC timestamp. A `load_timestamp` field is added to the endpoint definition with `readOnly: true` — it appears in the query schema and Hive DDL but is excluded from the mutation input. The write pipeline populates it automatically at runtime. Additionally, `load_timestamp_year` (Integer) and `load_timestamp_month` (Integer) fields are injected as materialized partition columns, enabling direct filtering in tools that read Parquet files without Hive metastore awareness (e.g. Jupyter notebooks with PyArrow, Pandas, or DuckDB).\n\n```json\n{\n  \"fields\": [\n    { \"name\": \"event_type\", \"type\": \"String\" },\n    {\n      \"name\": \"load_timestamp\",\n      \"type\": \"DateTime\",\n      \"options\": { \"readOnly\": true }\n    },\n    {\n      \"name\": \"load_timestamp_year\",\n      \"type\": \"Integer\",\n      \"options\": { \"readOnly\": true }\n    },\n    {\n      \"name\": \"load_timestamp_month\",\n      \"type\": \"Integer\",\n      \"options\": { \"readOnly\": true }\n    }\n  ],\n  \"mutation\": {\n    \"loadStrategy\": \"append\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/raw/events\",\n    \"partitioning\": true,\n    \"partitioningFormat\": \"year/month/day\"\n  }\n}\n```\n\n<Note>\n  The Endpoint Builder adds the `load_timestamp`, `load_timestamp_year`, and\n  `load_timestamp_month` fields automatically when timestamp partitioning is\n  active. Fields with `readOnly: true` are queryable via GraphQL but cannot be\n  provided as mutation input.\n</Note>\n\n### Disabled partitioning (flat layout)\n\nWhen `partitioning` is `false`, files are written to a flat directory without date-based directories. No `load_timestamp` field is added.\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"append\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/raw/events\",\n    \"partitioning\": false\n  }\n}\n```\n\nFile layout with flat partitioning:\n\n```\n<basePath>/\n  all.parquet/\n    <uuid-1>.parquet\n    <uuid-2>.parquet\n```\n\n### Field-based partitioning\n\nWhen `partitioning` is set to a field name string, records are grouped by the value of that date/datetime field and written to separate partition paths. No `load_timestamp` field is added.\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"append\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/raw/events\",\n    \"partitioning\": \"event_date\",\n    \"partitioningFormat\": \"year/month\"\n  }\n}\n```\n\nThe field must exist in every record and contain a valid ISO 8601 date or datetime value. Field names must start with a letter or underscore, contain only alphanumeric characters and underscores, and be 1–64 characters long.\n\n### Custom partition format\n\nFor advanced use cases you can define a custom partition format string that combines multiple fields and date components. The format uses `/` to separate path segments and `:` to extract date components from a field.\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"append\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/raw/events\",\n    \"partitioning\": \"customer_id/event_date:year/event_date:month\"\n  }\n}\n```\n\nThis produces paths like:\n\n```\nall.parquet/\n  customer_id=42/year=2024/month=06/<uuid>.parquet\n  customer_id=99/year=2024/month=06/<uuid>.parquet\n```\n\n#### Format syntax\n\nEach segment is either a plain field name or a field with a date component:\n\n| Segment            | Behavior                                                  |\n| ------------------ | --------------------------------------------------------- |\n| `customer_id`      | Extracts the raw field value → `customer_id=<value>`      |\n| `event_date:year`  | Parses the field as ISO date, extracts year → `year=2024` |\n| `event_date:month` | Extracts month (zero-padded) → `month=06`                 |\n| `event_date:day`   | Extracts day (zero-padded) → `day=15`                     |\n| `ts:hour`          | Extracts hour (zero-padded) → `hour=08`                   |\n| `ts:minute`        | Extracts minute (zero-padded) → `minute=30`               |\n| `ts:second`        | Extracts second (zero-padded) → `second=07`               |\n\nSupported date components: `year`, `month`, `day`, `hour`, `minute`, `second`.\n\nRecords are grouped by their resolved partition key — records with the same key are written to the same Parquet file.\n\nWhen using a custom format, the `partitioningFormat` field is ignored (the format is fully determined by the custom string). No `load_timestamp` field is added.\n\n<Note>\n  Fields referenced in the custom format must exist in every record. Date\n  component segments require valid ISO 8601 date or datetime values. A\n  `PartitionFieldError` is thrown if any record has a missing, null, or\n  unparseable field value.\n</Note>\n\n<Note>\n  The `partitioning` and `partitioningFormat` fields are not included in the\n  generated config when `loadStrategy` is `full_load`, since full-load always\n  overwrites the entire dataset as a single file.\n</Note>\n\n## Storage Configuration\n\nThe `type` field in the mutation configuration selects the storage adapter used by the write pipeline.\n\n### S3 (default)\n\nS3 is the default storage backend. Credentials and region are read from standard AWS environment variables:\n\n| Environment Variable    | Description                   |\n| ----------------------- | ----------------------------- |\n| `AWS_ACCESS_KEY_ID`     | AWS access key                |\n| `AWS_SECRET_ACCESS_KEY` | AWS secret key                |\n| `AWS_DEFAULT_REGION`    | Default region (fallback)     |\n| `AWS_ENDPOINT_URL`      | Custom S3 endpoint (fallback) |\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"full_load\",\n    \"type\": \"s3\",\n    \"bucket\": \"my-datalake\",\n    \"basePath\": \"warehouse/analytics/user_events\"\n  }\n}\n```\n\n### MinIO (local development)\n\nMinIO is supported for local development and self-hosted S3-compatible storage. The `endpoint` field is required when using MinIO. Credentials are read from MinIO-specific environment variables:\n\n| Environment Variable      | Description                 |\n| ------------------------- | --------------------------- |\n| `MINIO_ACCESS_KEY_ID`     | MinIO access key            |\n| `MINIO_SECRET_ACCESS_KEY` | MinIO secret key            |\n| `MINIO_ENDPOINT`          | MinIO server URL (fallback) |\n\n```json\n{\n  \"mutation\": {\n    \"loadStrategy\": \"full_load\",\n    \"type\": \"minio\",\n    \"bucket\": \"local-datalake\",\n    \"basePath\": \"warehouse/analytics/user_events\",\n    \"endpoint\": \"http://localhost:9000\"\n  }\n}\n```\n\n<Note>\n  When `type` is `\"minio\"`, the `endpoint` field is required either in the\n  configuration or via the `MINIO_ENDPOINT` environment variable.\n</Note>\n\n## Generated Resolver\n\nThe generated `mutation-schema.ts` integrates with `@lakeql/adapters` to execute the full write pipeline. Here's what the generated code looks like conceptually:\n\n```ts path=\"schemas/custom/hive/analytics/user_events/mutation-schema.ts\"\nimport { builder } from \"@lakeql/api\"\nimport { executeWritePipeline } from \"@lakeql/adapters\"\nimport { validationSchema } from \"./validations\"\nimport jsonSchema from \"./json-schema.json\"\n\n// GraphQL input type generated from field definitions\nconst UserEventsInput = builder.inputType(\"UserEventsInput\", {\n  fields: (t) => ({\n    email: t.string({ required: true }),\n    age: t.int({ required: false }),\n    event_type: t.string({ required: false }),\n    timestamp: t.string({ required: false }),\n  }),\n})\n\nbuilder.mutationField(\"writeUserEvents\", (t) =>\n  t.field({\n    type: \"Boolean\",\n    args: {\n      input: t.arg({ type: [UserEventsInput], required: true }),\n    },\n    resolve: async (_root, args, context) => {\n      // 1. Validate input against Zod schema\n      const validated = validationSchema.parse(args.input)\n\n      // 2. Execute the write pipeline\n      await executeWritePipeline({\n        records: validated,\n        jsonSchema,\n        config: {\n          loadStrategy: \"full_load\",\n          basePath: \"warehouse/analytics/user_events\",\n          s3: context.s3Config,\n          table: {\n            catalog: \"hive\",\n            schema: \"analytics\",\n            tableName: \"user_events\",\n          },\n        },\n      })\n\n      return true\n    },\n  })\n)\n```\n\n<Note>\n  You don't write this code manually — it's generated by `create-endpoint`. The\n  example above shows what the generated resolver does under the hood.\n</Note>\n\n## Input Validation\n\nWhen fields have `options.validations` configured, a `validations.ts` file is generated with a Zod schema. The mutation resolver validates input before invoking the pipeline:\n\n- If validation passes, the pipeline proceeds\n- If validation fails, Zod errors are returned as GraphQL field errors without any data being written\n\nSee the [create-endpoint](/cli/commands/create-endpoint) reference for the full list of available validations.\n\n## Write Permission Model\n\nLakeQL's write permission model is **default-deny**. Users must have explicit `Mutation` rules to perform any write operation.\n\nThis design is intentional:\n\n- Write operations are more sensitive than reads\n- Writes often execute via a shared system user in Trino\n- Each table typically has a defined data owner (source system)\n- Explicit rules enforce data ownership boundaries\n\n### Configuring Write Permissions\n\n```ts\nimport { defineConfig } from \"@lakeql/api\"\nimport { allConfigs } from \"./config-registry\"\n\nexport default defineConfig({\n  allConfigs,\n  permissions: [\n    {\n      name: \"ingestion-service\",\n      useSystemUser: true,\n      permissions: {\n        Query: [{ catalog: \"hive\", schema: \"raw\", tables: [\"*\"] }],\n        Mutation: [\n          {\n            catalog: \"hive\",\n            schema: \"raw\",\n            tables: [\"events\", \"user_actions\"],\n          },\n        ],\n      },\n    },\n    {\n      name: \"admin-user\",\n      useSystemUser: false,\n      permissions: {\n        Query: [{ catalog: \"hive\", schema: \"*\", tables: [\"*\"] }],\n        Mutation: [{ catalog: \"hive\", schema: \"config\", tables: [\"*\"] }],\n      },\n    },\n  ],\n})\n```\n\n### Permission Resolution\n\nFor write operations, the permission check follows this logic:\n\n1. **No authenticated user** → Denied\n2. **No `Mutation` rules for this user** → Denied\n3. **Rules exist but don't match catalog/schema/table** → Denied\n4. **Matching rule found** → Allowed\n\n## System User Impersonation\n\nWrite statements in Trino often need to execute as a system user with broad permissions, even when the request originates from a regular user.\n\nWhen `useSystemUser: true` is set for a permission entry, the Trino client uses system credentials for that user's write operations instead of their own identity. This means:\n\n- Trino sees the system user as the executor\n- The application layer controls which tables a given client can write to\n- Audit trails should track both the requesting user and the executing identity\n\n```ts\n{\n  name: \"etl-pipeline\",\n  useSystemUser: true,  // Execute writes as system user in Trino\n  permissions: {\n    Query: [],\n    Mutation: [\n      { catalog: \"hive\", schema: \"staging\", tables: [\"*\"] }\n    ]\n  }\n}\n```\n\n<Note>\n  When `useSystemUser` is `false`, the write is executed with the authenticated\n  user's Trino credentials. This requires that user to exist in Trino with\n  appropriate table privileges.\n</Note>\n\n## Error Handling\n\nThe mutation pipeline uses fail-fast error propagation:\n\n| Failure point      | Behavior                                                        |\n| ------------------ | --------------------------------------------------------------- |\n| Zod validation     | Returns field errors to GraphQL, no data written                |\n| Parquet conversion | Error propagated, no S3 or DDL operations attempted             |\n| S3 upload          | Error propagated, no DDL operations attempted                   |\n| Hive DDL           | Error propagated (with rollback attempt for `full_load_append`) |\n\nFor `full_load_append`, if one of the two Hive table creations fails, the system attempts a best-effort rollback of both tables before propagating the original error.\n\n<Warning>\n  The pipeline does not provide transactional guarantees across S3 and Hive. If\n  the DDL step fails after a successful S3 upload, the Parquet file remains in\n  S3. Re-running the mutation will overwrite it on the next successful\n  execution.\n</Warning>\n\n## Custom Mutations\n\nFor use cases not covered by the pipeline (e.g., custom SQL, cross-table operations), you can still create manual mutation resolvers. See the [Custom Resolvers](/lakeql/guides/custom-resolvers) guide.\n","description":"Enable write operations with the mutation pipeline, configure load strategies, and use system user impersonation.","keywords":["mutations","mutation","write","pipeline","strategies"]}
{"schemaVersion":"1.0.0","docId":"lakeql/introduction/key-concepts","source":"lakeql","slug":"introduction/key-concepts","path":"/docs/lakeql/introduction/key-concepts","raw_path":"/raw/lakeql/introduction/key-concepts.md","title":"Key Concepts","headings":[{"level":2,"text":"Data Lakehouse","id":"data-lakehouse"},{"level":2,"text":"GraphQL-over-Trino","id":"graph-ql-over-trino"},{"level":2,"text":"Schema Introspection","id":"schema-introspection"},{"level":2,"text":"Code Generation","id":"code-generation"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/introduction/key-concepts/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Key Concepts\ndescription: Core concepts behind LakeQL — data lakehouses, GraphQL-over-Trino, schema introspection, and code generation.\n---\n\n## Data Lakehouse\n\nA data lakehouse combines the flexibility of data lakes with the structured query capabilities of data warehouses. In a typical LakeQL setup:\n\n- **Storage** — Data lives in object storage (S3, MinIO) in open formats like Parquet or ORC\n- **Metastore** — Apache Hive Metastore tracks table metadata (schemas, partitions, locations)\n- **Query Engine** — Trino provides SQL access across all catalogs and schemas\n\nLakeQL sits on top of this stack, exposing lakehouse data through a GraphQL API without requiring you to build a separate application layer.\n\n## GraphQL-over-Trino\n\nLakeQL translates incoming GraphQL queries into Trino SQL. This translation uses Kysely as a type-safe query builder:\n\n```ts\n// A GraphQL query like this:\nquery {\n  orders(filter: { status: { eq: \"shipped\" } }, paging: { limit: 10 }) {\n    nodes { id, status, total }\n    pageInfo { hasNext, currentPage }\n  }\n}\n\n// Becomes a Trino SQL query like this:\nWITH total_count AS (\n  SELECT COUNT(*) AS total_records\n  FROM hive.sales.orders\n  WHERE status = 'shipped'\n),\nrecords AS (\n  SELECT id, status, total\n  FROM hive.sales.orders\n  WHERE status = 'shipped'\n  ORDER BY id ASC\n  FETCH FIRST 10 ROWS ONLY\n)\nSELECT * FROM total_count FULL JOIN records ON TRUE\n```\n\nThe query builder handles field selection, WHERE clause generation, pagination (FETCH/OFFSET), and sorting — all derived from the GraphQL resolve info and input arguments.\n\n## Schema Introspection\n\nLakeQL discovers table structures by querying Trino metadata. When you run `lakeql-cli pull`, the CLI:\n\n1. Connects to your Trino instance via the REST API\n2. Executes `SHOW COLUMNS FROM catalog.schema.table` for each table\n3. Parses column type strings (including complex types like `array(row(...))`)\n4. Produces structured column definitions with names, types, and nullability\n\nThis means your GraphQL schema always reflects the actual state of your lakehouse tables.\n\n## Code Generation\n\nFrom introspected metadata, LakeQL generates four artifacts per table:\n\n| File                 | Purpose                                                           |\n| -------------------- | ----------------------------------------------------------------- |\n| `config.ts`          | Table metadata — catalog, schema, table name, column mappings     |\n| `interface.ts`       | TypeScript interface matching the table's column types            |\n| `query-schema.ts`    | Pothos query schema with filtering, sorting, and pagination       |\n| `mutation-schema.ts` | Pothos mutation schema with input types and resolver stub         |\n| `json-schema.json`   | JSON Schema used by the response transformer at runtime           |\n| `endpoint.json`      | Endpoint definition for re-generation via CLI or Endpoint Builder |\n\nThese files are committed to your repository and imported by the API server at startup. When your table schema changes, re-run `pull` to regenerate them.\n\n```bash\n# Generated file structure after pulling the \"orders\" table\nsrc/schemas/generated/\n└── sales/\n    └── orders/\n        ├── config.ts\n        ├── interface.ts\n        ├── query-schema.ts\n        ├── mutation-schema.ts\n        ├── json-schema.json\n        └── endpoint.json\n```\n","description":"Core concepts behind LakeQL — data lakehouses, GraphQL-over-Trino, schema introspection, and code generation.","keywords":["concepts","graphql-over-trino","schema","introspection","generation"]}
{"schemaVersion":"1.0.0","docId":"lakeql/introduction/overview","source":"lakeql","slug":"introduction/overview","path":"/docs/lakeql/introduction/overview","raw_path":"/raw/lakeql/introduction/overview.md","title":"Overview","headings":[{"level":2,"text":"What is LakeQL?","id":"what-is-lake-ql"},{"level":2,"text":"The Problem","id":"the-problem"},{"level":2,"text":"The Solution","id":"the-solution"},{"level":2,"text":"Key Benefits","id":"key-benefits"},{"level":2,"text":"How It Works","id":"how-it-works"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/introduction/overview/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Overview\ndescription: LakeQL is a toolkit for generating type-safe GraphQL APIs on top of Trino data lakehouses.\n---\n\n## What is LakeQL?\n\nLakeQL is a toolkit that automatically generates a type-safe GraphQL API on top of your Trino data lakehouse. It introspects your Trino schemas, generates TypeScript interfaces and query schemas, and serves them through a production-ready GraphQL API with built-in pagination, filtering, and sorting.\n\n## The Problem\n\nManually writing GraphQL resolvers for data lakehouse tables is repetitive and error-prone. Every time a table changes, you update interfaces, resolvers, and type definitions by hand. Schema drift, missing fields, and inconsistent filtering logic creep in fast — especially when your lakehouse has dozens of tables across multiple schemas.\n\n## The Solution\n\nLakeQL flips the workflow. Instead of writing resolvers manually, you introspect your Trino catalog and generate everything:\n\n- **TypeScript interfaces** from column metadata\n- **Pothos query schemas** with filtering, sorting, and pagination baked in\n- **JSON schemas** for response transformation\n- **A config registry** that ties it all together at runtime\n\nThe generated code is fully typed end-to-end, from GraphQL input to SQL output.\n\nPull a schema from Trino and generate all artifacts:\n\n<Command variant=\"exec\">lakeql-cli pull --catalog hive --schema sales</Command>\n\n## Key Benefits\n\n- **Type safety** — Generated TypeScript interfaces ensure compile-time correctness from GraphQL layer to SQL queries\n- **Automatic schema generation** — Run one CLI command to generate resolvers, types, and configs for entire schemas\n- **Pothos-based schema builder** — Leverage the Pothos GraphQL schema builder for type-safe, declarative schema construction\n- **Built-in pagination** — Connection-based pagination with `pageInfo` metadata out of the box\n- **Flexible filtering** — AND/OR combinable filters with operators like `eq`, `neq`, `like`, `in`, `gt`, `lt`, and more\n- **Sorting** — Multi-field sorting with ASC/DESC direction\n- **Permission model** — Read/write permission checks per table with support for user-level and system-user flows\n- **Kysely SQL generation** — SQL queries are built with Kysely, giving you type-safe query construction without raw strings\n\n## How It Works\n\n```mermaid preview\nflowchart LR\n    trino[\"Trino Catalog\"] -->|introspect| cli[\"LakeQL CLI<br/>(pull)\"]\n    cli -->|generate| ts[\"Generated TypeScript\"]\n    ts -->|load| api[\"LakeQL API<br/>(serve)\"]\n    api --> gql[\"GraphQL Endpoint\"]\n```\n\n1. **Introspect** — The CLI connects to Trino and discovers table structures\n2. **Generate** — TypeScript files are written with interfaces, query schemas, and configs\n3. **Serve** — The API server loads generated schemas and exposes a GraphQL endpoint\n4. **Query** — Clients send GraphQL queries that are translated into optimized Trino SQL\n","description":"LakeQL is a toolkit for generating type-safe GraphQL APIs on top of Trino data lakehouses.","keywords":["lakeql","overview","toolkit","generating","type-safe"]}
{"schemaVersion":"1.0.0","docId":"lakeql/introduction/package-map","source":"lakeql","slug":"introduction/package-map","path":"/docs/lakeql/introduction/package-map","raw_path":"/raw/lakeql/introduction/package-map.md","title":"Package Map","headings":[{"level":2,"text":"Packages","id":"packages"},{"level":2,"text":"Dependency Flow","id":"dependency-flow"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/lakeql/introduction/package-map/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Package Map\ndescription: All packages in the LakeQL monorepo and how they relate to each other.\n---\n\n## Packages\n\nLakeQL is organized as a monorepo with focused, single-responsibility packages.\n\n### @lakeql/api\n\nGraphQL API server built on Hono, GraphQL Yoga, and Pothos. Handles HTTP serving, schema loading, authentication, and permission checks.\n\n### @lakeql/cli\n\nCommand-line tool for schema introspection and code generation. Connects to Trino, discovers tables, and writes TypeScript source files.\n\n### @lakeql/query-builder\n\nKysely-based SQL query generation. Translates GraphQL resolve info (selected fields, filters, sorting, pagination) into Trino-compatible SQL with CTEs.\n\n### @lakeql/trino-client\n\nHTTP client for the Trino REST API. Handles statement submission, polling for results, and authentication (Basic and Bearer).\n\n### @lakeql/schema-generator\n\nGenerates GraphQL model definitions, Hive table DDL, and JSON schemas from parsed column metadata.\n\n### @lakeql/column-parser\n\nParses Trino column type strings (e.g. `varchar`, `array(row(id bigint, name varchar))`) into structured type objects.\n\n### @lakeql/response-transformer\n\nTransforms Trino's array-based responses into typed JavaScript objects using JSON Schema definitions.\n\n### @lakeql/file-generator\n\nGenerates TypeScript source files — config, interfaces, and query schemas — from schema-generator output.\n\n### @lakeql/helpers\n\nShared utilities for pagination calculation, code formatting, and object manipulation.\n\n### @lakeql/logger\n\nStructured logging with loglayer and winston. Includes sensitive field redaction.\n\n### @lakeql/create-app\n\nProject scaffolding tool (`@lakeql/create-app`). Downloads a starter template via giget and configures dependencies.\n\n## Dependency Flow\n\nThe packages connect through two main flows:\n\n### CLI Flow (Introspection + Generation)\n\n```mermaid preview\nflowchart LR\n    cli[\"@lakeql/cli\"] --> tc[\"@lakeql/trino-client\"]\n    tc --> cp[\"@lakeql/column-parser\"]\n    cp --> sg[\"@lakeql/schema-generator\"]\n    sg --> fg[\"@lakeql/file-generator\"]\n    cli --> fg\n```\n\n### API Flow (Serving + Querying)\n\n```mermaid preview\nflowchart LR\n    api[\"@lakeql/api\"] --> qb[\"@lakeql/query-builder\"]\n    qb --> tc[\"@lakeql/trino-client\"]\n    tc --> rt[\"@lakeql/response-transformer\"]\n    api --> rt\n```\n\n### Shared Packages\n\nBoth flows use these shared packages:\n\n- `@lakeql/helpers` — Pagination math, formatting utilities\n- `@lakeql/logger` — Structured logging with redaction\n","description":"All packages in the LakeQL monorepo and how they relate to each other.","keywords":["packages","package","lakeql","monorepo","relate"]}
{"schemaVersion":"1.0.0","docId":"query-builder","source":"query-builder","slug":"query-builder","path":"/docs/query-builder","raw_path":"/raw/query-builder.md","title":"Query Builder","headings":[],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/query-builder/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Query Builder\nnavTitle: Query Builder\ndescription: Translates GraphQL queries into Trino SQL.\nentrypoint: /docs/query-builder/overview/introduction\n---\n","description":"Translates GraphQL queries into Trino SQL.","navTitle":"Query Builder","keywords":["query","builder","translates","graphql","queries"]}
{"schemaVersion":"1.0.0","docId":"query-builder/filtering/combining-filters","source":"query-builder","slug":"filtering/combining-filters","path":"/docs/query-builder/filtering/combining-filters","raw_path":"/raw/query-builder/filtering/combining-filters.md","title":"Combining Filters","headings":[{"level":2,"text":"WhereOperator Enum","id":"where-operator-enum"},{"level":2,"text":"normalizeUserQuery","id":"normalize-user-query"},{"level":2,"text":"normalizeFilter","id":"normalize-filter"},{"level":2,"text":"Processing Pipeline","id":"processing-pipeline"},{"level":2,"text":"Nested Example","id":"nested-example"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/query-builder/filtering/combining-filters/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Combining Filters\ndescription: How AND/OR logic works and how filters are normalized before query compilation.\n---\n\n## WhereOperator Enum\n\nThe `WhereOperator` enum defines the two logical combinators available for grouping filter conditions:\n\n```ts\nimport { WhereOperator } from \"@lakeql/query-builder\"\n\nenum WhereOperator {\n  AND = \"and\",\n  OR = \"or\",\n}\n```\n\nEvery `Where` object must have either `and` or `or` as its root key. Conditions within an `and` array are combined with SQL `AND`; conditions within an `or` array are combined with SQL `OR`.\n\n## normalizeUserQuery\n\nUser-supplied filter objects might not always follow the strict `{ and: [...] }` or `{ or: [...] }` root format. The `normalizeUserQuery` function ensures a consistent structure by wrapping bare field objects into an `and` root.\n\n```ts\nimport { normalizeUserQuery } from \"@lakeql/query-builder\"\n\n// Input: bare fields without a root operator\nconst input = { status: { eq: \"active\" }, region: { eq: \"us-east\" } }\n\n// Output: wrapped in an AND root\nconst normalized = normalizeUserQuery(input)\n// { and: [{ status: { eq: \"active\" } }, { region: { eq: \"us-east\" } }] }\n```\n\n## normalizeFilter\n\nThe `normalizeFilter` function handles another edge case: filter objects where multiple fields are combined in a single array element. It splits multi-field objects into separate entries so each condition is processed independently.\n\n```ts\nimport { normalizeFilter } from \"@lakeql/query-builder\"\n\n// Input: multiple fields in one object\nconst input = {\n  and: [{ status: { eq: \"active\" }, region: { eq: \"us-east\" } }],\n}\n\n// Output: each field in its own object\nconst normalized = normalizeFilter(input)\n// { and: [{ status: { eq: \"active\" } }, { region: { eq: \"us-east\" } }] }\n```\n\n## Processing Pipeline\n\nInside `generateQuery`, both normalization functions are applied before building the WHERE clause:\n\n1. `normalizeUserQuery(userQuery)` — ensures there's a root `and`/`or` key\n2. `normalizeFilter(...)` — splits multi-field objects into single-field entries\n3. `conditionBuilder(...)` — recursively walks the tree and builds Kysely expressions\n\n## Nested Example\n\n```ts\nimport type { Where } from \"@lakeql/query-builder\"\n\nconst complexFilter: Where = {\n  and: [\n    { status: { eq: \"active\" } },\n    {\n      or: [\n        { region: { eq: \"us-east\" } },\n        {\n          and: [{ region: { eq: \"eu-west\" } }, { tier: { eq: \"premium\" } }],\n        },\n      ],\n    },\n  ],\n}\n```\n\nThis translates to:\n\n```sql\nWHERE \"status\" = 'active'\n  AND (\n    \"region\" = 'us-east'\n    OR (\"region\" = 'eu-west' AND \"tier\" = 'premium')\n  )\n```\n","description":"How AND/OR logic works and how filters are normalized before query compilation.","keywords":["filters","combining","andor","logic","works"]}
{"schemaVersion":"1.0.0","docId":"query-builder/filtering/operators","source":"query-builder","slug":"filtering/operators","path":"/docs/query-builder/filtering/operators","raw_path":"/raw/query-builder/filtering/operators.md","title":"Operators","headings":[{"level":2,"text":"Supported Operators","id":"supported-operators"},{"level":2,"text":"Pattern Matching Examples","id":"pattern-matching-examples"},{"level":2,"text":"Comparison Examples","id":"comparison-examples"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/query-builder/filtering/operators/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Operators\ndescription: All supported filter operators and their SQL translations.\n---\n\n## Supported Operators\n\nThe query builder supports a comprehensive set of comparison and pattern-matching operators. Each operator in `FieldOptions` maps to a specific SQL expression.\n\n| Operator        | SQL Output           | Description                               |\n| --------------- | -------------------- | ----------------------------------------- |\n| `eq`            | `= value`            | Equals                                    |\n| `neq`           | `!= value`           | Not equals                                |\n| `in`            | `IN (values)`        | Matches any value in the list             |\n| `notIn`         | `NOT IN (values)`    | Does not match any value in the list      |\n| `lt`            | `< value`            | Less than                                 |\n| `lte`           | `<= value`           | Less than or equal                        |\n| `gt`            | `> value`            | Greater than                              |\n| `gte`           | `>= value`           | Greater than or equal                     |\n| `like`          | `LIKE '%value%'`     | Contains (wraps value with wildcards)     |\n| `notLike`       | `NOT LIKE '%value%'` | Does not contain                          |\n| `startsWith`    | `LIKE 'value%'`      | Starts with the given prefix              |\n| `notStartsWith` | `NOT LIKE 'value%'`  | Does not start with the given prefix      |\n| `endsWith`      | `LIKE '%value'`      | Ends with the given suffix                |\n| `notEndsWith`   | `NOT LIKE '%value'`  | Does not end with the given suffix        |\n| `is`            | `= value`            | Equality check (typically for booleans)   |\n| `isNot`         | `!= value`           | Inequality check (typically for booleans) |\n\n## Pattern Matching Examples\n\n```ts\nimport type { Where } from \"@lakeql/query-builder\"\n\n// Contains \"smith\" anywhere in the name\nconst containsFilter: Where = {\n  and: [{ name: { like: \"smith\" } }],\n}\n// SQL: WHERE \"name\" LIKE '%smith%'\n\n// Starts with \"prod-\"\nconst prefixFilter: Where = {\n  and: [{ environment: { startsWith: \"prod-\" } }],\n}\n// SQL: WHERE \"environment\" LIKE 'prod-%'\n\n// Ends with \".csv\"\nconst suffixFilter: Where = {\n  and: [{ filename: { endsWith: \".csv\" } }],\n}\n// SQL: WHERE \"filename\" LIKE '%.csv'\n```\n\n## Comparison Examples\n\n```ts\nimport type { Where } from \"@lakeql/query-builder\"\n\n// Numeric range\nconst rangeFilter: Where = {\n  and: [{ price: { gte: \"10\" } }, { price: { lte: \"100\" } }],\n}\n// SQL: WHERE \"price\" >= '10' AND \"price\" <= '100'\n\n// Membership\nconst membershipFilter: Where = {\n  and: [{ status: { in: [\"active\", \"pending\"] } }],\n}\n// SQL: WHERE \"status\" IN ('active', 'pending')\n```\n","description":"All supported filter operators and their SQL translations.","keywords":["operators","supported","examples","filter","their"]}
{"schemaVersion":"1.0.0","docId":"query-builder/filtering/where-interface","source":"query-builder","slug":"filtering/where-interface","path":"/docs/query-builder/filtering/where-interface","raw_path":"/raw/query-builder/filtering/where-interface.md","title":"Where Interface","headings":[{"level":2,"text":"Overview","id":"overview"},{"level":2,"text":"Example Structure","id":"example-structure"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/query-builder/filtering/where-interface/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Where Interface\ndescription: The recursive type structure that represents filter conditions.\n---\n\n## Overview\n\nThe query builder uses a recursive type system to represent arbitrarily nested filter conditions. The root `Where` type can contain AND/OR groups, each of which may hold field conditions or further nested groups.\n\n## Example Structure\n\n```ts\nimport type { Where } from \"@lakeql/query-builder\"\n\nconst filter: Where = {\n  and: [\n    { status: { eq: \"active\" } },\n    {\n      or: [{ region: { eq: \"us-east\" } }, { region: { eq: \"eu-west\" } }],\n    },\n    { amount: { gte: \"100\" } },\n  ],\n}\n```\n\nThis produces SQL equivalent to:\n\n```sql\nWHERE \"status\" = 'active'\n  AND (\"region\" = 'us-east' OR \"region\" = 'eu-west')\n  AND \"amount\" >= '100'\n```\n","description":"The recursive type structure that represents filter conditions.","keywords":["structure","where","interface","recursive","represents"]}
{"schemaVersion":"1.0.0","docId":"query-builder/overview/introduction","source":"query-builder","slug":"overview/introduction","path":"/docs/query-builder/overview/introduction","raw_path":"/raw/query-builder/overview/introduction.md","title":"Introduction","headings":[{"level":2,"text":"What is the Query Builder?","id":"what-is-the-query-builder"},{"level":2,"text":"Core Responsibilities","id":"core-responsibilities"},{"level":2,"text":"DummyDriver Approach","id":"dummy-driver-approach"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/query-builder/overview/introduction/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Introduction\ndescription: Learn how the query builder uses Kysely to generate Trino-compatible SQL from GraphQL queries.\n---\n\n## What is the Query Builder?\n\nThe `@lakeql/query-builder` package is responsible for translating GraphQL query parameters — field selections, filters, sorting, and pagination — into Trino-compatible SQL statements. It uses [Kysely](https://kysely.dev/) as a type-safe SQL query builder under the hood.\n\nThe package never executes queries against a database. Instead, it uses Kysely's `DummyDriver` to compile SQL strings that are later sent to Trino via the `@lakeql/trino-client`. This separation keeps query construction fully testable and free of I/O.\n\n## Core Responsibilities\n\n- Extract requested fields from GraphQL resolve info\n- Build parameterized SQL with CTEs for total count and paginated records\n- Apply user-supplied filters (AND/OR logic with multiple operators)\n- Apply sorting and Trino-specific paging syntax\n- Map GraphQL field names to database column names via `transformFields`\n- Wrap date/time columns in `to_unixtime()` for serialization\n\n```ts\nimport { generateQuery, getSelectFields } from \"@lakeql/query-builder\"\n\nconst query = generateQuery({\n  catalog: \"hive\",\n  schema: \"sales\",\n  table: \"orders\",\n  selectFields: [\"id\", \"customer_name\", \"created_at\"],\n  userQuery: { and: [{ status: { eq: \"shipped\" } }] },\n  sorting: [{ field: \"created_at\", direction: \"desc\" }],\n  paging: { limit: 25, offset: 0 },\n  dateFields: [\"created_at\"],\n})\n```\n\n## DummyDriver Approach\n\nKysely normally requires a database dialect with a real driver. The query builder configures Kysely with a `PostgresAdapter`, `PostgresQueryCompiler`, and a `DummyDriver` that never opens a connection. This gives you full SQL compilation (including parameter binding) without any runtime dependency on a database.\n\n```ts\nimport {\n  DummyDriver,\n  Kysely,\n  PostgresAdapter,\n  PostgresIntrospector,\n  PostgresQueryCompiler,\n} from \"kysely\"\n\nconst db = new Kysely({\n  dialect: {\n    createAdapter: () => new PostgresAdapter(),\n    createDriver: () => new DummyDriver(),\n    createIntrospector: (db) => new PostgresIntrospector(db),\n    createQueryCompiler: () => new PostgresQueryCompiler(),\n  },\n})\n```\n","description":"Learn how the query builder uses Kysely to generate Trino-compatible SQL from GraphQL queries.","keywords":["query","builder","introduction","learn","kysely"]}
{"schemaVersion":"1.0.0","docId":"query-builder/sorting-and-paging/paging","source":"query-builder","slug":"sorting-and-paging/paging","path":"/docs/query-builder/sorting-and-paging/paging","raw_path":"/raw/query-builder/sorting-and-paging/paging.md","title":"Paging","headings":[{"level":2,"text":"PagingInput Interface","id":"paging-input-interface"},{"level":2,"text":"Trino Pagination Syntax","id":"trino-pagination-syntax"},{"level":2,"text":"Usage Example","id":"usage-example"},{"level":2,"text":"Default Behavior","id":"default-behavior"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/query-builder/sorting-and-paging/paging/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Paging\ndescription: Trino-specific pagination using FETCH FIRST/NEXT and OFFSET.\n---\n\n## PagingInput Interface\n\n<InterfaceReference file=\"query-builder/src/index\" name=\"PagingInput\" />\n\n## Trino Pagination Syntax\n\nTrino uses SQL standard `FETCH FIRST N ROWS ONLY` syntax rather than the MySQL-style `LIMIT` clause. The query builder generates two different patterns depending on whether an offset is provided.\n\n### Without Offset\n\nWhen `offset` is not set or is `0`, the query uses:\n\n```sql\nFETCH FIRST 25 ROWS ONLY\n```\n\n### With Offset\n\nWhen `offset` is a positive number, the query uses:\n\n```sql\nOFFSET 50\nFETCH NEXT 25 ROWS ONLY\n```\n\nNote the difference: `FETCH FIRST` becomes `FETCH NEXT` when an offset is present.\n\n## Usage Example\n\n```ts\nimport { generateQuery } from \"@lakeql/query-builder\"\n\n// Example table type\ninterface MyTable {\n  id: number\n  event_type: string\n  timestamp: string\n  user_id: number\n}\n\n// First page: 25 rows, no offset\nconst page1 = generateQuery<MyTable>({\n  catalog: \"hive\",\n  schema: \"analytics\",\n  table: \"events\",\n  selectFields: [\"id\", \"event_type\", \"timestamp\"],\n  userQuery: {},\n  sorting: [{ field: \"timestamp\", direction: \"desc\" }],\n  paging: { limit: 25 },\n})\n\n// Second page: next 25 rows\nconst page2 = generateQuery<MyTable>({\n  catalog: \"hive\",\n  schema: \"analytics\",\n  table: \"events\",\n  selectFields: [\"id\", \"event_type\", \"timestamp\"],\n  userQuery: {},\n  sorting: [{ field: \"timestamp\", direction: \"desc\" }],\n  paging: { limit: 25, offset: 25 },\n})\n```\n\n## Default Behavior\n\nIf no `paging` option is provided, the query builder defaults to `FETCH FIRST 100 ROWS ONLY` with no offset. This prevents unbounded result sets from being returned accidentally.\n","description":"Trino-specific pagination using FETCH FIRST/NEXT and OFFSET.","keywords":["pagination","paging","trino-specific","using","fetch"]}
{"schemaVersion":"1.0.0","docId":"query-builder/sorting-and-paging/sorting","source":"query-builder","slug":"sorting-and-paging/sorting","path":"/docs/query-builder/sorting-and-paging/sorting","raw_path":"/raw/query-builder/sorting-and-paging/sorting.md","title":"Sorting","headings":[{"level":2,"text":"SortInput Interface","id":"sort-input-interface"},{"level":2,"text":"Multiple Sort Fields","id":"multiple-sort-fields"},{"level":2,"text":"transformFields Integration","id":"transform-fields-integration"},{"level":2,"text":"Why Sorting Matters for Paging","id":"why-sorting-matters-for-paging"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/query-builder/sorting-and-paging/sorting/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Sorting\ndescription: Multi-field sorting with configurable direction for query results.\n---\n\n## SortInput Interface\n\n<InterfaceReference file=\"query-builder/src/index\" name=\"SortInput\" />\n\nEach `SortInput` specifies a column to order by and the sort direction. The `field` property uses Kysely's `SelectExpression` type for type safety, but in practice it's a string matching a column name.\n\n## Multiple Sort Fields\n\nThe `sorting` parameter accepts an array. Fields are applied in order — the first entry is the primary sort key, subsequent entries are secondary, tertiary, etc.\n\n```ts\nimport { generateQuery } from \"@lakeql/query-builder\"\n\ninterface OrderTable {\n  id: number\n  status: string\n  created_at: string\n  customerName: string\n}\n\nconst compiled = generateQuery<OrderTable>({\n  catalog: \"hive\",\n  schema: \"sales\",\n  table: \"orders\",\n  selectFields: [\"id\", \"status\", \"created_at\"],\n  userQuery: {},\n  sorting: [\n    { field: \"status\", direction: \"asc\" },\n    { field: \"created_at\", direction: \"desc\" },\n  ],\n})\n```\n\nProduces:\n\n```sql\nORDER BY \"status\" ASC, \"created_at\" DESC\n```\n\n## transformFields Integration\n\nIf a sorted field has a mapping in `transformFields`, the query builder uses the database column name in the ORDER BY clause:\n\n```ts\nimport { generateQuery } from \"@lakeql/query-builder\"\n\ninterface OrderTable {\n  id: number\n  status: string\n  created_at: string\n  customerName: string\n}\n\nconst compiled = generateQuery<OrderTable>({\n  catalog: \"hive\",\n  schema: \"sales\",\n  table: \"orders\",\n  selectFields: [\"id\", \"status\", \"created_at\"],\n  userQuery: {},\n  sorting: [{ field: \"customerName\", direction: \"asc\" }],\n  transformFields: { customerName: \"customer_name\" },\n})\n```\n\nProduces:\n\n```sql\nORDER BY \"customer_name\" ASC\n```\n\n## Why Sorting Matters for Paging\n\nWithout at least one sort field, Trino does not guarantee row ordering between pages. This means a client paginating through results might see duplicate rows or miss rows entirely. Always provide at least one deterministic sort key (such as a primary key or timestamp) when using pagination.\n","description":"Multi-field sorting with configurable direction for query results.","keywords":["sorting","multi-field","configurable","direction","query"]}
{"schemaVersion":"1.0.0","docId":"trino-client","source":"trino-client","slug":"trino-client","path":"/docs/trino-client","raw_path":"/raw/trino-client.md","title":"Trino Client","headings":[],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/trino-client/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Trino Client\nnavTitle: Trino Client\ndescription: HTTP client for Trino's REST API.\nentrypoint: /docs/trino-client/overview/introduction\n---\n","description":"HTTP client for Trino's REST API.","navTitle":"Trino Client","keywords":["client","trino","trinos"]}
{"schemaVersion":"1.0.0","docId":"trino-client/configuration/client-setup","source":"trino-client","slug":"configuration/client-setup","path":"/docs/trino-client/configuration/client-setup","raw_path":"/raw/trino-client/configuration/client-setup.md","title":"Client Setup","headings":[{"level":2,"text":"Basic Authentication","id":"basic-authentication"},{"level":2,"text":"Bearer Token Authentication","id":"bearer-token-authentication"},{"level":2,"text":"Retry Configuration","id":"retry-configuration"},{"level":2,"text":"Timeout","id":"timeout"},{"level":2,"text":"Default Headers","id":"default-headers"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/trino-client/configuration/client-setup/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Client Setup\ndescription: Configuration options and authentication for the Trino client.\n---\n\n## Basic Authentication\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: {\n    type: \"basic\",\n    username: \"analyst\",\n    password: \"s3cr3t\",\n  },\n  catalog: \"hive\",\n  schema: \"sales\",\n})\n```\n\nThe username and password are Base64-encoded and sent as an `Authorization: Basic ...` header on every request.\n\n## Bearer Token Authentication\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: {\n    type: \"bearer\",\n    token: \"eyJhbGciOiJSUzI1NiIs...\",\n  },\n  catalog: \"iceberg\",\n  source: \"my-etl-service\",\n})\n```\n\nThe token is sent as an `Authorization: Bearer ...` header.\n\n## Retry Configuration\n\nThe client automatically retries on transient failures (HTTP 429, 500, 502, 503, 504 and network errors). Configure retry behavior via the `retry` option:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  retry: {\n    maxRetries: 5,\n    initialDelay: 500,\n    maxDelay: 30000,\n    backoffMultiplier: 2,\n  },\n})\n```\n\n<InterfaceReference file=\"trino-client/src/retry\" name=\"RetryConfig\" />\n\n## Timeout\n\nSet a default timeout (in milliseconds) for all requests. If a request exceeds this duration, it will be aborted:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  timeout: 30_000, // 30 seconds\n})\n```\n\nYou can also pass a per-query `signal` for more fine-grained control — see [Executing Queries](/docs/trino-client/queries/executing-queries).\n\n## Default Headers\n\nOn construction, the client sets the following Trino headers automatically:\n\n| Header            | Value                                         |\n| ----------------- | --------------------------------------------- |\n| `X-Trino-Source`  | The `source` option, or `\"nodejs\"` if omitted |\n| `X-Trino-Catalog` | The `catalog` option                          |\n| `X-Trino-Schema`  | The `schema` option (only if provided)        |\n\nYou can modify headers after construction using `setHeader()` or `setRawHeader()`:\n\n```ts\nclient.setHeader(\"X-Trino-Schema\", \"production\")\nclient.setRawHeader(\"X-Custom-Header\", \"value\")\n```\n","description":"Configuration options and authentication for the Trino client.","keywords":["authentication","client","configuration","setup","options"]}
{"schemaVersion":"1.0.0","docId":"trino-client/metadata/inspecting-catalogs","source":"trino-client","slug":"metadata/inspecting-catalogs","path":"/docs/trino-client/metadata/inspecting-catalogs","raw_path":"/raw/trino-client/metadata/inspecting-catalogs.md","title":"Inspecting Catalogs","headings":[{"level":2,"text":"Overview","id":"overview"},{"level":2,"text":"schemas","id":"schemas"},{"level":2,"text":"tables","id":"tables"},{"level":2,"text":"views","id":"views"},{"level":2,"text":"columns","id":"columns"},{"level":2,"text":"Full Discovery Example","id":"full-discovery-example"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/trino-client/metadata/inspecting-catalogs/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Inspecting Catalogs\ndescription: List schemas, tables, views, and columns from any Trino catalog.\n---\n\n## Overview\n\nThe Trino client provides convenience methods for inspecting catalog metadata. Each method executes a SQL query internally and returns a simplified result — typically an array of strings or tuples.\n\nSee the [API Reference](/docs/trino-client/api-reference) for full method signatures and return types.\n\n## schemas\n\nLists all schemas in a catalog.\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst schemas = await client.schemas({ catalog: \"hive\" })\n// [\"default\", \"sales\", \"analytics\", \"information_schema\"]\n```\n\n## tables\n\nLists all base tables in a catalog and schema.\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst tables = await client.tables({ catalog: \"hive\", schema: \"sales\" })\n// [\"orders\", \"customers\", \"products\", \"invoices\"]\n```\n\n## views\n\nLists all views in a catalog and schema.\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst views = await client.views({ catalog: \"hive\", schema: \"sales\" })\n// [\"monthly_revenue\", \"top_customers\"]\n```\n\n## columns\n\nLists all columns for a specific table, including their types and metadata.\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst columns = await client.columns({\n  catalog: \"hive\",\n  schema: \"sales\",\n  table: \"orders\",\n})\n// [\n//   [\"id\", \"varchar\", \"\", \"Order identifier\"],\n//   [\"amount\", \"double\", \"\", \"Total order amount\"],\n//   [\"created_at\", \"timestamp(3)\", \"\", \"When the order was placed\"],\n// ]\n```\n\n### Typed objects with `asObject`\n\nPass `asObject: true` to get `ColumnInfo[]` objects instead of raw tuples:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst columns = await client.columns({\n  catalog: \"hive\",\n  schema: \"sales\",\n  table: \"orders\",\n  asObject: true,\n})\n// [\n//   { name: \"id\", type: \"varchar\", extra: \"\", description: \"Order identifier\" },\n//   { name: \"amount\", type: \"double\", extra: \"\", description: \"Total order amount\" },\n// ]\n\nfor (const col of columns) {\n  console.log(`${col.name}: ${col.type}`)\n}\n```\n\n## Full Discovery Example\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"bearer\", token: \"eyJhbGciOiJSUzI1NiIs...\" },\n  catalog: \"iceberg\",\n})\n\n// Discover all tables across all schemas\nconst schemas = await client.schemas({ catalog: \"iceberg\" })\n\nfor (const schema of schemas) {\n  if (schema === \"information_schema\") {\n    continue\n  }\n\n  const tables = await client.tables({ catalog: \"iceberg\", schema })\n  console.log(`${schema}: ${tables.join(\", \")}`)\n\n  for (const table of tables) {\n    const columns = await client.columns({ catalog: \"iceberg\", schema, table })\n    console.log(\n      `  ${table}: ${columns.map(([name, type]) => `${name} (${type})`).join(\", \")}`\n    )\n  }\n}\n```\n","description":"List schemas, tables, views, and columns from any Trino catalog.","keywords":["schemas","tables","views","columns","inspecting"]}
{"schemaVersion":"1.0.0","docId":"trino-client/overview/introduction","source":"trino-client","slug":"overview/introduction","path":"/docs/trino-client/overview/introduction","raw_path":"/raw/trino-client/overview/introduction.md","title":"Introduction","headings":[{"level":2,"text":"What is the Trino Client?","id":"what-is-the-trino-client"},{"level":2,"text":"Core Capabilities","id":"core-capabilities"},{"level":2,"text":"Quick Example","id":"quick-example"},{"level":2,"text":"How It Works","id":"how-it-works"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/trino-client/overview/introduction/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Introduction\ndescription: An HTTP client for executing queries and inspecting metadata on Trino clusters.\n---\n\n## What is the Trino Client?\n\nThe `@lakeql/trino-client` package provides a typed HTTP client for communicating with a Trino cluster via its [REST API](https://trino.io/docs/current/develop/client-protocol.html). It handles authentication, paginated result fetching, query cancellation, and catalog metadata inspection.\n\nBuilt on native `fetch` with zero runtime dependencies. Supports basic and bearer token authentication, automatic pagination through `nextUri` links, configurable retry with exponential backoff, and an async-generator streaming mode for memory-efficient processing of large result sets.\n\n## Core Capabilities\n\n- **Query execution** — Send SQL statements and collect all result pages into a single array\n- **Streaming** — Yield rows one at a time via an async generator for large datasets\n- **Row transforms** — Map raw row arrays to typed objects with a `transform` function\n- **Query cancellation** — Cancel in-flight queries via `AbortSignal` or `cancelQuery()`\n- **Retry with backoff** — Automatic retry on transient failures (429, 5xx, network errors)\n- **Metadata inspection** — List schemas, tables, views, and columns for any catalog\n- **Authentication** — Basic auth (username/password) and bearer token auth\n- **User impersonation** — Execute queries on behalf of another user via the `X-Trino-User` header\n\n## Quick Example\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  schema: \"sales\",\n})\n\nconst rows = await client.query<[string, number]>({\n  sql: \"SELECT region, SUM(amount) FROM orders GROUP BY region\",\n})\n\nconsole.log(rows)\n// [[\"us-east\", 42000], [\"eu-west\", 31500], ...]\n```\n\n## How It Works\n\n1. The client sends a `POST` request to `/v1/statement` with the SQL body\n2. Trino responds with an initial result (possibly empty) and a `nextUri`\n3. The client follows `nextUri` links with `GET` requests until no more pages remain\n4. All `data` arrays from each page are concatenated into the final result\n\nThis pagination is handled transparently — you call `query()` and get back the full result set.\n","description":"An HTTP client for executing queries and inspecting metadata on Trino clusters.","keywords":["client","trino","introduction","executing","queries"]}
{"schemaVersion":"1.0.0","docId":"trino-client/queries/executing-queries","source":"trino-client","slug":"queries/executing-queries","path":"/docs/trino-client/queries/executing-queries","raw_path":"/raw/trino-client/queries/executing-queries.md","title":"Executing Queries","headings":[{"level":2,"text":"Basic Usage","id":"basic-usage"},{"level":2,"text":"Row Transforms","id":"row-transforms"},{"level":2,"text":"User Impersonation","id":"user-impersonation"},{"level":2,"text":"Query Cancellation","id":"query-cancellation"},{"level":2,"text":"Error Handling","id":"error-handling"},{"level":2,"text":"How Pagination Works","id":"how-pagination-works"},{"level":2,"text":"QueryProps Reference","id":"query-props-reference"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/trino-client/queries/executing-queries/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Executing Queries\ndescription: Run SQL statements against Trino and collect all result pages into a single array.\n---\n\n## Basic Usage\n\nThe `query` method executes a SQL statement and automatically follows Trino's `nextUri` pagination links until all result pages have been collected.\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  schema: \"sales\",\n})\n\nconst rows = await client.query<[string, string, number]>({\n  sql: \"SELECT id, region, amount FROM orders WHERE status = 'shipped'\",\n})\n\nfor (const [id, region, amount] of rows) {\n  console.log(`Order ${id}: ${region} - $${amount}`)\n}\n```\n\n## Row Transforms\n\nUse the `transform` option to map raw row arrays into typed objects:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\nimport type { Column } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  schema: \"sales\",\n})\n\ninterface Order {\n  id: string\n  region: string\n  amount: number\n}\n\nconst orders = await client.query<Order>({\n  sql: \"SELECT id, region, amount FROM orders\",\n  transform: (row: unknown[], columns: Column[]) => ({\n    id: row[0] as string,\n    region: row[1] as string,\n    amount: row[2] as number,\n  }),\n})\n\n// orders: Order[]\nconsole.log(orders[0].region)\n```\n\nThe `transform` function receives the raw row array and the column metadata, and returns the desired shape.\n\n## User Impersonation\n\nPass `impersonateAs` to execute the query as a different Trino user. This sets the `X-Trino-User` header for that request only:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst rows = await client.query<[string]>({\n  sql: \"SELECT current_user\",\n  impersonateAs: \"data-team-service\",\n})\n// rows: [[\"data-team-service\"]]\n```\n\n## Query Cancellation\n\nCancel a running query using an `AbortSignal`:\n\n```ts\nimport { TrinoClient, TrinoCancellationError } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\nconst controller = new AbortController()\n\n// Cancel after 10 seconds\nsetTimeout(() => controller.abort(), 10_000)\n\ntry {\n  const rows = await client.query({\n    sql: \"SELECT * FROM very_large_table\",\n    signal: controller.signal,\n  })\n} catch (error) {\n  if (error instanceof TrinoCancellationError) {\n    console.log(\"Query was cancelled\")\n  }\n}\n```\n\nYou can also cancel queries by ID:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\n// Cancel a specific query\nawait client.cancelQuery(\"20240101_123456_00001_abcde\")\n\n// Cancel all active queries\nawait client.cancelAllQueries()\n\n// Check what's currently running\nconst active = client.getActiveQueries()\n```\n\n## Error Handling\n\nThe client throws typed errors depending on the failure:\n\n```ts\nimport {\n  TrinoClient,\n  TrinoClientError,\n  TrinoQueryError,\n  TrinoCancellationError,\n} from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n})\n\ntry {\n  const rows = await client.query({ sql: \"SELECT * FROM nonexistent_table\" })\n} catch (error) {\n  if (error instanceof TrinoQueryError) {\n    // Trino returned an error in the response body\n    console.error(error.errorName) // e.g. \"TABLE_NOT_FOUND\"\n    console.error(error.errorType) // e.g. \"USER_ERROR\"\n    console.error(error.queryId)\n  } else if (error instanceof TrinoClientError) {\n    // HTTP-level error (non-2xx status)\n    console.error(error.statusCode) // e.g. 503\n    console.error(error.message)\n  } else if (error instanceof TrinoCancellationError) {\n    // Query was cancelled via AbortSignal\n    console.log(\"Cancelled:\", error.queryId)\n  }\n}\n```\n\n## How Pagination Works\n\n1. A `POST` is sent to `/v1/statement` with the SQL body\n2. Trino returns an initial response with (optionally) data and a `nextUri`\n3. The client follows `nextUri` with `GET` requests, collecting `data` arrays\n4. When no `nextUri` is returned, all pages have been fetched\n5. The concatenated result array is returned\n\nThis is entirely transparent — you call `query()` and get the complete result regardless of how many pages Trino needed internally.\n\n## QueryProps Reference\n\n<InterfaceReference file=\"trino-client/src/types\" name=\"QueryProps\" />\n","description":"Run SQL statements against Trino and collect all result pages into a single array.","keywords":["executing","queries","statements","against","trino"]}
{"schemaVersion":"1.0.0","docId":"trino-client/queries/streaming","source":"trino-client","slug":"queries/streaming","path":"/docs/trino-client/queries/streaming","raw_path":"/raw/trino-client/queries/streaming.md","title":"Streaming","headings":[{"level":2,"text":"Basic Usage","id":"basic-usage"},{"level":2,"text":"Streaming with Transforms","id":"streaming-with-transforms"},{"level":2,"text":"Cancelling a Stream","id":"cancelling-a-stream"},{"level":2,"text":"When to Use stream vs query","id":"when-to-use-stream-vs-query"},{"level":2,"text":"Memory Efficiency","id":"memory-efficiency"}],"documentType":"unknown","contentOrigin":"static-doc","canonicalUrl":"/docs/trino-client/queries/streaming/","buildId":"local-1782300258864","generatedAt":"2026-06-24T11:24:18.864Z","content":"---\ntitle: Streaming\ndescription: Stream query results row-by-row using an async generator for memory-efficient processing.\n---\n\n## Basic Usage\n\nThe `stream` method executes a SQL statement and returns an async generator that yields rows one at a time as pages are fetched. Ideal for large result sets where holding all data in memory would be impractical.\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  schema: \"logs\",\n})\n\nconst stream = await client.stream<[string, string, number]>({\n  sql: \"SELECT timestamp, level, duration FROM request_logs\",\n})\n\nfor await (const [timestamp, level, duration] of stream) {\n  if (duration > 5000) {\n    console.warn(`Slow request at ${timestamp}: ${duration}ms`)\n  }\n}\n```\n\n## Streaming with Transforms\n\nCombine streaming with `transform` to get typed objects:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\nimport type { Column } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  schema: \"logs\",\n})\n\ninterface LogEntry {\n  timestamp: string\n  level: string\n  duration: number\n}\n\nconst stream = await client.stream<LogEntry>({\n  sql: \"SELECT timestamp, level, duration FROM request_logs\",\n  transform: (row: unknown[], columns: Column[]) => ({\n    timestamp: row[0] as string,\n    level: row[1] as string,\n    duration: row[2] as number,\n  }),\n})\n\nfor await (const entry of stream) {\n  if (entry.level === \"ERROR\") {\n    console.error(`Error at ${entry.timestamp}: ${entry.duration}ms`)\n  }\n}\n```\n\n## Cancelling a Stream\n\nPass an `AbortSignal` to cancel a stream mid-flight:\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  schema: \"logs\",\n})\n\nconst controller = new AbortController()\n\nconst stream = await client.stream({\n  sql: \"SELECT * FROM events\",\n  signal: controller.signal,\n})\n\nlet count = 0\nfor await (const row of stream) {\n  count += 1\n  if (count >= 1000) {\n    controller.abort()\n    break\n  }\n}\n```\n\n## When to Use stream vs query\n\n| Scenario                          | Method   |\n| --------------------------------- | -------- |\n| Result fits comfortably in memory | `query`  |\n| Large or unbounded result sets    | `stream` |\n| Need all rows before processing   | `query`  |\n| Can process rows incrementally    | `stream` |\n| ETL pipelines / file exports      | `stream` |\n\n## Memory Efficiency\n\nWith `query`, all pages are collected into a single array before the promise resolves. With `stream`, each page is fetched on-demand and rows are yielded immediately — only one page of data is held in memory at a time.\n\n```ts\nimport { TrinoClient } from \"@lakeql/trino-client\"\n\nconst client = new TrinoClient({\n  host: \"https://trino.example.com\",\n  port: 8443,\n  auth: { type: \"basic\", username: \"analyst\", password: \"secret\" },\n  catalog: \"hive\",\n  schema: \"analytics\",\n})\n\n// Processing millions of rows without loading them all into memory\nconst stream = await client.stream<[string, number]>({\n  sql: \"SELECT user_id, event_count FROM daily_aggregates\",\n})\n\nlet processed = 0\nfor await (const [userId, count] of stream) {\n  console.log(`${userId}: ${count}`)\n  processed += 1\n}\n\nconsole.log(`Processed ${processed} rows`)\n```\n","description":"Stream query results row-by-row using an async generator for memory-efficient processing.","keywords":["stream","streaming","query","results","row-by-row"]}