LakeQL
Introduction
  • Overview
  • Key Concepts
  • Package Map
Getting Started
  • Prerequisites
  • Quickstart
  • Environment Configuration
  • First Run
Architecture
  • System Overview
  • Data Flow
  • Request Lifecycle
Configuration
  • Environment Variables
  • Authentication
  • Trino Connection
create-app
  • Usage
  • Template Structure
  • Post Creation
Contributing
  • Local Development
  • Contribution Guide
Guides
  • Custom Resolvers
  • Extending Schema
  • Deploying
  • Mutations
  • Load Strategies
GitHub
LakeQL
  1. LakeQL
  2. Architecture
  3. Data Flow

On this page

  1. The Generation Pipeline
  2. Stage 1: Trino Introspection
  3. Stage 2: Column Parsing
  4. Stage 3: Schema Generation
  5. Stage 4: File Generation
  6. Re-running the Pipeline

Data Flow

How LakeQL’s CLI pipeline transforms Trino metadata into generated TypeScript files.

The Generation Pipeline #

When you run lakeql-cli pull, a multi-stage pipeline transforms Trino metadata into ready-to-use TypeScript source files.

Stage 1: Trino Introspection #

@lakeql/trino-client connects to your Trino instance and executes metadata queries:

1
2
3
// The CLI issues this for each table
SHOW COLUMNS FROM hive.sales.orders

Trino returns rows like:

ColumnTypeExtraComment
idbigint
customer_idbigint
statusvarchar
metadatarow(source varchar, version integer)
tagsarray(varchar)

Stage 2: Column Parsing #

@lakeql/column-parser transforms raw type strings into structured objects. This handles Trino's complex type syntax including nested rows, arrays, and maps:

1
2
3
4
5
6
7
8
9
10
11
12
13
// Input: "array(row(id bigint, name varchar))"
// Output:
{
  kind: "array",
  element: {
    kind: "row",
    fields: [
      { name: "id", type: { kind: "scalar", base: "bigint" } },
      { name: "name", type: { kind: "scalar", base: "varchar" } }
    ]
  }
}

The parser handles all Trino types: varchar, bigint, integer, double, boolean, date, timestamp, array(T), map(K, V), and row(...).

Stage 3: Schema Generation #

@lakeql/schema-generator takes the parsed column definitions and produces:

  • JSON Schema — Describes the response structure for runtime transformation
  • GraphQL model definitions — Pothos-compatible type and field definitions
  • TypeScript interface fields — Mapped types for the interface file

This stage maps Trino types to TypeScript and GraphQL types:

Trino TypeTypeScriptGraphQL
bigintnumberInt or Float
varcharstringString
booleanbooleanBoolean
dateDateDate
timestampDateDateTime
array(T)T[][T]
row(...)nested interfacenested type

Stage 4: File Generation #

@lakeql/file-generator writes the final TypeScript source files to disk. Each file is formatted and ready to be imported by the API server.

After generation, the CLI also updates the config registry — an aggregated index file that imports all generated configs:

1
2
3
4
5
import { ordersConfig } from "./schemas/generated/hive/sales/orders/config"
import { customersConfig } from "./schemas/generated/hive/sales/customers/config"

export const allConfigs = [ordersConfig, customersConfig] as const

This registry is what the API server uses to discover and load all available table schemas.

Re-running the Pipeline #

The pipeline is idempotent. Running pull again overwrites existing generated files with fresh versions based on current Trino metadata. This makes it safe to re-run whenever table schemas change.

If you've made manual edits to generated files (e.g. adding custom fields to query-schema.ts), those changes will be lost on the next pull. Place custom resolvers in separate files instead.

Previous page

System Overview

Next page

Request Lifecycle

src/config-registry.ts