Data Lakehouse #
A data lakehouse combines the flexibility of data lakes with the structured query capabilities of data warehouses. In a typical LakeQL setup:
- Storage — Data lives in object storage (S3, MinIO) in open formats like Parquet or ORC
- Metastore — Apache Hive Metastore tracks table metadata (schemas, partitions, locations)
- Query Engine — Trino provides SQL access across all catalogs and schemas
LakeQL sits on top of this stack, exposing lakehouse data through a GraphQL API without requiring you to build a separate application layer.
GraphQL-over-Trino #
LakeQL translates incoming GraphQL queries into Trino SQL. This translation uses Kysely as a type-safe query builder:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23// A GraphQL query like this:
query {
orders(filter: { status: { eq: "shipped" } }, paging: { limit: 10 }) {
nodes { id, status, total }
pageInfo { hasNext, currentPage }
}
}
// Becomes a Trino SQL query like this:
WITH total_count AS (
SELECT COUNT(*) AS total_records
FROM hive.sales.orders
WHERE status = 'shipped'
),
records AS (
SELECT id, status, total
FROM hive.sales.orders
WHERE status = 'shipped'
ORDER BY id ASC
FETCH FIRST 10 ROWS ONLY
)
SELECT * FROM total_count FULL JOIN records ON TRUE
The query builder handles field selection, WHERE clause generation, pagination (FETCH/OFFSET), and sorting — all derived from the GraphQL resolve info and input arguments.
Schema Introspection #
LakeQL discovers table structures by querying Trino metadata. When you run lakeql-cli pull, the CLI:
- Connects to your Trino instance via the REST API
-
Executes
SHOW COLUMNS FROM catalog.schema.tablefor each table -
Parses column type strings (including complex types like
array(row(...))) - Produces structured column definitions with names, types, and nullability
This means your GraphQL schema always reflects the actual state of your lakehouse tables.
Code Generation #
From introspected metadata, LakeQL generates four artifacts per table:
| File | Purpose |
|---|---|
config.ts | Table metadata — catalog, schema, table name, column mappings |
interface.ts | TypeScript interface matching the table's column types |
query-schema.ts | Pothos query schema with filtering, sorting, and pagination |
mutation-schema.ts | Pothos mutation schema with input types and resolver stub |
json-schema.json | JSON Schema used by the response transformer at runtime |
endpoint.json | Endpoint definition for re-generation via CLI or Endpoint Builder |
These files are committed to your repository and imported by the API server at startup. When your table schema changes, re-run pull to regenerate them.
1
2
3
4
5
6
7
8
9
10
11
# Generated file structure after pulling the "orders" table
src/schemas/generated/
└── sales/
└── orders/
├── config.ts
├── interface.ts
├── query-schema.ts
├── mutation-schema.ts
├── json-schema.json
└── endpoint.json