LakeQL - Contributing - Contribution Guide

Getting Started #

Fork and clone the repository
Follow the Local Development guide to set up your environment
Create a feature branch from main

Project Structure #

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
lakeql/
├── apps/
│   └── docs/              # Documentation site
├── packages/
│   ├── adapters/          # Storage adapters (Hive/S3, future: Iceberg, ClickHouse)
│   ├── api/               # GraphQL API (Pothos + Yoga)
│   ├── cli/               # CLI tool
│   ├── query-builder/     # SQL query builder (Kysely-based)
│   ├── trino-client/      # Trino REST API client
│   └── ...
├── tooling/
│   └── test-data/         # Test data generation + seed tooling
├── .minitrino/            # Local Trino cluster configuration
└── templates/             # App scaffolding templates

Development Workflow #

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Start local infrastructure
pnpm mt:start
pnpm seed --all

# Run the development server
pnpm dev:backend

# Run tests
pnpm test

# Type checking
pnpm typecheck

# Linting
pnpm lint

Code Style #

TypeScript strict mode is required
Follow existing patterns in the codebase

Before committing, run:

1
2
3
4
5
6
7
8
9
10
11
12
# Lint (check)
pnpm lint

# Lint (auto-fix)
pnpm lint:fix

# Format (check)
pnpm format

# Format (auto-fix)
pnpm format:fix

Changesets #

We use changesets to manage versioning and changelogs for published packages. When your change affects a published @lakeql/* package, create a changeset:

1
2
pnpm cs

This opens an interactive prompt to select affected packages and describe the change. The changeset file is committed alongside your code.

When a changeset is not needed:

Changes to tooling/ , docs, or dev infrastructure
Changes to private: true packages

When a changeset is needed:

Any change to a published @lakeql/* package — including bug fixes, features, and refactors

Commits and PRs #

Keep commits focused on a single change
Use clear commit messages describing the "what" and "why"
PRs should reference any related issues

Adding a New Dataset Template #

The seed system uses dataset templates to generate test data for local development. Each template defines a set of Trino columns and a generator function that produces Parquet files. If you need test data with a different structure — for example when working on a new feature that requires a specific schema — you can add a new template.

Create a new file in tooling/test-data/src/datasets/
Export columns (Trino column definitions) and generate (Parquet generator function)
Import and use in tooling/test-data/seed.config.ts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// tooling/test-data/src/datasets/my-dataset.ts
import type { ColumnDefinition } from "../seed/config"

export const myColumns: ColumnDefinition[] = [
  { name: "id", type: "BIGINT" },
  { name: "value", type: "VARCHAR" },
]

export async function myGenerate(
  amount: number,
  targetDir: string
): Promise<string> {
  // Generate parquet file, return path
}

Running Tests #

1
2
3
4
5
6
7
8
9
# All tests
pnpm test

# Specific package
pnpm -F @lakeql/trino-client test

# With coverage
pnpm test -- --coverage