LakeQL
Overview
  • Introduction
  • Hive Table Manager
Write Pipeline
  • executeWritePipeline
  • Load Strategies
  • Partitioning
Storage
  • Storage Operations
  • API Reference
GitHub
LakeQL
  1. Adapters
  2. Overview
  3. Hive Table Manager

On this page

  1. How it works
  2. External table locations
  3. Table strategies
  4. Usage
  5. Generated DDL
  6. Configuration
  7. Table definition

Hive Table Manager

How the write pipeline manages Hive external tables in Trino.

The Hive Table Manager is responsible for creating and maintaining external tables in Trino's Hive catalog. After the write pipeline uploads Parquet files to object storage, the table manager ensures a corresponding Hive table exists that points to that data — making it immediately queryable.

How it works #

Every mutation write goes through this flow:

  1. Data is written as Parquet to S3/MinIO
  2. The existing Hive table is dropped (if present)
  3. A new external table is created pointing to the uploaded data

This "drop + create" approach ensures the table schema always matches the current data, even if fields were added or removed.

External table locations #

Hive external tables point to a directory in object storage, not a single file. The table manager uses the s3a:// URI scheme, which is required by Hive's underlying Hadoop FileSystem.

1
2
3
4
5
6
s3a://<bucket>/<basePath>/latest.parquet/
         │         │              │
         │         │              └─ Directory containing Parquet file(s)
         │         └─ Configured basePath for the endpoint
         └─ Configured bucket name
The s3a:// scheme is always used — regardless of whether the storage is AWS S3 or MinIO. The standard s3:// scheme is not recognized by Hadoop's FileSystem.

Table strategies #

Depending on the load strategy, different tables are managed:

StrategyTables created
full_load<tableName> → latest.parquet/
full_load_append<tableName>_latest → latest.parquet/, <tableName>_all → all.parquet/
append<tableName> → all.parquet/

For full_load_append, both tables are managed atomically — if creating one fails, the manager attempts a best-effort rollback by dropping both.

Usage #

The table manager is typically used internally by executeWritePipeline. For direct usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34import { createHiveTableManager } from "@lakeql/adapters"
import { TrinoClient } from "@lakeql/trino-client"

const trinoClient = new TrinoClient({
  host: "https://trino.example.com",
  port: 8443,
  auth: { type: "basic", username: "admin", password: "secret" },
  catalog: "hive",
})

const hiveManager = createHiveTableManager({
  client: trinoClient,
  bucket: "my-datalake",
})

// Build a properly formatted location URI
const location = hiveManager.buildExternalLocation(
  "analytics/events/latest.parquet/"
)
// => "s3a://my-datalake/analytics/events/latest.parquet/"

// Create a table pointing to that location
await hiveManager.recreateTable({
  catalog: "hive",
  schema: "analytics",
  tableName: "events",
  externalLocation: location,
  columns: [
    { name: "event_id", type: "VARCHAR" },
    { name: "message", type: "VARCHAR" },
    { name: "created_at", type: "TIMESTAMP(3)" },
  ],
})

Generated DDL #

The above produces:

1
2
3
4
5
6
7
8
9
10
11
12
DROP TABLE IF EXISTS hive.analytics.events;

CREATE TABLE hive.analytics.events (
  event_id VARCHAR,
  message VARCHAR,
  created_at TIMESTAMP(3)
)
WITH (
  external_location = 's3a://my-datalake/analytics/events/latest.parquet/',
  format = 'PARQUET'
);

Configuration #

PropertyType
clientTrinoClient

The Trino client instance to use for DDL operations.

bucketstring

S3 bucket name for external table locations.

Table definition #

PropertyType
catalogstring

The catalog name.

schemastring

The schema name.

tableNamestring

The table name.

externalLocationstring

S3 location for the external table.

columnsObject[]

SQL column definitions from JSON Schema.

PropertyType
└ namestring
└ typestring

Previous page

Introduction

Next page

Write Pipeline

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34