Adapters - Overview - Hive Table Manager

The Hive Table Manager is responsible for creating and maintaining external tables in Trino's Hive catalog. After the write pipeline uploads Parquet files to object storage, the table manager ensures a corresponding Hive table exists that points to that data — making it immediately queryable.

How it works #

Every mutation write goes through this flow:

Data is written as Parquet to S3/MinIO
The existing Hive table is dropped (if present)
A new external table is created pointing to the uploaded data

This "drop + create" approach ensures the table schema always matches the current data, even if fields were added or removed.

External table locations #

Hive external tables point to a directory in object storage, not a single file. The table manager uses the s3a:// URI scheme, which is required by Hive's underlying Hadoop FileSystem.

1
2
3
4
5
6
s3a://<bucket>/<basePath>/latest.parquet/
         │         │              │
         │         │              └─ Directory containing Parquet file(s)
         │         └─ Configured basePath for the endpoint
         └─ Configured bucket name

The s3a:// scheme is always used — regardless of whether the storage is AWS S3 or MinIO. The standard s3:// scheme is not recognized by Hadoop's FileSystem.

Table strategies #

Depending on the load strategy, different tables are managed:

Strategy	Tables created
`full_load`	`<tableName>` → `latest.parquet/`
`full_load_append`	`<tableName>_latest` → `latest.parquet/`, `<tableName>_all` → `all.parquet/`
`append`	`<tableName>` → `all.parquet/`

For full_load_append, both tables are managed atomically — if creating one fails, the manager attempts a best-effort rollback by dropping both.

Usage #

The table manager is typically used internally by executeWritePipeline. For direct usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34import { createHiveTableManager } from "@lakeql/adapters"
import { TrinoClient } from "@lakeql/trino-client"

const trinoClient = new TrinoClient({
  host: "https://trino.example.com",
  port: 8443,
  auth: { type: "basic", username: "admin", password: "secret" },
  catalog: "hive",
})

const hiveManager = createHiveTableManager({
  client: trinoClient,
  bucket: "my-datalake",
})

// Build a properly formatted location URI
const location = hiveManager.buildExternalLocation(
  "analytics/events/latest.parquet/"
)
// => "s3a://my-datalake/analytics/events/latest.parquet/"

// Create a table pointing to that location
await hiveManager.recreateTable({
  catalog: "hive",
  schema: "analytics",
  tableName: "events",
  externalLocation: location,
  columns: [
    { name: "event_id", type: "VARCHAR" },
    { name: "message", type: "VARCHAR" },
    { name: "created_at", type: "TIMESTAMP(3)" },
  ],
})

Generated DDL #

The above produces:

1
2
3
4
5
6
7
8
9
10
11
12
DROP TABLE IF EXISTS hive.analytics.events;

CREATE TABLE hive.analytics.events (
  event_id VARCHAR,
  message VARCHAR,
  created_at TIMESTAMP(3)
)
WITH (
  external_location = 's3a://my-datalake/analytics/events/latest.parquet/',
  format = 'PARQUET'
);

Configuration #

Property	Type
client	`TrinoClient`
The Trino client instance to use for DDL operations.
bucket	`string`
S3 bucket name for external table locations.

Table definition #

Property Type

catalog string

The catalog name.

schema string

The schema name.

tableName string

The table name.

externalLocation string

S3 location for the external table.

columns Object[]

SQL column definitions from JSON Schema.

Property	Type
└ name	`string`
└ type	`string`