Configuration for the Hive Table Manager.
Properties
| Property | Type | Modifiers |
|---|---|---|
| client | TrinoClient | — |
The Trino client instance to use for DDL operations. | ||
| bucket | string | — |
S3 bucket name for external table locations. | ||
Definition of a Hive external table to create.
Properties
| Property | Type | Modifiers |
|---|---|---|
| catalog | string | — |
The catalog name. | ||
| schema | string | — |
The schema name. | ||
| tableName | string | — |
The table name. | ||
| externalLocation | string | — |
S3 location for the external table. | ||
| columns | Array<{ name: string; type: string; }> | — |
SQL column definitions from JSON Schema. | ||
Manages Hive external table DDL operations (DROP + CREATE) for the mutation write pipeline.
Properties
| Property | Type | Modifiers |
|---|---|---|
| recreateTable | (definition: HiveTableDefinition) => Promise<void> | — |
Drops and recreates a single Hive external table. Executes DROP TABLE IF EXISTS followed by CREATE TABLE. | ||
| recreateTablePair | (latestDefinition: HiveTableDefinition, allDefinition: HiveTableDefinition) => Promise<void> | — |
For full_load_append: manages both _latest and _all tables. Creates both tables, and attempts rollback (best-effort drop both) if either creation fails. | ||
| buildExternalLocation | (path: string) => string | — |
Builds a properly-formatted external location URI for Hive tables.
Uses the | ||
Creates a HiveTableManager that executes DDL via the Trino client.
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| config | HiveTableManagerConfig | — |
Returns
HiveTableManagerType
"s3" | "minio"Storage configuration for S3 or MinIO adapters.
Credentials are read internally from environment variables:
- S3: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION, AWS_ENDPOINT_URL
- MinIO: MINIO_ACCESS_KEY_ID, MINIO_SECRET_ACCESS_KEY, requires explicit endpoint
Properties
| Property | Type | Modifiers |
|---|---|---|
| type | StorageType | — |
Storage adapter type. | ||
| bucket | string | — |
Bucket name. | ||
| region? | string | — |
Region override. Falls back to AWS_DEFAULT_REGION env var for S3. | ||
| endpoint? | string | — |
Custom endpoint. Required for MinIO, optional for S3 (falls back to AWS_ENDPOINT_URL env var). | ||
Type
StorageConfigStorage operations interface for S3-compatible object stores.
Properties
| Property | Type | Modifiers |
|---|---|---|
| upload | (buffer: Uint8Array, targetPath: string) => Promise<void> | — |
Uploads a file buffer to the specified S3 path. | ||
| deletePrefix | (prefix: string) => Promise<void> | — |
Deletes all objects under the given prefix. | ||
Error class for storage operation failures, providing path and operation context.
Constructor
new StorageError(message, path, operation, options?)Parameters
| Parameter | Type | Default Value |
|---|---|---|
| message | string | — |
| path | string | — |
| operation | "upload" | "delete" | — |
| options? | ErrorOptions | undefined | — |
Properties
| Property | Type | Default Value | |
|---|---|---|---|
| path | string | — | |
Modifiers: public, readonly | |||
| operation | "upload" | "delete" | — | |
Modifiers: public, readonly | |||
Extends
ErrorCreates storage operations backed by files-sdk. Adapter selection is based on config.type:
- “s3”: reads credentials from AWS_* environment variables
- ”minio”: reads credentials from MINIO_* environment variables
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| config | StorageConfig | — |
Returns
StorageOperationsA table definition describing what to create in the storage backend.
Properties
| Property | Type | Modifiers |
|---|---|---|
| catalog | string | — |
The catalog name. | ||
| schema | string | — |
The schema name. | ||
| table | string | — |
The table name. | ||
| columns | Array<ColumnDefinition> | — |
Column definitions (name + backend-specific type). | ||
Base configuration shared by all adapters.
Properties
| Property | Type | Modifiers |
|---|---|---|
| type | string | — |
Unique identifier for this adapter type. | ||
The storage adapter interface.
Each adapter implements these methods to provide table management for a specific storage backend (e.g., Hive/S3, Iceberg, ClickHouse).
Type Parameters
| Parameter | Constraint | Default |
|---|---|---|
TConfig | AdapterConfig | AdapterConfig |
Properties
| Property | Type | Modifiers |
|---|---|---|
| type | TConfig["type"] | readonly |
The adapter type identifier. | ||
| createTable | (definition: TableDefinition) => Promise<void> | — |
Creates a table in the storage backend. Uses IF NOT EXISTS by default. | ||
| dropTable | (catalog: string, schema: string, table: string) => Promise<void> | — |
Drops a table from the storage backend. Uses IF EXISTS — does not throw if the table doesn’t exist. | ||
| replaceTable | (definition: TableDefinition) => Promise<void> | — |
Drops and recreates a table. Useful for replacing external tables with updated schemas. | ||
Type
"full_load" | "full_load_append" | "append"Configuration for the write pipeline.
Properties
| Property | Type | Modifiers |
|---|---|---|
| loadStrategy? | LoadStrategy | — |
The load strategy for this endpoint. | ||
| type? | StorageType | — |
Storage adapter type. | ||
| bucket | string | — |
Bucket name for storing Parquet files. | ||
| basePath | string | — |
The base path for storing Parquet files. | ||
| region? | string | — |
Optional region override. | ||
| endpoint? | string | — |
Optional custom endpoint for S3-compatible storage. | ||
| table | { catalog: string; schema: string; tableName: string; } | — |
Hive table definition for DDL management. | ||
| trinoClient | TrinoClient | — |
The Trino client instance for DDL operations. | ||
| partitioning? | PartitioningValue | — |
Partitioning mode. | ||
| partitioningFormat? | "year" | "year/month" | "year/month/day" | — |
Partition format granularity. | ||
Input for the write pipeline execution.
Properties
| Property | Type | Modifiers |
|---|---|---|
| records | Array<Record<string, any>> | — |
The records to persist. Accepts any array of objects with string keys. | ||
| jsonSchema | JsonSchema | — |
The JSON Schema describing the record structure. | ||
| config | WritePipelineConfig | — |
Pipeline configuration. | ||
Generates a Hive-style partition path based on date and format. Format options:
- “year”: year=YYYY/ <uuid> .parquet
- ”year/month”: year=YYYY/month=MM/ <uuid> .parquet
- ”year/month/day”: year=YYYY/month=MM/day=DD/ <uuid> .parquet
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| date | Date | — |
| format? | "year" | "year/month" | "year/month/day" | year/month/day |
Returns
stringGenerates a flat file path (no partitioning).
Returns
stringType
"disabled" | "timestamp" | "field" | "custom"The normalized partitioning configuration used internally by the pipeline.
Properties
| Property | Type | Modifiers |
|---|---|---|
| mode | PartitionMode | — |
| format | "year" | "year/month" | "year/month/day" | — |
| fieldName? | string | — |
| formatString? | string | — |
Normalizes the raw partitioning config into a resolved structure.
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| partitioning? | PartitioningValue | true |
| partitioningFormat? | "year" | "year/month" | "year/month/day" | year/month/day |
Returns
ResolvedPartitioningAdds load_timestamp, load_timestamp_year, and load_timestamp_month to JSON Schema for consistent Parquet + Hive DDL derivation. Returns a new schema object (does not mutate the input).
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| jsonSchema | JsonSchema | — |
Returns
JsonSchemaInjects load_timestamp, load_timestamp_year, and load_timestamp_month into each record. Returns new record array (does not mutate input).
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| records | Array<Record<string, unknown>> | — |
| timestamp | Date | — |
Returns
Array<Record<string, unknown>>Error thrown when a record’s partition field is missing, null, or invalid.
Constructor
new PartitionFieldError(fieldName, reason, recordIndex, value?)Parameters
| Parameter | Type | Default Value |
|---|---|---|
| fieldName | string | — |
| reason | "missing" | "null" | "invalid_date" | — |
| recordIndex | number | — |
| value? | unknown | — |
Properties
| Property | Type | Default Value | |
|---|---|---|---|
| fieldName | string | — | |
Modifiers: readonly | |||
| reason | "missing" | "null" | "invalid_date" | — | |
Modifiers: readonly | |||
| recordIndex | number | — | |
Modifiers: readonly | |||
| value? | unknown | — | |
Modifiers: readonly | |||
Extends
ErrorA parsed segment from a custom partition format string.
Properties
| Property | Type | Modifiers |
|---|---|---|
| fieldName | string | — |
The field name to extract from the record. | ||
| component? | "year" | "month" | "day" | "hour" | "minute" | "second" | — |
If set, extract this date component from the field’s ISO date value. | ||
Parses a custom partition format string into an array of segments.
Format: “segment/segment/…“ where each segment is either:
- A plain field name: “customer_id” → extracts raw value
- A field with date component: “event_date:year” → extracts year from ISO date
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| format | string | — |
Returns
Array<PartitionSegment>Generates a custom partition path from a record and parsed segments. For each segment:
-
Without component: formats as
fieldName=<value> -
With component: parses the field as ISO date, extracts the component, formats as
component=<value>Appends/<uuid>.parquetat the end.
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| record | Record<string, unknown> | — |
| segments | Array<PartitionSegment> | — |
| recordIndex? | number | 0 |
Returns
stringGroups records by their custom partition path. Records with the same partition key go to the same file (one UUID per unique partition key).
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| records | Array<Record<string, unknown>> | — |
| segments | Array<PartitionSegment> | — |
Returns
Map<string, Array<Record<string, unknown>>>Executes the write pipeline:
- Convert records to Parquet via
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| input | WritePipelineInput | — |
Returns
Promise<void>asyncConfiguration for the Trino + Hive + S3 adapter.
Extends
AdapterConfigProperties
| Property | Type | Modifiers |
|---|---|---|
| type | "trino-hive-s3" | — |
| client | TrinoClient | — |
The Trino client instance to use for DDL operations. | ||
| bucket | string | — |
S3 bucket name for external table locations. | ||
| prefix? | string | — |
Optional base prefix within the bucket (default: “”). | ||
| format? | "PARQUET" | "ORC" | "AVRO" | "JSON" | — |
Storage format (default: “PARQUET”). | ||
Adapter for Hive external tables stored on S3.
Generates CREATE TABLE statements with:
-
external_locationpointing tos3://<bucket>/<prefix>/<schema>/<table> -
formatset to the configured format (default: PARQUET)
Parameters
| Parameter | Type | Default Value |
|---|---|---|
| config? | Omit<TrinoHiveS3Config, "type"> | {"prefix":"","format":"PARQUET"} |
Returns
StorageAdapter<TrinoHiveS3Config>