Adapters - API Reference

Configuration for the Hive Table Manager.

Properties

Property	Type	Modifiers
client	`TrinoClient`	`—`
The Trino client instance to use for DDL operations.
bucket	`string`	`—`
S3 bucket name for external table locations.

Definition of a Hive external table to create.

Properties

Property	Type	Modifiers
catalog	`string`	`—`
The catalog name.
schema	`string`	`—`
The schema name.
tableName	`string`	`—`
The table name.
externalLocation	`string`	`—`
S3 location for the external table.
columns	`Array<{ name: string; type: string; }>`	`—`
SQL column definitions from JSON Schema.

Manages Hive external table DDL operations (DROP + CREATE) for the mutation write pipeline.

Properties

Property	Type	Modifiers
recreateTable	`(definition: HiveTableDefinition) => Promise<void>`	`—`
Drops and recreates a single Hive external table. Executes DROP TABLE IF EXISTS followed by CREATE TABLE.
recreateTablePair	`(latestDefinition: HiveTableDefinition, allDefinition: HiveTableDefinition) => Promise<void>`	`—`
For full_load_append: manages both _latest and _all tables. Creates both tables, and attempts rollback (best-effort drop both) if either creation fails.
buildExternalLocation	`(path: string) => string`	`—`
Builds a properly-formatted external location URI for Hive tables. Uses the `s3a://` scheme required by Hive’s Hadoop FileSystem.

Creates a HiveTableManager that executes DDL via the Trino client.

Parameters

Parameter	Type	Default Value
config	`HiveTableManagerConfig`	—

Returns

HiveTableManager

Type

"s3" | "minio"

Storage configuration for S3 or MinIO adapters.

Credentials are read internally from environment variables:

S3: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION, AWS_ENDPOINT_URL
MinIO: MINIO_ACCESS_KEY_ID, MINIO_SECRET_ACCESS_KEY, requires explicit endpoint

Properties

Property	Type	Modifiers
type	`StorageType`	`—`
Storage adapter type.
bucket	`string`	`—`
Bucket name.
region?	`string`	`—`
Region override. Falls back to AWS_DEFAULT_REGION env var for S3.
endpoint?	`string`	`—`
Custom endpoint. Required for MinIO, optional for S3 (falls back to AWS_ENDPOINT_URL env var).

Type

StorageConfig

Storage operations interface for S3-compatible object stores.

Properties

Property	Type	Modifiers
upload	`(buffer: Uint8Array, targetPath: string) => Promise<void>`	`—`
Uploads a file buffer to the specified S3 path.
deletePrefix	`(prefix: string) => Promise<void>`	`—`
Deletes all objects under the given prefix.

Error class for storage operation failures, providing path and operation context.

Constructor

new StorageError(message, path, operation, options?)

Parameters

Parameter	Type	Default Value
message	`string`	—
path	`string`	—
operation	`"upload" \| "delete"`	—
options?	`ErrorOptions \| undefined`	—

Properties

Property	Type	Default Value
path	`string`	—
Modifiers:`public, readonly`
operation	`"upload" \| "delete"`	—
Modifiers:`public, readonly`

Extends

Error

Creates storage operations backed by files-sdk. Adapter selection is based on config.type:

“s3”: reads credentials from AWS_* environment variables
”minio”: reads credentials from MINIO_* environment variables

Parameters

Parameter	Type	Default Value
config	`StorageConfig`	—

Returns

StorageOperations

A table definition describing what to create in the storage backend.

Properties

Property	Type	Modifiers
catalog	`string`	`—`
The catalog name.
schema	`string`	`—`
The schema name.
table	`string`	`—`
The table name.
columns	`Array<ColumnDefinition>`	`—`
Column definitions (name + backend-specific type).

Base configuration shared by all adapters.

Properties

Property	Type	Modifiers
type	`string`	`—`
Unique identifier for this adapter type.

The storage adapter interface.

Each adapter implements these methods to provide table management for a specific storage backend (e.g., Hive/S3, Iceberg, ClickHouse).

Type Parameters

Parameter	Constraint	Default
`TConfig`	`AdapterConfig`	`AdapterConfig`

Properties

Property	Type	Modifiers
type	`TConfig["type"]`	`readonly`
The adapter type identifier.
createTable	`(definition: TableDefinition) => Promise<void>`	`—`
Creates a table in the storage backend. Uses IF NOT EXISTS by default.
dropTable	`(catalog: string, schema: string, table: string) => Promise<void>`	`—`
Drops a table from the storage backend. Uses IF EXISTS — does not throw if the table doesn’t exist.
replaceTable	`(definition: TableDefinition) => Promise<void>`	`—`
Drops and recreates a table. Useful for replacing external tables with updated schemas.

Type

"full_load" | "full_load_append" | "append"

Configuration for the write pipeline.

Properties

Property	Type	Modifiers
loadStrategy?	`LoadStrategy`	`—`
The load strategy for this endpoint.
type?	`StorageType`	`—`
Storage adapter type.
bucket	`string`	`—`
Bucket name for storing Parquet files.
basePath	`string`	`—`
The base path for storing Parquet files.
region?	`string`	`—`
Optional region override.
endpoint?	`string`	`—`
Optional custom endpoint for S3-compatible storage.
table	`{ catalog: string; schema: string; tableName: string; }`	`—`
Hive table definition for DDL management.
trinoClient	`TrinoClient`	`—`
The Trino client instance for DDL operations.
partitioning?	`PartitioningValue`	`—`
Partitioning mode. `true` partitions by write timestamp, `false` disables, or a string for field-based/custom.
partitioningFormat?	`"year" \| "year/month" \| "year/month/day"`	`—`
Partition format granularity.

Input for the write pipeline execution.

Properties

Property	Type	Modifiers
records	`Array<Record<string, any>>`	`—`
The records to persist. Accepts any array of objects with string keys.
jsonSchema	`JsonSchema`	`—`
The JSON Schema describing the record structure.
config	`WritePipelineConfig`	`—`
Pipeline configuration.

Generates a Hive-style partition path based on date and format. Format options:

“year”: year=YYYY/ <uuid> .parquet
”year/month”: year=YYYY/month=MM/ <uuid> .parquet
”year/month/day”: year=YYYY/month=MM/day=DD/ <uuid> .parquet

Parameters

Parameter	Type	Default Value
date	`Date`	—
format?	`"year" \| "year/month" \| "year/month/day"`	`year/month/day`

Returns

string

Generates a flat file path (no partitioning).

Returns

string

Type

"disabled" | "timestamp" | "field" | "custom"

The normalized partitioning configuration used internally by the pipeline.

Properties

Property	Type	Modifiers
mode	`PartitionMode`	`—`
format	`"year" \| "year/month" \| "year/month/day"`	`—`
fieldName?	`string`	`—`
formatString?	`string`	`—`

Normalizes the raw partitioning config into a resolved structure.

Parameters

Parameter	Type	Default Value
partitioning?	`PartitioningValue`	`true`
partitioningFormat?	`"year" \| "year/month" \| "year/month/day"`	`year/month/day`

Returns

ResolvedPartitioning

Adds load_timestamp, load_timestamp_year, and load_timestamp_month to JSON Schema for consistent Parquet + Hive DDL derivation. Returns a new schema object (does not mutate the input).

Parameters

Parameter	Type	Default Value
jsonSchema	`JsonSchema`	—

Returns

JsonSchema

Injects load_timestamp, load_timestamp_year, and load_timestamp_month into each record. Returns new record array (does not mutate input).

Parameters

Parameter	Type	Default Value
records	`Array<Record<string, unknown>>`	—
timestamp	`Date`	—

Returns

Array<Record<string, unknown>>

Error thrown when a record’s partition field is missing, null, or invalid.

Constructor

new PartitionFieldError(fieldName, reason, recordIndex, value?)

Parameters

Parameter	Type	Default Value
fieldName	`string`	—
reason	`"missing" \| "null" \| "invalid_date"`	—
recordIndex	`number`	—
value?	`unknown`	—

Properties

Property	Type	Default Value
fieldName	`string`	—
Modifiers:`readonly`
reason	`"missing" \| "null" \| "invalid_date"`	—
Modifiers:`readonly`
recordIndex	`number`	—
Modifiers:`readonly`
value?	`unknown`	—
Modifiers:`readonly`

Extends

Error

A parsed segment from a custom partition format string.

Properties

Property	Type	Modifiers
fieldName	`string`	`—`
The field name to extract from the record.
component?	`"year" \| "month" \| "day" \| "hour" \| "minute" \| "second"`	`—`
If set, extract this date component from the field’s ISO date value.

Parses a custom partition format string into an array of segments.

Format: “segment/segment/…“ where each segment is either:

A plain field name: “customer_id” → extracts raw value
A field with date component: “event_date:year” → extracts year from ISO date

Parameters

Parameter	Type	Default Value
format	`string`	—

Returns

Array<PartitionSegment>

Generates a custom partition path from a record and parsed segments. For each segment:

Without component: formats as fieldName=<value>
With component: parses the field as ISO date, extracts the component, formats as component=<value> Appends /<uuid>.parquet at the end.

Parameters

Parameter	Type	Default Value
record	`Record<string, unknown>`	—
segments	`Array<PartitionSegment>`	—
recordIndex?	`number`	`0`

Returns

string

Groups records by their custom partition path. Records with the same partition key go to the same file (one UUID per unique partition key).

Parameters

Parameter	Type	Default Value
records	`Array<Record<string, unknown>>`	—
segments	`Array<PartitionSegment>`	—

Returns

Map<string, Array<Record<string, unknown>>>

Executes the write pipeline:

Convert records to Parquet via

Parameters

Parameter	Type	Default Value
input	`WritePipelineInput`	—

Returns

Promise<void>

Modifiers:async

Configuration for the Trino + Hive + S3 adapter.

Extends

AdapterConfig

Properties

Property	Type	Modifiers
type	`"trino-hive-s3"`	`—`
client	`TrinoClient`	`—`
The Trino client instance to use for DDL operations.
bucket	`string`	`—`
S3 bucket name for external table locations.
prefix?	`string`	`—`
Optional base prefix within the bucket (default: “”).
format?	`"PARQUET" \| "ORC" \| "AVRO" \| "JSON"`	`—`
Storage format (default: “PARQUET”).

Adapter for Hive external tables stored on S3.

Generates CREATE TABLE statements with:

external_location pointing to s3://<bucket>/<prefix>/<schema>/<table>
format set to the configured format (default: PARQUET)

Parameters

Parameter	Type	Default Value
config?	`Omit<TrinoHiveS3Config, "type">`	`{"prefix":"","format":"PARQUET"}`

Returns

StorageAdapter<TrinoHiveS3Config>

Configuration for the Hive Table Manager.

Properties

Property	Type	Modifiers
client	`TrinoClient`	`—`
The Trino client instance to use for DDL operations.
bucket	`string`	`—`
S3 bucket name for external table locations.

Definition of a Hive external table to create.

Properties

Property	Type	Modifiers
catalog	`string`	`—`
The catalog name.
schema	`string`	`—`
The schema name.
tableName	`string`	`—`
The table name.
externalLocation	`string`	`—`
S3 location for the external table.
columns	`Array<{ name: string; type: string; }>`	`—`
SQL column definitions from JSON Schema.

Manages Hive external table DDL operations (DROP + CREATE) for the mutation write pipeline.

Properties

Property	Type	Modifiers
recreateTable	`(definition: HiveTableDefinition) => Promise<void>`	`—`
Drops and recreates a single Hive external table. Executes DROP TABLE IF EXISTS followed by CREATE TABLE.
recreateTablePair	`(latestDefinition: HiveTableDefinition, allDefinition: HiveTableDefinition) => Promise<void>`	`—`
For full_load_append: manages both _latest and _all tables. Creates both tables, and attempts rollback (best-effort drop both) if either creation fails.
buildExternalLocation	`(path: string) => string`	`—`
Builds a properly-formatted external location URI for Hive tables. Uses the `s3a://` scheme required by Hive’s Hadoop FileSystem.

Creates a HiveTableManager that executes DDL via the Trino client.

Parameters

Parameter	Type	Default Value
config	`HiveTableManagerConfig`	—

Returns

HiveTableManager

Type

"s3" | "minio"

Storage configuration for S3 or MinIO adapters.

Credentials are read internally from environment variables:

S3: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION, AWS_ENDPOINT_URL
MinIO: MINIO_ACCESS_KEY_ID, MINIO_SECRET_ACCESS_KEY, requires explicit endpoint

Properties

Property	Type	Modifiers
type	`StorageType`	`—`
Storage adapter type.
bucket	`string`	`—`
Bucket name.
region?	`string`	`—`
Region override. Falls back to AWS_DEFAULT_REGION env var for S3.
endpoint?	`string`	`—`
Custom endpoint. Required for MinIO, optional for S3 (falls back to AWS_ENDPOINT_URL env var).

Type

StorageConfig

Storage operations interface for S3-compatible object stores.

Properties

Property	Type	Modifiers
upload	`(buffer: Uint8Array, targetPath: string) => Promise<void>`	`—`
Uploads a file buffer to the specified S3 path.
deletePrefix	`(prefix: string) => Promise<void>`	`—`
Deletes all objects under the given prefix.

Error class for storage operation failures, providing path and operation context.

Constructor

new StorageError(message, path, operation, options?)

Parameters

Parameter	Type	Default Value
message	`string`	—
path	`string`	—
operation	`"upload" \| "delete"`	—
options?	`ErrorOptions \| undefined`	—

Properties

Property	Type	Default Value
path	`string`	—
Modifiers:`public, readonly`
operation	`"upload" \| "delete"`	—
Modifiers:`public, readonly`

Extends

Error

Creates storage operations backed by files-sdk. Adapter selection is based on config.type:

“s3”: reads credentials from AWS_* environment variables
”minio”: reads credentials from MINIO_* environment variables

Parameters

Parameter	Type	Default Value
config	`StorageConfig`	—

Returns

StorageOperations

A table definition describing what to create in the storage backend.

Properties

Property	Type	Modifiers
catalog	`string`	`—`
The catalog name.
schema	`string`	`—`
The schema name.
table	`string`	`—`
The table name.
columns	`Array<ColumnDefinition>`	`—`
Column definitions (name + backend-specific type).

Base configuration shared by all adapters.

Properties

Property	Type	Modifiers
type	`string`	`—`
Unique identifier for this adapter type.

The storage adapter interface.

Each adapter implements these methods to provide table management for a specific storage backend (e.g., Hive/S3, Iceberg, ClickHouse).

Type Parameters

Parameter	Constraint	Default
`TConfig`	`AdapterConfig`	`AdapterConfig`

Properties

Property	Type	Modifiers
type	`TConfig["type"]`	`readonly`
The adapter type identifier.
createTable	`(definition: TableDefinition) => Promise<void>`	`—`
Creates a table in the storage backend. Uses IF NOT EXISTS by default.
dropTable	`(catalog: string, schema: string, table: string) => Promise<void>`	`—`
Drops a table from the storage backend. Uses IF EXISTS — does not throw if the table doesn’t exist.
replaceTable	`(definition: TableDefinition) => Promise<void>`	`—`
Drops and recreates a table. Useful for replacing external tables with updated schemas.

Type

"full_load" | "full_load_append" | "append"

Configuration for the write pipeline.

Properties

Property	Type	Modifiers
loadStrategy?	`LoadStrategy`	`—`
The load strategy for this endpoint.
type?	`StorageType`	`—`
Storage adapter type.
bucket	`string`	`—`
Bucket name for storing Parquet files.
basePath	`string`	`—`
The base path for storing Parquet files.
region?	`string`	`—`
Optional region override.
endpoint?	`string`	`—`
Optional custom endpoint for S3-compatible storage.
table	`{ catalog: string; schema: string; tableName: string; }`	`—`
Hive table definition for DDL management.
trinoClient	`TrinoClient`	`—`
The Trino client instance for DDL operations.
partitioning?	`PartitioningValue`	`—`
Partitioning mode. `true` partitions by write timestamp, `false` disables, or a string for field-based/custom.
partitioningFormat?	`"year" \| "year/month" \| "year/month/day"`	`—`
Partition format granularity.

Input for the write pipeline execution.

Properties

Property	Type	Modifiers
records	`Array<Record<string, any>>`	`—`
The records to persist. Accepts any array of objects with string keys.
jsonSchema	`JsonSchema`	`—`
The JSON Schema describing the record structure.
config	`WritePipelineConfig`	`—`
Pipeline configuration.

Generates a Hive-style partition path based on date and format. Format options:

“year”: year=YYYY/ <uuid> .parquet
”year/month”: year=YYYY/month=MM/ <uuid> .parquet
”year/month/day”: year=YYYY/month=MM/day=DD/ <uuid> .parquet

Parameters

Parameter	Type	Default Value
date	`Date`	—
format?	`"year" \| "year/month" \| "year/month/day"`	`year/month/day`

Returns

string

Generates a flat file path (no partitioning).

Returns

string

Type

"disabled" | "timestamp" | "field" | "custom"

The normalized partitioning configuration used internally by the pipeline.

Properties

Property	Type	Modifiers
mode	`PartitionMode`	`—`
format	`"year" \| "year/month" \| "year/month/day"`	`—`
fieldName?	`string`	`—`
formatString?	`string`	`—`

Normalizes the raw partitioning config into a resolved structure.

Parameters

Parameter	Type	Default Value
partitioning?	`PartitioningValue`	`true`
partitioningFormat?	`"year" \| "year/month" \| "year/month/day"`	`year/month/day`

Returns

ResolvedPartitioning

Adds load_timestamp, load_timestamp_year, and load_timestamp_month to JSON Schema for consistent Parquet + Hive DDL derivation. Returns a new schema object (does not mutate the input).

Parameters

Parameter	Type	Default Value
jsonSchema	`JsonSchema`	—

Returns

JsonSchema

Injects load_timestamp, load_timestamp_year, and load_timestamp_month into each record. Returns new record array (does not mutate input).

Parameters

Parameter	Type	Default Value
records	`Array<Record<string, unknown>>`	—
timestamp	`Date`	—

Returns

Array<Record<string, unknown>>

Error thrown when a record’s partition field is missing, null, or invalid.

Constructor

new PartitionFieldError(fieldName, reason, recordIndex, value?)

Parameters

Parameter	Type	Default Value
fieldName	`string`	—
reason	`"missing" \| "null" \| "invalid_date"`	—
recordIndex	`number`	—
value?	`unknown`	—

Properties

Property	Type	Default Value
fieldName	`string`	—
Modifiers:`readonly`
reason	`"missing" \| "null" \| "invalid_date"`	—
Modifiers:`readonly`
recordIndex	`number`	—
Modifiers:`readonly`
value?	`unknown`	—
Modifiers:`readonly`

Extends

Error

A parsed segment from a custom partition format string.

Properties

Property	Type	Modifiers
fieldName	`string`	`—`
The field name to extract from the record.
component?	`"year" \| "month" \| "day" \| "hour" \| "minute" \| "second"`	`—`
If set, extract this date component from the field’s ISO date value.

Parses a custom partition format string into an array of segments.

Format: “segment/segment/…“ where each segment is either:

A plain field name: “customer_id” → extracts raw value
A field with date component: “event_date:year” → extracts year from ISO date

Parameters

Parameter	Type	Default Value
format	`string`	—

Returns

Array<PartitionSegment>

Generates a custom partition path from a record and parsed segments. For each segment:

Without component: formats as fieldName=<value>
With component: parses the field as ISO date, extracts the component, formats as component=<value> Appends /<uuid>.parquet at the end.

Parameters

Parameter	Type	Default Value
record	`Record<string, unknown>`	—
segments	`Array<PartitionSegment>`	—
recordIndex?	`number`	`0`

Returns

string

Groups records by their custom partition path. Records with the same partition key go to the same file (one UUID per unique partition key).

Parameters

Parameter	Type	Default Value
records	`Array<Record<string, unknown>>`	—
segments	`Array<PartitionSegment>`	—

Returns

Map<string, Array<Record<string, unknown>>>

Executes the write pipeline:

Convert records to Parquet via

Parameters

Parameter	Type	Default Value
input	`WritePipelineInput`	—

Returns

Promise<void>

Modifiers:async

Configuration for the Trino + Hive + S3 adapter.

Extends

AdapterConfig

Properties

Property	Type	Modifiers
type	`"trino-hive-s3"`	`—`
client	`TrinoClient`	`—`
The Trino client instance to use for DDL operations.
bucket	`string`	`—`
S3 bucket name for external table locations.
prefix?	`string`	`—`
Optional base prefix within the bucket (default: “”).
format?	`"PARQUET" \| "ORC" \| "AVRO" \| "JSON"`	`—`
Storage format (default: “PARQUET”).

Adapter for Hive external tables stored on S3.

Generates CREATE TABLE statements with:

external_location pointing to s3://<bucket>/<prefix>/<schema>/<table>
format set to the configured format (default: PARQUET)

Parameters

Parameter	Type	Default Value
config?	`Omit<TrinoHiveS3Config, "type">`	`{"prefix":"","format":"PARQUET"}`

Returns

StorageAdapter<TrinoHiveS3Config>

API Reference

HiveTableManagerConfig

Properties

HiveTableDefinition

Properties

HiveTableManager

Properties

createHiveTableManager

Parameters

Returns

StorageType

Type

StorageConfig

Properties

S3Config

Type

StorageOperations

Properties

StorageError

Constructor

Parameters

Properties

Extends

createStorageOperations

Parameters

Returns

TableDefinition

Properties

AdapterConfig

Properties

StorageAdapter

Type Parameters

Properties

LoadStrategy

Type

WritePipelineConfig

Properties

WritePipelineInput

Properties

generatePartitionPath

Parameters

Returns

generateFlatPath

Returns

PartitionMode

Type

ResolvedPartitioning

Properties

resolvePartitioningConfig

Parameters

Returns

enrichJsonSchemaWithTimestamp

Parameters

Returns

injectLoadTimestamp

Parameters

Returns

PartitionFieldError

Constructor

Parameters

Properties

Extends

PartitionSegment

Properties

parsePartitioningFormat

Parameters

Returns

generateCustomPartitionPath

Parameters

Returns

groupRecordsByCustomPartition

Parameters

Returns

executeWritePipeline

Parameters

Returns

TrinoHiveS3Config

Extends

Properties

createTrinoHiveS3Adapter