New Destination: HVR - AWS S3 with Apache Iceberg Table Format as destination
Request is for HVR to support AWS S3 with Apache Iceberg Table Format as destination.
Goal should be that Oracle databases and SAP HANA databases can be used as source and AWS S3 with Iceberg as destination, allowing change-data-capture from these sources into Iceberg tables.
The Iceberg tables should be possible to read from Snowflake as external Iceberg tables.
Acceptance criteria:
- AWS Glue supported as Iceberg Catalogue
- Oracle DBs and SAP HANA DBs
- Metadata for version 2 of the Apache Iceberg specification is generated
- Identity partition columns don't exceed 32 bytes.
- Parquet files don't use unsigned integer logical type
- Manifest files don’t contain duplicates
- Parquet metadata is UTF-8 compliant
- Ensure that the table’s metadata statistics (for example, RowCount or NullCount) match the data content
Additional considerations:
- Ideally a proposal is made on compaction of Iceberg tables, either leveraging AWS Glue built-in compaction or another compaction approach. Goal should be that the number of underlying parquet file keeps manageable and good read performance is achieved.
-
Hi Dominik,
Have you looked at the Fivetran Managed Service offering that supports Iceberg tables? https://fivetran.com/docs/destinations/s3-data-lake
Naturally this approach does not support all of what you are looking for. However, some of it is available including support for Oracle Database as a source and populating the AWS Glue catalog (I am unsure to what extent your Parquet format requirements are satisfied).
It would be good to understand how close the Fivetran solution is to what you are looking for, so that we can plan future product enhancements accordingly.
Thanks,
Mark. -
Dear @Mark, thanks for the reply! This feature request is related to Fivetran HVR and not Fivetran SaaS. While Fivetran Saas (your link) supports writing to Iceberg, this feature request is about enabling exactly the same capability for the HVR product as well.
Especially it should be possible for reading from HANA and Oracle DBs via the log-based CDC with HVR and write into an Iceberg table sitting in AWS S3. -
Iceberg has really become the standard interoperable table format now, so the lack of Iceberg support in HVR is becoming a noticeable gap.
Since Fivetran SaaS already supports Iceberg, having the same capability in HVR would make the product much more aligned with where most architectures are heading.
Please sign in to leave a comment.
Comments
3 comments