Destination Improvement: Add Compaction for Managed Data Lake (Apache Iceberg)
AnsweredWe use Fivetran's minute sync frequency on quite a few connectors.
Fivetran does not perform compaction or allow downstream automatic AWS Glue compaction on Apache Iceberg databases, raising S3 costs for the amount of requests for many small files and worsening the performance when querying Apache Iceberg itself. This will only get more expensive and queries will get slower until compaction is added.
-
Official comment
Thanks for the feedback @Chase.
We are currently exploring how to enhance Fivetran's Managed Data Lake compaction strategy. Today we run compaction on the data included in the particular sync. For most workloads this results in sufficient compaction to reduce S3 I/0 costs and prevent Iceberg performance degradation.
I'd be happy to chat more about the specifics of your usecase. Would you mind emailing me at (casey dot karst at fivetran.com)?Thanks,
Casey -
This is pretty shocking to me, honestly. I expected compaction to be a core part of the managed experience, particularly for near real-time Apache Iceberg datasets.
Compaction isn’t really optional in Iceberg - it’s a fundamental maintenance task. If Fivetran doesn’t provide this as part of the managed service, there at least needs to be a clear and supported way for users to perform compaction themselves.
Please sign in to leave a comment.
Comments
2 comments