Skip to main content

Community

Destination Improvement: Add Compaction for Managed Data Lake (Apache Iceberg)

Answered

Please sign in to leave a comment.

Comments

2 comments

  • Official comment

    Thanks for the feedback @Chase. 
    We are currently exploring how to enhance Fivetran's Managed Data Lake compaction strategy. Today we run compaction on the data included in the particular sync. For most workloads this results in sufficient compaction to reduce S3 I/0 costs and prevent Iceberg performance degradation. 

    I'd be happy to chat more about the specifics of your usecase. Would you mind emailing me at (casey dot karst at fivetran.com)?

    Thanks, 
    Casey

     

    This is pretty shocking to me, honestly. I expected compaction to be a core part of the managed experience, particularly for near real-time Apache Iceberg datasets.

    Compaction isn’t really optional in Iceberg - it’s a fundamental maintenance task. If Fivetran doesn’t provide this as part of the managed service, there at least needs to be a clear and supported way for users to perform compaction themselves.