Partitioning on fivetran synced column while syncing the data to Iceberg from MySQL
Hi Team,
We use Fivetran to sync data from MySQL binlogs to Iceberg/S3.
Our request is to introduce partitioning on the fivetran_synced date column (for the relevant tables, based on user requirements). This would allow our Spark jobs to leverage partition pruning and read the data more efficiently, resulting in improved performance and optimized resource utilization.
As I understand it, the Fivetran team periodically runs compaction jobs to remove orphan files and consolidate multiple small files into larger ones for improved storage and query efficiency. Hence we can use this job as well to partition the data if required.
Please let us know if this can be supported or if there are any considerations we should be aware of.
Thanks,
Please sign in to leave a comment.
Comments
0 comments