Destination Improvement: S3 Data Lake - Date Partitions
PlannedThe S3 data lake feature uses Parquet format and normalised tables. We need a way to reduce the data scanned in queries when filtering by dates.
So if there was an option to partition data by Year, Month, Day then we could reduce the data scanned 1000x.
I would expect this to be preconfigured when selecting the Schema in a connector. It would define the partitions for default properties such as Dates.
-
Official comment
Hi David,
Thank you for your feedback! We are currently developing a partitioning strategy for the S3 Destination and plan to start implementation in Q4 of this year.
Stay tuned!
-
Hi,
Any update on the progress of this? Was it started in Q4 as planned?
Regards,
David
-
Hi David,
Thanks for following up! We've been working on ways to improve the performance of the data lake writer and as a result the partitioning work has been pushed down the roadmap. I will report back when I have a better sense of timing.Best,
Coral
Please sign in to leave a comment.
Comments
3 comments