Community

Destination Improvement: S3 Data Lake - Date Partitions

Planned

David Rooney User

July 31, 2023 11:19

The S3 data lake feature uses Parquet format and normalised tables. We need a way to reduce the data scanned in queries when filtering by dates.

So if there was an option to partition data by Year, Month, Day then we could reduce the data scanned 1000x.

I would expect this to be preconfigured when selecting the Schema in a connector. It would define the partitions for default properties such as Dates.

Please sign in to leave a comment.

Comments

3 comments

Official comment

Coral Trivedi User
- August 13, 2023 20:06
Hi David,

Thank you for your feedback! We are currently developing a partitioning strategy for the S3 Destination and plan to start implementation in Q4 of this year.

Stay tuned!
David Rooney User
- January 29, 2024 11:34
Hi,

Any update on the progress of this? Was it started in Q4 as planned?

Regards,

David
Coral Trivedi User
- February 26, 2024 16:48
Hi David,

Thanks for following up! We've been working on ways to improve the performance of the data lake writer and as a result the partitioning work has been pushed down the roadmap. I will report back when I have a better sense of timing.

Best,

Coral