Community

Destination Improvement: COPY INTO for Databricks

marco blanca User

August 01, 2023 15:46
Edited

HVR5 was making use of the following steps:

Source -> temp ADLS -> Target Table

Second step was done without the use of Burst table but just using COPY INTO

HVR6 is doing:

Source -> temp ADLS -> burst table -> target table

This is leading to multiple problems:

1. Lower performance in an append only scenario Burst table is not required

2. COPY INTO is idempotent while burst is not (leading to duplicates that can impact repair feature)

3. Burst table is changing the order of the record since it handles change per type and based on time

Please sign in to leave a comment.

Comments

4 comments

Ankit kumar User
- August 02, 2023 07:45
Upvote
Mark Van de Wiel User
- August 15, 2023 21:58
Ankit,

Thank you for submitting this request.

Integration into LDP's targets is uniform i.e. it works the same for all targets. The behavior in HVR 5 was different because of the use of the AgentPlugin for delivering into Databricks. With version 6.1 natively supporting Databricks the delivery is now identical to other platforms.

As it relates to idempotent behavior: Databricks does not currently support multi-statement transactions. This means that our default desire to keep a transactionally consistent image on the target cannot be maintained. However, individual statements are still idempotent like the copy into command. I recommend you set Integrate CommitFrequency to STATEMENT to inform LDP this is the behavior. It should then recovery properly without introducing duplicates.

Regarding order: the one and only reliable transaction order is the TimeKey column that we recommend you populate with {hvr_integ_seq}.

I will consider your enhancement as a general improvement for all LDP targets and put it on the backlog.

Thanks,
Mark.
marco blanca User
- May 02, 2024 10:56
Hi,
Is there any update on the feature?

The introduction of MERGE statement has an high cost impact that it is around 30-40%

Thanks
Mark Van de Wiel User
- May 04, 2024 12:38
Hi Marco,

Thank you for your suggestion.

We are working on append optimizations generically and for other platforms. It looks like we may get to this in the 2nd half of calendar year 2024.

Thanks,
Mark.