Skip to main content

Community

Destination Improvement: COPY INTO for Databricks

Please sign in to leave a comment.

Comments

4 comments

    Upvote

  • Mark Van de Wiel User

    Ankit,

    Thank you for submitting this request.

    Integration into LDP's targets is uniform i.e. it works the same for all targets. The behavior in HVR 5 was different because of the use of the AgentPlugin for delivering into Databricks. With version 6.1 natively supporting Databricks the delivery is now identical to other platforms.

    As it relates to idempotent behavior: Databricks does not currently support multi-statement transactions. This means that our default desire to keep a transactionally consistent image on the target cannot be maintained. However, individual statements are still idempotent like the copy into command. I recommend you set Integrate CommitFrequency to STATEMENT to inform LDP this is the behavior. It should then recovery properly without introducing duplicates.

    Regarding order: the one and only reliable transaction order is the TimeKey column that we recommend you populate with {hvr_integ_seq}.

    I will consider your enhancement as a general improvement for all LDP targets and put it on the backlog.

    Thanks,
    Mark.

    Hi,
    Is there any update on the feature?

    The introduction of MERGE statement has an high cost impact that it is around 30-40%

    Thanks

     

  • Mark Van de Wiel User

    Hi Marco,

    Thank you for your suggestion.

    We are working on append optimizations generically and for other platforms. It looks like we may get to this in the 2nd half of calendar year 2024.

    Thanks,
    Mark.