Aurora MySql Connector - Table Re-sync Control Improvements.
AnsweredWe understand there are couple of scenarios that Fivetran determines(as listed in your Mysql documentation) which would trigger a re-sync of a table. Our data squads in Aurora MySql constantly adds data and schema migrations constantly in Aurora MySql that can trigger a Fivetran re-sync.
However a re-sync of a table also causes the following issues.
Environment: Aurora MySql 8 using BinLog based replication.
1) Increases the load on Primary Aurora MySql source OLTP environment. Load on MySql is also increased when Fivetran performs via "Data Check" validation(a feature we see in MySql connector). We have no control over it to adjust it's schedule. We observed these are CPU/IO intensive queries.
2) Incremental sync of re-synced table is paused until re-sync(historical backfill) is complete.
3) Slows done or pauses incremental bin log replication of non impacted tables to Destination.
4) Overall it delays the ability for downstream users to query fresh data in downstream systems like Snowflake , MDLS. Problem gets worse when a re-sync of large tables is done and we have significant number of tables to get replicated.
We would like to have Fivetran to enable/give us the ability with controls/knobs so we can pause automatic re-sync of a table and customers can perform such maintenance on low traffic window outside normal business hours as required.
This feature seriously limits our ability to use Fivetran for our Business to produce data on time impacting overall data latency SLA to destinations.
Thanks
Ramki
-
Hi Ramki,
Could you please describe specific scenarios in which Fivetran forces a full re-sync?
Thanks,
Val Kulichenko, Fivetran Product Team -
Hi
As listed in Fivetran document for Aurora MySql and Postgres these are scenarios Fivetran would perform Automatic Table Re-syncs
https://fivetran.com/docs/connectors/databases#automatictableresyncforsqldatabases
Thanks
Ramki
-
Hi Ramki,
Sorry, let me rephrase this. We generally treat this scenarios on case-by-case basis, so which specific scenarios are causing re-syncs for you? Any particular schema changes, for example?
Thanks,
Val Kulichenko, Fivetran Product Team -
Hi
On Aurora MySql we plan to change the Primary Key(PK) column from Int to BigInt datatype to handle growth in data. We expect this change on PK can trigger a re-sync. We are replicating more than 790+ tables currently.
Our source squads perform Ruby on Rails Active Record based LHM data and schema migrations on Aurora MySql and Postgres constantly and they typically reorder columns on a table we are syncing to Snowflake via Logical Replication(BinLogs). This has caused fivetran to perform table re-syncs as well.
Overall any table re-sync for concerns listed originally we would like to have better control over it so we can pause and trigger them . It is preferred if we can have historical table re-resync support done on a secondary MySql instance while bin log can be on Primary Mysql(apps traffic is also hosted) to minimize CPU/IO contentions on MySql Primary and merged to same table in destination.
Thanks
Ramki
Please sign in to leave a comment.
Comments
4 comments