Skip to main content

Community

Union single tenant databases with the same schema

Answered

Please sign in to leave a comment.

Comments

2 comments

  • Official comment
    Fraser User

    Hi Scott - this is a very common use case!  We have customers successfully scaling to 100,000's of tables sharing the same schemas across thousands of databases. Our recommended architecture is to replicate each individual database into its own schema, then use a VIEW to re-combine the tables at query time. This side-steps combining the sharded data in the ETL pipeline, avoiding reliability & data integrity errors of having a single pipeline working across many databases. The data analysts working with the data queries the VIEW as a single logical table even though it’s physically split on disk.

    This strategy leverages the fact that modern data warehouses natively handle computations across tables sharded over multiple nodes — it’s how data is stored under the hood. In practice, querying VIEWs that are UNION SELECT’s of the underlying sharded tables is highly performant. The data warehouse query planner is smart enough to avoid materializing the intermediary VIEW.

    We wrote a case study about how to automatically create the VIEW that ties the tables together: https://fivetran.com/case-studies/replicating-sharded-databases-a-case-study-of-salesloft-citus-data-and-fivetran

    Thanks Fraser.  It sounds like I'm doing it the right way then!