Community

Improve Managing Fivetran related metadata

Planned

Alex Thornbury User

June 24, 2026 20:17

Which connector?:

At a minimum this pertains to fivetran_metadata and Salesforce related connections

Additional details:

These two connections (and possibly others) create multiple metadata related tables with data that accumulates quickly. The management and data retention of these tables falls completely on the customer.

At least three of these tables collect a lot of data:

- fivetran_metadata.log

- <<salesforce>>.fivetran_api_call

- <<salesforce>>.fivetran_query

A common data retention practice is to keep X number days/months/years/etc of data and these tables have columns that can support that. However they do not have indexes to assist with this, thus making DELETE statements slow and expensive. Adding an index is possible, but this is an added hassle placed upon the customer to manage.

An even easier way to manage these tables would be by partitioning by date range. When the Destination is a PostgreSQL cluster this is made very easy with the pg_partman extention. Once managed by pg_partman, adding partitions to and dropping old ones from a table according to a company's data retention policies is easy.

The first two tables - fivetran_metadata.log and - <<salesforce>>.fivetran_api_call - can support this because their existing primary key constraint contains the column - time_stamp and start respectively - that would be used as the partitioning column in PostgreSQL partitioning. You can create the parent partition table with the subsequent child partitions, PostgreSQL's partitioning kicks in, and Fivetran continues to work no problem.

The third table <<salesforce>>.fivetran_query has a column that would support this and make management easier, but it is not part of the primary key constraint and thus partitioning cannot work on it. Further, it cannot be changed because a Fivetran sync will notice the discrepancy between the target DDL and the expected DDL, drop the new primary key constraint, and try to re-create the existing primary key constraint which fails.

At a minimum, please update the start column to NOT NULL and the primary key constraint for <<salesforce>>.fivetran_query from just "id" to "id, start". The id column is already unique, so adding start will not affect the uniqueness. I have not seen an example where start is NULL so changing it to NOT NULL should not cause any problems either.

Even better, please review these metadata features and put some effort into making easier to manage across the board. At the very least, maybe make them an optional feature that is disabled by default for net new connections?

Please sign in to leave a comment.

Comments

7 comments

Official comment

Sadie Martin Fivetranner
- July 06, 2026 19:36
Hi Alex,
Thanks again for the detail. I understand your main concern around data retention and ease of warehouse maintenance. Let me address your points directly:
- Opt-in vs. opt-out for metadata features: The Fivetran metadata connector is opt-out rather than opt-in to ensure this data is available when customers need it. This minimizes delays in troubleshooting or operational tasks that rely on historical metadata. Given that opting out is already supported for the fivetran_metadata and that partitioning is supported for the log table in particular, can you let me know if you're experiencing a specific limitation with this connector's current approach?
- Table partitioning and index management: For tables like fivetran_metadata.log and <<salesforce>>.fivetran_api_call, it is possible to partition these tables as you've demonstrated. While I understand this is a non-trivial process, Fivetran generally leaves warehouse tuning, such as indexing and partitioning, to customers to accommodate warehouse-specific and environment-specific requirements. We can certainly consider adding documentation for migrating these tables to partition tables in PostgreSQL. Would that be helpful to someone in your shoes?
- salesforce.fivetran_query schema request: I recognize the limitation with <<salesforce>>.fivetran_query and the inability to partition this table due to the current key constraint. We have added your request to make the start column non-null and update the primary key constraint to our backlog. We will update this ticket when we are able to prioritize this work and/or define a solution.
Sadie
Sadie Martin Fivetranner
- June 26, 2026 13:29
Hi Alex,
Sadie from the Product Management team here. This is a valuable suggestion regarding managing Fivetran-related metadata, especially around efficient data retention. We appreciate your clear feedback and suggestions.
The request has been added to our feature improvements backlog. To help us better prioritize and scope this, could you share more about your specific challenges? What's your current workaround and how often are you managing data retention related to these metadata tables?
We will keep this thread updated with any progress or changes regarding this request.
Thanks,
Sadie
Alex Thornbury User
- June 29, 2026 23:09
- Edited
Sadie-

We are trying to keep a rolling three months of data in these tables. I currently have 9 PostgreSQL destinations. All of these have one or more Connection from a Salesforce source and 8 of them have a fivetran_metadata source. So it is tedious task to setup initially.

So far, I successfully migrated "fivetran_metadata.log" and "<<salesforce>>.fivetran_api_call" to partitioned tables using pg_partman to manage them. It is not possible to convert an existing table to a partitioned table in PostgreSQL, so the migration process is:
This assumes that the pg_partman extension is already installed in a schema called partman

Rename the table and any associated objects (constraints, indexes, etc.)

ALTER TABLE fivetran_metadata.log RENAME TO log_old;
ALTER INDEX fivetran_metadata.log_pkey RENAME TO log_old_pkey;

Create a partitioned version of the table using the CREATE TABLE ... LIKE syntax, e.g.
```
CREATE TABLE fivetran_metadata.log (
LIKE fivetran_metadata.log_old INCLUDING DEFAULTS INCLUDING COMMENTS
) PARTITION BY RANGE (time_stamp);

ALTER TABLE fivetran_metadata.log ADD PRIMARY KEY (id, time_stamp);
```
Configure the table to be managed by pg_partman

setting the partitioning column

the partitioning type

the partitioning interval

include a default catch all partition

a starting date 3 months in the past to create partitions to backfill
```
SELECT partman.create_parent(
	p_parent_table => 'fivetran_metadata.log',
	p_control => 'time_stamp',
	p_type => 'range',
	p_interval => '1 month',
	p_premake => 1,
	p_default_table => true,
	p_start_partition => '2026-03-01'
);
```
Configure pg_partman with the desired data retention settings
```
UPDATE partman.part_config
SET retention = '3 months',
retention_keep_table = false
WHERE parent_table = 'fivetran_metadata.log';
```
Run pg_partman's partition maintenance function (and schedule it accordingly with whatever scheduling option you use)
```
SELECT partman.run_maintenance();
```
Once that is done, backfill the newly created partitioned table using the renamed table as the source. Since it is a lot of data and to minimize the impact, I prefer to insert in a batched fashion. This is a preference, not a requirement; this process can be achieved in many other ways as well

pg_partman's mainteance script, run_maintenance, will add new partitions, detach and drop old one according to the configuration settings per managed table. These actions are really DDL in nature, not DML, so they should be nearly instantaneous in comparison to expensive delete statements.
Alex Thornbury User
- June 29, 2026 23:23
<<salesforce>>.fivetran_query on the other hand will not currently support this option. The candidate column, start, is scoped as NULLABLE and is not part of the existing primary key constraint. It is not possible to alter the primary key constraint because the DDL discrepancy will be detected on the next sync job. When this happens, the job will drop the new PK and attempt to recreate the old PK. If you did attempt to manage this with pg_partman and created a partitioned table, PostgreSQL will not allow the old primary key to be created.

So for the moment, the only option is to use a DELETE statement BUT the other problem is that while the "start" column is a candidate to use in this action, there is NO index on the column. Thus the deletion is an expensive operation. Oddly enough, even though you cannot alter a primary key constraint, you can add an index and the sync job doesn't seem to care.

Initially getting this table under control is painful. Since it can be large if unmanaged, it is generally easier to backfill a new table with applicable data from the original table.
- Rename the table, constraints, and add an index on the renamed table:
```
ALTER TABLE sf_onepinnacol.fivetran_query RENAME TO fivetran_query_old;
ALTER TABLE sf_onepinnacol.fivetran_query_old RENAME CONSTRAINT fivetran_query_pkey TO fivetran_query_old_pkey;

CREATE INDEX ON sf_onepinnacol.fivetran_query_old ("start");
```
- Re-create the table and any associated grants
```
CREATE TABLE sf_onepinnacol.fivetran_query (
 id varchar(36) NOT NULL,
 "start" timestamptz NULL,
 done timestamptz NULL,
 source_object varchar(256) NULL,
 source_api varchar(256) NULL,
 modified_field varchar(256) NULL,
 modified_since_inclusive timestamptz NULL,
 query varchar(16384) NULL,
 merge_mode varchar(256) NULL,
 rows_updated_or_inserted int8 NULL,
 _fivetran_synced timestamptz NULL,
 CONSTRAINT fivetran_query_pkey PRIMARY KEY (id)
);

ALTER TABLE sf_onepinnacol.fivetran_query OWNER TO ...; --if necessary
GRANT SELECT ON TABLE sf_onepinnacol.fivetran_query TO ...;
...
```
- Backfill the new table with the old table. This might be doable with a simple INSERT INTO if the amount of data is small enough, else other methods like pg_dump, batched insertion, etc. are necessary.
```
-- Insert data from the start of the month three months ago to current
INSERT INTO sf_onepinnacol.fivetran_query
SELECT
 *
FROM
 sf_onepinnacol.fivetran_query_old
WHERE
 "start"  >= date_trunc('MONTH', now()) - INTERVAL '3 month';
```
- Create an index on the new table to assist in the future (it is generally easier to insert first, then index)
```
CREATE INDEX fivetran_query_start_idx ON sf_onepinnacol.fivetran_query USING btree (start);
```
- Drop the old table
```
DROP TABLE IF EXISTS sf_onepinnacol.fivetran_query_old;
```
Alex Thornbury User
- June 29, 2026 23:31
After I have all of the affected clusters in an initial good state, then the frequency of the maintenance will be monthly going forward. However, it will require two different mechanisms at present:
- use pg_partman's partition maintenance function to easily manage 2 of the 3 tables
- use a DELETE statement to manage the third table. this could be:
I would much rather have a single and simple mechanism moving forward, in other words pg_partman.

Even if I can drop the fivetran_metadata connection from our connections assuming there is no dependency upon them, it still leaves me with the two tables created for Salesforce Connections.
Alex Thornbury User
- June 29, 2026 23:45
The fivetran_metadata Connector is optional, but requires an opt-out as far as I know. In other words, you create a Connector and the fivetran_metadata Connector is created automatically, then you have to delete it afterwards. An opt-in feature would preferable in my opinion.

Ideally, I would think that the Salesforce additional metadata functionality would be an opt-in feature, not an enabled by default feature that cannot be disabled.

If that is not doable, then any logging or similar append-only tables should be constructed with a reasonable attempt at data management. Ideally when PostgreSQL is the destination, it is configured such that PostgreSQL partitioning can be used - pg_partman is not required for this, but it very easy and flexible to use. At the very least the table have the necessary indexes such that DELETE statements are easy.

In addition, if the management of these tables falls upon the customer then it should be well documented on how to do it. If this is documented, then it was not easily found in the documents.
Alex Thornbury User
- July 01, 2026 14:18
Managing the partitions with pg_partman's maintenance script drops the oldest partition and successfully creates the newest future partition based on the table settings in pg_partman's part_config table in 1 to 2 seconds. Being able to manage large volumes of data easily like this is a huge win. Much easier than expensive and slow DELETE statements.