Connector Improvement: Google Search Console Connector Improvement
Not plannedAdd a page column in the site_report_by_page table, to know the number of clicks per page as soon as possible.
Also, before you say that we can get that by page_report table. The number of clicks in the google search console doesn't match the sum of clicks per page I get from the "page_report" table. Whereas, the "search_report_by_page" is giving the right sum of clicks on a particular site on a particular timeframe or a day.
-
Official comment
Hi everyone,
Luke from the Fivetran Product Team here. I want to provide some clarity on this request, because the underlying issue is a Google Search Console API design constraint rather than something we can resolve by adding a column.Why
site_report_by_pagedoesn't include apagecolumnThe naming of this table is admittedly confusing. "By page" here refers to Google's aggregation method, not a per-page breakdown. The Google Search Console API supports two aggregation modes: "by property" and "by page." These modes change how impressions, clicks, CTR, and position are calculated. When aggregating "by page," each unique URL is counted separately in the metrics (for example, if two URLs from your site appear in the same search result, that counts as two impressions rather than one). When aggregating "by property," those would be deduplicated into a single impression for the site.
So
site_report_by_pagegives you site-level totals that use the "by page" aggregation math. That's why the click totals in this table tend to match what you see in the GSC UI when you're viewing the Performance report at the page level. Sean's later comment in this thread correctly identified this distinction.Why
page_reportclick totals don't match the GSC UIThis is the core of Abhimanyu's original concern, and Luke (Roy) hit the same issue. The discrepancy is not a Fivetran bug. It's a documented behavior of the Google Search Console API.
Google's own API documentation (https://developers.google.com/webmaster-tools/v1/how-tos/all-your-data) states that when you include
pageas a dimension in your API request, you get "greater detail... at the expense of losing some data." Google drops data when thepagedimension is present in order to keep query computation within their resource limits. This means summing clicks across all pages inpage_reportwill almost always produce a lower total than the site-level aggregated number you see insite_report_by_pageor in the GSC UI.Independent testing has confirmed this empirically. When
pageis the only dimension (or combined with justdate), the totals match the "by page" aggregation. But the moment you combinepagewith other dimensions likecountry,device, orquery, Google's API returns significantly fewer impressions and clicks, sometimes more than 50% lower. This is the data loss Google describes in their documentation, and it is outside Fivetran's control.Why adding a
pagecolumn tosite_report_by_pagewouldn't solve the problemIf we were to add
pageas a dimension to the API call that populatessite_report_by_page, two things would happen. First, the table would effectively become identical topage_report, since the API response is determined by the dimensions you request. Second, and more importantly, the accurate site-level totals thatsite_report_by_pagecurrently provides would be lost, because adding thepagedimension triggers the data-loss behavior described above. You'd end up with per-page rows, but the numbers would no longer match the GSC UI totals, which is the opposite of what this request is trying to achieve.Recommended approach
The connector already provides tables designed for per-page analysis:
page_report,keyword_page_report, and their hourly variants. These include thepagecolumn along with clicks, impressions, CTR, and position. The tradeoff is that their totals will be slightly lower than the site-level aggregated numbers due to the Google API constraint described above.If your use case requires both per-page breakdowns and accurate site-level totals, the best approach is to use both sets of tables:
site_report_by_pagefor accurate aggregate metrics, andpage_reportfor the per-page breakdown, with the understanding that the two will not sum to the same totals. This is the same tradeoff that exists in the Google Search Console UI itself when you toggle between the summary view and the Pages tab.For more detail on these aggregation differences, see our documentation on data discrepancies: https://fivetran.com/docs/connectors/applications/google-search-console/troubleshooting/kb-search-console-data-discrepancies.
Hopefully this helps anyone else that comes across this post.
Cheers,
Luke -
Hi Abhimanyu Mahajan, Drew from the Product Team here!
I'd be happy to work on getting additional fields added, could you confirm again which table you are referring to? Are you referring to the SITE_REPORT_BY_PAGE table?
The table you referenced is not in our schema.
-
Hi Drew
Yes, I wanted the page column to be added in the "site_report_by_page" table, to get the right number of clicks per page.
Thanks!
-
We have the same problem, where the SITE_REPORT_BY_PAGE and KEYWORD_SITE_REPORT_BY_PAGE tables don't include page URLs so we can't tell which page each row is for, making those tables unusable for us in their current state.
After contacting Fivetran support about this, it appears this issue might be specific to the newer "domain properties" added to Google Search Console in 2019 (versus the older "URL-prefix properties"), but I've confirmed the Google Search Console API for those reports can return page URLs for domain properties if `PAGE` is specified as one of the dimensions.
So we'd also like a PAGE column to be added to the SITE_REPORT_BY_PAGE and KEYWORD_SITE_REPORT_BY_PAGE tables.
-
-
I've belatedly realized that the PAGE_REPORT and KEYWORD_PAGE_REPORT tables contain the page-specific data I was looking for in the SITE_REPORT_BY_PAGE tables, and that the SITE_REPORT_BY_PAGE tables appropriately aggregate at the site level with "by page" referring to using Google's particular "aggregate by page" method of calculating the metrics, so I've removed my upvote for this request.
-
Hi Sean, were you able to get a workaround for this? Everything Abhimanyu is saying is exactly my current situation!
-
Both of those tables are not reflecting what the Google Search Console is showing by clicks by page the number of clicks is always off between the two.
Please sign in to leave a comment.
Comments
8 comments