Skip to main content

Community

Connector Improvement: Google Search Console Connector Improvement

Not planned

Please sign in to leave a comment.

Comments

8 comments

  • Official comment

    Hi everyone,

    Luke from the Fivetran Product Team here. I want to provide some clarity on this request, because the underlying issue is a Google Search Console API design constraint rather than something we can resolve by adding a column.

    Why site_report_by_page doesn't include a page column

    The naming of this table is admittedly confusing. "By page" here refers to Google's aggregation method, not a per-page breakdown. The Google Search Console API supports two aggregation modes: "by property" and "by page." These modes change how impressions, clicks, CTR, and position are calculated. When aggregating "by page," each unique URL is counted separately in the metrics (for example, if two URLs from your site appear in the same search result, that counts as two impressions rather than one). When aggregating "by property," those would be deduplicated into a single impression for the site.

    So site_report_by_page gives you site-level totals that use the "by page" aggregation math. That's why the click totals in this table tend to match what you see in the GSC UI when you're viewing the Performance report at the page level. Sean's later comment in this thread correctly identified this distinction.

    Why page_report click totals don't match the GSC UI

    This is the core of Abhimanyu's original concern, and Luke (Roy) hit the same issue. The discrepancy is not a Fivetran bug. It's a documented behavior of the Google Search Console API.

    Google's own API documentation (https://developers.google.com/webmaster-tools/v1/how-tos/all-your-data) states that when you include page as a dimension in your API request, you get "greater detail... at the expense of losing some data." Google drops data when the page dimension is present in order to keep query computation within their resource limits. This means summing clicks across all pages in page_report will almost always produce a lower total than the site-level aggregated number you see in site_report_by_page or in the GSC UI.

    Independent testing has confirmed this empirically. When page is the only dimension (or combined with just date), the totals match the "by page" aggregation. But the moment you combine page with other dimensions like country, device, or query, Google's API returns significantly fewer impressions and clicks, sometimes more than 50% lower. This is the data loss Google describes in their documentation, and it is outside Fivetran's control.

    Why adding a page column to site_report_by_page wouldn't solve the problem

    If we were to add page as a dimension to the API call that populates site_report_by_page, two things would happen. First, the table would effectively become identical to page_report, since the API response is determined by the dimensions you request. Second, and more importantly, the accurate site-level totals that site_report_by_page currently provides would be lost, because adding the page dimension triggers the data-loss behavior described above. You'd end up with per-page rows, but the numbers would no longer match the GSC UI totals, which is the opposite of what this request is trying to achieve.

    Recommended approach

    The connector already provides tables designed for per-page analysis: page_report, keyword_page_report, and their hourly variants. These include the page column along with clicks, impressions, CTR, and position. The tradeoff is that their totals will be slightly lower than the site-level aggregated numbers due to the Google API constraint described above.

    If your use case requires both per-page breakdowns and accurate site-level totals, the best approach is to use both sets of tables: site_report_by_page for accurate aggregate metrics, and page_report for the per-page breakdown, with the understanding that the two will not sum to the same totals. This is the same tradeoff that exists in the Google Search Console UI itself when you toggle between the summary view and the Pages tab.

    For more detail on these aggregation differences, see our documentation on data discrepancies: https://fivetran.com/docs/connectors/applications/google-search-console/troubleshooting/kb-search-console-data-discrepancies.

    Hopefully this helps anyone else that comes across this post.

    Cheers,
    Luke

    Hi Abhimanyu Mahajan, Drew from the Product Team here!

    I'd be happy to work on getting additional fields added, could you confirm again which table you are referring to? Are you referring to the SITE_REPORT_BY_PAGE table?

    The table you referenced is not in our schema.

    Hi Drew

    Yes, I wanted the page column to be added in the "site_report_by_page" table, to get the right number of clicks per page.

    Thanks!

    We have the same problem, where the SITE_REPORT_BY_PAGE and KEYWORD_SITE_REPORT_BY_PAGE tables don't include page URLs so we can't tell which page each row is for, making those tables unusable for us in their current state.

    After contacting Fivetran support about this, it appears this issue might be specific to the newer "domain properties" added to Google Search Console in 2019 (versus the older "URL-prefix properties"), but I've confirmed the Google Search Console API for those reports can return page URLs for domain properties if `PAGE` is specified as one of the dimensions.

    So we'd also like a PAGE column to be added to the SITE_REPORT_BY_PAGE and KEYWORD_SITE_REPORT_BY_PAGE tables.

    Hi Drew, Do we have any updates on this product improvement ? 

    I've belatedly realized that the PAGE_REPORT and KEYWORD_PAGE_REPORT tables contain the page-specific data I was looking for in the SITE_REPORT_BY_PAGE tables, and that the SITE_REPORT_BY_PAGE tables appropriately aggregate at the site level with "by page" referring to using Google's particular "aggregate by page" method of calculating the metrics, so I've removed my upvote for this request.

    Hi Sean, were you able to get a workaround for this? Everything Abhimanyu is saying is exactly my current situation!

    Both of those tables are not reflecting what the Google Search Console is showing by clicks by page the number of clicks is always off between the two.