Skip to main content

Community

Other: Connector SDK Schema Retrieval Function

Answered

Please sign in to leave a comment.

Comments

5 comments

  • Official comment

    Hi Jake,

    This is a really interesting feature request that we have been considering. There are a couple of aspects to it.

    1. The ability in the UI to control what gets delivered to the warehouse

    2. The ability for the connector.py code to find out about tables deselected in the UI in order to process data differently (eg skip deselected tables)

    In the case you describe I'm not quite sure I'm following. I understand:

    • you have a source that have multiple reports available via an API

    • Some reports are not available until 'permission' is granted so you want to write the connector to always pull data for all the reports every sync so when new reports become available no code iteration is required.

    • Some of the available reports are not actually wanted in the destination warehouse and you want your end users to use the UI to indicate which reports to actually deliver.

    If the above is correct, I would be interested to understand if API quotas and sync performance are significant factors here?

    We are close to releasing a feature that will allow the Schema Tab Ui to be used to indicate some tables are not wanted in the destination. On deselecting a table we will stop delivering the data to the destination, however your code will continue to pull the data from the source and pass it to Fivetran - we will drop it there.

    The question we have been debating is, "is this actually useful?" as the sync will be (likely significantly) longer due to the extra data being pulled but not delivered. We were thinking it could speed up getting the schema the end customer wants while decoupling the engineering to remove the unwanted data from the connector. I'd love to know if this functionality would be enough for your use case or if you want/need to change the connector.py behavior based on the schema selection information if it were available to the connector code?

    Looking forward to discussing further

    Thanks,
    Alison

    So, the solution suggested would help move the dial towards where we want to be, but the delay caused by pulling all files would be problematic and hitting api throttling would become a real issue as well.

    To add more color to my use case, I have an application that contains reports. I have a role that is configured to consume and process any number of reports dynamically. From a permission standpoint I have access to almost all reports in the app via the api client I am using, but I only want to load a subset of those reports. 

    Additionally, the users are constantly adding new reports into the app, and reach out to use to integrate the new reports into Fivetran. This currently requires a code, change which seems like a unnecessarily wasteful time and effort wise, for what all other fivetran connectors expose as a button click in the UI.

    Design wise, I'd embed a call to the schema method to list all accessible reports from an API endpoint in the system. The user could then select or de-select these as desired and configure if they want to auto add new tables and columns or not. When the "update" method is called, it would be nice to have either a self.get_selected_schema() in-built method, or an additional (optional) argument that gets passed to the method containing schema details. Then when I'm iterating through the reports, I would use this to smartly select which reports I do and don't want to sync into Snowflake.

    Thank you Jake for the additional description.

    There is an interesting order of operations question introduced by the need to query the source to get all the possible tables and then query the user to get what they actually want - all ideally before the code executes.

    I'm wondering if your use case might better be through of a configuration problem?

    If so then there are a couple of setup form approaches you could take:

    • request the user enter a comma separated list of reports as a text entry field (available today)
    • develop a setup form to make the above approach easier to use - visible parameters, help text, drop downs  etcs (available in a month or two).
    • upload a csv or other txt file that contains the reports to be sync'ed by this connector (being considered)

    I'd love to understand your reaction to these ideas.

    Best - Alison

    Regarding this point: 

    • "request the user enter a comma separated list of reports as a text entry field (available today)"
    • We already are using the comma separated list in the input table. This get's difficult to manage because by default it's marking it as a secret, so we cannot easily go in and "edit" a row whose existing contents are unreadable.

    Regarding this point:

    • develop a setup form to make the above approach easier to use - visible parameters, help text, drop downs  etcs (available in a month or two).
    • I'm assuming this would be editable in the edit connector flow as well?

    The major point we want to address is day 2 operations and adjustments. The CSV suggestion, is nice, but it doesn't get me away from having to push code instead of hitting a button in the UI.

    Alternatively, another option would be to allow users to dynamically add new configuration key: value pairs into the connector in the UI once the app is deployed. If that was enabled, I could iterate through all configs starting with "report_" at runtime. This would allow me to dynamically add new reports without code changes. The down side of this approach is I have to go and ask for report ids from customers, instead of having a drop down list of available reports to sync like I would get in the schema approach initially suggested.

    Jake, 

    This is not a Connector SDK specific matter. You can retrieve this information via the rest API as you mentioned. See https://fivetran.com/docs/rest-api/api-reference/connection-schema/connection-schema-config. Feel free to call the rest API in your connector code.

    Best