Other: Connector SDK Schema Retrieval Function
AnsweredI would like to be able to directly access the schema data exposed via the connector in UI to identify which ones have been selected or de-selected from the python code in the Connector SDK. If I understand the code currently, this isn't available without hitting the Fivetran API.
Our use case is that we would like to create a connector that pulls ALL available reports via API and expose that list via the schema list in the UI. For Connector SDKs for that generate multiple different reports that can be generically dynamically parsed at runtime, this allows us to create the connector once and use the Fivetran UI to identify and ADD any new reports as soon as the user gives our service account permission to read it. This would be a significant time saver since new reports would be a button click away instead of a Jira story to be added to our backlog.
-
Official comment
Hi Jake,
This is a really interesting feature request that we have been considering. There are a couple of aspects to it.
The ability in the UI to control what gets delivered to the warehouse
The ability for the connector.py code to find out about tables deselected in the UI in order to process data differently (eg skip deselected tables)
In the case you describe I'm not quite sure I'm following. I understand:
you have a source that have multiple reports available via an API
Some reports are not available until 'permission' is granted so you want to write the connector to always pull data for all the reports every sync so when new reports become available no code iteration is required.
Some of the available reports are not actually wanted in the destination warehouse and you want your end users to use the UI to indicate which reports to actually deliver.
If the above is correct, I would be interested to understand if API quotas and sync performance are significant factors here?
We are close to releasing a feature that will allow the Schema Tab Ui to be used to indicate some tables are not wanted in the destination. On deselecting a table we will stop delivering the data to the destination, however your code will continue to pull the data from the source and pass it to Fivetran - we will drop it there.
The question we have been debating is, "is this actually useful?" as the sync will be (likely significantly) longer due to the extra data being pulled but not delivered. We were thinking it could speed up getting the schema the end customer wants while decoupling the engineering to remove the unwanted data from the connector. I'd love to know if this functionality would be enough for your use case or if you want/need to change the connector.py behavior based on the schema selection information if it were available to the connector code?Looking forward to discussing further
Thanks,
Alison -
So, the solution suggested would help move the dial towards where we want to be, but the delay caused by pulling all files would be problematic and hitting api throttling would become a real issue as well.
To add more color to my use case, I have an application that contains reports. I have a role that is configured to consume and process any number of reports dynamically. From a permission standpoint I have access to almost all reports in the app via the api client I am using, but I only want to load a subset of those reports.
Additionally, the users are constantly adding new reports into the app, and reach out to use to integrate the new reports into Fivetran. This currently requires a code, change which seems like a unnecessarily wasteful time and effort wise, for what all other fivetran connectors expose as a button click in the UI.
Design wise, I'd embed a call to the schema method to list all accessible reports from an API endpoint in the system. The user could then select or de-select these as desired and configure if they want to auto add new tables and columns or not. When the "update" method is called, it would be nice to have either a self.get_selected_schema() in-built method, or an additional (optional) argument that gets passed to the method containing schema details. Then when I'm iterating through the reports, I would use this to smartly select which reports I do and don't want to sync into Snowflake. -
Thank you Jake for the additional description.
There is an interesting order of operations question introduced by the need to query the source to get all the possible tables and then query the user to get what they actually want - all ideally before the code executes.
I'm wondering if your use case might better be through of a configuration problem?
If so then there are a couple of setup form approaches you could take:- request the user enter a comma separated list of reports as a text entry field (available today)
- develop a setup form to make the above approach easier to use - visible parameters, help text, drop downs etcs (available in a month or two).
- upload a csv or other txt file that contains the reports to be sync'ed by this connector (being considered)
I'd love to understand your reaction to these ideas.
Best - Alison -
Regarding this point:
- "request the user enter a comma separated list of reports as a text entry field (available today)"
- We already are using the comma separated list in the input table. This get's difficult to manage because by default it's marking it as a secret, so we cannot easily go in and "edit" a row whose existing contents are unreadable.
Regarding this point:
- develop a setup form to make the above approach easier to use - visible parameters, help text, drop downs etcs (available in a month or two).
- I'm assuming this would be editable in the edit connector flow as well?
The major point we want to address is day 2 operations and adjustments. The CSV suggestion, is nice, but it doesn't get me away from having to push code instead of hitting a button in the UI.
Alternatively, another option would be to allow users to dynamically add new configuration key: value pairs into the connector in the UI once the app is deployed. If that was enabled, I could iterate through all configs starting with "report_" at runtime. This would allow me to dynamically add new reports without code changes. The down side of this approach is I have to go and ask for report ids from customers, instead of having a drop down list of available reports to sync like I would get in the schema approach initially suggested. -
Jake,
This is not a Connector SDK specific matter. You can retrieve this information via the rest API as you mentioned. See https://fivetran.com/docs/rest-api/api-reference/connection-schema/connection-schema-config. Feel free to call the rest API in your connector code.
Best
Please sign in to leave a comment.
Comments
5 comments