Connector Improvement: source side filtering for JIRA source
Problem statement~
"Null values essentially represent wasted MAR charges in Jira tables ISSUE_MULTISELECT_HISTORY, and ISSUE_FIELD_HISTORY. We are essentially paying for rows that contain incomplete or unusable data being replicated from these JIRA tables into Delta Lake target.
Investigation results
ISSUE_FIELD_HISTORY = 51 Million MAR with Null value entries. 31 Million MAR with valid values.
select count(*) from edl_raw.api_jira_default.issue_field_history where value is null -- 530471286 rows
select count(*) from edl_raw.api_jira_default.issue_multiselect_history where value is NOT null -- 4066336 rows
Expectations~
"It would be beneficial if there was an option in the Jira connector to block null value entries from loading into target delta tables ISSUE_MULTISELECT_HISTORY, and ISSUE_FIELD_HISTORY."
-
Official comment
Hi Prasad,
Can you help provide an example of null values in the history for a specific issue field? The purpose of the history table is to provide a timeline of the field history and that could include null values if the field was null for a certain period of time.
Thanks,
Frank
Please sign in to leave a comment.
Comments
1 comment