Analyze the Relationship Between Data and Integrations
Learning Objectives
After completing this unit, you’ll be able to:
- Establish the inter-dependencies between data and integrations.
- Describe the impact and influence of data capabilities on integrations and vice-versa.
Understand the Relationship Between Data and Integration Capabilities
Data and integrations might appear to be distinct concepts, but that’s not the case. They have a very symbiotic and mutually beneficial relationship. Good data leads to better integration quality. And an efficient integration leads to better data quality. If your data isn’t integrated properly, you won’t be able to get a holistic picture. Similarly, if an integration doesn’t process the data properly, it affects the quality of insights and analysis.
If this sounds complicated, you can use the matrix below to help you unravel the interdependencies between your data and your integrations. The table highlights some of the many considerations that are critical to bringing synergy between data and integrations. You can use it to better understand the interdependencies as you embark on data and integration projects at your organization.
| Integration Steps | ||
---|---|---|---|
Data Traits | Data Generation & Retrieval | Data Transformations | Data Consumption & Sharing |
Completeness | Is the extraction complete? Did you ensure the right filters are in place to extract just the right and complete data set? | Will the data retain its completeness after transformations are applied? Here are a couple of examples where transformation can impact data completeness: When data is summarized or rolled-up, it could lead to loss of drill-down ability at a later time if details aren’t retained explicitly. If the transformation of the date field is retaining only the month and year, the day portion is lost and can’t be retrieved in the future. | Is the complete data set being consumed in the target application? How are errors and exceptions handled that could lead to only partial data being consumed and not being complete in the target application? |
Accuracy | Does the data remain accurate as you are extracting the data? How do you ensure incorrect joins, data structure, depth and so on aren’t leading to data inaccuracies? | Is the data losing its accuracy as it’s being transformed as part of the integration flow? If the requirement is to change metrics (for example: km/h to miles/h) or change syntax (for example: dd-mm-yy to mm-dd-yy)—is it being communicated effectively? Otherwise, transformations can often lead to compromising the accuracy of data. | Does the data remain accurate during and after it is processed? Are there controls in place to ensure data stays accurate if an update is skipped or processed multiple times? If data gets corrupted along the integration layer, are there controls to discard the data? |
Integrity/Consistency | As you’re retrieving data from a (relational) source, have you ensured that associations, parent-child relationships or data hierarchy and dependencies are retrieved as well? | Data transformations made and propagated across applications can lead to data inconsistency and compromise the integrity of data. | As data is consumed, have you considered the order of processing to ensure data integrity is retained? How do you prevent orphan records and records with incorrect associations? |
Timeliness | Is data being retrieved on time? If a scheduled extraction is missed, what measures are in place for reading the data before it gets modified or overwritten at source? | Do data transformations affect the timeliness and availability of data in a negative way? Can complex transformation logic be simplified or delegated upstream/downstream so as to not negatively impact the processing time doing transformations? | Is the data processed before the next set of updates becomes available? Are there controls in place to handle data if an update is skipped? |
Uniqueness | If the source has duplicate data for valid reasons, does it all need to be extracted? What’s the logic to identify duplicates data in source? | Is there a possibility of introduction to duplicates due to transformation and mappings? Should there be an order of transformations being applied so that data doesn’t lose its uniqueness by accident? Have you evaluated the transformation logic to merge duplicates? | How are duplicates handled when data is being consumed? Have you evaluated techniques for handling uniqueness, such as discarding the incoming data or overwriting the existing data? |
Validity | Is the data valid and relevant for the use case? Have you evaluated the data set against the use case to ensure too little/too much data isn’t getting extracted? | Are there data transformations such as datatype conversion, rounding, data format modifications, and have they been tested against the data? Can the transformations process the data nuances that will be available in real-life scenarios? | As data is being consumed, is it usable and provides valid information as was intended? If the integration layer is only able to provide part of the data, what impact does it have on the validity of the broader data set and consumption? |
You should now have a better understanding of the importance of working on data and integrations in tandem. Good data integration practices and high-quality data help you achieve the end goal of deriving meaningful insights and providing enhanced experiences to customers and stakeholders. With this understanding, you can give proper considerations to data and integrations together and analyze the dependencies as you embark on your next project.