4 Tips for Preserving Data Integrity During Integration

Integrated systems use shared data to fuel analytics, improve big data and AI initiatives, and inform machine learning algorithms. But imagine making major decisions on marketing budgets, sales processes, and customer service programs using completely inaccurate data.

That would be crazy, right? Not that crazy...

A whopping 94% of businesses suspect their customer data is wrong. And, in a recent survey of data executives, 82% cited data quality concerns as a barrier to data integration projects. Solid data integrity is simply not there, and it’s costing businesses big time.

So, what can you do? Take steps to improve and maintain your data integrity for a smoother and more productive integration (and accurate analytics in the long term!). We’ll offer four key tips for improving data integrity in this post.

Characteristics of Data Integrity

You may think data quality and data integrity are the same, but they’re actually slightly different. Data quality represents the USABILITY of information to serve an intended purpose. Is it fit for usability in operations, decision making, or planning?

Data integrity is a subset of data quality that focuses more on the RELIABILITY of that information. Is it accurate? Is it valid? Here are the characteristics that embody data integrity:

Data Completeness: Data integrity relies on comprehensive sets of data that are not missing any of the key elements necessary to address your information requirements. For example, if you have a database of first and last names but no emails to pair those names with, that data is incomplete.
Data Validity: This data integrity characteristic questions whether data can be traced and connected to other data sources to be verified. If you can’t track where the data came from and validate its format and attributes, then the information lacks integrity.
Data Uniqueness: If your datasets are duplicated, they’re not unique. Data uniqueness cannot exist when a field, record, or entire data set has a duplicate counterpart. That duplicate will inhibit data integrity and throw off any analytics or calculations you try to produce with it.
Data Timeliness: If the data is not up-to-date, it’s not timely. Especially with the data protection standards in place across Europe and the US, this question of data timeliness is even more important. The value of the data decays over time and will become irrelevant to future decision making.
Data Accuracy: What makes data accurate? We must ask ourselves how well the data can be applied to a real-world context within our business. If you’ve ever heard someone reference a “single source of truth” that’s where data accuracy comes into play. You must have a source to reference from in order to verify that yes, this is accurate, based on the origins of this data and the objects associated to it in this location.
Data Consistency: Consistency can be compared within the same database and/or across databases. Is the same information present? Is it reflected correctly within the database or across the two compared databases? For example, if a record shows 40 active users on an account but the account only has 30 registered users, we can tell there’s an inconsistency within that database. Where are the other 10 registered users? An example when comparing databases might be if our Marketing Automation says our client’s annual revenue is $100 Million but our CRM cites the annual revenue at $250 Million. If we integrate the systems and we’re planning to segment accounts based on annual revenue, we may end up targeting the wrong users.

Threats to Data Integrity

Data integrity issues during integration aren’t the shortfall of any one thing. They usually stem from an assortment of issues including migration errors that force duplications or unintended formatting changes, security misconfigurations and software bugs, and human errors that either already existed in the database or were accidentally added in during the programming of an integration field. The key to combatting poor data integrity is to stay alert and address all the potential shortfalls.

4 Tips for Preserving Data Integrity During Integration

Data quality and its underlying data integrity must be preserved so your information will correctly inform your major business decisions. As you look to integrate core systems together, keep these data integrity tips in mind to help you.

#1 Clean the Data!

Guys, I literally cannot stress this enough. Clean your data before you start ANY integration project. Get rid of the duplicates, null values, extra spaces, extra numbers, etc. If you don’t have set rules for how data should be inputted into your systems, you must establish some. Apply whatever guidelines you set consistently across databases as you clean them. So, for example, if you decide that addresses should be written as 123 N. Sesame St. versus 123 North Sesame St., then you must ensure that is reflected across all addresses wherever they exist.

#2 Validate Data Received from External Sources

Ever buy a list for marketing or sales prospecting? If you brought in data from an external source, you must validate it. This is true both before integration, and really, just in general. No one wants to inadvertently spam a potential client or have corrupted or incorrect data ruining their analytics after an integration.

#3 Profile your Data

Data will be mixing and moving around when you integrate. Reviewing the source data, understanding its structure, and identifying its interrelationships are all part of a complete data integrity process. Data profiling helps you identify the data’s potential for your integration and any anomalies that may affect your data quality. If there are concerns with any aspects of the data or its relationships to other data, they can be addressed prior to testing and loading the data in the target system. It’s also a great way to discover and assess your metadata as it relates to data integrity.

#4 Use ETL Testing

Many integration solutions are ETL based. For example, StarfishETL is an ETL based iPaaS (integration Platform as a Service). ETL stands for Extract – Transform – Load. This process is meant to pull the data from the source system, transform it into a consistent data type, and then load it into the destination system. By testing the ETL you’re making sure the data transfer is being applied as you intended before you do the full data load for the integration. Proper ETL testing is done in ordered stages:

Clearly defined business requirements outline the scope of the project and the flow of information.
Data is cleaned and validated.
Scenarios are mapped and SQL scripts are ready for test cases. Make sure the mapping document contains all the information you intend!
The initial extraction from the source system occurs. During this stage of ETL testing you are looking for any bugs or defects that must be addressed.
The transformation logic is applied to ensure the data type matches the mapping document for each column and table.
Data is loaded in the initial test and checked to ensure the record counts match, and any invalid data was rejected.
A summary report is created so all stakeholders can see the outcome of the test and the details of what was resolved.

The average business uses 137 unique SaaS apps. Integrating some of those systems together just makes sense!

It helps teams collaborate, it gives analytics and AI a powerful boost, and it reveals new patterns that can be used to scale and grow revenue streams. However, none of this can be achieved if the data you’re using to make those decisions lacks integrity and quality.

Keep data integrity top of mind during your integration and establish a long term plan to maintain it in your organization to truly reap the benefits that integrated solutions can bring.

Posted in:

Data Integration