Dealing with Multiple Sources of Information
Authors: Oded Weiss – BI solution expert
Overview
In certain cases, our database is connected to different data sources – e.g., cloud, local files, CRM systems, etc.
Usually, the number of data sources isn’t a problem for designing the database.
However, there are scenarios where the data we receive from different sources isn’t coherent and can lead to some issues regarding data integrity.
The problem
For example, suppose we are a company that receives information from two different banks about that resident in the same country. Each bank has its own dim table that stores data about the country the bank resident in e.g., country name, continent, language, currency, and on).
It is a possible scenario that one of the banks will decide to store its currency data as dollars, and the other bank will decide to store its courtesy data as NIS as seen in the next table:
Country | Currency | Language |
Israel | NIS | Hebrew |
Israel | Dollar | Hebrew |
When combining the data into one dimension table, we would face the problem of storing two different currencies for the same country. This creates some trouble performing analysis on these tables, especially as there are two currency values to compute instead of one.
In addition, the maintenance for such a table would be relatively high because for any given country, you have more than one row to update the data.
Suggestions for a solution
We want to make sure that the country id we store in our fact table is unique.
To achieve this, we need to implement a process that will prevent the storing of duplicated values for each country.
We can create a temporary table that will store all the duplicating values. the purpose of such a table will be to retain the data temporarily for the QA department to re-examine it and choose to keep or change the currency accordingly.
Additionally, we will choose one currency for each country to be displayed in the dimension table, and all transactions will be stored in that manner – transactions with different currencies will be kept in the latter form, and will also be stored in our temporary table for the QA re-examination.
In some cases, however, it’s more likely that we will get a null value from one of our sources of information. In such situations, we can create an unknown value for all the countries with null currencies, so we could still present the data for all the missing currency values.