Nowadays, many organizations are use Business Intelligence (BI) to make strategic decisions with expectations of achieving competitive benefits for their business. BI software is usually created with the help of data warehouses and data marts which collect data from other applications, flat files, CRMs, websites, big data stores and data sources. BI is a procedure of collecting raw data which required for business and converting it into useful and meaningful information which can be further used for data analysis, mining, reporting, predicting and interpretive analysis.
A data warehouse is a collection of data from various heterogeneous sources which enables organizations to have consolidated data for separate analysis from transactions. It is a database mainly designed for query and analysis rather than transactions.
Role of ETL in the organizations
ETL (Extract Transformation and Load) is process which collects data from different data sources, converts it into preferred format of data based on business rules and loads it into a cohesive database known as data warehouse. ETL transforms data in a unified set of data from various dissimilar formats of data. As multiple data sources are involved, integrity and implementation of business intelligence is a complex task. Success of a project depends on the various aspects of the quality of data which makes testing a critical and extremely important task throughout the ETL process. This guarantees data transformation, data integrity, data completeness and data consistency of the migrated data.
ETL testing is different than conventional testing in terms of difficulty and scope. You need a different approach to ETL. The testing procedures primarily target identifying and mitigating data defects in the transformed data and overall errors occurred prior to processing of data for analysis and reporting.
ETL testing process
ETL testing process follows the steps below:
ETL Testing approaches
As data is moved into the production systems from multiple data sources, flat files, spreadsheets among others, it has to be in the correct order and arranged according to the customer requirement.
Verifying completeness of the data into production system
This step includes confirming whether data transformed from the source is expected data in the target/production database. Verification of all the data which is loaded into target database to validate compare counts, totals the actual data with that of processed or non-processed target data.
This mainly involves verifying the data types, data length, indexes and constraints into the transformed data.
Verification of transformed data for business rules validation with the help of writing multiple SQL queries and comparing the output with the target data.
This test includes syntax and reference checks to avoid any errors due to invalid formats of dates and unique ids from a business perspective. The syntax test includes invalid characters, uppercase or lower case letters. In addition, the data quality check consists of number check, date check, precision and null values verification.
Includes verification of inserts and upgrades getting processed correctly during the ETL process.
Challenges in ETL testing
ETL testing is dissimilar to conventional testing. There are multiple challenges testers face during ETL testing. Some of them are listed below:
The above challenges can be overcome by taking precautionary actions with the help of the following measures:
For an organization, data warehouse is an important asset which contains invaluable data used for real-time analysis, data mining, analysis of historical data, and corporate decision making. Thus, ETL testing plays a substantial role in validating completeness, integrity, reliability, accuracy and security of data. Defects found in later stages of application development are costlier to resolve than earlier stages of software applications. Therefore, ETL testing is crucial to ensure that the developed data warehouse is a quality product.
References
http://queforum.com/etl-testing/93052-testing-challenges-etl-testing-process.html
http://qainsights.com/challenges-in-data-warehouse-testing/
http://www.developer.com/db/big-data-practical-tips-for-etl-testing.html
http://www.pqatesting.com/our_ideas/blog/etl_and_data_warehouse_testing