Data is often transformed which might require complex SQL queries for comparing the data. ETL testing is very much dependent on the availability of test data with different test scenarios. Although there are slight variations in the type of tests that need to be executed for each project, below are the most common types of tests that need to be done for ETL Testing.
ETL Testing Categories. Data Type Check Verify that the table and column data type definitions are as per the data model design specifications. Data Length Check Verify that the length of database columns are as per the data model design specifications. Verify that the unique key and foreign key columns are indexed as per the requirement.
Verify that the table was named according to the table naming convention. Metadata Naming Standards Check Verify that the names of the database metadata such as tables, columns, indexes are as per the naming standards. Metadata Check Across Environments Compare table and column metadata across environments to ensure that changes have been migrated appropriately.
Track changes to Table metadata over a period of time. Compare table metadata across environments to ensure that metadata changes have been migrated properly to the test and production environments. Compare column data types between source and target environments. Validate Reference data between spreadsheet and database or across environments. Record Count Validation Compare count of records of the primary source table and target table. Example: A simple count of records comparison between the source and target tables.
Some of the common data profile comparisons that can be done between the source and target are: Compare unique values in a column between the source and target. Compare max, min, avg, max length, min length values for columns depending of the data type. Compare null values in a column between the source and target. For important columns, compare data distribution frequency in a column between the source and target.
Data Profile Test Case: Automatically computes profile of the source and target query results — count, count distinct, nulls, avg, max, min, maxlength and minlength. Component Test Case: Provides a visual test case builder that can be used to compare multiple sources and target. Query Compare Test Case: Simplifies the comparison of results from source and target queries. Duplicate Data Checks Look for duplicate rows with same unique key column or a unique combination of columns as per business requirement.
Count of records with null foreign key values in the child table. Count of invalid foreign key values in the child table that do not have a corresponding primary key in the parent table. Data Rules Test Plan: Define data rules and execute them on a periodic basis to check for data that violates them.
The steps to be followed are listed below: Review the source to target mapping design document to understand the transformation design. Compare the results of the transformed test data with the data in the target table.
Review the requirement and design for calculating the interest. Implement the logic using your favourite programming language. Compare your output with data in the target table. Transformation testing using Black Box approach Black-box testing is a method of software testing that examines the functionality of an application without peering into its internal structures or workings.
The steps to be followed are listed below: Review the requirements document to understand the transformation requirements.
Prepare test data in the source systems to reflect different transformation scenarios. Come with the transformed data values or the expected values for the test data from the previous step. Compare the results of the transformed test data in the target table with the expected values. Review the requirement for calculating the interest. Setup test data for various scenarios of daily account balance in the source system. Compare the transformed data in the target table with the expected values for the test data.
Visual Test Case Builder: Component test case has a visual test case builder that makes it easy to rebuild the transformation logic for testing purposes. This makes it easy for the tester to implement transformations and compare using a Script Component. Benchmark Capability: Makes it easy baseline the target table expected data and compare the latest data with the baselined data.
Regression testing by baselining target data Often testers need to regression test an existing ETL mapping with a number of transformations.
Here are the steps: Execute the ETL before the change and make a copy of the target table. Execute the modified ETL that needs to be regression tested. Compare data in the target table with the data in the baselined table to identify differences. Verify that data conforms to reference data standards Data model standards dictate that the values in certain columns should adhere to a values in a domain.
Example: Compare Country Codes between development, test and production environments. Track reference data changes Baseline reference data and compare it with the latest reference data so that the changes can be validated. Baseline reference data and compare with the latest copy to track changes to reference data.
Define data rules to verify that the data conform to the domain values. Duplicate Data Checks When a source record is updated, the incremental ETL should be able to lookup for the existing record in the target table and update it. Compare Data Values Verify that the changed data values in the source are reflecting correctly in the target data. Is a latest record tagged as the latest record by a flag?
Are the old records end dated appropriately? End-to-End Data Testing Integration testing of the ETL process and the related applications involves the following steps: Setup test data in the source system. Execute ETL process to load the test data into the target. View or process the data in the target system. Validate the data and application functionality that uses the data. I will use a simple example below to explain the ETL testing mechanism. The above equation can provide us the information for creating physical tests to validate the ETL processes.
We will use a simple example below to explain the ETL testing mechanism. A source table has an individual and corporate customer. The requirement is that an ETL process should take the corporate customers only and populate the data in a target table.
The test cases required to validate the ETL process by reconciling the source input and target Output data. The transformation rule also specifies that output should only have corporate customers. Physical Test 1: Count the corporate customers from the source. Count the customers from the target table. If the ETL transformation is correct the count should be an exact match. This is pretty good test however the count might misguide if the same record is loaded more than once as it cannot distinguish between each customer.
Physical Test 2: Compare each corporate customer from the source to the customer on target. This kind of reconciliation can be done at the row or attribute level. Hence, it will not only prove the validation of counts but will also prove that the customer is exactly the same on both sides. There are various permutations and combination of these types of rules including advancing complexity. Also, this example proves that the concepts for ETL testing are totally different from that of GUI based software testing.
As discussed earlier in the article — ETL testing Vs. They need specialized software. In the drawing below, we have a bunch of ETL processes that are reading, transforming and loading customer, orders and shipment data. We will take these examples and then create test cases and rules in iCEDQ to certify the processes. The examples below will also clear the thought process and the principles behind ETL testing. Even though the downstream ETL processes are not responsible for incoming upstream data from source system, still it is important to validate the source data because;.
Source target reconciliation will certify that the ETL1 has not dropped data or added extra data in the processes of copying the data from file to Stage. This rules teste if the ETL Process 1 has loaded the source data correctly to target. Usually such kind of tests need to connect across two different systems. In this case its between file server and database. Consider the alternative:. Since the products are same. A data validation rule can be setup to notify data that is outside the expected range and it can work.
However, if the product rating data is important for business and there is another source that can provide data. If another source is not available, then data reconciliation can be done with previous days data. This can capture sudden changes or if the previous days data is sent again by the source system. Validate if the data is following the business rule. These rules are independent of ETL processes.
The order table is populated by ETL2 process. And it can be very complicated. However, a simple rule that the business is aware can suggest a data or processing issue. The raw data is the records of the daily transaction of an organization such as interactions with customers, administration of finance, and management of employee and so on.
What is Data Warehouse? A data warehouse is a database that is designed for query and analysis rather than for transaction processing. The data warehouse is constructed by integrating the data from multiple heterogeneous sources. It enables the company or organization to consolidate data from several sources and separates analysis workload from transaction workload.
Data is turned into high quality information to meet all enterprise reporting requirements for all levels of users. What is ETL? ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse. Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database. Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems and spreadsheets.
Let see how it works For example, there is a retail store which has different departments like sales, marketing, logistics etc. Each of them is handling the customer information independently, and the way they store that data is quite different.
The solution is to use a Data warehouse to store information from different sources in a uniform structure using ETL. ETL can transform dissimilar data sets into an unified structure.
Later use BI tools to derive meaningful insights and reports from this data. Various types of keys are primary key, alternate key, foreign key, composite key, surrogate key. The data warehouse owns these keys and never allows any other entity to assign them. Cleaning does the omission in the data as well as identifying and fixing the errors.
In addition to these, this system creates meta-data that is used to diagnose source system problems and improves data quality. Identifying data sources and requirements 2. Data acquisition 3. Implement business logics and dimensional Modeling 4. Build and populate data 5. To support your business decision, the data in your production systems has to be in the correct order.
Informatica Data Validation 1 Option provides the ETL testing automation and management capabilities to ensure that production systems are not compromised by the data. Source to Target Such type of testing is carried out to validate whether the data values 2 Testing Validation transformed are the expected data values. Testing Application Upgrades Such type of ETL testing can be automatically generated, saving substantial test development time.
This type of testing checks whether 3 the data extracted from an older application or repository are exactly same as the data in a repository or new application. Data Completeness To verify that all the expected data is loaded in target from the source, Testing data completeness testing is done. Some of the tests that can be run are 5 compare and validate counts, aggregates and actual data between the source and target for columns with simple transformation or no transformation.
Data Accuracy Testing This testing is done to ensure that the data is accurately loaded and 6 transformed as expected. Data Transformation Testing data transformation is done as in many cases it cannot be Testing achieved by writing one source SQL query and comparing the output 7 with the target. Multiple SQL queries may need to be run for each row to verify the transformation rules.
In order to avoid any error due to date or order number during business process Data Quality testing is done. Syntax Tests: It will report dirty data, based on invalid characters, character pattern, incorrect upper or lower case order etc.
0コメント