Fandom

Scratchpad

ETL Processing SAD mwiki

216,082pages on
this wiki
Add New Page
Discuss this page0 Share

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.



Title page


Software Architecture Document

Introduction

The document provides a comprehensive overview of the software architecture components supporting the ETL processing within the Client Hub environment.

Extract, transform and load (ETL) is a major component within the Client Hub that is responsible for extracting data from source systems, transforming the data into required formats and loading the transformed data into the databases within the Client Hub environment.

Purpose

This document provides an architectural overview of the ETL processing system, using a number of different use cases and architectural views to depict different aspects of the ETL processing within the Client Hub. It is intended to capture and convey the significant architectural decisions which have been made on the system.

Scope

The document focuses on the ETL software components that process and transform data from the source systems and load the processed data into a database of different stages within the Client Hub.

<Examples of different database stages could be

  • Pre-landing Database
  • Landing Database
  • Staging Database
  • Target Hub Database>

Definitions, Acronyms and Abbreviations

Please refer to the Glossary section in the Appendix, which contains a list of the main definitions, acronyms and abbreviations used in this document.

References

The documents referenced in this document can be found in the Appendix A – References section. Please note that some of the referenced documents are still in draft version.

Overview

The document begins with a high-level architectural representation of the overall ETL process and subsequently provides a lower-level detail of each architecture component.

Architectural Representation

The high-level ETL processing architecture shows major system and software components that involve in the movement of source data into the Client Hub. The system components include the source systems that provide data for the Client Hub and the Client Hub which is the target data destination of the ETL processing. The software components, in contrast, are software programs that are used for extracting, transforming and loading of data into the databases within the Client Hub environment. The software components comprise of the source extract programs, ETL processes as well as other supporting software components that complement the ETL processes.

System Components

  • Source Systems:
  • Client Hub’s Databases:
  • External Data Providers Iif any for data enrichment)

Software Components

  • Extract Program: An extract program residing on the source system that is responsible for extracting and providing source data to the Client Hub
  • ETL: ETL processes that involve in the data extraction, transformation and loading of data into a database within the Client Hub
  • Supporting ETL: Software programs that are auxiliary to the main ETL processes. They are responsible for logging execution status of ETL jobs, processing of exception handling rules and capturing of exception data.

Architectural Goals and Constraints

List the key high level architectural goals and constraints related to the ETL processes, the databases that will be involved etc.

Use-Case View

Examples of the use cases within the client hub

  • Receiving data
  • Persisting data
  • Unique key generation
  • Log exceptions

Business Scenarios (Examples)

There are two primary business scenarios related to the MDM Repository ETL Processing:

Initial Load Scenario

This business scenario takes place when the data store within the Client Hub is to be initially loaded with data from the source systems. The scenario may also occur when the database within the Client Hub has to be reloaded if the data is wiped out or if the database has been re-instantiated.

Delta (Incremental) Load Scenario

The Delta Load scenario entails the ETL processing of delta data from the source systems that normally occurs at a regular interval (daily, weekly or monthly). The delta data from the source system is defined as new data records, data records that have been updated or data records that have been deleted in the source systems. Delta data as a result of a change made to the source systems since the previous occurrence of delta data extraction will be processed by the Delta Load ETL Processing scenario.

Sequence Diagram

Sequence diagram illustrating the sequence of use case interactions and system boundaries where a use case is residing.

Functionality

The section describes functionalities of the ETL processing software component. Some of the examples are

Sample functionality is described below

Receive Source Data

Support for Extracting Required Source Data Elements from Original Extract

Support for Ensuring that No Files are Overwritten Unintentionally

Support for Verifying Completeness of Received Extract Files

Persisting Source Data in the Pre-Landing Database Functionality

Support for persisting or storing the source data records in the Pre-Landing Database

Load Business Data Elements to Pre-Landing Database

Load Technology Data Elements to Pre-Landing Database

Maintain Original Source Data Content

Maintain Original Source Data Format

Support for Tracing Staging Data to its Original in Source Extract

Persisting Source Data in the Landing Database Functionality

Support for persisting or storing the source data records in the Landing Database

Load Business Data Elements to Landing Database

Load Technology Data Elements to Landing Database

Maintain Original Source Data Content

Maintain Original Source Data Format

Support for Tracing Staging Data to its Original in Source Extract

Persist Data in Staging Database Functionality

Transform Source Data to Staging Database’s Required Formats

Translation of Source Codes to Standardized Codes

Support for Surrogate Key Generation

Logical View

This section should provide a logical view of the system and the database components

System Components

Typically the various landing and staging areas, source systems etc.

Examples: Source Systems: The in-scope source systems.

  • Pre-Landing Area: The Pre-Landing Area will store source data as it comes in an extract file. The Pre-Landing Area will be used for storing data from a source system that may not provide extract files in the Client Hub’s defined formats.
  • Landing Area: The Landing Area will only store Client Hub’s required data elements from the source system.
  • Staging Area: The Staging Area is a database that is modeled after the hub Conceptual Data Model, which is client-focused and independent of the choice of MDM product that will be deployed in the Target Hub. At this stage, the same data domain (i.e., person, business, account, etc.) from a different source system will be persisted in the same table in the Staging database.
  • Target Hub: The Target Hub in logical view is the database of a chosen MDM product.
  • Cross Reference: The Cross Reference system component is a database component for storing cross references profile data residing in the source system to the profile data that is stored in the MDM repository. The key data element in the Cross Reference is the source system key of profile data in the source system and the unqiue ID generated by a Client Hub process, in the Client Hub.

Software Components

The software components typically include the major ETL processes: Examples are

  • Flow 1A: Source-to-Pre-Landing ETL

The Source-to-Pre-Landing ETL is a data movement process that reads source data files and loads the data into the Pre-Landing database. There should be no transformation on the raw data during this process. However, due to the record structure and the format of the data from certain system (e.g., mainframe) that may have sub records or repeating values and the source data types that may not be supported by the RDBMS in the distributed environment, the Source-to-Pre-Landing ETL may apply data transformation rules to the extent that the source data could be loaded into the Pre-Landing database.

  • Flow 1B: Pre-Landing-to-Landing ETL

The Pre-Landing-to-Landing ETL is a data movement process that performs column filtering rules to remove unnecessary data fields from the Pre-Landing database and load only required data fields into the Landing database. Note that in this ETL software component, another filtering rule based on a list of branch numbers may be applied to facilitate the testing allowing the test team to look at smaller, more manageable volume of data.

  • Flow 2: Landing-to-Staging ETL

The Landing-to-Staging ETL is a data movement process that performs extensive data transformation for populating the database in the Staging Area. It reads source data that is residing in the Landing Area’s database. The following classifications of data transformations occur in the Landing-to-Staging ETL:

  • Validations: Validate source data for invalid data content or formats.
  • Business Rules: Apply business rules on source data.
  • Transformation/Translations: Apply data transformation rules on source data or translate source codes, types or indicators into standardized values.
  • Enrichment: Extend, enhance or improve data with data quality rules.
  • Correlation: Apply a defined logic for correlating pieces of data to establish a relationship between data.
  • Flow 4: ETL to support interfaces to the external data providers

Process View

The process view of the ETL processing architecture reveals ETL processes that support the initial as well as the delta ETL processing requirements.

Example: Source-to-Landing ETL

The logical view of the Source-to-Landing ETL assumes that the source data files from source systems contain data content and formats that meet the requirements of the Client Hub. In the process view of the Flow-1 or the Source-to-Landing ETL, however, reveals the fact that the requirements may not be met by certain source systems. For example, the files may have different file formats or contain many more data fields than needed by the Client Hub.

In such case, the Source-to-Landing ETL can be broken down into two ETL processes:

  • Flow 1A: Source-to-Pre-Landing ETL Process
  • Flow 1B: Pre-Landing-to-Landing ETL Process

Major Processing Steps:

  • FTP source data files from the source system to a designated file system area on the ETL server.
  • Apply transformation rules to create load-ready files containing unfiltered data into Pre-Landing Area database. In this step, transformation rules are minimal and may include adding audit fields.
  • FTP the load-ready files from the ETL server to a designated file system area on the database server where the Pre-Landing Database resides.
  • Load data in the load-ready files into the Pre-Landing Database.

Deployment View

This deployment view of the ETL processing architecture provides a level of details as to how ETL architecture components reside in the source systems, the ETL environment as well as in the Client Hub environment.

ETL Processing SAD mwiki2 01 1024

1: Sample ETL Processing Deployment View

Implementation View

Overview (Example)

Major Processing Steps and Design Specifications:

Step Description Design & Program Spec
1A.1    
   

FTP Source Data Files to ETL Server

  • Program Logic Overview
  • Program Specifications
Program  
Name  
Overview  
Dependencies  
Parameters  
Program Steps  
Inputs  
Outputs  
Lookup/Hash  
Before Logic  
After Logic  
Related Documents  
  • Exception Handlings

The following exception-handling rules defined for this program.

#
Exception Conditions/Rules Handling Rules Exception Code* Exception Information to be Logged
       

Job Sequence and Operational View

Job Sequence Diagram of Source-to-Pre-Landing ETL Process (illustrative Example)

ETL Processing SAD mwiki2 02 1024

Data View

The section provides a detail of each data architectural components that are relevant to the ETL Processing

Examples : Source file names; details of the relevant data models etc.

Source System File Name (Logical) Description Source System Physical File Name (DSN)
TBCAMAAA Account TEST.SORT.KDA522P0.CAMSTAAA.DX060391
     
     
ETL Processing SAD mwiki2 03 1024

Size and Performance

Appendix A – References (sample)

Standards, Guidelines and Best Practices

MDM Software Requirements Specification (SRS)

Identification Management SRS

Client Hub Services

Client Hub Infrastructure

Data Model

Source Extract Layouts

Data Mappings

Deployment Information of Software Components

The section is for informational purposes only.

ETL Environment

Component Name Development QA Acceptance Production
Software Components        
         
Files        
         

Services Environment

Component Name Development QA Acceptance Production
Software Components        
Message Queue TBD      

Database Environment

Component Name Development QA Acceptance Production
Database Components        
Files        

Key Trigger File Locations

Key Name Staging Field Name Location
Profile ID pfl_oid $\PFL_OID\profilekey.trg

Note: $ is default home directory for trigger files specified by ETL

Exception Classification and Exception Codes (example)

Category Exception Name Code
Process Authorization Exception PRC-01
Data Data Audit Exception DAT-01


|}

[[category:uncategorized]]

Also on Fandom

Random wikia