MDM Hub Software Architecture Document – Identification Management
The document provides a comprehensive overview of the software architecture components supporting the Identification Management in Client Hub. One of the main purposes of the Client Hub is to raise alert for suspect duplicate profiles that may exist within the system for a given party and provide with tools to manage the duplicate profiles (e.g. merge profile). Identification Management process is necessary to support these requirements.
This document provides an architectural overview of the ETL processing system and Hub Services using a number of different use cases and architectural views to depict different aspects of the Identification Management within the Client Hub. It is intended to capture and convey the significant architectural decisions which have been made on the system.
The document focuses on the ETL software components and Hub services that support the functionality to manage profiles of Client data records as part of Identification Management.
Definitions, Acronyms and Abbreviations
Please refer to the Glossary section in the Appendix, which contains a list of the main definitions, acronyms and abbreviations used in this document.
<Name and location of the document which is referred>
The document begins with a high-level architectural representation of the overall processes and subsequently provides a lower-level detail of each architecture component.
The high-level Identification Management architecture shows two main components- 1) Identification Management ETL processing architecture and 2) Hub Services architecture
The ETL architecture component includes the source systems that provide data for the Client Hub and the Client Hub which is the target data destination of the ETL processing. It also includes the External Data Provider which is used for data enrichment and in order to obtain links between client records based on matching results provided by the vendor system. The software component of the ETL architecture are software programs that are used for extracting, transforming and loading of data into the databases within the Client Hub environment.
The Hub Services architecture component comprises of the services which are required for Identification Management tasks.
ETL System Components
- Source Systems:
- Client Hub’s Databases:
- External Data Providers Iif any for data enrichment)
ETL Software Components
- Extract Program: An extract program residing on the source system that is responsible for extracting and providing data extract to External Vendor for Identification Link
- ETL: ETL processes that involve in the data extraction, transformation and loading of data into a database within the Client Hub
- Supporting ETL: Software programs that are auxiliary to the main ETL processes. They are responsible for sending client data to External vendor through FTP, receiving and processing enriched data from external vendor, logging execution status of ETL jobs, processing of exception handling rules and capturing of exception data.
Hub Services Components (Examples)
- GUID Management Services
- Key Generation Services:
- Cross Reference Services
- Business Enterprise Services:
Architectural Goals and Constraints
List the key high level architectural goals and constraints related to the ETL processes, Hub Services.
Examples of the use cases within the client hub
- Obtain External Link
- Compute GUID
- Match profile records
- Assign GUID
- Merge/Split Client Profile records
- Update Cross Reference
- Log exceptions
Business Scenarios (Examples)
When the source data from Enterprise Business Systems is loaded into the staging area then client hub profiles are created. Each profile created in the staging area will be assigned a unique profile id. The goal of the GUID computation process is to assign GUID to each profile. The same GUID assigned to multiple profiles signifies that the profiles belong to the same party based on the matching rules defined.
If minimum requirements for an External Identification Link assignment are met for the profile, the name and address information along with the profile key will be sent to Vendor. Vendor will return the External Link. If minimum requirements for GUID calculation are met, GUID will be computed. For some profiles the data may not be sufficient to obtain the External Identification Link and / or GUID. The GUID values will remain blank in these scenarios.
The following table illustrates the order of invocation of the use cases required for implement the scenario along with corresponding functionality provided by each use case.
Change Data Capture
When a critical piece of data changes on a profile, GUID may need to be recomputed. This GUID change may trigger GUID re-assignment on other profiles. Below is an example to illustrate the scenario
Initially all three records were assigned different GUIDs, which indicates that the profiles represent three different parties. The end user updated the second record by providing the SSN and DOB.
After the change the first record is linked to the second one through the SSN and address. The second and the third records are linked by the name, address, and date of birth. As a result all three records are linked by the same GUID. The chaining effect caused the linkage between record 1 and record 3 even though no data change occurred on these profiles and they represented different parties before the change in the second record.
Sequence diagram illustrating the sequence of use case interactions and system boundaries where a use case is residing.
The section describes functionalities of the ETL processing software component. Some of the examples are
Sample functionality is described below
Extract Transform and Create Package for External Vendor – - Batch Mode, Real-Time Mode
Send Data to External Vendor System- Batch Mode, Real-Time Mode
Receive Enriched Data from External Vendor System - Batch Mode, Real-Time Mode
Extract Transform and Load Enriched Data to Client Hub System -- Batch Mode, Real-Time Mode
Run Matching rules to identify set of matching profiles
Generate Keys for Landing, Staging and/or Target Area
Manage Keys for Landing, Staging and/or Target Area
Update Cross Reference Database
This section should provide a logical view of the services, system, and the database components
Logical View – ETL Services
Logical View - Hub Services
The process view of the ETL processing architecture reveals ETL processes that support the Identification Management requirements.
Example: Client Hub-to-External Vendor ETL
The ETL software component is primarily responsible for sending data files, control files to Vendor via the Enterprise Data Exchange Server. It is responsible for creating a package of client data elements by identifying the profile records which needs to be sent out.
The major processing steps for the component is given below
- Flow 1A: Extract Profile Data from Client Hub
- Flow 1B: Validate Profile Data
- Flow 1C: Transform Data into Vendor Format
- Flow 1D: Generate data files
- Flow 1E: FTP Data Files via Enterprise Data Exchange Server
This deployment view of the ETL and Hub services architecture for Identification Management provides a level of details as to how these architecture components reside in the source systems, the ETL environment as well as in the Client Hub environment.
Major Processing Steps and Design Specifications:
|Step||Description||Design & Program Spec|
FTP Client Hub data to Vendor System
- Program Logic Overview
- Program Specifications
- Exception Handlings
The following exception-handling rules defined for this program.
|#||Exception Conditions/Rules||Handling Rules||Exception Code*||Exception Information to be Logged|
Job Sequence and Operational View
The section provides a detail of each data architectural components that are relevant to the ETL Processing
Examples : Source file names; details of the relevant data models etc.
|Source System File Name (Logical)||Description||Source System Physical File Name (DSN)|
Size and Performance
Appendix A – References (sample)
Standards, Guidelines and Best Practices
MDM Software Requirements Specification (SRS)
Identification Management SRS
Client Hub Services
Client Hub Infrastructure
Source Extract Layouts
Deployment Information of Software Components
The section is for informational purposes only.
Key Trigger File Locations
|Key Name||Staging Field Name||Location|
Exception Classification and Exception Codes (example)
|Data||Data Audit Exception||DAT-01|