Master Data Management: Suspect Duplicate Processing in IBM MDM Server

The task of evaluating data, finding suspects in the data and collapsing them based on rules is an exhaustive process. If the suspects do not have a high possibility of a match then what action should be taken? How can automated merge be leveraged so the manual process of collapsing data can be minimized?

There are a lot of questions which the business wants the answer for before it can make an informed decision. Lets talk about the basic terminology which business should know when talking about Suspect Duplicate Processing(SDP) or Duplicate Suspect Processing(DSP).

What is SDP ?
IBM MDM can identify the duplicate parties in real-time, as part of adding or updating the party data or offline as part of Evergreening. Suspect Duplicate Processing (SDP) feature provides mechanism to identify these duplicate parties. Terminology Business users should know:
- Critical Data
- Match/Non-Match Score
- Match Category
- Match Matrix

What is Critical Data ?
The term ‘Critical Data’ refers to data elements that are selected by business to be used for comparision in SDP. If all Critical Data fields match between two records then they are considered exact match. For Example: Last Name, SSN, Address Line One

What is Match/Non-Match Score?
Each critical data element is given a score
For example:

Critical Data

Match Relevancy Score

Non-Match Relevancy Score

Last Name

1

1

SSN

2

2

Address Line One
8
8

What is Match Category ?
Match category is based on the Match/Non-Match Score.Out of the Box(OOTB) there are 4 Match Categories:
A1 - Match/Non-Match score indicate that a definite duplicate party has been found.
A2 - Match/Non-Match score indicate that high probability that a duplicate party has been found.
B - Match/Non-Match score indicate that it is fairly unlikely that a duplicate party has been found.
C - Match/Non-Match score indicate that the suspect party is not a duplicate.
These categories can be customized.

What is the Match Matrix?
Match Matrix brings together Match/Non-Match Scores and Match Categories.
- 0 means data element not present in either or both the new and existing records
- Negative value means data element is present in both the new and existing record and it does not match
- Positive Value means data element is present in both the new and existing record and it matches

Last Name	SSN	Address Line One	Match Score	Non-Match Score	Category
1	2	8	11	0	A1
-1	-1	-8	0	11	C
1	2	0	3	0	A2

The categories in the match matrix are decided by business. Hopefully this provided a basic overview of the Suspect Duplicate Processing concept in IBM MDM. Feel free to leave comments or ask questions.

Master Data Management

Pages

Wednesday, May 5, 2010

Suspect Duplicate Processing in IBM MDM Server

1 comment:

Critical Data	Match Relevancy Score	Non-Match Relevancy Score
Last Name	1	1
SSN	2	2
Address Line One	8	8