Thursday, May 6, 2010

What does business need ?


An excerpt from Data Intelligence Gap 

So, exactly what is it that the business needs to know that the data can’t provide? Here are some examples:
What the Business Wants to Know
Data needed
What’s inhibiting peak efficiency
Can I lower my inventory costs and purchase prices? Can I get discounts on high volume items purchased?
Reliable inventory data.
Multiple ERP and SCMsystems. Duplicate part numbers. Duplicate inventory items. No standardization on parts descriptions and numbers. Global data existing in different code pages and languages.
Are my marketing programs effective? Am I giving customers and prospects every opportunity to love our company?
Customer attrition rates. Results of marketing programs.
Typos. Lack of standardization of name and address. MultipleCRM systems. Many countries and systems.
Are any customers or prospects “bad guys”? Are we complying with all international laws?
Reliable customer data for comparison to “watch” lists.
Lack of standards. Ability to match names that may have slight variations against watch lists. Missing values.
Am I driving the company in the right direction?
Reliable business metrics. Financial trends.
Extra effort and time needed to compile sales and finance data – time to cross-check results.
Is the company we’re buying worth it?
Fast comprehension of the reliability of the information provided by the seller.
Ability to quickly check the accuracy of the data, especially the customer lists, inventory level accuracy, financial metrics, and the existence of “bad guys” in the data.

Again, these are some of the many reasons where data lacks intelligence and can’t provide for the needs of the corporation. 

Wednesday, May 5, 2010

Suspect Duplicate Processing in IBM MDM Server

The task of evaluating data, finding suspects in the data and collapsing them based on rules is an exhaustive process. If the suspects do not have a high possibility of a match then what action should be taken? How can automated merge be leveraged so the manual process of collapsing data can be minimized?

There are a lot of questions which the business wants the answer for before it can make an informed decision. Lets talk about the basic terminology which business should know when talking about Suspect Duplicate Processing(SDP) or Duplicate Suspect Processing(DSP).

What is SDP ?
IBM MDM can identify the duplicate parties in real-time, as part of adding or updating the party data or offline as part of Evergreening. Suspect Duplicate Processing (SDP) feature provides mechanism to identify these duplicate parties. Terminology Business users should know:
- Critical Data
- Match/Non-Match Score
- Match Category
- Match Matrix

What is Critical Data ?
The term ‘Critical Data’ refers to data elements that are selected by business to be used for comparision in SDP. If all Critical Data fields match between two records then they are considered exact match. For Example: Last Name, SSN, Address Line One

What is Match/Non-Match Score?
Each critical data element is given a score
For example:

Critical Data
Match Relevancy Score
Non-Match Relevancy Score
Last Name
1
1
SSN
2
2
Address Line One
8
8


What is Match Category ?
Match category is based on the Match/Non-Match Score.Out of the Box(OOTB) there are 4 Match Categories:
A1 - Match/Non-Match score indicate that a definite duplicate party has been found.
A2 - Match/Non-Match score indicate that high probability that a duplicate party has been found.
B - Match/Non-Match score indicate that it is fairly unlikely that a duplicate party has been found.
C - Match/Non-Match score indicate that the suspect party is not a duplicate.
These categories can be customized.

What is the Match Matrix?
Match Matrix brings together Match/Non-Match Scores and Match Categories.
- 0 means data element not present in either or both the new and existing records
- Negative value means data element is present in both the new and existing record and it does not match
- Positive Value means data element is present in both the new and existing record and it matches

Last Name
SSN
Address Line One
Match Score
Non-Match Score
Category
1
2
8
11
0
A1
-1
-1
-8
0
11
C
1
2
0
3
0
A2


The categories in the match matrix are decided by business. Hopefully this provided a basic overview of the Suspect Duplicate Processing concept in IBM MDM. Feel free to leave comments or ask questions.