![]() ![]() Over time, as you label records as either unique or duplicates, the system will “learn” from your choices and apply the same logic to subsequent records. You will then be able to compare the field values in the master record with fields containing different values in the suspected duplicate record, which will be highlighted in red. What separates machine learning from rule-based deduplication is that it recreates the human thought process. In other words, just like you would look at two records and label them as duplicates (or not) the same logic would be applied by machine learning. We train the machine learning algorithms through various string metrics that are able to calculate whether or not two field entries are the same. When the algorithms identify potentially duplicate records, it will present you with a side-by-side view of the master record and the suspected duplicate either inside DataGroomr or Salesforce. More about this will be discussed in the next section. There is no complicated setup process, no rules to manage and the machine learning takes a smarter approach to deduplication that is more scalable. ![]() This is why DataGroomr uses machine learning for duplicate detection since it allows the user to avoid all of these issues and simplifies usability as well. Imagine the number of comparisons that would need to be made for organizations with hundreds of thousands or even millions of records. A rule-based system would need to perform 5,000,000,000 comparisons (100,000 X 50,000) to identify possible duplicates. You decide to upload a spreadsheet with 50,000 additional records. For example, let’s say that you have a modest number of records such as 100,000. Then we need to think about the effort required to manage those rules to make sure they do not interfere with business processes. For example, some of the deduping rules can block web-to-lead submissions from coming in. Finally, the rule-based approach is not scalable. ![]() First of all, it takes time resources to create all of the rules and filters. Creating a new rule every time a duplicate is discovered is simply unsustainable and, ultimately, a futile effort when we think about all of the possible “fuzzy” duplicates. A lot of the apps on the AppExchange are rule-based which presents certain usability problems. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |