Dedupe Records During Job Run

ATTENTION: This forum is no longer active. Please navigate to our new support site at
Viewing 1 posts (of 1 total)
Justin Kuehlthau
Administrator - Author
Post count: 24

"Native Deduping" could mean a lot of things.  The simplest solution I can think of would be to mark a field or concatenation of fields to look for duplicate records.  Say Account Name or Contact First Name & “ “ & Contact Last Name.  Starfish then writes each Name to the local database and as it comes upon a duplicate, Starfish would mark a field in the record as “Possible Duplicate”.  We would also need to go back and mark the first record that we then matched on as a Possible Duplicate.  The matching algorithm could start as a simple exact match and then be programmed to do fuzzy logic.  Inc. vs Incorporated, etc.

The other option I thought of was to leverage the free, open source tool ElasticSearch for deduping.  We would feed the name field into ES to be indexed, have ES detect duplicates, and then update the detected duplicates with a Possible Duplicate flag.  There is a lot of information on ES online.  One page I found was :

Comments (1)

Justin Kuehlthau's picture Justin Kuehlthau
Administrator - Author
Post count: 24

You can now do this with the new SQLite Xref functionality:

I put a simple sample script on the wiki:

Log in to post comments
Viewing 1 posts (of 1 total)

Forum Login

Login or sign up for our forums to connect to the user community.


You must log in to post.

Not a Member? Register.