Dedupe Records During Job Run

ATTENTION: This forum is no longer active. Please navigate to our new support site at https://support.starfishetl.com/
Viewing 1 posts (of 1 total)
Justin Kuehlthau
Administrator - Author
Post count: 24
#1

"Native Deduping" could mean a lot of things.  The simplest solution I can think of would be to mark a field or concatenation of fields to look for duplicate records.  Say Account Name or Contact First Name & “ “ & Contact Last Name.  Starfish then writes each Name to the local database and as it comes upon a duplicate, Starfish would mark a field in the record as “Possible Duplicate”.  We would also need to go back and mark the first record that we then matched on as a Possible Duplicate.  The matching algorithm could start as a simple exact match and then be programmed to do fuzzy logic.  Inc. vs Incorporated, etc.

The other option I thought of was to leverage the free, open source tool ElasticSearch for deduping.  We would feed the name field into ES to be indexed, have ES detect duplicates, and then update the detected duplicates with a Possible Duplicate flag.  There is a lot of information on ES online.  One page I found was : http://zmievski.org/2011/03/duplicates-detection-with-elasticsearch

Comments (1)

Justin Kuehlthau's picture Justin Kuehlthau
Administrator - Author
Post count: 24
#2

You can now do this with the new SQLite Xref functionality: http://wiki.starfishetl.com/index.php/Cross-reference_(Xref)#SQLite_Xref

I put a simple sample script on the wiki: http://wiki.starfishetl.com/index.php/Check_for_duplicates_using_xref

Log in to post comments
Viewing 1 posts (of 1 total)

Forum Login

Login or sign up for our forums to connect to the user community.

Reply

You must log in to post.

Not a Member? Register.