Return of GEDCOM Import

Started by Mike Stangel on Friday, February 22, 2019
Problem with this page?

Participants:

Profiles Mentioned:

Related Projects:

Showing 61-90 of 191 posts

I'm working very hard to merge all of the duplicates by the GEDCOM uploads. I've merged 20 thousand in the past two months. It's really altruistic and noble of me to do this work without pay but my life was saved from a car accident and it's the least i can do to pay it forward. Stop complaining about duplication and merge, goddamnit!

Private we're grateful for your hard work, but you do not need to take all of this on yourself. Are you seeing more than 3 generations of duplication from GEDCOM imports? If so, that doesn't sound like the way it should work. (The initial import will do 5 generations of ancestors plus their siblings... after that, it should import 3 generations of ancestors+siblings at a time, but only if there are no nearby tree matches). Is the problem that our tree matching algorithm isn't spotting the duplicates? Or it is, but somehow the GEDCOM importer is blowing past those and continuing to import more than 3 generations above a match?

Please, Jonathan, stop merging. The main part of these duplicates should instead be marked as duplicates and deleted.

Remi makes a good point... if there's a duplicate in a lower generation and then a few generations of duplicates above it, just cut loose the upper generations, mark them fictional so they can't be merged and then send me a link to one of the profiles. I'll delete the isolated dups.

Mike Stangel From what I’ve seen there might be several scenarios to explain Gedcom imports with dates before 1700 (I’m being arbitrary on that date).

1. Failure to detect matching Geni profiles

2. Match detected but import continues

3. Import begun from an “uptree” profile that is not the member (I don’t think I’ve seen this for a fact however)

I add my voice to the request to “not” merge duplicate trees. With analysis, “new to Geni” profiles can be merged at ONE match. The rest isolated. Remi is getting this done more efficiently than I am, so his advice best.

Great that it has been fixed, removed the opportunity to merge the two trees.

Jon Trondson Trondsøn Benkestokk
Added by: Kim René Jensen on January 31, 2016
Managed by: Kim René Jensen

Jon Trondson Trondsøn Benkestokk
Added by: Kelvin Olav Jonck on May 27, 2019
Managed by: Kelvin Olav Jonck

...is geni still stating that gedcom has no issues !!??

quote from above "I'm working very hard to merge all of the duplicates by the GEDCOM uploads. I've merged 20 thousand in the past two months"

Take for example the profile of Gerson Sagalowich which is an ancestor's brother who I added to Geni over 11 years ago. When I noticed it was duplicated due to a GEDCOM upload, instead of complaining and isolating the merge and marking tree for deletion and pissing off the clueless uploader, what did I do? I merged and merged and kept those ugly automatic "GEDCOM Source" biographies on their profiles and now I am sure the uploader is pleased. He is now a manager of all these profiles which I uploaded 11 years ago and I am fine with that because this is a collaborative web site.

I had one about 2 months ago and I only deleted about 5 or 6 profiles in that GEDCOM upload and merged the rest of it in.

I'm happy to have the owner of that GEDCOM upload as a manager of the remaining profiles in his upload. I also got him to stop the GEDCOM upload, but never even thought of having his upload removed.

You are describing different scenarios, and therefore different solutions, from what I’m encountering.

This is an example of where we do not want an upload to merge in Colonial America (before 1776).

Thomas Lawthrop/Lowthroppe

-Notice close to 100 managers and on Geni since 2007.
- (possibly) my 11th great Grandfather, certainly many others
- management granted by request (manager options, request management, will route to me as primary manager)

Instead -

1. Find the lowest point (most recent) of the matching tree
2. Merge there the one match
3. Disconnect the upper tree, mark Fictional
4. Send to Mike as he described

If unsure on Colonial America, post in the project discussion, “Need help merging Colonial Americans?”

https://www.geni.com/discussions/185015?msg=1237704

Bolesław III Wrymouth

GEDCOM Import:

Boleslaw III (Krzywousty, den snedmynte) Piast av Polen

Private I allways contact and explain to the gedcom uploader the reason why it is not necessary to import profiles already on Geni. And I also pinpoint where their gedcom import can be attached to the world tree by tracing their imported tree and comparing it with the world tree. In my experience it is a lot easier to do it this way than merging the duplicates of the gedcom import.

Usually when explained that the user get the same access to dead public profiles wether thei'r being an administrator or not, most of them will understand. So far I have only got positive feedback of this explanation.

So my advise to you, Jonathan, is to stop this merging. It will make your life easier, and it will make the demand of your fellow curators a lot less, since most of us won't merge thousands of duplicate profiles, but instead will seperate the duplicate tree, and mark it for deletion, after explaining it to the gedcom uploader.

I haven't tried this new gedcom import (my import was done many years ago, and is merged and fine), but how would I compare a gedcom file with the geni.com tree before uploading, to know what to upload and not to upload?

I've also noticed a lot of duplicates lately, but It's also a pity if uploaded trees is just discarded, They might still contain some additional info... And it is a collaborative tree...

Kenneth Ekman,

There are some tools that are able to compare GEDCOM files. So in theory it would be possible to do a GEDCOM download from Geni and compare that with another GEDCOM to find the duplicates. You will then have to remove the duplicates (can still be tricky if the tool does not do that for you) and upload a GEDCOM with only profiles that are not in Geni already.

Another way would be using a local program that can find matches with
MyHeritage and Geni and import a GEDCOM in such a program. By excluding profiles with a Geni match you could export a new GEDCOM from it that has no Geni matches and import that into Geni.

Still another method (probably the easiest way) would be to create a free MyHeritage account and upload a GEDCOM there. That will find the matches between the Geni and Myheritage tree. If there are many matches the easiest way to get missing data in Geni would be using SmartCopy (https://www.geni.com/projects/SmartCopy/18783) to get additional data in Geni.
Only if there are whole MyHeritage branches that are not on Geni, it may be easier to export those missing branches from MyHeritage to separate GEDCOM files and import those in Geni. Even then SmartCopy can be used to get any missing data from MyHeritage to Geni for profiles that are already on Geni.

A bit more on the last method you can find in https://www.geni.com/projects/Verbinden-met-de-wereldstamboom/17983

Private - You said "I've merged 20 thousand in the past two months" -- were all of these Public Profiles?
If not, did you check if the Private profiles had active managers, and if so, wait for them to respond -- or did you instead just focus on seeing that they matched and merging as fast as you could?

I'll undo all of them over the next few months.

I'll start by undoing all the David Sealtiel profiles. Is everyone cool with that? It may take 7 months. I'll post my progress periodically of the reduplication and zombification.

Private,

Don't do that.

Private I am sure you have a lot of time, but wait a minute before you do something that cost a lot of time.

Merging one or two profiles doesn't take a lot of time.
Mostly, you don't know how many to expect... :-)

Waiting for another manager to do the merge might never give any results.... Maybe they're no longer on the site...

Merging is good, don't try to put down those who do good work....

The most important thing GENI can do right now is implement a D-M Soundex into their search engine. Every different spelling of the same name needs to appear in searches.

They did once use regular Soundex, and it worked really bad causing a lot of bad merge suggestions, since it is targeting English names only, and I would not expect "Jewish Soundex" and "Eastern European Soundex" to work better since the target is even more narrow.

Geni could use information from merges that where done (and were not reversed) to find information about what kind of names in which period and in what region may indicate possible merges. That kind of information could also be used to enhance the search.

Jonathan, you don't need to undo the merges you have done. That will only result in several months of unnecesarry work.

Thanks for the history Bjorn. What I would say is don't use the Soundex for merge suggestions, but only when a researchers is using the search box to try to find someone (and give the researcher the option of using the Soundex or not). There is a real problem with the way the search is working now. For example, If I put in the surname Johnson, I don't get Jonson or Johnsen. That means I have to run an individual search on every spelling variation I can think of. This is not good.

Yes, - Geni did even support wildcards i searches in the early stages. I miss it a bit, but as I said that also caused a lot of problems with people not so clever seeing that even if you got identical names it is not necessary the same person. Hopefullt Geni will adopt some if the search engines from MyHeritage.

A lot of people use the Google or MyHeritage search engines for the Geni database. It would be wise if Geni would use the search engine of MyHeritage. It would prevent double import of profiles. Or at the moment a new profile is added by GEDCOM or other way that a search is done if that profile is already in the Geni database and if that is done the manager would get a message.

I found many (100+) samples of – most likely ancestry/gedcom uploads – in the Swedish tree far beyond 1800. We have by now fixed several of them.

Sample 1: Anna Catharina Catarina Svinhufvud i Västergötland (Svinhufvud af Västergötland), + AD02 40G fdna

She was born 1736 – but this profile is fixed now (see revisions). It was uploaded by Johan Fält but it seems that he also now has started to change his uploads which got a strange numbering code. We are still working on fixing these uploads which is a tremendous lot of work. He uploaded doubles of many known nobilies in Sweden and changed their names and added suffixes with a numbering code. See revisions.

Johan Fält

Anna Catharina Svinhufvud i Västergötland

Sample 2: Niklas Lovefall added several (I don’know how many) profiles with birth- and surnames like Iii:3 – these are not only suffixes, these are “real” names.

Niklas Lovefall

https://www.geni.com/family-tree/index/6000000077316164026

Erik Nilsson

Eric Nilsson was born 1728.

Sample 3: Molly Mills has added a lot of koded samples.

https://www.geni.com/family-tree/index/6000000090605054651

https://www.geni.com/family-tree/index/6000000007511657023?
highlight_id=6000000009383779293&resolve=6000000009383779293#6000000009383779293

Name: Nils Midle name: Andersson
Surname: Ilii:3
Birth Surname: Iii:3
Display name: Nils Iii:3

Margareta Warg

This is Margareta Warg with the code 5 and a conflict. Margareta Warg was born 1738 and many profiles have a suffix code.

We have started to clean up the profiles involved but we do not know how many uploads there have been and – even worse – will come. Instead of researching we spend our time to clean up.

“There are some restrictions, however -- you cannot import a GEDCOM for someone born before 1800, and all GEDCOM imports will stop once they go back to 1600. Read the full details here”

This is the claim of the gedcom-upload.
Does this really work? The profiles I mentioned are from the 18th century.

Furtheron I would think about the following changes (I guess you already have done this).

(1) Uploads will generate doubles and conflicts. A non-pro cannot merge and resolve conflicts.

(2) I frequently use MH uploads with original household documents and well done information of MH users. MH uploads allow you: main profile, husband, parents, siblings and children- that’s it and it works quite fine – before the upload every single field can be changed if necessary. A simple overwrite does not occur because it will result in a data conflict or a profile with the name “no name” and now I can fix the error.

(3) If a gedcom import shows a geni-match the uploader obviously can choose to use her/his information – which in sample 1 has resulted in that a lot of established names of Swedish nobility were changed and detoriated.

Only a curator can – I guess – open the list with the profiles these uploaders manage and find suspicious profiles. Everbody else is trying ro find these profiles in a jungle or in the activity list of the uploader.
The suspected uploaders I mentioned have added about 6000 profiles within 2/3 years.
Not all of them have a code or something else attached. Maybe 200+.

Did you contact the uploader?

We can't have some people working against "the common good"...

Also, is it possible to revert a gedcom upload?

Showing 61-90 of 191 posts

Create a free account or login to participate in this discussion