Data Conflicts & Search results improvements

Started by Peter Rohel (c) on Thursday, March 5, 2015
Problem with this page?

Participants:

  • Geni Pro
  • Private User
    Geni Pro
  • Private User
    Geni Pro
  • Private User
    Geni Pro
  • Private User
    Geni member

Profiles Mentioned:

Showing 1-30 of 35 posts
3/5/2015 at 7:25 AM

Mike Stangel - would it be feasible to make Program enhancements for a) Reduction of Data Conflicts b) improve Search results & Reduce data entry?

a) 1st Letter data entry Default = Uppercase in Name(s) fields. *allow users to change to Lowercase (for van XX, von XX, de XX, etc.)

b) Name Variations = Added by app in the AKA field (from tables), ex: :

Maria = adds to aka: Marie, Mary, etc.
Josef = adds to aka: Joseph, Joe, Joey, etc.
Susan = adds to aka: Susana, Susanna, Susannah, Suzana, Suzanna, Suzannah, Susann, Suzan, Suzann, Susanne, Suzanne, Susanne, Susanne, Suzette, Sue, Suse, Susie, Sanna, Suzi, Suzy, Suzie, Suki
etc........

* Tables would include Global given name variations / translations (found on Wikipedia, etc.)

Private User
3/5/2015 at 7:43 AM

ad a.) don't forget special Dutch names like s' Jacob and de l'Ordre etcetera

ad b.)

wouldn't it be better to determine a new field for 'VOORVOEGSELS', for Belgian users prefer them in UPPER-class etc.

Private User
3/5/2015 at 7:46 AM
Private User
3/5/2015 at 11:27 PM

Jeanette please clarify a "VOORVOEGSEL"

I'm not sure what the intention of all the Name Variations in the AKA are for Peter? I like to use that field for names my profile actually used..

I would prefer the search function to have that logic built in and find the Name variations automatically (This is how MyHeritage's SuperSearch works I believe, and other Sites have a button to switch on and off for 'Name variations')

Private User
3/5/2015 at 11:42 PM

A voorvoegsel in English: Oh, have to search for a dictionary...
not a prefix, but part of the SIRNAME but not the Surname itself, f.i.
* van der
* de l'
* Van 't
* der
* etcetera....
is that a explanation you can work with?

Private User
3/5/2015 at 11:47 PM

Private User

It's rather complicated for foreign users of Dutch a/o Belgian surnames, for in our country we do NOT like the 'Voorvoegsel' in upperclass and we put the surname in CAPITALS, but Belgian people allways alphabetize their 'Van ABC-names' under V instead of A.

Private User
3/6/2015 at 12:09 AM

Aah OK so the Voorvoegsel is the 'van' and the 'de' etc.. Thanks Jeanette

Private User
3/6/2015 at 7:55 AM

I think this falls partly under the "Language" module - it may be a nuisance keeping track of different name variations in different languages, but less so than trying to read names in a dozen or more different languages, some of which do not use the Roman alphabet.

(I can manage to muddle through several European languages - and I realize that I'm quite unusual for a US citizen - but Greek, Cyrillic, and so forth just lose me completely.

Private User
3/6/2015 at 8:24 AM

I think it's called preposition, or prefix, or article, or...
maybe "son OF a " also could be some kind of voorvoegsel?
Have wondered if it was the Germans who started it,
a lot of Von there, but the dutch have it as well.
In Sweden mostly reserved for knighted people in tree variations,
"av" "af" "von".

3/6/2015 at 12:29 PM

In a) (1st letter Uppercase) - *allow users to change the Default 1st letter to Lowercase. "..enforce case sensitivity in data entry." - thus Reducing user time & improving Geni's quality.

In b) starting with OR building only the Roman alphabet Given name variation Tables - would result in tremendous time saving

List top 5 aka names in the AKA field OR a new field, since current Naming fields are still being debated. After the 5 names - provide a "see more" hyperlink.

This way, only users interested in viewing All the AKA's - would see them, while many Given names "could" fall in the top 5.

The tables could be "maintained" by Curators or perhaps, even Pro users - thus Not wasting Geni technical resources. Table creation is mainly a "One time" function - utilized over & over again by the app.

1) could the multi language modules work - together with the Tables?

2) or - are users to enter each of the Given name variations - in the Different languages?

*all ideas welcomed - just trying to get the system app to do more of the work (repetitive, error prone, non-existent) VS. being done Manually by users.

3/6/2015 at 1:05 PM

Back to Peter Rohel's suggestion. Doesn't it make sense to have synonyms be part of the search algorithm rather than having the system add them to aka? The aka should only include name variants that apply directly to the specific person.

For example, you might have gone by Pete, but not Petros or Petr. The algorithm should know that they are the same name, but the aka or language variants of your name, should only include names that apply specifically to you.

3/6/2015 at 2:02 PM

a) My first reaction to this is that the data-conflict resolution should just be indifferent to case (as now the first and middle name fields are, but not maiden and last), however it does bug me to see lazily-entered profiles such as "mike stangel" -- I'll play around with some Javascript to capitalize the first character in each field only once, and see if it's not too annoying to override when needed.

b) This absolutely belongs in the search/matching system and not in every single aka field. We will get there, with MyHeritage's help.

Private User
3/6/2015 at 2:18 PM

I typically use the AKA field for nicknames, if someone appears in records with multiple spellings or versions of their name (names tend to change over time), or if someone changed countries and therefore changed their name from say Pierre in France to Peter in the US.

To use the AKA for every possible variation of a name sounds like a big mess ;-)

3/6/2015 at 2:44 PM

Wendi, I agree 100% with you.

Mike Stangel, I agree with you. I think the capitalization can become annoying. A rather better solution will probably be if "prefix(es)", like van, van der, du, etc. can be recognized by the software and the first letter after that capitalized. If this application only applies to the SA tree, and not the global tree, I think we are back to square one.

I'd then rather prefer the status quo. Going back whenever a "prefix" first letter must be changed to lower case, will most certainly have a negative impact on capturing speed.

Private User
3/7/2015 at 1:14 AM

Mike Stangel

Capitals for f.i. Hein van Staveren will annoy us enormously ....! Be SURE of that !

3/7/2015 at 8:01 AM

I'm worried that any attempt to "fix" name automatically could have unintended consequences.

First, there are geographical differences. Names with von, van, van der, and de might be lower case in the original country but are often capitalized when some of the family moves to America. Members of the New Amsterdam project (and related projects) put a lot of work into getting it right.

Second, there are differences across time. Medieval nobles in England and Scotland don't capitalize de, but Victorians who added the particle back into their names do capitalize it. It's very frustrating people mistakenly capitalize the de for medieval people, but it's not the kind of thing that can fixed easily by a system sweep.

3/7/2015 at 9:04 AM

Mike – thank you for giving the 1st letter Uppercase some consideration. Perhaps you have (or can create) some stats on the problem.

Justin - users will be able to type "those names: - in the format they want - as they do Now. However, majority names on Geni, etc. use - Uppercase 1st letter for given & surnames - and those, Geni should standardise.

Hopefully, mixed Surnames starting with Lowercase, Double Surnames – can have special routines. But if NOT - manually changing the 1st letter back to Lowercase during entry & continuing – will hopefully Not cause a major delay or irritation for those names.

My word processor & email new paragraphs - Default the 1st letter to Uppercase. When I change it Lowercase and continue typing - it remains Lowercase. Only Geni Date utilizes “ease of use fields” with: a) drop downs b) field editing, formatting. When I paste: 19/2/1989 (European format) - I manually change it to: 2/19/1989 (my format choice) - otherwise the result would be: February, 1989.

* I am aware Tree view Preferences allow 3 formats - but it is faster for me to make the date change on the Entry screen vs. changing the formatting back & forth in the Preferences

*few Merge & Data conflict tests: test test DIAMANT = merge test Diamant on top of - Test Diamant (no Data conflict)
test test DIAMANT = merge test Diamant on top of - Test DIAMANT (data conflict = DIAMANT) http://www.geni.com/merge/resolve/6000000031984501294
test test DIAMANT = merge TEST DIAMANT – on top of test Diamant (data conflict = DIAMANT) http://www.geni.com/merge/resolve/6000000031984501294
test test DIAMANT = merge test DIAMANT on top of - test Diamant (data conflict = DIAMANT) http://www.geni.com/merge/resolve/6000000031984278600
Josef Josef DIAMANT = merge Josef DIAMANT on top of - Joseph DIAMANT (data conflict = Joseph) http://www.geni.com/merge/resolve/6000000031985250875
Josef Josef DIAMANT = merge Josef DIAMANT on top of - Josef Diamant (data conflict = Diamant) http://www.geni.com/merge/resolve/6000000031985250875

3/7/2015 at 9:14 AM

Justin is absolutely correct.

Names should be as they are written by the people in the country where they lived, not as some other group decides. And how names are written may change.

van ~ Van and de ~ De are two good examples.

I fight with this in profiles I manage or curate when a group speaking one language changes names into their idea of what the name was. For example converting Yiddish names to their modern Israeli equivalents. Or converting Argentine Spanish names to some possible modern Israeli equivalent.

If the name was spelled "Van ...." in America, that's how it should be represented.

3/7/2015 at 9:28 AM

Hatte Blejer (absent until Nov 1) and Justin Durand I support both your points of view 100%.

It is not for one group to impose their viewpoint on the global community, and then expect the global community to adapt to their system and manually correct it.

That is exactly why I am in support of the status quo.

3/7/2015 at 12:54 PM

I trust Mike (geni staff) to determine the pros & cons of:
1) changing New names entry - to prevent all lower case or uppercase
2) changing Existing names - to Xxxxx Xyzxyz *with conditions (mentioned several times above)

Based on: existing statistics, development time, processing resources, response time & most Importantly - End result, effect on Data & Users. Those are normal development & improvement metrics - used by small, mid & large corporations. I will not attempt to tell Geni staff - how to do the programming, they're very capable.

* 1985-88 we developed programs to: check 1st character (given & surname) for case, selecting them & the 3rd character, checked for spaces in names (xx Xyzxyz, xxx Xyzxyz), checked for numeric, special character, etc. - creating a "matchcode" for Duplicate elimination & Merging. We used drop downs in Data entry, selected tables, had data field help, etc. All that, to Reduce Customer service Errors - and Reduce labour time (=staff). That was our mandate from senior management.

Sure, Geni users can continue with unnecessary edits, data conflicts and inconsistent Names - due to lower case only and upper case names. But, just because it was Not incorporated in 2007 - does Not mean, it should not be done (improved) in 2015. Be they New entries only 1) or Old entries as well 2) mentioned above.

3/7/2015 at 1:02 PM

My concern is only about changing old entries programatically. The number of variables make to possible to think it could be done but also very hazardous if any are missed or misapplied.

I have no doubt Mike will make the best decision for Geni, but before he makes any decision at all this discussion is important so he understands the potential problems.

3/7/2015 at 1:11 PM

If it were up to me, here's what I would do -- give curators a script they could run on public profiles within a family group. The script could have options that would normalize capitalization according to a few different standards. Say, lower casing surnames, but capitalizing or not capitalizing de, van, etc. Then, let people ask in the curator thread for help in specific areas.

Much safer than doing an elaborately constructed system-wide sweep.

3/7/2015 at 1:41 PM

Justin Durand - I hear you Justin. Suggestions, alternatives, implementations, best practices - are welcomed - to solve the problem :)

Those who think "it is Not a problem" - and nothing needs to change - are welcomed to state their opinion & facts - all are welcome, it's an open public discussion.

3/7/2015 at 3:35 PM

Erica, I don't want to go much further with this because you and I will just fall back into our standard practice of driving each other nuts with all the bickering and hair splitting.

However, to answer your question, Geni has already defined family group. I don't know whether curators still have it but once upon a check there was a tool for check living, or maybe it was check privacy, that grabbed and checked and changed the profiles in the immediate vicinity.

Like you, I've also been around the block on software manipulation of data fields but my experience was far less happy. Users saying, "Oh, I forgot to tell you about that scenario". Programmers saying, "Sorry, I didn't think there was a difference between this and that."

If you take just the simple case of whether not to capitalize de, I think you might be able to see some of the potential dangers. First, figure out whether it is a French de or a Dutch de. Then look at the time period. Then look at the place.

I have to wonder if it is really quite so easy to isolate British profiles born before "about" 1400 so the system can lower cases their de, then identify all the British profiles from about 1750 to 1900 so that if the surname is "British" the de needs to be capitalized but if it's from a French immigrant then it needs to be lower cased. Then move on to Sweden and America, where there were many French and British immigrants with de names, and maybe leave those alone because there is no rule.

It seems to me that in a programming landscape like that, it would be impossible to get it exactly right. Perhaps the biggest problem would be just trusting that the place of birth is correct and can be easily parsed. I think it would be very likely that large areas of the tree would be missed by a "no call" scenario, and many would be changed that shouldn't have been changed because there wasn't enough initial effort to identify every scenario.

Just to be clear -- I'm an enthusiastic supporter of any new system that will help people get the capitalization right for new profiles, even if it's not perfect, but I have serious reservations about trying to do a sweep to clean up existing problems.

3/7/2015 at 4:04 PM

Perhaps a nice compromise between full automation and no automation would be simply recognizing when a name like that is being entered. Once identified the interface could present a question to the user, "Are you sure the name was used in that way?", and present a link to documentation on the various naming conventions. In this way Geni would be empowering the user to learn about the naming conventions and the user gets to verify the name is being entered correctly.

3/7/2015 at 4:39 PM

Erica, you misunderstand. I'm not talking about the family groups of living people. Hopefully, we can allow users the privilege of knowing best how to spell the names of their own parents and grandparents.

I'm talking about the family group defined by a historical person's immediate family, out to 3rd great grandparents, and 4th cousins. In other words, exactly like the check living tool, and not like some imaginary other definition that's easier to argue about ;)

It's my opinion that it would be better and safer for data integrity to allow spot fixes in a small area rather than trying to sweep all of Geni. I can understand that you have a different opinion, but I hope you will give up this vendetta you have that makes you rush to tell me what an idiot I am whenever I post something.

3/7/2015 at 7:21 PM

Justin is right in the "It's my opinion that it would be better and safer for data integrity to allow spot fixes in a small area rather than trying to sweep all of Geni." or what most genealogical programs like PAF, RootsMagic etc. call "Goblal Replace" It can reek complete havoc and be a total disaster and ruin a data base - you find that evident in many GEDCOM databases where you see very weird place names it has to be used very carefully and wisely -

3/7/2015 at 8:37 PM

Not only do I work in software but I work in language software and specifically have a lot of experience in software related to recognizing and parsing names.

The pitfalls that Justin points out are very real, as any of us who have seen a lot of profiles on Geni should recognize.

Things that seem simple and straightforward are not and name handling in an automated fashion in a multilingual, crowd sourced environment is far from simple.

Private
3/7/2015 at 8:47 PM

I agree with justin about this i have a lot of nordic names that became americized when they left for the us and it's a hassle...

3/8/2015 at 7:38 AM

Private - interesting, above your surname appears as mcCann (display name) - while your profile uses McCann (last name).

Michael, what did you mean by "americized" ?
They did Not change it to: MCCANN or mccann - same for your LaPoint names - correct ?

*Naming Conventions and Merge issues (partially related): http://www.geni.com/discussions/145450?msg=1005989&page=1

Showing 1-30 of 35 posts

Create a free account or login to participate in this discussion