Clean-up again, with some automation

Started by Private User on Thursday, January 18, 2018
Problem with this page?

Participants:

Profiles Mentioned:

Related Projects:

Showing 1-30 of 34 posts
Private User
1/18/2018 at 10:11 AM

From the dates of the discussions, it's been a long long time since the last drive of About Me clean-up. I only recently started doing some, and have come to my own preference. If you'll allow me to describe it briefly, then I'll turn to making a computer script that does clean-up on a massive scale.

My way of doing it:

Straight copy & paste from the "intro" section of English wikipedia page, add the link to the name (between square bracket, and making the name bold), remove any reference numbers and italicize wherever it needs to be (e.g. book titles). I remove the dates (after checking they are already entered in the profile) because I want to be able to see the first sentence from project's peek view. I sometimes try to use the short form of the name; the long form is up on the profile.

Most recently I started adding Geni links to any reference to another person. Geni is probably the unique place where all names can be linked. I found it much easier and more fun to read or skim, and it invites the viewer to "jump" to a totally different branch of the tree. I also experimented with adding links to all repeating references (of the same name), which is not the wikipedia practice. Here are two profiles that I just did:

Dr. Martin Luther King, Jr.
Coretta Scott King

And I would remove the rest of the wikipedia article that someone else has entered, IF it was just a straight copy. If I see some editing work (making headlines, remove extraneous characters) I would leave it alone. I know Randy Stebbing prefers to give a "wikipedia summary", with lots of ellipsis. My hope is that we can come to some sort of agreement on the "best practice".

1/18/2018 at 11:19 AM

Private User I’m liking this idea except for removing dates from the narrative. When resolving data conflicts on an MP (I get these requests daily) it’s a big help to have dates in the “about” rather than go to an external Source window. In fact, once date / locations are in the overview, I feel much more secure in locking basic fields (about matches fields). In addition if I’m looking at family members it’s a big help to have dates in the notable’s window.

Love the Geni profile linking within overview.

Private User
1/18/2018 at 12:20 PM

I'm open to including the dates. It usually doesn't help with the one-line peek anyways.

Private User
1/18/2018 at 12:28 PM

Let me add (the obvious) that the links within Geni can be hovered over to see the picture and immediate family, which also works when logged out. This is a relatively recent feature, and we should make better use of it.

Two methods for including link: 1) full url (can shorten the "name part" in the middle if too long), space, and the name you want it to appear; and 2) just the Geni ID between square bracket. Geni will display it according to the viewer's preference. I prefer the former.

1/18/2018 at 12:53 PM

I didn't know that about "hover," thank you.

Does copying Wikipedia have any copywrite issues?

What happens to information that is already on the "About Me" that is not Wikipedia?

1/19/2018 at 4:32 AM

>add the link to the name (between square bracket, and making the name bold),
While this looks cool it isn't a particularly clear method for crediting wiki, vis Leanne's point.

I suggest start the About with italicised "From Wikipedia, the free encyclopedia" which is actually the very first sentence of every single wikipedia article, i then use the square bracket method to make the word Wikipedia into the link to the article i am copying from.

1/19/2018 at 11:54 AM

I’m no expert in copyright but Wikipedia has a Creative Commons redistribution note.

Yao, maybe a last updated date stamp, because articles update (part of the reason for a live link !)

Adding the webaddress and Accessed <date> would be good for people to be able to easily go to the page plus know how old the information is - and if the wikipedia page is different it allows us to explain why.

Sometimes wikipedia pages get hacked - how do we know that the information in wikipedia is correct?

1/19/2018 at 2:35 PM

Not just hacked, can be incorrect. But this is what we have Curators and managers for.

I use [SIC: should be JOHN] to annotate quotes from sources that have a detail wrong

But how does the checking happen if the process is automated?

1/19/2018 at 6:42 PM

If a profile you follow is updated you'll get a notification.

Also I'm pretty sure Yao would be working with well known notables with Wikipedia bios already in Geni --- I have many of those.

BUT if there are additional genealogy notes made from other sources than Wikipedia I hope those are retained. This would be a Wikipedia article update only I think / hope.

Private User
1/20/2018 at 6:56 PM

I added the suggestions to the two King profiles. Let me know if you think it is okay. (I'm not so keen on using the most standard citation, "Wikipedia contributors, ...", which I have seen.)

I didn't have it all thought through how to make things automated. Ideally this would be something Geni should keep track of—which profile is associated with which wikipedia page, and update it periodically.

I know that wikipedia is often not very reliable or well-written, especially when we leave the world of European/American notables. But I believe the majority of our notable MPs are already quoting wikipedia. I wanted to clean up that part of the About Me only. (I realized after starting this post that this "About" Cleanup project had a much wider scope.)

I'm probably going to test it out with a few profiles. I thought the US presidents would be a good place to start, but all those profiles have super long About Me due to multiple merges. Any suggestions? I could also start from the wikipedia side, say from their Featured Articles, which are of better quality.

1/20/2018 at 9:48 PM

> but all those profiles have super long About Me due to multiple merges.

All the more reason to clean them up, though not perhaps what you are looking for to experiment with.

1/21/2018 at 11:55 AM

I like the idea of American presidents and getting rid of all those copy & paste dup dup dup. But there certainly are presidents we have additional notes for. I've worked on trees for Washington, Lincoln, the Adams, Obama, Trump, and (for my sins) Nixon. Probably many others. Probably safest to work with the 20th / 21st century first.

Is it worth creating a project "Bio on Wikipedia" and adding profiles to it as we go along?
That way there will be at least a list of them.

1/22/2018 at 3:43 PM

Sorry off-topic, but inspired by this discussion:
Would it be possible to generate a narrative About description from the timeline of a profile and related profiles and some information about the time of the events (may be extracted from projects) when the About is empty?

1/22/2018 at 3:50 PM

I mean something like this:
Franciscus Bernardus was born on August 4, 1825 in Goes, during the French Age. He was the oldest son of J.F. Lippens and B.T. Claeys. Two years later he got a sister Maria Barbera (33+ years old). When Francis was 30 years old, his father dies on 2 February 1855. However, 3 weeks later, he married Anna Catharina Koenen on February 21, 1855 in Middelburg. The couple lived in Goes. Three months after their wedding day their first child is born, Jacobus A.F., who dies after only 4 months. Within a year, the family settles in Amsterdam, where 11 more children are born, 4 of whom die as babies. F.B. performed professions as music master and office clerk. He died at the age of 68 in Amsterdam, on October 23, 1893. Franciscus was the one who moved from the south to 'the big city' in the north and stayed there.

But generated from the data available with Geni and possible Wikipedia

1/23/2018 at 12:01 AM

Wikitree and ancestry do (what looks like a) basic automated narrative. But picking up projects, timeline, listing family members, this is inspired. I can tell you it would make detangling & keeping trees orderly easier, not to mention making the profile more interesting. But perhaps only if the overview is blank or with a SmartCopy citation only, and not replacing those links, just appending to the "about."

1/23/2018 at 1:15 AM

Yes, that is what I was thinking about. There are many empty Abouts and Abouts with only SmartCopy entries.

Maybe I should post the same question to Jeff and see if he could build something?

If Smart Copy could do a basic template - it would be easier to tweak rather than type it all in again.

1/23/2018 at 4:17 AM

It's hard to imagine this as something more than a fleshing out of the Timeline, it is hard (for me at least) to imagine a computer writing that paragraph about Franciscus Bernardus based on the BDM data gleaned from the surrounding tree

Private User
1/23/2018 at 8:18 AM

That would be the job of an AI, which may not be too far in the future:) On a related note, I thought MyHeritage had an AI-ish tool that extracts data from a narrative like an obituary, which is the reverse of what Job was proposing. (No idea how well it works.)

I haven't started on scripting it, but did a couple of US presidents and UK PMs by hand. It seems that the UK profiles are much cleaner (kudos to the curators). The particular style I proposed, with links to Geni profiles, should lend itself to a "dumb" computer fairly well.

I'm not so sure if I wanted to have a project for "bio on wikipedia". I could start with an existing project (of notables), and insert a computer-generated paragraph at the beginning, and solicit help from the collaborators to proof-read and do the actual "clean-up" of duplicate wikipedia copy&paste. Any of you want to offer your project for testing, with permission from other (active) collaborators?

To be clear, I would NOT touch other information in About Me. IMO it is best to have extra information that is of special interest to genealogists.

Private User
3/6/2018 at 11:45 AM

I like idea of Job, but then with the links to the sources. If it based on wikidata links to wikidata. If it based on wikitree links to there. Aspecially if the wikitree article is very good sourced.

But links to the primary sources like birth certificates is better.

But to give an idea how a script would look like is. I will do only the beginning. Just to give Alex Moes an idea how simple it is for me and Liu.

{firstname.ego} was born on {birthmonth.ego} {birthday.ego}, {birthyear.ego} in {birthplace.ego}, during the {namebirthage} Age. He was the {numberofsibling.of.parentpair.in.words} {gendersibling.of.parentpair.in.words} of {first.name.father} {lastname.father} and {first.name.mother} {lastname.mother}. Two years later he got a sister Maria Barbera (33+ years old). When Francis was 30 years old, his father dies on 2 February 1855. However, 3 weeks later, he married Anna Catharina Koenen on February 21, 1855 in Middelburg. The couple lived in Goes. Three months after their wedding day their first child is born, Jacobus A.F., who dies after only 4 months. Within a year, the family settles in Amsterdam, where 11 more children are born, 4 of whom die as babies. F.B. performed professions as music master and office clerk. He died at the age of 68 in Amsterdam, on October 23, 1893. Franciscus was the one who moved from the south to 'the big city' in the north and stayed there.

BTW I'm can also write in several programming languages and translate English into pseudo script. There did you think I'm BoSc in???

Private User
3/6/2018 at 12:09 PM

Only the last line `Franciscus was the one who moved from the south to the big city in the north and stayed there.´ is very funny, but Job that you need to code that yourself. For that can only be done by peopleware and not in software.

Private User
3/6/2018 at 12:34 PM

And ofcourse we need to write another story for a middle sibling or for a youngest sibling. So if oldest then use story of Job, if middle then use 2nd story and if youngest sibling use benjamin story. And if there are no siblings then use one child story. Of course at the moment more siblings are added the story should be rewritten.

Private User
2/7/2019 at 8:22 AM

Ok, finally found this discussion.... I made a little tool that anyone could use:

https://beta.observablehq.com/@liuyao12/format-wikipedia-biography-...

It doesn't automatically link to other Geni profiles, which would require maintaining a database. But hopefully it's helpful to others. Any bugs or suggestions are welcome!

2/7/2019 at 9:31 AM

I think its a great tool, Yao, and look forward to using it.

Jeroen, Wikipedia is a wonderful "start" point for historic profiles, and often have citations. Copying the header information, as this tool does, also helps boost Geni's google hits, and keeps Geni "in sync."

Private User
2/7/2019 at 9:44 AM

If I take the example of Ronnie Brunswijk the base should be https://www.wikidata.org/wiki/Q1968705 and from there a Geni bio should be build. If you use https://en.wikipedia.org/wiki/Ronnie_Brunswijk then the disclaimer

"Text is available under the Creative Commons Attribution-Share Alike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization."

Should be added.

Private User
2/7/2019 at 9:48 AM

In general the facts are free, but the texts are subject of license and policies. I see no problem with linking to Wikipedia or Wikidata. As linking help Geni search engine hits.

Showing 1-30 of 34 posts

Create a free account or login to participate in this discussion