Reasons for Keeping a Record of Data Elided by Data Conflict Resolutions after Merges.

Started by Sharon Doubell on Wednesday, November 21, 2012
Problem with this page?

Participants:

Profiles Mentioned:

Related Projects:

Showing 1-30 of 78 posts

Evidence suggests that the Data Conflicts on profiles tend to accumulate unmanageably because too many people are wary of making irretrievable choices about 'correct values'.

This reticence results in wholescale 'dumping' of data, when the stacked profile data is too numerous to easily sift through. - Which weakens Geni's content validity in the long run.

Logging the elided data in a Discussion is an attempt to resolve this problem.

So, I have already resolved the Data Conflict,
and am simply making sure that all the managers involved in that merge, know what data of theirs I removed.

I am doing this as a courtesy, against the possibility that the other, or future managers - may want to re-engage with the deleted data. (Sending a private message means there is no record for any future managers of the profile - of which we hope there will be many.)

It is not a query, and it does not require a comment,
unless you disagree with the way the Data Conflict was resolved, or as many curators - like Justin, Sheldon and Harald sometimes do, you want to add useful info about the data presented – that others can benefit from when resolving Conflicts on that profile in the future.

“Why bother doing it?”

I'm doing this in case anyone would have wanted to make a different decision about the data included. THIS WAY THAT DATA OPTION ISN'T LOST to what may seem to be an arbitrary Curator decision.

It allows people to later go back and make a different decision, while logging their thought process for those choices on the Discussion. That PREVENTS ‘RE-INVENTING THE WHEEL’, in the future.

“Isn’t that what MPs are for?”

I am making 2 presumptions here:

1) Data on unlocked MPs will not always be resolved to comply with the original.
So, over time, without this kind of record and engagement - they can / will come to represent as 'gospel' data that the original MP never actually had.
THAT LEAVES GENI OPEN TO REPRESENTING AS 'SET IN STONE', DATA THAT IS ACTUALLY DEGRADING OVER TIME. This is a very bad use of MPs, and will give Geni a bad name.

2) Curators don't always know everything, AND OTHER MANAGERS on the profile might actually want to be ALERTED TO THE OPPORTUNITY TO ENGAGE WITH DATA CHOICES being made on the MP by them and others.

“Isn't better to leave resolving these data conflicts of you don't know what the correct values is?”

Yes, but I am presuming that there will be many cases where the 'correct value' is best established/ maintained by giving the managers whose data have been elided the chance to dispute it.

We need also to be careful not to exclude anyone who doesn't see him/herself as an indisputable expert on the profile’s data. Geni is about accumulating mass knowledge from the population (including that which has hitherto been inaccessible by 'experts) and skewing it towards validity over time.

Exactly as Jadra points out - history is not static. The record of our negotiations about how the data will be represented is what written History is made of. In the case of huge crowd sourced data projects like Geni, this record is the mechanism by which the data progresses towards greater and greater depth and 'accuracy', instead of going around in circles.

That is my intention in using a profile Discussion to keep a record of the data we elide / delete as we resolve data conflicts.

They way I see it is that it is pollution the discussions and will stick to the profile forever.

When I come over data conflicts where I am in doubt of what is correct I always check the profile, and if still in doubt I add it to the About Me.

Only when a discussion is needed a discussion is started, either as a discussion between the managers or if needed a public discussions.

The thing is that some of these discussion topics have no merit at all, one of the dates being clearly wrong or the spelling of a name in a time when spelling of names were fluid.

One might perhaps discuss conflicting data in documents, but not this kind of conflict. There really isn't anything to discuss.

Guys, there are not too many other ways to say that these are not intended as topics - meritorious or not.
I am not logging them because I'm in need of help as to what is correct.

I've pretty much said ad nauseum that it is about a creating a courtesy record for the profile's managers to comment on if they object to the changes.

I get that you don't like them. I moved the conversation here so we could stop 'polluting'? the profile logs with your comments about the fact that you don't like them.

These are my reasons for logging the data. I'm the one putting in the extra work. If you don't like them, just ignore them.

Oh, they're not really discussions at all?

If you only want to alert the managers of the profile, then I would agree with June that only the managers should be contacted.

Sending a private message means there is no record for any future managers of the profile - of which we hope there will be many.

I'm doing this in case anyone would have wanted to make a different decision about the data included. THIS WAY THAT DATA OPTION ISN'T LOST to what may seem to be an arbitrary Curator decision.

It allows people to later go back and make a different decision, while logging their thought process for those choices on the Discussion. That PREVENTS ‘RE-INVENTING THE WHEEL’, in the future.

I still don't see why anyone should be reminded that, for example, someone once thought Jacques de Savoy might have been born in 1710.

Or whether Helena Smit, daughter of Alewyn Smit was in fact Burger. The source documents are attached to her profile and cannot be argued with, unless contradictory documents are found later. THEN it can be discussed

I don't think mistakes of this kind should be perpetuated. They should be allowed to quietly die.

The problem arises when someone deletes information that may later prove to be correct, especially if there is no supporting documentation. Daan Botes in Zambia has suggested a viable alternative to either just deleting the data or creating dicsussions about it. He's suggested creating a document with the deleted data in it and storing it on the individual's profile. You can do this by clicking on the Media tab and then on 'Create New Document'. Give the document a name and when you click through after that, you can paste in the deleted info. Make a note that this has been done in the About section and (hopefully) voila!

You find all changed data in the revision history of a profile, so there is no need to duplicate it.

Bjorn - No, only the Revision History of the Primary Profile is shown. Any time Profiles are merged, the Revision History of the one not chosen to be the Primary Profile is Lost.

Bjorn, the revision history provides
1) no possibility to engage with other managers’ Reasons - should we want to go back and see why data was discarded and
2) no alert to the managers that choices have just been made that elide some people's data (eg Burger, in Jansi's example.) in case they to add or object to it.

Jansi, Lee's point is exactly it:
The problem arises when someone deletes information that may later prove to be correct,
or when someone changes information that reflects their own preference for presentation, that others might like to engage with (eg Charlemagne’s many name options);
or when there is simply is not enough data in the sources and someone has to make a arbitrary choice between data sets (eg Charlemagne’s naming sources)

When the data is so similar that it makes no difference, I don't record it.
But when there is a difference (even, as in your much used example- when I'm sure the one is incorrect) I still record the fact that I've deleted it, because that error may be an alert to the profile manager that, for eg, their profile was mis-merged into another one, or their generations are out; or the data source they're using is poor, and may be so on other profiles.

I don't feel comfortable deleting people's info without alerting them.
And I do think it is more efficient to record the deleted info in a designated public discussion record per profile, for exactly your reasons: so that the info can be reviewed and ‘the same mistakes are not perpetuated.’

We'll have to agree to disagree, Sharon.

I agree with the rest of Lee's proposal, to create a document with the deleted data in it.

Dang! Geni's software developers definitely missed a trick here.

There should be a way to keep the profile manager's original information intact, and then to provide a separate forum in which changes/additions can be made by others. Especially on our own family trees, all of us feel a certain attachment to the records we've created - and Geni really should have thought of a way to preserve these while simultaneously allowing for research cooperation amongst collaborators.

I personally find it rather distressing to spend so much time setting up a tree in my own style and using information I've managed to dig up, only to find the profile merged, and the style and content changed completely ...

Yes, Lee, collaboration has to be done sensitively, imo or it just ends up being taking other people's info without acknowledging their investment in it.

The creation of a Deleted Items Doc is great, except that it doesn't solve the problem of providing an instant alert to the managers that a deletion / change may have occurred on the profile. A Discussion does this.

I have asked for such an option to be created as the best possible solution; but Geni is inundated with design requests so this is not likely to happen anytime soon.

"I don't think mistakes of this kind should be perpetuated. They should be allowed to quietly die. ..."

Unfortunately "spurious data" on internet trees doesn't just die - it comes around to haunt with every new tree.

A discussion may be the exactly most efficient way to solve this.

Sharon, I see your system as a trade-off.

On one hand, I feel like I need to go check each of those messages, and that takes time. Nine times out of ten, there is nothing I care about. Sometimes I get a little annoyed. I'm aware that I'm the only one saying I have to check, but I wish I didn't feel like I have to.

Then, suddenly I see one message where I care very much. That one message makes all the others worthwhile, because it is the kind of alert I'm looking for.

I agree Erica, and I really don't think we're giving enough thought on Geni to preventing data deterioration.

Curators tend to think we will be sufficient as the frontline guardians if we could only watch every profile carefully enough - which 'expert only' attitude is contributing to the problem. (MPs, for example, can entrench bad data, if it accidentally becomes attached to them; and there is presently no good mechanism to alert people to the opportunity / necessity to watch out for it.)

There needs to be a system in place that acknowledges that our only chance to maintain / keep improving data quality at this number of profiles is by keeping the mass users engaged in the process of checking the data as it changes - not by excluding them.

Justin, I hear you. I feel like I've got to reply to every comment on the discussion too. :-( Ditto about annoying and then worth it.

I'm trying to put in an explanatory opening paragraph on all of them, so people don't feel they have to comment.

I do think Geni needs to create a parallel discussion thread for this, but without that - this process is at least designed so that it will still be there to go back to later; making it fine to ignore them until you find yourself working on that profile in the future.

All it really is, is a fail safe, but it also prevents the comeback of users suddenly wanting to know what we curators did to their data without telling them.

If someone is annoyed by the messages, it's easy to avoid them -- just unfollow the profile. I don't want to sound harsh, but at a particular point you either care about changes or you don't ;)

And there's always "relinquish management" as well. :)

Sharon I applaud you for a creative use of existing Geni tools.

I do appreciate the annoyance of "extra noise" (particularly in notifications to one's own email), but perhaps there are easy ways that Geni can develop to reduce that volume, if need be. They were able to quickly implement a "mute" setting for inadvertent "group forwards" for instance.

So we should keep an eye out for any need like that as well.

If you notice this is about the first time I've posted as I've looked at the discussions and thought - "OK, I can move on on now ..."

Oh, one more point. Preserving outmoded data on a profile with a text document created "on the fly" is a fabulous tool. I use it for "cleaning overviews."

:-)

I somewhat prefer the discussion method to the document in the sense that a "message" is kept in the sense that the discussion is automatically recorded against the profile and it remains available if one, at the creation stage, is not in a situation to attend to the matter.
The pollution issue Bjorn referred to , may somewhat be alleviated by using the same discussion thread on a profile where previous data conflicts were logged

Yes. Very much so. That is exactly the intention :-)

Sharon, - your "job" as a curator is to make a decision when there is a data conflict, not to send the ball out again. If you can't, leave it.

Bjorn, I think you are mixing up the job of curator with that of an auditor.

Showing 1-30 of 78 posts

Create a free account or login to participate in this discussion