Page 1 of 1

Source Fields + associated Notess, + possible duplicate branch

Posted: 02 Jul 2020 18:30
by Billread
Hi Mike
still on the Gedcom that the relative made many issues with. Having deleted the rest of the rubbish, and solved the UID's , i note there are some other things I may be able to do to clean the ged up a bit. The first is that I note that many source entries ( most or all in fact) are obsolete. Plus the associated Note Record for the source. If I know the name of the source or what the nore record says, how easy is it to search/Bulk delete those? I looked for a way, and played, but?

also As a lot of this is still tied to the duplicate records, how do I tell if there is maybe duplicate branches, I know that the feature exists, and it occurs to me that may be part of the problem. But jenny tells me she entered a lot of GRO ref's in some peoples Note Record for the birth, rather than as a source. I can see this when I open the 2 copies of the individual in View ..records lists Individuals, but it may be that as in Focus view i always only see one of the duplicate individuals sets if i can identify a second set lurking, they may all be the ones not opened and modified each time she opened and added an entry. Therefore I might be able to uilk dlete the whole second set of people once found somehow!!

The problem is identifying the second set of 100 people that appeared.
Guess its still a merge compare 2 at a time though if its random as to which focus individual is modified each time, so they are all mixed up.
Hope that makes sense
thanks Bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 02 Jul 2020 18:55
by tatewise
If possible, it would be easier if we could focus on one problem at a time.
Also, a screenshot or two for that one problem might simplify the description.

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 02 Jul 2020 20:26
by Billread
Ok first is probably to do with how select all records, with certain field entries If there is one way to do that if a field has Known Information in, then I assume the same process would work for every field??

looking at one screenshot attached in most records I would like to select all with Source ...From A Cox split....etc, and delete , would that delete ALL sub fields Like certainty, and Note, note record. Because there are just A FEW, however where I would like to clear the SOURCE, but keep the note Record if possible...or move the note first? See Johan record ( of which there are 2 as he is one of the duplicates) I want to clear the Alan Cox source as obsolete. But The note has a lot of information included I wish to keep!
Johan duplicate  keep Note Record delete source if possible.png
Johan duplicate keep Note Record delete source if possible.png (127.56 KiB) Viewed 9215 times
Source to select  delete .png
Source to select delete .png (95.52 KiB) Viewed 9215 times
Where I get some confusion is these notes are attached to INDI records. Then there is a seperate NOTES tag...where are those notes , in that they obviously are entered from a different place, and I may want to delete in bulk some of those?

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 02 Jul 2020 20:37
by Billread
I'm assuming if I go to sources tab, I can select the Aln Cox Source, dlete, etc, But as last post, didnt want to lose some data in the occasional note like Johans I mentioned. Although all those with Alans Source generally can go.

Just afraid if dlete wrong things, i lose others that are child fields or similar.
Bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 02 Jul 2020 20:48
by Billread
On the same subject , notes and dletion of, Under notes tab are many as screenshot...saying just Keep Keep. each has a record ID
but i cannot see how to determine whose record ( Indi) they appear in, again nearly all can be deleted I think, but one at a time, or slect all, dlete?
Keep Keep.png
Keep Keep.png (42.4 KiB) Viewed 9207 times

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 02 Jul 2020 21:38
by tatewise
We can look at this in more detail tomorrow, but here are a few tips:
  1. To see what happens when you delete a particular item, just do it, paying attention to the before and after.
    If necessary take screenshots of the before to compare with the after.
    Then use Edit > Undo ... to reverse the deletion, and then Edit > Redo ... and then Edit > Undo ... to flip betwen before and after. You will soon see what is included in the deletion.
  2. To delete records with similar text values, first click on the column heading that contains that text.
    So regarding your Notes tab screenshot, click the Note Records column heading.
    That will bring all the __KEEP keep records together, so they can easily be selected.
    You can do the same for any column, so clicking the Links column heading would group together any with 0 links.
    Similarly, in the Sources tab, you can group all records with the same Source Records Titles, etc.
  3. Then to bulk delete...
    In the Records Window open the Named List pane using Lists > Named Lists Pane command.
    Choose an empty Named List such as Bookmarks that has Items set to 0.
    Now, select the group to be deleted by holding down Shft key while selecting first and last record in group.
    Use the Lists > Add to Selected Named List command then Lists > Delete Named List Records and confirm.
    That is explained in how_to:delete_a_large_number_of_records|> Delete Any Number of Records.
BTW: Deleting a Source link will not delete a Note Record link immediately below as that is a sibling item and not a child item.

BTW: To find where a record is used, use the View > Record Links command, or a Plugin as explained in the advice at how_to:finding_where_records_are_used|> Finding where Records are used.

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 02 Jul 2020 22:15
by Billread
Ok Mike
I have used named lists before, so I think Ok there, I will play a bit.
Thanks
duplicates, and knowing where they are in the schem..ie a new branch is next headache.
goodnight
Bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 16:24
by Billread
Hi Mike
managed to delete a lot of the rubbish.

so how do I find the duplicate branch tool in case that will help sort all these duplicates. As if I open a diagram for everyone i see only one copy of each person, so the rest are hidden? how a[part from in Individual list do i find them, and how identify if the copies are interlinked somehow. Jenny who did the work said she never created any duplicates, and I know the file I sent her had only 1 in. Now 100.
Thanks
bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 16:49
by LornaCraig
If your relative added extra individual records they will be the records with the highest ID numbers. (Assuming that you haven't added any more individuals yourself since she worked on it). So in the Records window, individuals tab, click the Record ID column heading to sort them. You may need to hold down the Alt key while clicking on the heading, to bring the highest numbers to the top. Does that help?

You could add these records to a named list for systematic comparison with other records with the same names.

Do these records have a different Pool number from most of the others in you project? If so they are an independent group not linked in any way to your other records (although they may be duplictes of your other records.) If they have the same Pool number as the rest then they must be linked somewhere. Try creating an All Relatives diagram for one of these newest records and see if that gives any clues.

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 17:48
by tatewise
Lorna's advice about highest Record Id may work, but the records were added in Legacy so that may not be true.

When you run the Find Duplicate Individuals Plugin its Result Set lists the Record Id of each duplicate pair.
Could you post a screenshot of at least the first few duplicates in the Result Set so I can analyse them.
I suspect that the Father, Mother, Spouse, Child columns will be 0 indicating no duplicate family branch.

From your description of the All Relatives Diagram it sounds like the duplicates are not linked.

To discover the Relationship Pool each Individual belongs to run the Search For Orphans standard Query.
Any Individuals who only exist in their own unique Pool number are totally unrelated to anybody else.

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 18:41
by Billread
Hi
yes I had noted the fact all except 3 taht duplicate are 90 apart, but they arent all at the end if I do a sort on ID then i can read through Individual list, and they match the same order for both sets, but then towards the edn of the second set, newer records added start appearing as well, that are not all duplicates. so need be careful.

What is the pool No you both mention.
screenshot attached.

I want to check the GRO entries i mentioned, that i can see in the Notes TAg, but cannot identify from the ID there which Individual in the Individual tag each nore belongs to?? If I check some quickly i can tell if they have been added to the original ID or the Duplicate??
Thanks
Duplicates.png
Duplicates.png (179.48 KiB) Viewed 9086 times
Bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 19:22
by tatewise
A Relationship Pool number is assigned by FH to every Individual record.

All Individuals who are related to one another either directly or via marriage have the same Pool number.
Your main family tree of Individuals will be related, so will all have the same Pool number, usually 1.
Other groups of Individuals, unrelated to your main tree, will be assigned a different (usually higher) Pool number.
So, Individuals who are related to nobody will have their own unique Pool number.
Use View > Standard Queries > Search For Orphans to list all Individuals sorted by Pool number.

However, looking at your screenshot, most duplicates have scores for close relatives (Father, Mother, Spouse, Child).
So they must be in a duplicated family group.
If you select one duplicate Individual in that Result Set and produce an All Relatives Diagram you can see the group.
Select the other duplicate and do the same to see their family group.

Can we please focus on one issue at a time. Put the GRO entries on the back burner until we have sorted out duplicates, or forget about duplicates while we resolve the GRO issue, but not both at the same time.

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 19:30
by Billread
i have 10 orphans, which i can see from doing a Everyone diagram. Is that a plugin ?

But whether i click on what i believe is the original ID person, or their 90+ ID copy, if I focus, both seem to be ok if I then do an everyone diagram...

I checked some of the GRO notes by opening each in NOtes tag, and seeing who it was, then looking for that person in the INdv list. For those i checked, the original ID person seems correct.
maybe I can select the 100 dupl's and just delete them?
Bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 19:35
by Billread
so as it looks like the higher numbered persons are the duplicate, how do i identify them as a set? there are 329 people in this gedcom, of which 100 are duplicates, if the 100 are linked and will show in an evryone diagram if i select 1 of them ( the duplicate) surely that diagram will differ by 129 people if i select the original of that person ??
takes a bit of thinking this!!

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 19:41
by Billread
tried one person, and focused on him, and then his duplicate, an evryone diagram gave the exact same statistics in both
cases. ??

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 20:01
by tatewise
An Everyone Diagram can be very confusing as it literally displays everyone, including all the duplicates, regardless of which Individual you select. That is why the statistics are always the same whoever you focus on.

I advised you to experiment with the All Relatives Diagram which only includes relatives of the selected Indiviudal.

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 20:07
by Billread
You mentioned Orphans, so ran that queery, but not very clear what it shows. The first person is in a pool of 12, then it reduces until loads in a pool of one, and nobne in a pool of zero.
attached screenshot
Orphans.png
Orphans.png (62.15 KiB) Viewed 9068 times
Bill

PS I know from diagram there are 10 peopel in the everyone diagram who are comletely alone..are they not orphans?

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 20:22
by Billread
Well finally , after removing the 10 floating records ( i have always called them floaters) Then tidying up a few more things, I now have an evryone diagram, where a big chunk of people appear twice First at the top of the diagram, around 90 Plus I would guess, with coloured links off to groups below. Then a bit further down the diagram they appear again, just as one large group, with no links to any others smaller groups, as the first set. So I assume these are the duplicate set.

I am aware i can select them in the diagram, and delete the records, , can you think of a reason I should do another way, or anything to check on?
thanks, I really appreciate the help
Bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 20:30
by tatewise
The numbers are NOT the size of the Pool, but the Pool identification number.
So Charles DRUCE is a lone orphan in Pool number 12.
Ditto for the next seven in Pools 11 - 6 and also Pool 4.
There are two people in Pool 5.
But whether any of those are duplicates is not clear.

The WILLCOCKS family are in Pool 3.

The remainder are in Pool 2 or the largest Pool 1.

Go ahead and delete them as you presumably have a backup GEDCOM.

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 21:10
by Billread
I think I came up with a good option. Backed up Gedcom, I opened an Everyone Diagram, and selected all the group I could see were duplicates, added to named list, exported as a seperate gedcom, Then Jenny can use these if something wrong in set retained, then deleted the named list contents. The ged exported is all the higher numbered of 67 duplicates.
then realsied from a clearer diagram that there was another group, of a father and 8 children, when i looked he appears twice, and has differing numbers of kids, so merged him, and sorted the kids, which were added to a new named list.
Went back a second time, ran duplicates plugin, and found some 12 more duplicates, so selected the higher No again of each, added to new named list, Exported Gedcom again. Then deleted again.

Now I have a clean, clear gedcom, that still seems to have all GRO records etc mentioned.

And at least the exported geds contain any data if later found to be missing something.

seems like a reasonable answer, do you think I missed anything? and thanks for help
Bill

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 21:33
by tatewise
Difficult to tell whether you missed anything as I cannot look over your shoulder :D

Re: Source Fields + associated Notess, + possible duplicate branch

Posted: 03 Jul 2020 21:35
by Billread
really?
felt like a tawny sitting on my right shoulder :lol: