* Check for possible duplicated Media 1.4

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Check for possible duplicated Media 1.4

Post by peterbel »

Ran this today and it came up with an error.
Unsure if it is relevant but this is the first time I have run it since putting the FH folder on to OneDrive.
image.png
image.png (30 KiB) Viewed 1602 times
Tracing the Devon Bellamy family along with their partners.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

Sorry about that. It should not be affected by using OneDrive. It happens when loading a Media file's contents.
I don't think you have been involved in the prototype trials in Is there an easier way to use merge (22115).

I have not yet added any special memory management to this plugin as I hoped it would not need any.

Do you have any very very large Media files? e.g. Video clips or audio clips, etc?
If so, they may be the cause, and I will have to adjust the plugin to deal them.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Re: Check for possible duplicated Media 1.4

Post by peterbel »

Thanks for the prompt response Mike.
Yes I do have some Video files in the Media folder, one about 720,000 kb. so pretty big!
Tracing the Devon Bellamy family along with their partners.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

I have modified the plugin temporarily to only load the first 100MB of any Media file.

Try the attached Check for Possible Duplicated Media plugin Version 1.4.1 Date 04 Dec 2023.

Does running that avoid the 'not enough memory' error message?

If that works then I need to think about how best to handle such large files.
Should the plugin ignore them completely?
Alternatively, check their file size and just the first 100MB against other files to detect possible duplicates with the risk that there may be a difference beyond the 100MB.
Last edited by tatewise on 02 Feb 2024 15:58, edited 1 time in total.
Reason: Attachment deleted as 'Check for Possible Duplicate Media (FH7)' is in the Plugin Store.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Re: Check for possible duplicated Media 1.4

Post by peterbel »

Thanks MIke. It ran without an error finding 7 possible dups.
Tracing the Devon Bellamy family along with their partners.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

Ok, that suggests a finite file size that cannot be handled and I've confirmed with tests that is about 400MB.

Attached is the Check for Possible Duplicated Media plugin Version 1.4.2 Date 05 Dec 2023 which reports any such Large File in the Result Set. Please give it a try with your large video files.

Such files will not be checked for possible duplicates.
Last edited by tatewise on 02 Feb 2024 15:59, edited 1 time in total.
Reason: Attachment deleted as 'Check for Possible Duplicate Media (FH7)' is in the Plugin Store.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Re: Check for possible duplicated Media 1.4

Post by peterbel »

It ran without error and reported my large avi which presumably tripped up the earlier version.

As I want to proceed with caution I tend to always choose Do Not Merge, then I can inspect the result set and decide an action line by line. Perhaps my particular issue but the majority of what it finds is because I have named a media file incorrectly, to my convention, recognised it and then corrected it in Explorer, but not everywhere else. I then have to hunt down where that is, which can be testing, even with your Where Used Record LInks PlugIn.
Tracing the Devon Bellamy family along with their partners.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

Peter, can you explain your file renaming issue in more detail?

Just renaming a Media file in Explorer would result in a broken link in the FH Media record.

What are you hunting down with Where Used Record Links?
Changing file links/names in a Media record has no impact on links from other records.

None of the above should result in duplicated file links in FH.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Re: Check for possible duplicated Media 1.4

Post by peterbel »

Perhaps I have incorrectly described the reason??
Snapshot of my test using 1.4.2
Capture.JPG
Capture.JPG (74.61 KiB) Viewed 1457 times
Looking in the folder with Explorer only the image with the John and Jane section exists, for that one line.

My convention, which I moved to after realising I needed one to sort on, is:
Year (If appropriate) Event Place Name(s).
So an example could be: 1779 Baptism Chagford Elizabeth Ballamy John Ballamy
This tells me that Both Elizabeth and John feature on that one image and of course should linked to them both in FH, providing I have records for both of them. Often I find I have just one and more research is needed.
Tracing the Devon Bellamy family along with their partners.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

Remember that the Media Record Titles are quite separate from the Linked Media File names.

The entries in your screenshot are Media Record Titles and their Record Id.
So it says Media Record 622 and Media Record 765 link to identical files in your Media folder.

You have to inspect those two Media Records to check whether the same file is linked or two separate but identical files are linked.

You need to decide whether you need both Media Records or if one is redundant.

Do they both have links from other records? If one has 0 Links then it is probably redundant.

Does one have the correct Title and the other does not? Maybe they should be merged.

If any of the above does not make sense to you then ask and I can post some helpful screenshots.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Re: Check for possible duplicated Media 1.4

Post by peterbel »

Thanks for the explanation Mike. I think I was confusing Media Title and Media File Name.
Thanks also for the offer of more help
I will start investigating !
Tracing the Devon Bellamy family along with their partners.
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Re: Check for possible duplicated Media 1.4

Post by peterbel »

Ugh! 1.4.2 is erroring when it is checking the avi file again ??
Tracing the Devon Bellamy family along with their partners.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

Maybe either the max size of file accepted by the plugin needs to be reduced or more aggressive memory management applied. I'll investigate.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
peterbel
Superstar
Posts: 348
Joined: 21 Nov 2014 20:24
Family Historian: V7
Location: Cornwall

Re: Check for possible duplicated Media 1.4

Post by peterbel »

In case it helps, I wound back to 1.4.1 and no error?
Tracing the Devon Bellamy family along with their partners.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

V1.4.1 set the limit at 100MB whereas V1.4.2 allows up to 400MB and that may sometimes be too large.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check for possible duplicated Media 1.4

Post by Mark1834 »

Just a thought - I assume the plugin checks file size first, and only looks at detailed content if sizes are identical, so what are the chances of two very large video files with different content having exactly the same file sizes?

It can happen for relatively small jpg files (I've just checked my media folders out of curiosity), but not so sure about large files...
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check for possible duplicated Media 1.4

Post by Mark1834 »

A quick check suggests that you are testing all files, even where sizes are different - I know that my media records need a bit of TLC, and the plugin took 96 seconds to locate 15 duplicates. However, a quick and dirty plugin just to compare media file size took just 6 seconds to locate 21 instances of files with identical sizes.

If you screen file size first, and save the detailed check only for files with identical size, won't you make the plugin much faster, and probably avoid the need to fully load very large files? Granted my plugin only runs in FH7, but if you are going to fork an existing plugin on the grounds that it has not been fully optimised for FH7, does that matter?
Mark Draper
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

Yes, I have been thinking along similar lines but there are several wrinkles.

The design of the plugin is derived from the earlier Check For Possible Duplicate Media and uses a data file to hold previously checked Media file data, i.e. its MD5 encoding and Modification Date-Time stamp.
The idea is to save time reading every file each time the plugin is run.
That data file would need to be updated to include file Size but cater for it missing from previous users' data files.

The plugin cannot rely on using LFS to get file attributes Modification Date-Time and Size because LFS does not support UTF8 file paths.

So I am considering two options.
1) Use FSO to copy each Media file to a non-UTF8 'safe' file and use LFS on that copy.
2) Use FSO to directly access file attributes Mod DateTime and Size. The FSO Mod Date Time is a different format to LFS which is in the data file but if that is rebuilt anyway to include file Size that is not a problem.

The plugin will still have to cater for loading large files if they happen to have matching attributes and set a limit or add traps to avoid exhausting memory.

Also, the file 'duplication' is not necessarily because there are two copies of identical files.
It may also be where two (or more) Media records link to the same file.
Perhaps that should be detected by comparing Media record File Link paths.

The consequence of the above ideas involves a radical redesign of the plugin, which I had hoped to avoid.
I only got involved in the first place because the earlier Check For Possible Duplicate Media failed for various scenarios and the author was unable to fix it so I offered a quick fix and the rest is history.

Mark, would you like to take on this plugin?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check for possible duplicated Media 1.4

Post by Mark1834 »

It sounds like there is good scope for an FH7-optimised plugin using a different design and up-to-date tools. However, having a third plugin doing basically the same thing makes no sense at all.

At the moment, the Plugin Store offers Check for Possible Duplicated Media, which "replaces the earlier plugin Check for Possible Duplicate Media but fully supports FH V7", but also continues to offer the original plugin, which is described as compatible with FH5, 6 & 7, with no mention of any limit to its functionality.

Clearly it didn't concern Store Admin when the new plugin was first published, but what are users supposed to make of this apparent contradiction?

IMO, it needs a fully joined-up approach from the three plugin authors to unravel this Gordian knot:
  1. I am happy to produce a new FH7 version (Check for Possible Duplicate Media FH7), but I would do it in a completely different way, using fhFileUtils() to screen out all media files of unique size (so cannot be duplicated), then using Windows itself to create the hashes for the remaining files using fhShellExecute() to call a series of certutil -hashfile commands. To me it makes no sense making Lua jump through hoops to do something that Windows does easily and reliably. I'd need some reassurance that the Plugin Police would not object to this approach, as they have given mixed messages in the past, depending on who was the duty constable.
  2. The original plugin be descoped to just FH5 and FH6. If it's not fully compatible with FH7, it needs to say so. If it is, we don't need any other version.
  3. The newer fork, Check for Possible Duplicated Media, be deleted from the Store once a new FH7 version is produced, as it is then redundant.
For me, this happens together or not at all, so are all three authors happy with this approach (or of course, do you have a better idea?).
Mark Draper
User avatar
ColeValleyGirl
Megastar
Posts: 5509
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Check for possible duplicated Media 1.4

Post by ColeValleyGirl »

I don't think the plugin store managers should remove plugins that might still be in use by users of FH5 or FH6, or edit submissions by plugin authors. And I'd say it's up to an author to decide whether their plugin has been superseded or not (there's even a checkbox to do so).
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check for possible duplicated Media 1.4

Post by tatewise »

The older plugin does not support UTF8 extended characters anywhere in the Media file path.
It also suffers from the very large file problem that has arisen recently.
It does not offer to automatically merge identical duplicate Media records that users have requested.

If the proposed new plugin is just for FH V7 then that leaves FH V5 & V6 users with a less functional plugin.

Apart from fixing the older plugin issues my plugin offers to automatically merge identical duplicate Media.
The proposed new plugin should do the same so my new fork can be deleted.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check for possible duplicated Media 1.4

Post by Mark1834 »

ColeValleyGirl wrote: 06 Dec 2023 18:42I'd say it's up to an author to decide whether their plugin has been superseded or not (there's even a checkbox to do so).
It's interesting looking back at the history. Jane's original plugin has been around since the days of FH5 and has had over 1600 downloads, but the last update was nearly three years ago, shortly after FH7 was released. Mike's fork appears to have first seen the light of day on FHUG shortly after that, in spring 2021, when Helen suggested that the two authors liaised to avoid potential confusion.

Little appeared to change until a few days ago, when Mike's fork was published in the store, and the new problems emerged.

I have absolutely no problems with any remaining FH5/6 users having to use a less functional older plugin, as it is their choice not to upgrade, but I don't think we can make any progress until we hear Jane's view on whether her original plugin is still actively supported in FH7, or has now been abandoned following publication of the fork.
Mark Draper
User avatar
Jane
Site Admin
Posts: 8518
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Re: Check for possible duplicated Media 1.4

Post by Jane »

I can not really see the problem as it is not uncommon on extension stores to see multiple related/similar plugins.

I am happy to remove any or all of my plugins from the store, but I do not think it is Calico's place to remove plugins unless they are dangerous. My plugin was never intended to support unicode as it was written long before FH supported it and works fine for early versions and normal sized files.
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check for possible duplicated Media 1.4

Post by Mark1834 »

Thanks Jane - agree that overlapping plugins are not a problem (I've written one or two myself ;) ), but I think it needs to be clear to store users what the differences are.

Would you be happy to amend the store description of your original plugin to make it clear that it does not support non-Latin file names (I think that is a better description for general users than "Unicode") or large files such as video?

That would give users a clear choice between two alternative plugins with different but overlapping application.

Mike's description of the original plugin as "not fully supporting FH7" appears to be a bit of a red herring, as these limitations apply to FH6 as well.
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check for possible duplicated Media 1.4

Post by Mark1834 »

Jane has updated her original plugin description in the store to confirm that it is fully functional in FH5/6/7, apart from Unicode file names or very large files such as video. I will post a first draft alternative plugin in a separate thread, probably tomorrow, but there are issues with Mike's fork that are worth reporting now (although unlikely to be worth fixing as the plugin will be withdrawn shortly, and nobody appears to have downloaded it yet apart from me).

I ran all three plugins against a copy of my current main project for testing. I get the same record listing as Jane (but will have a more detailed report), but Mike's version appears to flag both false positives and false negatives.

1. The first line of the fork description says it detects "where the file names are different but the contents are the same", but appears to miss examples where I deliberately have duplicated records. This would typically be where unrelated baptisms or burials are recorded in the same image, so I have two separate copies of the same file, each named according to the relevant Principal, and date fields corresponding to the event date.

2. More puzzling is recording an alleged match between two Media Records containing completely separate files, which have different file sizes so cannot have the same MD5 hash. The screen grabs show the plugin report, plus the two Property Box displays that show clearly that the files are completely different images.
Report.png
Report.png (14.68 KiB) Viewed 1149 times
File 1.png
File 1.png (244.33 KiB) Viewed 1149 times
File 2.png
File 2.png (246.06 KiB) Viewed 1149 times
Mark Draper
Post Reply