Page 1 of 1

Ver 7.0.17.1 - Duplicate Media Prevention

Posted: 05 Nov 2022 16:54
by tatewise
FH V7.0.17.1 Updates say:
If you add a picture or other media item into a project, Family Historian will now check for duplicates, skip unnecessary file copies, and avoid creating a new Media record if there is already a Media record for the item in question.
However, it does not explain the criteria for detecting duplicates and there seems to be nothing in the Help.
There are four scenarios when adding Media and considering possible duplicates.
The file paths referred to here are the ones in the Media record File Link.
  1. Both files have different file paths and different contents, so are definitely NOT duplicates.
  2. Both files have different file paths but identical contents, so in some sense are duplicates.
  3. Both files have the same file path but different contents, so are definitely NOT duplicates.
  4. Both files have the same file path and identical contents, so are definite duplicates.
FH is only detecting the last case 4. as duplicates, but does not mention that when they are manually added.
It just quietly does not copy the file and uses the existing Media record.

Case 3. leads to files with a suffix like (n) without FH giving any warning.

Case 2. can arise if the file is left outside FH or copied to a different folder than its existing duplicate.
So this case still needs the 'Check for Possible Duplicate Media' plugin to find such files.

The original trigger for this change was imported Projects where multiple Local Media Objects all referred to the same Media file (i.e. Case 4, above) and produced multiple identical Media Records and multiple file copies.
That problem is fixed by the change, but has missed the opportunity for detecting the other cases.

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Posted: 05 Nov 2022 17:49
by NickWalker
The logic seems to be:
When opening a file, check which folder it is being copied to
If the folder already contains a file with the same name and the file contents are the same then don't bother copying in and use the existing media record instead.

It wouldn't spot if I copied the file into a different folder in the project.

I can fully understand why they've not gone further with this because they'd need to do a full search for duplicates whenever an image was added which could take a very long time (some people have thousands of media records). Any lengthier process would have to be a separate 'search for duplicates' with warnings that this could take a long time. A plugin already does this I think?

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Posted: 05 Nov 2022 18:35
by Mark1834
This seems to be an “I wouldn’t design it this way if it were my app” point rather than a definite bug in FH? I tend to agree that expecting FH to screen all possibilities would be an unreasonable overhead, so plugin remains the best option for users who may have an issue.

Perhaps performance on large media sets could be improved by omitting the resource-intensive contents comparison if the file sizes are different (so cannot be the same content).

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Posted: 05 Nov 2022 18:49
by tatewise
Mark has hit the nail on the head. FH does not need to compare files if they are different sizes.
Anyway, it could keep a cache of encoded file contents just like the plugin once did before FH V7 and my substitute does.
Then checking all files would be very quick.

The lack of Help documentation does not help!

The Check for Possible Duplicate Media error (21125) thread suggested that this new FH V7 feature could make the plugin redundant - but no, not as it stands!

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Posted: 05 Nov 2022 19:23
by NickWalker
Well obviously, it wouldn't compare them if they were different sizes, but it would still be a significant job to scan all the potentially 10s of thousands of files comparing file sizes, particularly on a network drive or non-SSD.

What I was trying to say is that a full system to do all this comparison and caching and dealing with various other issues that would occur is quite a significant development and not this relatively simple additional step added in a minor update.

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Posted: 05 Nov 2022 20:48
by tatewise
As I said, FH would not need to actually scan any files at all if it held a cache of the relevant details.
e.g. file path, size, and md5 hex sum, just like the Check for Possible Duplicated Media plugin.
It could even be indexed by file size to provide an almost instant lookup. It is not rocket science.
The cache just needs a tiny update each time a file is added.

BTW: "this relatively simple additional step" has taken over 7 years to achieve since I reported it! I am not impressed. Sorry!