* Ver 7.0.17.1 - Duplicate Media Prevention

Questions regarding use of any Version of Family Historian. Please ensure you have set your Version of Family Historian in your Profile. If your question fits in one of these subject-specific sub-forums, please ask it there.
Post Reply
User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Ver 7.0.17.1 - Duplicate Media Prevention

Post by tatewise » 05 Nov 2022 16:54

FH V7.0.17.1 Updates say:
If you add a picture or other media item into a project, Family Historian will now check for duplicates, skip unnecessary file copies, and avoid creating a new Media record if there is already a Media record for the item in question.
However, it does not explain the criteria for detecting duplicates and there seems to be nothing in the Help.
There are four scenarios when adding Media and considering possible duplicates.
The file paths referred to here are the ones in the Media record File Link.
  1. Both files have different file paths and different contents, so are definitely NOT duplicates.
  2. Both files have different file paths but identical contents, so in some sense are duplicates.
  3. Both files have the same file path but different contents, so are definitely NOT duplicates.
  4. Both files have the same file path and identical contents, so are definite duplicates.
FH is only detecting the last case 4. as duplicates, but does not mention that when they are manually added.
It just quietly does not copy the file and uses the existing Media record.

Case 3. leads to files with a suffix like (n) without FH giving any warning.

Case 2. can arise if the file is left outside FH or copied to a different folder than its existing duplicate.
So this case still needs the 'Check for Possible Duplicate Media' plugin to find such files.

The original trigger for this change was imported Projects where multiple Local Media Objects all referred to the same Media file (i.e. Case 4, above) and produced multiple identical Media Records and multiple file copies.
That problem is fixed by the change, but has missed the opportunity for detecting the other cases.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Post by NickWalker » 05 Nov 2022 17:49

The logic seems to be:
When opening a file, check which folder it is being copied to
If the folder already contains a file with the same name and the file contents are the same then don't bother copying in and use the existing media record instead.

It wouldn't spot if I copied the file into a different folder in the project.

I can fully understand why they've not gone further with this because they'd need to do a full search for duplicates whenever an image was added which could take a very long time (some people have thousands of media records). Any lengthier process would have to be a separate 'search for duplicates' with warnings that this could take a long time. A plugin already does this I think?
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
Mark1834
Megastar
Posts: 2145
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Post by Mark1834 » 05 Nov 2022 18:35

This seems to be an “I wouldn’t design it this way if it were my app” point rather than a definite bug in FH? I tend to agree that expecting FH to screen all possibilities would be an unreasonable overhead, so plugin remains the best option for users who may have an issue.

Perhaps performance on large media sets could be improved by omitting the resource-intensive contents comparison if the file sizes are different (so cannot be the same content).
Mark Draper

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Post by tatewise » 05 Nov 2022 18:49

Mark has hit the nail on the head. FH does not need to compare files if they are different sizes.
Anyway, it could keep a cache of encoded file contents just like the plugin once did before FH V7 and my substitute does.
Then checking all files would be very quick.

The lack of Help documentation does not help!

The Check for Possible Duplicate Media error (21125) thread suggested that this new FH V7 feature could make the plugin redundant - but no, not as it stands!
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Post by NickWalker » 05 Nov 2022 19:23

Well obviously, it wouldn't compare them if they were different sizes, but it would still be a significant job to scan all the potentially 10s of thousands of files comparing file sizes, particularly on a network drive or non-SSD.

What I was trying to say is that a full system to do all this comparison and caching and dealing with various other issues that would occur is quite a significant development and not this relatively simple additional step added in a minor update.
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Ver 7.0.17.1 - Duplicate Media Prevention

Post by tatewise » 05 Nov 2022 20:48

As I said, FH would not need to actually scan any files at all if it held a cache of the relevant details.
e.g. file path, size, and md5 hex sum, just like the Check for Possible Duplicated Media plugin.
It could even be indexed by file size to provide an almost instant lookup. It is not rocket science.
The cache just needs a tiny update each time a file is added.

BTW: "this relatively simple additional step" has taken over 7 years to achieve since I reported it! I am not impressed. Sorry!
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

Post Reply