Is there any way to save storage on similar images?

pe1uca@lemmy.pe1uca.dev · 1 year ago

Is there any way to save storage on similar images?

simplymath@lemmy.world · 1 year ago

Compressed length is already known to be a powerful metric for classification tasks, but requires polynomial time to do the classification. As much as I hate to admit it, you’re better off using a neural network because they work in linear time, or figuring out how to apply the kernel trick to the metric outlined in this paper.

a formal paper on using compression length as a measure of similarity: https://arxiv.org/pdf/cs/0111054

a blog post on this topic, applied to image classification:

https://jakobs.dev/solving-mnist-with-gzip/

smpl@discuss.tchncs.de · 1 year ago

deleted by creator

smpl@discuss.tchncs.de · 1 year ago

deleted by creator

simplymath@lemmy.world · 1 year ago

Yeah. That’s what an MP4 does, but I was just saying that first you half to figure out which images are “close enough” to encode this way.

simplymath@lemmy.world · 1 year ago

Yeah. I understand. But first you have to cluster your images so you know which ones are similar and can then do the depulication. This would be a powerful way to do that. It’s just expensive compared to other clustering algorithms.