[IPOL discuss] Repository of test images

Fri Mar 15 16:23:30 CET 2013

> More than a page or repository, I think this dataset should be
> published as an article, like what has already been done for a 3D scan
> dataset: http://dx.doi.org/10.5201/ipol.2011.dalmm_ps

Yes, it seem that a complete article is more useful, indeed.

> ... and aperture and lense and color balance and .... In fact, he best
> solution is to provide all the content of the meaningful original EXIF
> data. EXIF can probably be exported to a machine-readable XML file and
> a human-readable text format.

Well, for those images taken directly from the camera we have all that  
information, but some of the images might be altered before they can  
be used as test images.
For example, the images used in the denoising demos where zoomed-out  
several times to remove noise from them and therefore many of the  
characteristics of the original image are transformed. But still the  
image has been taken with some parameters that perhaps are interesting  
to say (hoping that this doesn't confure the users). The original  
camera configuration and the transformations done afterwards.

> Usually, I see both checksums used: SHA1SUM and MD5SUM, because both
> have some known weaknesses. On the other hand, these checksums
> uidentify the file, not the image. I am wondering if we could provide
> a checksum information independant from the (lossless) file format
> (PNG and TIFF for example), but still easy to use for anyone.

According to the Wikipedia "SHA-1 appears to provide greater  
resistance to attacks[citation needed], supporting the NSA’s assertion  
that the change increased the security". Anyway, we just want to use  
the hash function as a quick verification of the file, so I think it's  
enough. And it's very simple for the users to check a file. For  
example, using "shasum" in a GNU/Linux system.

If we proposed a new format to verify the contents of the file, it'd  
be more difficult for the users to verify the files. I think it should  
be very quick and straightforward: you have an input PNG image and you  
want to verify that it's exactly the same image used in IPOL. You just  
compute the checksum in your file and compare it with the expected  
value.

And if we looked at the contents, we'd come across problems like equal  
values for the intensities but different ICC color profiles and other  
ugly problems. I'd prefer a simple binary SHA1 checksum on the file  
and that's all.

> By the way, I started sonme discussions with datadryad.org, to use
> their infrastructure to store heavy datasets published in IPOL. Still
> discussing and considering the options for the moment.

The most distributed IPOL is, the better, clearly.