[IPOL discuss] Fast DCT using FFTW3
Nicolas Limare
nicolas.limare at cmla.ens-cachan.fr
Tue Feb 28 15:18:49 CET 2012
Hi Miguel,
Thanks for sharing the code, maybe you can put it somewhere in the
wiki?
> The code sniplet is here:
> http://dev.ipol.im/~colom/tools/sniplets/DCT_sniplet.c
I also used FFTW3 to perform a DCT for the Retinex code, and I
remember a few things from my attempts to get a faster program. They
may be useful to others too:
* The float variant is faster, for the simple reason that the data
takes less place in the CPU cache, hence less memory<->CPU
transfers. If 6 decimals are enough for you, which is probable if
your input data comes from a 8bit image, single precision is worth
trying.
* Instead of copying the data from u to FFT_in, I think you could
directly process a block of u with the advanced FFTW3 interface
-> http://fftw.org/fftw3_doc/Advanced-Real_002dto_002dreal-Transforms.html
This would save a possibly expensive copy.
* If you want to copy from u to FFT_in, you could use a memcpy instead
of the inner for() loop; this could be faster, nested loops are
often a big performance penalty. You could do something like
for (int j = 0; j < w; j++)
memcpy(FFT_in + j*w, u + (y+j)*Nx+x, w * sizeof(double));
* Still about loops, you could use a single loop instead of the two
external ones, something like
/* z is the upper left block corner indice */
for (int z = 0; z < Nx * (Ny - w), z++) {
if (z % Nx > Nx - w)
continue; /* skip borders */
for (int i = 0; i < w; i++)
memcpy(FFT_in + i*w, u + z+Nx*i, w * sizeof(double));
fftw_execute(fft_plan);
...
}
But without more comments, this code is harder to understand.
* Multiplications are faster than divisions, so you could do
double inv_2w = 1. / ( 2 * w);
for (...)
FFT_out[...] *= inv_2w;
Same thing for sqrt, you only need it once:
double inv_sqrt2 = 1. / sqrt(2);
for (...)
FFT_out[...] *= inv_sqrt2;
Some compiler may recognize these optimizations automatically, some
will not. This formula may also have a small impact on the rounding
error.
--
Nicolas LIMARE - CMLA - ENS Cachan http://www.cmla.ens-cachan.fr/~limare/
IPOL - image processing on line http://www.ipol.im/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://tools.ipol.im/mailman/archive/discuss/attachments/20120228/1ff9c228/attachment.pgp>
More information about the discuss
mailing list