[IPOL discuss] Fast DCT using FFTW3

Nicolas Limare nicolas.limare at cmla.ens-cachan.fr
Tue Feb 28 15:18:49 CET 2012


Hi Miguel, 

Thanks for sharing the code, maybe you can put it somewhere in the
wiki?

> The code sniplet is here:
> http://dev.ipol.im/~colom/tools/sniplets/DCT_sniplet.c

I also used FFTW3 to perform a DCT for the Retinex code, and I
remember a few things from my attempts to get a faster program. They
may be useful to others too:

* The float variant is faster, for the simple reason that the data
  takes less place in the CPU cache, hence less memory<->CPU
  transfers. If 6 decimals are enough for you, which is probable if
  your input data comes from a 8bit image, single precision is worth
  trying.

* Instead of copying the data from u to FFT_in, I think you could
  directly process a block of u with the advanced FFTW3 interface
  -> http://fftw.org/fftw3_doc/Advanced-Real_002dto_002dreal-Transforms.html
  This would save a possibly expensive copy.

* If you want to copy from u to FFT_in, you could use a memcpy instead
  of the inner for() loop; this could be faster, nested loops are
  often a big performance penalty. You could do something like

      for (int j = 0; j < w; j++)
          memcpy(FFT_in + j*w, u + (y+j)*Nx+x, w * sizeof(double));

* Still about loops, you could use a single loop instead of the two
  external ones, something like

      /* z is the upper left block corner indice */
      for (int z = 0; z < Nx * (Ny - w), z++) { 
          if (z % Nx > Nx - w)
              continue; /* skip borders */
          for (int i = 0; i < w; i++)
              memcpy(FFT_in + i*w, u + z+Nx*i, w * sizeof(double));
          fftw_execute(fft_plan);
          ...
      }

  But without more comments, this code is harder to understand.

* Multiplications are faster than divisions, so you could do

    double inv_2w = 1. / ( 2 * w);

    for (...)
        FFT_out[...] *= inv_2w;

  Same thing for sqrt, you only need it once:

    double inv_sqrt2 = 1. / sqrt(2);

    for (...)
       FFT_out[...] *= inv_sqrt2;

  Some compiler may recognize these optimizations automatically, some
  will not. This formula may also have a small impact on the rounding
  error.

-- 
Nicolas LIMARE - CMLA - ENS Cachan    http://www.cmla.ens-cachan.fr/~limare/
IPOL - image processing on line                          http://www.ipol.im/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://tools.ipol.im/mailman/archive/discuss/attachments/20120228/1ff9c228/attachment.pgp>


More information about the discuss mailing list