[IPOL discuss] Fast DCT using FFTW3

Pascal Getreuer getreuer at gmail.com
Tue Feb 28 19:19:10 CET 2012

Dear all,

I would like to add that DCT transforms can be used in certain cases to
perform convolution *even faster* than FFT-based convolution.  Thus having
good DCT code is very useful.

DCT transforms can be used to perform fast convolutions with symmetric
boundary handling under the condition that the filter is symmetric, e.g.,
the filter can be a Gaussian or a Laplacian.

With the FFT, convolution with symmetric boundary handling can be achieved
by first reflecting the signal to twice its length,

f1,f2,...,fN  →  f1,f2,...,fN,fN,...,f2,f1

In this way, the periodization implied by the FFT produces half-sample
symmetric boundaries.  A length-2N real-to-complex FFT transform can be
applied to obtain the transform as N+1 complex values.  Then do (complex)
multiplication with the transformed filter, perform the
complex-to-real inverse FFT to obtain a length-2N result, and take the
first N values to obtain the convolution.

With the DCT-II transform, the symmetric boundaries are already implied by
the transform.  To convolve a length-N signal, compute its DCT-II transform
to obtain N real values.  Then multiply the transform values with the DCT-I
(not DCT-II) transform of the filter.  Perform the inverse DCT-II transform
and you are done.  The key point is that we never needed arrays longer than
N floats to do it, so the convolution is more memory efficient.

1D symmetric convolution of length N
DCT-based: requires N floats
FFT-based: requires 2N+2 floats (N+1 complex values)

The advantages are even better in higher dimensions:

2D symmetric convolution of size N x N
DCT-based: requires N^2 floats
FFT-based: requires 4N^2 + 4N floats (2N x (N + 1) complex array)

3D symmetric convolution of size N x N x N
DCT-based: requires N^3 floats
FFT-based: requires 8N^3 + 4N^2 floats (2N x 2N x (N + 1) complex array)

For more details, I describe DCT-based convolution in the section
"Convolution with Boundary Handling" of


Also see the paper by Martucci, “Symmetric convolution and the discrete
sine and cosine transforms.” <http://dx.doi.org/10.1109/78.295213>


On Tue, Feb 28, 2012 at 11:04, Pascal Monasse <monasse at imagine.enpc.fr>wrote:

> Hi Miguel,
> > It even analyzes the numerical expressions and tries to compute it the
> > shorter and faster way.
> > For example, if y = f(y) + 2*f(y) + 3*f(y), it doesn't call f(y) three
> > times.
> I doubt that the compiler would dare such an aggressive optimization. f may
> have side effects that the compiler is not able to analyze, therefore it
> may
> think it is safer to call f(y) three times...
> Best,
> Pascal
> _______________________________________________
> discuss mailing list
> discuss at list.ipol.im
> http://tools.ipol.im/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tools.ipol.im/mailman/archive/discuss/attachments/20120228/ca6e2ba7/attachment.html>

More information about the discuss mailing list