[IPOL discuss] RGB->gray conversion and speed comparison

Nicolas Limare nicolas.limare at cmla.ens-cachan.fr
Fri Jul 29 11:43:37 CEST 2011


> I will make some timing tests to see if there is any benefit using
> integers. 

I tested various implementations of this RGB->Y conversion. My
benchmark code is attached, and is in a git repository:
    http://dev.ipol.im/git/?p=nil/rgby_bench.git

I used the cpucycles code[1] by D. J. Bernstein code to compute the cost
of these operations in CPU cycles. It's the first time i make such
precise measures, so I might have done some mistakes.

The results for the two main compilers gcc and icc on my machine are
in the attached file report.txt. My interpretation is that integer
code is faster than floating-point code. The difference is even more
clear when you measure the cost of many identical operations
(BLOCK_SIZE > 1 in bench.c), and I think this means CPU pipelining is
more efficient for integer than for floats.

The libpng code
    Y = (6969 * R + 23434 * G + 2365 * B) >> 15
is always the fastest, but we often have close performances with the
following variation with better numerical quality using rounding
instead of truncation
    Y = (((6969 * R + 23434 * G + 2365 * B) >> 15)
         + (((6969 * R + 23434 * G + 2365 * B) & (1 << 14)) >> 14))
I am aware this kind of instruction with bitwise & and shifts >> would
require lots of comments in the code to be clear for everyone.

Conclusion: integer operations are faster and can be used for a fast
and accurate implementation of the RGB->Y conversion. I think I will
modify io_png to use this version.

I am also interested in the results of this benchmark on your
machine. To generate a benchmark report, save the attached
rgby_bench.tar.gz and
    tar xvzf rgby_bench.tar.gz
    cd rgby_bench
    make count                # shows the results for your default compiler
    make report > report.txt  # tests various compilers and options
and send me the report.txt file.

[1] http://ebats.cr.yp.to/cpucycles.html

> And I will ask libpng implementers why they used an integer
> approximation and truncation resulting in 50% wrong results.
> 
> PS: In the libpng source code, I observed that the actual
>     coefficients used by libpng are
>     Y = (6968 R + 23434 G + 2366 B) / 32768
> instead of the ones in the libpng documentation
>     Y = (6969 R + 23434 G + 2365 B) / 32768
> I will raise this issue on the libpng mailing-list.

libpng bug reported here:
https://sourceforge.net/tracker/?func=detail&aid=3381606&group_id=5624&atid=105624

-- 
Nicolas LIMARE - CMLA - ENS Cachan    http://www.cmla.ens-cachan.fr/~limare/
IPOL - image processing on line                          http://www.ipol.im/
-------------- next part --------------
#  Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
# /home/nil/.local/bin/gcc
# gcc -O2 -fomit-frame-pointer -march=native -mtune=native -c bench.c -o bench.o
# gcc  bench.o cpucycles.o -lm -o bench
21 cycles	integer * + /, division truncated
21 cycles	integer * + >>, shift rounded
21 cycles	integer * + >>, shift rounded with tmp
21 cycles	integer * + >>, shift truncated
24 cycles	floating-point * +, cast rounded
24 cycles	floating-point * +, cast truncated
24 cycles	integer * + floating-point /, cast rounded
39 cycles	floating-point * +, floor rounded
42 cycles	integer * + floating-point /, floor rounded
# icc -O2 -fomit-frame-pointer -xHost -fp-model precise -c bench.c -o bench.o
# icc  bench.o cpucycles.o -lm -o bench
21 cycles	integer * + /, division truncated
21 cycles	integer * + >>, shift rounded
21 cycles	integer * + >>, shift rounded with tmp
21 cycles	integer * + >>, shift truncated
24 cycles	floating-point * +, cast rounded
24 cycles	floating-point * +, cast truncated
24 cycles	integer * + floating-point /, cast rounded
27 cycles	floating-point * +, floor rounded
27 cycles	integer * + floating-point /, floor rounded
-------------- next part --------------
#  Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
# /home/nil/.local/bin/gcc
# gcc -O2 -fomit-frame-pointer -march=native -mtune=native -c bench.c -o bench.o
# gcc  bench.o cpucycles.o -lm -o bench
30 cycles	integer * + >>, shift truncated
33 cycles	floating-point * +, cast truncated
33 cycles	integer * + /, division truncated
33 cycles	integer * + floating-point /, cast rounded
33 cycles	integer * + >>, shift rounded
33 cycles	integer * + >>, shift rounded with tmp
36 cycles	floating-point * +, cast rounded
48 cycles	floating-point * +, floor rounded
54 cycles	integer * + floating-point /, floor rounded
# icc -O2 -fomit-frame-pointer -xHost -fp-model precise -c bench.c -o bench.o
# icc  bench.o cpucycles.o -lm -o bench
30 cycles	integer * + >>, shift truncated
51 cycles	integer * + /, division truncated
60 cycles	integer * + floating-point /, cast rounded
60 cycles	integer * + >>, shift rounded
60 cycles	integer * + >>, shift rounded with tmp
72 cycles	floating-point * +, cast truncated
78 cycles	floating-point * +, cast rounded
108 cycles	integer * + floating-point /, floor rounded
180 cycles	floating-point * +, floor rounded
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgby_bench.tar.gz
Type: application/octet-stream
Size: 2541 bytes
Desc: not available
URL: <http://tools.ipol.im/mailman/archive/discuss/attachments/20110729/d9171985/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://tools.ipol.im/mailman/archive/discuss/attachments/20110729/d9171985/attachment.pgp>


More information about the discuss mailing list