avutil/pixelutils: faster pixelutils_sad_[au]_16x16
authorClément Bœsch <u@pkh.me>
Thu, 14 Aug 2014 20:30:55 +0000 (22:30 +0200)
committerClément Bœsch <u@pkh.me>
Sat, 23 Aug 2014 08:18:53 +0000 (10:18 +0200)
commit45c7f3997ea11c3d1007b2126b1c0049a8c27105
tree4b197592039c1b5cfd458db0e12b604f94e0dca4
parentc82a288f8747a92278ba2e1a8c30380c18254bbd
avutil/pixelutils: faster pixelutils_sad_[au]_16x16

~560 → ~500 decicycles

This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html

Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
libavutil/x86/pixelutils.asm