arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling
authorMartin Storsjö <martin@martin.st>
Wed, 4 Jan 2017 11:08:51 +0000 (13:08 +0200)
committerMartin Storsjö <martin@martin.st>
Sat, 11 Mar 2017 11:14:48 +0000 (13:14 +0200)
commit758302e4bc14e93989e7feb1135ec3f807c3310d
tree10a16d328653707ab878de0f6ce61a2882a8e479
parent045e33ae3fee74e39b1321dddf727eacb1ecf541
arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling

This work is sponsored by, and copyright, Google.

Before:                            Cortex A7      A8      A9     A53
vp9_inv_dct_dct_16x16_sub1_add_neon:   273.0   189.5   211.7   235.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   752.0   459.2   862.2   553.9
After:
vp9_inv_dct_dct_16x16_sub1_add_neon:   226.5   145.0   225.1   171.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   721.2   415.7   727.6   475.0

This is cherrypicked from libav commit
a76bf8cf1277ef6feb1580b578f5e6ca327e713c.

Signed-off-by: Martin Storsjö <martin@martin.st>
libavcodec/arm/vp9itxfm_neon.S