aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1
authorMartin Storsjö <martin@martin.st>
Thu, 23 Feb 2017 21:33:58 +0000 (23:33 +0200)
committerMartin Storsjö <martin@martin.st>
Sat, 11 Mar 2017 11:14:50 +0000 (13:14 +0200)
commitf32690a298badbf2df66319e9b38236ad3d3e321
tree63a763f839fd8739fd459c72293b6aee9a0e674d
parent3fbbad29847c79f422128ad88f174c53a5f6c449
aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1

This is one cycle faster in total, and three instructions fewer.

Before:
vp9_loop_filter_mix2_v_44_16_neon: 123.2
After:
vp9_loop_filter_mix2_v_44_16_neon: 122.2

This is cherrypicked from libav commit
3bf9c48320f25f3d5557485b0202f22ae60748b0.

Signed-off-by: Martin Storsjö <martin@martin.st>
libavcodec/aarch64/vp9lpf_neon.S