avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H
authorzjh8890 <243186085@qq.com>
Sat, 21 Nov 2015 16:07:35 +0000 (00:07 +0800)
committerMichael Niedermayer <michael@niedermayer.cc>
Mon, 14 Dec 2015 15:51:01 +0000 (16:51 +0100)
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.
(cherry picked from commit c18176bd551b4616757080376707637e30547fd0)

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
libavcodec/aarch64/neon.S

index 619aec6..a227cbd 100644 (file)
 .macro  transpose_4x4H  r0, r1, r2, r3, r4, r5, r6, r7
         trn1            \r4\().4H,  \r0\().4H,  \r1\().4H
         trn2            \r5\().4H,  \r0\().4H,  \r1\().4H
-        trn1            \r7\().4H,  \r3\().4H,  \r2\().4H
-        trn2            \r6\().4H,  \r3\().4H,  \r2\().4H
+        trn1            \r7\().4H,  \r2\().4H,  \r3\().4H
+        trn2            \r6\().4H,  \r2\().4H,  \r3\().4H
         trn1            \r0\().2S,  \r4\().2S,  \r7\().2S
         trn2            \r3\().2S,  \r4\().2S,  \r7\().2S
         trn1            \r1\().2S,  \r5\().2S,  \r6\().2S