This prevents a call to bytestream_get_be16() using a movzwl both before
and after the ror instruction, which is obviously inefficient. Arm uses
the same trick also.
Sintel decoding goes from (avg+SD) 9.856 +/- 0.003 to 9.797 +/- 0.003 sec.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
#include "libavutil/attributes.h"
#define av_bswap16 av_bswap16
-static av_always_inline av_const uint16_t av_bswap16(uint16_t x)
+static av_always_inline av_const unsigned av_bswap16(unsigned x)
{
- __asm__("rorw $8, %0" : "+r"(x));
+ __asm__("rorw $8, %w0" : "+r"(x));
return x;
}