sws: implement MMX/SSE2/SSSE3/SSE4 versions for horizontal scaling.