ffmpeg.git
2 years agopthread_frame: do not run hwaccel decoding asynchronously unless it's safe
Anton Khirnov [Thu, 24 Nov 2016 14:14:22 +0000 (15:14 +0100)]
pthread_frame: do not run hwaccel decoding asynchronously unless it's safe

Certain hardware decoding APIs are not guaranteed to be thread-safe, so
having the user access decoded hardware surfaces while the decoder is
running in another thread can cause failures (this is mainly known to
happen with DXVA2).

For such hwaccels, only allow the decoding thread to run while the user
is inside a lavc decode call (avcodec_send_packet/receive_frame).

Merges Libav commit d4a91e65.

Signed-off-by: wm4 <nfxjfg@googlemail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agopthread_frame: ensure the threads don't run simultaneously with hwaccel
Anton Khirnov [Sat, 3 Dec 2016 14:21:40 +0000 (15:21 +0100)]
pthread_frame: ensure the threads don't run simultaneously with hwaccel

Merges Libav commit 8dfba25c.

Signed-off-by: wm4 <nfxjfg@googlemail.com>
2 years agopthread_frame: use better memory orders for frame progress
Wan-Teh Chang [Fri, 9 Dec 2016 17:54:47 +0000 (09:54 -0800)]
pthread_frame: use better memory orders for frame progress

This improves commit 59c70227405c214b29971e6272f3a3ff6fcce3d0.

In ff_thread_report_progress(), the fast code path can load
progress[field] with the relaxed memory order, and the slow code path
can store progress[field] with the release memory order. These changes
are mainly intended to avoid confusion when one inspects the source code.
They are unlikely to have measurable performance improvement.

ff_thread_report_progress() and ff_thread_await_progress() form a pair.
ff_thread_await_progress() reads progress[field] with the acquire memory
order (in the fast code path). Therefore, one expects to see
ff_thread_report_progress() write progress[field] with the matching
release memory order.

In the fast code path in ff_thread_report_progress(), the atomic load of
progress[field] doesn't need the acquire memory order because the
calling thread is trying to make the data it just decoded visible to the
other threads, rather than trying to read the data decoded by other
threads.

In ff_thread_get_buffer(), initialize progress[0] and progress[1] using
atomic_init().

Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Merges Libav commit 343e2833.

Signed-off-by: wm4 <nfxjfg@googlemail.com>
2 years agopthread_frame: Unreference hw_frames_ctx on per-thread codec contexts
Mark Thompson [Thu, 3 Nov 2016 00:13:35 +0000 (00:13 +0000)]
pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts

When decoding with threads enabled, the get_format callback will be
called with one of the per-thread codec contexts rather than with the
outer context.  If a hwaccel is in use too, this will add a reference
to the hardware frames context on that codec context, which will then
propagate to all of the other per-thread contexts for decoding.  Once
the decoder finishes, however, the per-thread contexts are not freed
normally, so these references leak.

Merges Libav commit fd0fae60.

Signed-off-by: wm4 <nfxjfg@googlemail.com>
2 years agopthread_frame: properly propagate the hw frame context across frame threads
Anton Khirnov [Mon, 7 Nov 2016 13:21:18 +0000 (14:21 +0100)]
pthread_frame: properly propagate the hw frame context across frame threads

Merges Libav commit 84f22568.

Signed-off-by: wm4 <nfxjfg@googlemail.com>
2 years agopthread_frame: use atomics for frame progress
Anton Khirnov [Sun, 17 Jul 2016 22:04:16 +0000 (00:04 +0200)]
pthread_frame: use atomics for frame progress

Merges Libav commit 59c70227.

Signed-off-by: wm4 <nfxjfg@googlemail.com>
2 years agopthread_frame: use atomics for PerThreadContext.state
Anton Khirnov [Sun, 17 Jul 2016 22:04:16 +0000 (00:04 +0200)]
pthread_frame: use atomics for PerThreadContext.state

Merges Libav commit 64a31b28.

Signed-off-by: wm4 <nfxjfg@googlemail.com>
2 years agoffmpeg: don't unnecessarily use a deprecated API function
wm4 [Thu, 16 Mar 2017 04:18:59 +0000 (05:18 +0100)]
ffmpeg: don't unnecessarily use a deprecated API function

Since we've disabled side data merging in ffmpeg.c, this really changes
nothing.

2 years agoavcodec, avformat: deprecate anything related to side data merging
wm4 [Thu, 16 Mar 2017 03:52:55 +0000 (04:52 +0100)]
avcodec, avformat: deprecate anything related to side data merging

This patch deprecates anything that has to do with merging/splitting
side data. Automatic side data merging (and splitting), as well as all
API symbols involved in it, are removed completely.

Two FF_API_ defines are dedicated to deprecating API symbols related to
this: FF_API_MERGE_SD_API removes av_packet_split/merge_side_data in
libavcodec, and FF_API_LAVF_KEEPSIDE_FLAG deprecates
AVFMT_FLAG_KEEP_SIDE_DATA in libavformat.

Since it was claimed that changing the default from merging side data to
not doing it is an ABI change, there are two additional FF_API_ defines,
which stop using the side data merging/splitting by default (and remove
any code in avformat/avcodec doing this): FF_API_MERGE_SD in libavcodec,
and FF_API_LAVF_MERGE_SD in libavformat.

It is very much intended that FF_API_MERGE_SD and FF_API_LAVF_MERGE_SD
are quickly defined to 0 in the next ABI bump, while the API symbols are
retained for a longer time for the sake of compatibility.
AVFMT_FLAG_KEEP_SIDE_DATA will (very much intentionally) do nothing for
most of the time it will still be defined. Keep in mind that no code
exists that actually tries to unset this flag for any reason, nor does
such code need to exist. Code setting this flag explicitly will work as
before. Thus it's ok for AVFMT_FLAG_KEEP_SIDE_DATA to do nothing once
side data merging has been removed from libavformat.

In order to avoid that anyone in the future does this incorrectly, here
is a small guide how to update the internal code on bumps:

- next ABI bump (probably soon):
  - define FF_API_LAVF_MERGE_SD to 0, and remove all code covered by it
  - define FF_API_MERGE_SD to 0, and remove all code covered by it
- next API bump (typically two years in the future or so):
  - define FF_API_LAVF_KEEPSIDE_FLAG to 0, and remove all code covered
    by it
  - define FF_API_MERGE_SD_API to 0, and remove all code covered by it

This forces anyone who actually wants packet side data to temporarily
use deprecated API to get it all. If you ask me, this is batshit fucked
up crazy, but it's how we roll. Making AVFMT_FLAG_KEEP_SIDE_DATA to be
set by default was rejected as an ABI change, so I'm going all the way
to get rid of this once and for all.

Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agoadd signature filter for MPEG7 video signature
Gerion Entrup [Mon, 2 Jan 2017 01:08:57 +0000 (02:08 +0100)]
add signature filter for MPEG7 video signature

This filter does not implement all features of MPEG7. Missing features:
- compression of signature files
- work only on (cropped) parts of the video

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agolavc/nvenc: misc cosmetics to reduce diff with Libav
Clément Bœsch [Mon, 20 Mar 2017 22:04:28 +0000 (23:04 +0100)]
lavc/nvenc: misc cosmetics to reduce diff with Libav

2 years agoMerge commit '70de2ea4261f860457a04e3d0c58c5543f403325'
Clément Bœsch [Mon, 20 Mar 2017 21:57:28 +0000 (22:57 +0100)]
Merge commit '70de2ea4261f860457a04e3d0c58c5543f403325'

* commit '70de2ea4261f860457a04e3d0c58c5543f403325':
  nvenc: Extended rate-control support as provided by SDK 7

This commit is a noop, see facc19ef06a753515a3fa604269dd1aa412dc08f

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '358c887a9fa0fb2e7ce089eaea71ab924a3e47a7'
Clément Bœsch [Mon, 20 Mar 2017 21:56:01 +0000 (22:56 +0100)]
Merge commit '358c887a9fa0fb2e7ce089eaea71ab924a3e47a7'

* commit '358c887a9fa0fb2e7ce089eaea71ab924a3e47a7':
  nvenc: Add support for high bitdepth

This commit is a noop, see d1bf8a3aa878003f5019bb97c3228f8027e5d116

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'e02e2515b24bfc37ede6ca1744696230be55e50b'
Clément Bœsch [Mon, 20 Mar 2017 21:42:25 +0000 (22:42 +0100)]
Merge commit 'e02e2515b24bfc37ede6ca1744696230be55e50b'

* commit 'e02e2515b24bfc37ede6ca1744696230be55e50b':
  nvenc: Add some easier to understand presets that match x264 terminology

This commit is a noop, see a81b000a392e5c7119d2eddb3f4c90ab9f1e0554 and
faffff88c21c24765e5a3c87ffc657b191c4efc0.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '352741b5ead1543d775ccf6040f33023e4491186'
Clément Bœsch [Mon, 20 Mar 2017 21:38:14 +0000 (22:38 +0100)]
Merge commit '352741b5ead1543d775ccf6040f33023e4491186'

* commit '352741b5ead1543d775ccf6040f33023e4491186':
  nvenc: Make sure that enum and array index match

This commit is a noop, see a81b000a392e5c7119d2eddb3f4c90ab9f1e0554

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5'
Clément Bœsch [Mon, 20 Mar 2017 21:28:38 +0000 (22:28 +0100)]
Merge commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5'

* commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5':
  audiodsp/x86: yasmify vector_clipf_sse
  audiodsp: reorder arguments for vector_clipf

Merged the version from Libav after a discussion with James Almer on
IRC:

19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94c7d7ebf995d4dbb4f878d06272f9c6
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed

Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoconfigure: fix crystalhd detection
Clément Bœsch [Mon, 20 Mar 2017 18:45:48 +0000 (19:45 +0100)]
configure: fix crystalhd detection

Regression since 4563a86f011b54977b390c72ec3901cace35f8da.

See 20c4fb2e010fff7e3f8acd36ad132c0140fec5fb for more information.

Tested-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agoMerge commit 'bf58545aace7d14522ce4fa680c7b3ff62109a3a'
Clément Bœsch [Mon, 20 Mar 2017 18:11:44 +0000 (19:11 +0100)]
Merge commit 'bf58545aace7d14522ce4fa680c7b3ff62109a3a'

* commit 'bf58545aace7d14522ce4fa680c7b3ff62109a3a':
  audiodsp: fix vector_clipf documentation

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'e9ef6171396dc4106526aaa86b620c61ca3d1017'
Clément Bœsch [Mon, 20 Mar 2017 18:10:56 +0000 (19:10 +0100)]
Merge commit 'e9ef6171396dc4106526aaa86b620c61ca3d1017'

* commit 'e9ef6171396dc4106526aaa86b620c61ca3d1017':
  checkasm: add tests for audiodsp

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '2eb97af66af90ca3978229da151f0b8b3a5d9370'
Clément Bœsch [Mon, 20 Mar 2017 18:04:12 +0000 (19:04 +0100)]
Merge commit '2eb97af66af90ca3978229da151f0b8b3a5d9370'

* commit '2eb97af66af90ca3978229da151f0b8b3a5d9370':
  checkasm: add a test for blockdsp

Merged-by: Clément Bœsch <u@pkh.me>
2 years agolavc/arm: fix indent in blockdsp_init_neon
Clément Bœsch [Mon, 20 Mar 2017 18:01:25 +0000 (19:01 +0100)]
lavc/arm: fix indent in blockdsp_init_neon

2 years agoMerge commit 'eea9857bfd6925d0c34382c00b971ee6df12ad44'
Clément Bœsch [Mon, 20 Mar 2017 17:59:40 +0000 (18:59 +0100)]
Merge commit 'eea9857bfd6925d0c34382c00b971ee6df12ad44'

* commit 'eea9857bfd6925d0c34382c00b971ee6df12ad44':
  blockdsp: drop the high_bit_depth parameter

This commit is a noop, see 562ba4a827ceb9ed5b7d056484a9c2312a5458c5

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '340f12f71207513672b5165d810cb6c8622c6b21'
Clément Bœsch [Mon, 20 Mar 2017 17:54:33 +0000 (18:54 +0100)]
Merge commit '340f12f71207513672b5165d810cb6c8622c6b21'

* commit '340f12f71207513672b5165d810cb6c8622c6b21':
  hwcontext_cuda: Add P010 and YUV444P16 pixel format

This commit is a noop, we already have P010 and P016.

18:52 <@BtbN> Adding AV_PIX_FMT_YUV444P16 won't hurt, but doesn't gain anything.
18:53 <@BtbN> I'd say just noop it. If we'll ever need it, it will be added in turn.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '75d98e30afab61542faab3c0f11880834653bd6b'
Clément Bœsch [Mon, 20 Mar 2017 17:44:00 +0000 (18:44 +0100)]
Merge commit '75d98e30afab61542faab3c0f11880834653bd6b'

* commit '75d98e30afab61542faab3c0f11880834653bd6b':
  audiodsp/x86: clear the high bits of the order parameter on 64bit

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '1d6c76e11febb58738c9647c47079d02b5e10094'
Clément Bœsch [Mon, 20 Mar 2017 17:41:49 +0000 (18:41 +0100)]
Merge commit '1d6c76e11febb58738c9647c47079d02b5e10094'

* commit '1d6c76e11febb58738c9647c47079d02b5e10094':
  audiodsp/x86: fix ff_vector_clip_int32_sse2

No functionnal changes, only cosmetics. This issue was fixed in
9a9e2f1c8aa4539a261625145e5c1f46a8106ac2.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'
Clément Bœsch [Mon, 20 Mar 2017 17:38:07 +0000 (18:38 +0100)]
Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'

* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793':
  x86util: Document SBUTTERFLY macro

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'de64dd13cbd47fd54334b6aa2a2cd3c7c36daae2'
Clément Bœsch [Mon, 20 Mar 2017 17:37:00 +0000 (18:37 +0100)]
Merge commit 'de64dd13cbd47fd54334b6aa2a2cd3c7c36daae2'

* commit 'de64dd13cbd47fd54334b6aa2a2cd3c7c36daae2':
  avcodec: Add the extended pixel format profile for HEVC

This commit is a noop, see 5a41999d81459297183c4e27618e38f8ba719853

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '136f55207521f0b03194ef5b55ba70f1635d6aee'
Clément Bœsch [Mon, 20 Mar 2017 17:34:06 +0000 (18:34 +0100)]
Merge commit '136f55207521f0b03194ef5b55ba70f1635d6aee'

* commit '136f55207521f0b03194ef5b55ba70f1635d6aee':
  mpegvideo_motion: Handle edge emulation even without unrestricted_mv

This commit is a noop, see 7b1e0beb2da31a0a8847bc9c68a87a120b71fa8a

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '15fcf6292ed79be274c824fedb099c2665f4cc15'
Clément Bœsch [Mon, 20 Mar 2017 17:30:03 +0000 (18:30 +0100)]
Merge commit '15fcf6292ed79be274c824fedb099c2665f4cc15'

* commit '15fcf6292ed79be274c824fedb099c2665f4cc15':
  build: remove hardcoded name of version header

This commit is noop, our version.sh is completely different.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '8c201dde0ab62e5cd581d958e78d7609e0ba710d'
Clément Bœsch [Mon, 20 Mar 2017 17:29:06 +0000 (18:29 +0100)]
Merge commit '8c201dde0ab62e5cd581d958e78d7609e0ba710d'

* commit '8c201dde0ab62e5cd581d958e78d7609e0ba710d':
  build: doc: more fine-grained dependencies for generated texi files

This commit is a noop, we have a different system for handling the
documentation.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoconfigure: error out if jni is enabled and cannot be found
Matthieu Bouron [Wed, 15 Mar 2017 14:23:34 +0000 (15:23 +0100)]
configure: error out if jni is enabled and cannot be found

2 years agoMerge commit 'bc7399934def210c2a84ea51375d50f79c676c96'
Clément Bœsch [Mon, 20 Mar 2017 15:53:56 +0000 (16:53 +0100)]
Merge commit 'bc7399934def210c2a84ea51375d50f79c676c96'

* commit 'bc7399934def210c2a84ea51375d50f79c676c96':
  libdc1394: Distinguish between enumeration errors and no cameras found

This commit is a noop, see 384251daffb98d88b0fe897b341bb68445f885de

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'df3795025337479a639cb3cd26c93a4e82ccd4db'
Clément Bœsch [Mon, 20 Mar 2017 15:47:41 +0000 (16:47 +0100)]
Merge commit 'df3795025337479a639cb3cd26c93a4e82ccd4db'

* commit 'df3795025337479a639cb3cd26c93a4e82ccd4db':
  rtsp: Fix a crash with the RTSP muxer

This commit is a noop, see f8a13c72132a65e34e05b878dc780ad330dd7371

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'bdf7610eb266fd3de650040c97328791868abd82'
Clément Bœsch [Mon, 20 Mar 2017 15:44:53 +0000 (16:44 +0100)]
Merge commit 'bdf7610eb266fd3de650040c97328791868abd82'

* commit 'bdf7610eb266fd3de650040c97328791868abd82':
  vf_scale_vaapi: Crop input surface to active region

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '3a9662af6c741f8354b1ca97642f78f5c02e2e8f'
Clément Bœsch [Mon, 20 Mar 2017 15:42:50 +0000 (16:42 +0100)]
Merge commit '3a9662af6c741f8354b1ca97642f78f5c02e2e8f'

* commit '3a9662af6c741f8354b1ca97642f78f5c02e2e8f':
  vaapi_h264: Fix HRD bit_rate/cpb_size scaling

This commit is a noop, see 06d73d002e7f911f26ae1548b46e442a6ece9a4a

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '7081620aca36e616ea96f71fd71d2703e3abae09'
Clément Bœsch [Mon, 20 Mar 2017 15:07:11 +0000 (16:07 +0100)]
Merge commit '7081620aca36e616ea96f71fd71d2703e3abae09'

* commit '7081620aca36e616ea96f71fd71d2703e3abae09':
  hwcontext_vdpau: Fix missing subscripts

This commit is a noop, see f7e9275f83ec116fc859367d61998eae8af438fc

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '09a145b3c837273b1379321e44386a3233156e75'
Clément Bœsch [Mon, 20 Mar 2017 15:02:43 +0000 (16:02 +0100)]
Merge commit '09a145b3c837273b1379321e44386a3233156e75'

* commit '09a145b3c837273b1379321e44386a3233156e75':
  hwcontext_vdpau: Remove duplicate definition of GET_CALLBACK

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'de452e503734ebb0fdbce86e9d16693b3530fad3'
Clément Bœsch [Mon, 20 Mar 2017 12:47:29 +0000 (13:47 +0100)]
Merge commit 'de452e503734ebb0fdbce86e9d16693b3530fad3'

* commit 'de452e503734ebb0fdbce86e9d16693b3530fad3':
  pixblockdsp: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoavcodec/wmaprodec: reset offsets when error happens
Paul B Mahol [Mon, 20 Mar 2017 14:05:01 +0000 (15:05 +0100)]
avcodec/wmaprodec: reset offsets when error happens

Fixes #6250.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2 years agoconfigure: add stdint.h to x264 and xavs checks
Ricardo Constantino [Mon, 20 Mar 2017 14:10:34 +0000 (14:10 +0000)]
configure: add stdint.h to x264 and xavs checks

Regression from 4563a86f011b54977b390c72ec3901cace35f8da.
Both need stdint.h included before the respective x264.h and xavs.h.

Old require() used different, separate checks that didn't actually
need stdint.h to work. require2()'s (now require) check_func_headers()
does include stdint.h but only after the custom headers.

For libxavs this would also be consequently fixed by libav's
commit 20abcaa273a6e77d0a2e1a98c643c73562c6f8f2 which wasn't merged yet.

2 years agoavcodec/vp9: avx2 implementation of ipred_dl_16x16_16
Ilia [Sun, 12 Mar 2017 22:06:26 +0000 (05:06 +0700)]
avcodec/vp9: avx2 implementation of ipred_dl_16x16_16

vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5

Benchmarked with 10000 runs

Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2 years agoh264pred: added AVX2 implementation for tm_vp8 16x16.
Mirage Abeysekara [Sat, 18 Mar 2017 19:50:53 +0000 (01:20 +0530)]
h264pred: added AVX2 implementation for tm_vp8 16x16.

checkasm --bench results with 5000 runs

pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2 years agowmavoice: remove unused or write-only variables.
Ronald S. Bultje [Thu, 22 Dec 2016 14:02:32 +0000 (09:02 -0500)]
wmavoice: remove unused or write-only variables.

2 years agoMerge commit 'ab3554e1a7c04a5ea30f9c905de92348478ef7c8'
Clément Bœsch [Mon, 20 Mar 2017 11:18:45 +0000 (12:18 +0100)]
Merge commit 'ab3554e1a7c04a5ea30f9c905de92348478ef7c8'

* commit 'ab3554e1a7c04a5ea30f9c905de92348478ef7c8':
  configure: Drop check_lib()/require() in favor of check_lib2()/require2()

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '468bfe38c66d4d020984158e53b09a6a5749f394'
Clément Bœsch [Mon, 20 Mar 2017 11:08:11 +0000 (12:08 +0100)]
Merge commit '468bfe38c66d4d020984158e53b09a6a5749f394'

* commit '468bfe38c66d4d020984158e53b09a6a5749f394':
  ppc: mpegvideo: Add proper runtime AltiVec detection

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '6ce93757ee6b81fe727bfdc9f546fd0ddf9139c3'
Clément Bœsch [Mon, 20 Mar 2017 11:05:34 +0000 (12:05 +0100)]
Merge commit '6ce93757ee6b81fe727bfdc9f546fd0ddf9139c3'

* commit '6ce93757ee6b81fe727bfdc9f546fd0ddf9139c3':
  ppc: Update #endif comments

This commit is mostly a noop as we seem to support PPC LE (see
902ce2a6c4364fd27ae3f1db78cd275caf79c006). Only the h264 chunks are
updated.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'caccb3a0cdc7ee32cbed7eab156d35025133eadc'
Clément Bœsch [Mon, 20 Mar 2017 10:57:32 +0000 (11:57 +0100)]
Merge commit 'caccb3a0cdc7ee32cbed7eab156d35025133eadc'

* commit 'caccb3a0cdc7ee32cbed7eab156d35025133eadc':
  audiodsp: ppc: Add VSX variant

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'e89cef40506d990a982aefedfde7d3ca4f88c524'
Clément Bœsch [Mon, 20 Mar 2017 10:55:20 +0000 (11:55 +0100)]
Merge commit 'e89cef40506d990a982aefedfde7d3ca4f88c524'

* commit 'e89cef40506d990a982aefedfde7d3ca4f88c524':
  checkasm: Read the unsigned value as it should

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '75d642a944d5579e4ef20ff3701422a64692afcf'
Clément Bœsch [Mon, 20 Mar 2017 10:51:57 +0000 (11:51 +0100)]
Merge commit '75d642a944d5579e4ef20ff3701422a64692afcf'

* commit '75d642a944d5579e4ef20ff3701422a64692afcf':
  vaapi_vp8: Explicitly include libva vp8 decode header
  vaapi_decode: Ignore the profile when not useful
  lavc/vaapi: Add VP8 decode hwaccel
  vp8: Add hwaccel hooks

This merge is a noop as these commits are already under review on the
mailing list. doc/libav-merge.txt is updated to track its progress.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '131a85a1fed9966bbd38517f76abfac0237e39dc'
Clément Bœsch [Mon, 20 Mar 2017 10:31:27 +0000 (11:31 +0100)]
Merge commit '131a85a1fed9966bbd38517f76abfac0237e39dc'

* commit '131a85a1fed9966bbd38517f76abfac0237e39dc':
  utvideo: Change type of array stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '52730e0f867fe77b7d2353d8b44e92edb7079ca5'
Clément Bœsch [Mon, 20 Mar 2017 10:26:00 +0000 (11:26 +0100)]
Merge commit '52730e0f867fe77b7d2353d8b44e92edb7079ca5'

* commit '52730e0f867fe77b7d2353d8b44e92edb7079ca5':
  iir_filter: Change type of array stride parameters to ptrdiff_t

The merge also updates the MIPS code and drop the extra log.h include.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '6b52762951fa138eef59e2628dabb389e0500e40'
Clément Bœsch [Mon, 20 Mar 2017 10:10:46 +0000 (11:10 +0100)]
Merge commit '6b52762951fa138eef59e2628dabb389e0500e40'

* commit '6b52762951fa138eef59e2628dabb389e0500e40':
  error_resilience: Change type of array stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7'
Clément Bœsch [Mon, 20 Mar 2017 10:04:50 +0000 (11:04 +0100)]
Merge commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7'

* commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7':
  configure: Simplify clock_gettime() test

nanosleep check also updated.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '3aa9d37d03da3c9b482d19b3988659287815280e'
Clément Bœsch [Mon, 20 Mar 2017 10:00:07 +0000 (11:00 +0100)]
Merge commit '3aa9d37d03da3c9b482d19b3988659287815280e'

* commit '3aa9d37d03da3c9b482d19b3988659287815280e':
  build: Fix directory dependencies of tests/pixfmts.mak target

This might not be necessary given our mkdirs in the configure, but it
probably doesn't hurt.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff'
Clément Bœsch [Mon, 20 Mar 2017 09:47:01 +0000 (10:47 +0100)]
Merge commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff'

* commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff':
  configure: Fix --disable-pod2man / --disable-texi2html

This commit is a noop, we have dedicated documentation option for this
purpose.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoconfigure: remove pod2man from the config list
Clément Bœsch [Mon, 20 Mar 2017 09:45:18 +0000 (10:45 +0100)]
configure: remove pod2man from the config list

The configure has the --disable-manpages option for this purpose, and
--disable-pod2man is currently ignored due to that. This is also
consistent with the other documentation options.

2 years agoMerge commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd'
Clément Bœsch [Mon, 20 Mar 2017 08:48:22 +0000 (09:48 +0100)]
Merge commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd'

* commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd':
  configure: Simplify libopenjpeg check

This commit is a noop, our libopenjpeg check is already "simpler".

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '2610c9528f86286e4c6e174411a26ff5b4815cde'
Clément Bœsch [Mon, 20 Mar 2017 08:46:33 +0000 (09:46 +0100)]
Merge commit '2610c9528f86286e4c6e174411a26ff5b4815cde'

* commit '2610c9528f86286e4c6e174411a26ff5b4815cde':
  configure: Move initial VAAPI check to a more sensible place

This commit is a noop, see 17989dcf540c13a7122663f64c09dc830ffc3a41

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '5b5ed92d92252a685e891a5d636870e223b63228'
Clément Bœsch [Mon, 20 Mar 2017 08:43:52 +0000 (09:43 +0100)]
Merge commit '5b5ed92d92252a685e891a5d636870e223b63228'

* commit '5b5ed92d92252a685e891a5d636870e223b63228':
  sanm: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2 years agolavc/copy_block: style fix
Clément Bœsch [Mon, 20 Mar 2017 08:23:15 +0000 (09:23 +0100)]
lavc/copy_block: style fix

2 years agoMerge commit '73f5e17a203713c4ac4e5a821809823b383b195f'
Clément Bœsch [Mon, 20 Mar 2017 08:22:36 +0000 (09:22 +0100)]
Merge commit '73f5e17a203713c4ac4e5a821809823b383b195f'

* commit '73f5e17a203713c4ac4e5a821809823b383b195f':
  copy_block: Change type of array stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '21e500ba647aec233d5930d3d1081489d0d53ceb'
Clément Bœsch [Mon, 20 Mar 2017 08:17:34 +0000 (09:17 +0100)]
Merge commit '21e500ba647aec233d5930d3d1081489d0d53ceb'

* commit '21e500ba647aec233d5930d3d1081489d0d53ceb':
  svq1dec: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '746c56b7730ce09397d3a8354acc131285e9d829'
Clément Bœsch [Mon, 20 Mar 2017 08:07:57 +0000 (09:07 +0100)]
Merge commit '746c56b7730ce09397d3a8354acc131285e9d829'

* commit '746c56b7730ce09397d3a8354acc131285e9d829':
  indeo: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08'
Clément Bœsch [Mon, 20 Mar 2017 07:52:07 +0000 (08:52 +0100)]
Merge commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08'

* commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08':
  Drop memalign hack

Merged, as this may indeed be uneeded since
46e3936fb04d06550151e667357065e3f646da1a.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0'
Clément Bœsch [Mon, 20 Mar 2017 07:37:40 +0000 (08:37 +0100)]
Merge commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0'

* commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0':
  hwcontext_dxva2: use the special UC copy for downloading frames

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'
Clément Bœsch [Mon, 20 Mar 2017 07:30:42 +0000 (08:30 +0100)]
Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'

* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
  imgutils: add a function for copying image data from GPU mapped memory

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '24da430324735f95880c4a4a54298dc8023125bb'
Clément Bœsch [Mon, 20 Mar 2017 07:26:09 +0000 (08:26 +0100)]
Merge commit '24da430324735f95880c4a4a54298dc8023125bb'

* commit '24da430324735f95880c4a4a54298dc8023125bb':
  Changelog: mark the release 12 branch

This commit is a noop.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '851960f6f8cf1f946fe42fa36cf6598fac68072c'
Clément Bœsch [Mon, 20 Mar 2017 07:25:01 +0000 (08:25 +0100)]
Merge commit '851960f6f8cf1f946fe42fa36cf6598fac68072c'

* commit '851960f6f8cf1f946fe42fa36cf6598fac68072c':
  lavc: Remove old vaapi decode infrastructure
  avconv_vaapi: Convert to use hw_frames_ctx only
  vaapi_mpeg4: Convert to use the new VAAPI hwaccel code
  vaapi_vc1: Convert to use the new VAAPI hwaccel code
  vaapi_mpeg2: Convert to use the new VAAPI hwaccel code
  vaapi_h264: Convert to use the new VAAPI hwaccel code
  lavc: Rewrite VAAPI decode infrastructure

This merge is a noop, these commits have already been cherry-picked.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '72eba6558ee4f10239ba3f472c0b033ec70082a7'
Clément Bœsch [Mon, 20 Mar 2017 07:21:09 +0000 (08:21 +0100)]
Merge commit '72eba6558ee4f10239ba3f472c0b033ec70082a7'

* commit '72eba6558ee4f10239ba3f472c0b033ec70082a7':
  wmavoice: Simplify GetBitContext initialization

This commit is a noop. We don't have that code anymore since
3deb4b54a24f8cddce463d9f5751b01efeb976af.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '80fc75d51e3312e1890591048eb6a3d499b6e49d'
Clément Bœsch [Mon, 20 Mar 2017 07:19:03 +0000 (08:19 +0100)]
Merge commit '80fc75d51e3312e1890591048eb6a3d499b6e49d'

* commit '80fc75d51e3312e1890591048eb6a3d499b6e49d':
  Changelog: Mention mov with multiple stsd

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99'
Clément Bœsch [Mon, 20 Mar 2017 07:17:09 +0000 (08:17 +0100)]
Merge commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99'

* commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99':
  High Definition Compatible Digital (HDCD) decoder filter, using libhdcd

This commit is a noop, we have that code natively.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit '95f80293456d9d4b1b096621260c38bc90325ec0'
Clément Bœsch [Mon, 20 Mar 2017 07:12:57 +0000 (08:12 +0100)]
Merge commit '95f80293456d9d4b1b096621260c38bc90325ec0'

* commit '95f80293456d9d4b1b096621260c38bc90325ec0':
  avprobe: Fix memory leak

This commit is a noop, ffprobe is not affected.

Merged-by: Clément Bœsch <u@pkh.me>
2 years agodoc/APIchanges: fill date & hash for AV_PIX_FMT_FLAG_BAYER
Clément Bœsch [Mon, 20 Mar 2017 07:10:54 +0000 (08:10 +0100)]
doc/APIchanges: fill date & hash for AV_PIX_FMT_FLAG_BAYER

2 years agoMerge commit '8db804e8f549d5b86a1edf62736e0ef80f160da9'
Clément Bœsch [Mon, 20 Mar 2017 07:09:15 +0000 (08:09 +0100)]
Merge commit '8db804e8f549d5b86a1edf62736e0ef80f160da9'

* commit '8db804e8f549d5b86a1edf62736e0ef80f160da9':
  mov: Remove old b-frame/video delay heuristic

This commit is a noop, see 425be3c810e019c7a1298be7219536fa28f7ba49

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoMerge commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff'
Clément Bœsch [Mon, 20 Mar 2017 07:08:31 +0000 (08:08 +0100)]
Merge commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff'

* commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff':
  mov: Remove ancient heuristic hack

This commit is a noop, see 04f8d312877ffdcb816c7ff74b94eaa06dd6e1f0

Merged-by: Clément Bœsch <u@pkh.me>
2 years agoswscale: cosmetics in is{RGB,BGR}inInt
Clément Bœsch [Sun, 19 Mar 2017 22:42:10 +0000 (23:42 +0100)]
swscale: cosmetics in is{RGB,BGR}inInt

Reduce diff with Libav.

2 years agoswscale: remove unused is{RGB,BGR}inBytes
Clément Bœsch [Sun, 19 Mar 2017 22:36:29 +0000 (23:36 +0100)]
swscale: remove unused is{RGB,BGR}inBytes

2 years agoswscale: use a (more correct) function for isPacked
Clément Bœsch [Sun, 19 Mar 2017 14:28:19 +0000 (15:28 +0100)]
swscale: use a (more correct) function for isPacked

2 years agoswscale: use a function for isAnyRGB
Clément Bœsch [Sun, 19 Mar 2017 14:15:10 +0000 (15:15 +0100)]
swscale: use a function for isAnyRGB

2 years agoswscale: use a function for isBayer
Clément Bœsch [Sun, 19 Mar 2017 14:04:53 +0000 (15:04 +0100)]
swscale: use a function for isBayer

2 years agolavu: add AV_PIX_FMT_FLAG_BAYER
Clément Bœsch [Sun, 19 Mar 2017 21:34:31 +0000 (22:34 +0100)]
lavu: add AV_PIX_FMT_FLAG_BAYER

2 years agoswscale: use a function for isGray
Clément Bœsch [Sun, 19 Mar 2017 13:57:29 +0000 (14:57 +0100)]
swscale: use a function for isGray

2 years agofate: add fate-sws-pixdesc-query
Clément Bœsch [Sun, 19 Mar 2017 13:48:32 +0000 (14:48 +0100)]
fate: add fate-sws-pixdesc-query

Test the pixel format querying within libswscale.

2 years agoavcodec/mjpegdec: quant_matrixes can be up to 65535, use uint16_t
Michael Niedermayer [Fri, 17 Mar 2017 02:25:18 +0000 (03:25 +0100)]
avcodec/mjpegdec: quant_matrixes can be up to 65535, use uint16_t

Fixes invalid shift
Fixes: 870/clusterfuzz-testcase-5649105424482304

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agoavcodec/mjpegdec: Check quant_matrixes values for being non zero
Michael Niedermayer [Fri, 17 Mar 2017 02:25:17 +0000 (03:25 +0100)]
avcodec/mjpegdec: Check quant_matrixes values for being non zero

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agoavcodec/vp56: Check avctx->error_concealment before enabling EC
Michael Niedermayer [Thu, 16 Mar 2017 10:20:46 +0000 (11:20 +0100)]
avcodec/vp56: Check avctx->error_concealment before enabling EC

Fixes timeout with 847/clusterfuzz-testcase-5291877358108672
Fixes timeout with 850/clusterfuzz-testcase-5721296509861888

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agoavcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647...
Michael Niedermayer [Thu, 16 Mar 2017 02:02:50 +0000 (03:02 +0100)]
avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int'

Fixes: 864/clusterfuzz-testcase-4774385942528000

See: [FFmpeg-devel] [PATCH 1/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: 2147483647 - -14133 cannot be represented in type 'int'
See: [FFmpeg-devel] [PATCH 2/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int'

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agoavcodec/tiff: Check stripsize strippos for overflow
Michael Niedermayer [Thu, 16 Mar 2017 01:00:17 +0000 (02:00 +0100)]
avcodec/tiff: Check stripsize strippos for overflow

Fixes: 861/clusterfuzz-testcase-5688284384591872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2 years agoaarch64: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible
Martin Storsjö [Sat, 25 Feb 2017 22:38:48 +0000 (00:38 +0200)]
aarch64: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 21512 bytes to 31400 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     284.6
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    1902.7
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1903.0
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    2201.1
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   2510.0
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2821.3
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1011.6
vp9_inv_dct_dct_32x32_sub2_add_10_neon:    9716.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9704.9
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   10641.7
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  11555.7
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  12499.8
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  13403.7
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  14335.8
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  15253.6
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16179.5

After:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     282.8
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    1142.4
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1139.0
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    1772.9
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   2515.2
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2823.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1012.7
vp9_inv_dct_dct_32x32_sub2_add_10_neon:    6944.4
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    6944.2
vp9_inv_dct_dct_32x32_sub8_add_10_neon:    7609.8
vp9_inv_dct_dct_32x32_sub12_add_10_neon:   9953.4
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  10770.1
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  13418.8
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  14330.7
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  15257.1
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16190.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoarm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible
Martin Storsjö [Fri, 24 Feb 2017 15:39:00 +0000 (17:39 +0200)]
arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14516 bytes to 22484 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    270.7    418.5    295.4
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    3840.2   3244.8   3700.1   2337.9
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4212.5   3575.4   3996.9   2571.6
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    5174.4   4270.5   4615.5   3031.9
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5676.0   4908.5   5226.5   3491.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6403.9   5589.0   5839.8   3948.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1710.7    944.7   1582.1   1045.4
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   21040.7  16706.1  18687.7  13193.1
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22197.7  18282.7  19577.5  13918.6
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   24511.5  20911.5  21472.5  15367.5
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  26939.5  24264.3  23239.1  16830.3
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  29419.5  26845.1  25020.6  18259.9
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31146.4  29633.5  26803.3  19721.7
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33376.3  32507.8  28642.4  21174.2
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35629.4  35439.6  30416.5  22625.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37269.9  37914.9  32271.9  24078.9

After:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    276.0    418.5    295.1
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    2336.2   1886.0   2251.0   1458.6
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    2531.0   2054.7   2402.8   1591.1
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    3848.6   3491.1   3845.7   2554.8
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5703.8   4831.6   5230.8   3493.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6399.5   5567.0   5832.4   3951.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1722.1    938.5   1577.3   1044.5
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   15003.5  11576.8  13105.8   9602.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   15768.5  12677.2  13726.0  10138.1
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   17278.8  14825.4  14907.5  11185.7
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  22335.7  21544.5  20379.5  15019.8
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  24165.6  23881.7  21938.6  16308.2
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31082.2  30860.9  26835.3  19711.3
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33102.6  31922.8  28638.3  21161.0
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35104.9  34867.5  30411.7  22621.2
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37438.1  39103.4  32217.8  24067.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoaarch64: vp9itxfm16: Move the load_add_store macro out from the itxfm16 pass2 function
Martin Storsjö [Fri, 24 Feb 2017 14:49:12 +0000 (16:49 +0200)]
aarch64: vp9itxfm16: Move the load_add_store macro out from the itxfm16 pass2 function

This allows reusing the macro for a separate implementation of the
pass2 function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoaarch64: vp9itxfm16: Make the larger core transforms standalone functions
Martin Storsjö [Fri, 24 Feb 2017 14:10:25 +0000 (16:10 +0200)]
aarch64: vp9itxfm16: Make the larger core transforms standalone functions

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from
26288 to 21512 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1887.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2801.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9691.4
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16154.9

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1899.5
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2827.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9714.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16175.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoarm: vp9itxfm16: Make the larger core transforms standalone functions
Martin Storsjö [Fri, 24 Feb 2017 14:02:23 +0000 (16:02 +0200)]
arm: vp9itxfm16: Make the larger core transforms standalone functions

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from
17500 to 14516 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4237.4   3561.5   3971.8   2525.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6371.9   5452.0   5779.3   3910.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22068.8  17867.5  19555.2  13871.6
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37268.9  38684.2  32314.2  23969.0

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4375.1   3571.9   4283.8   2567.2
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6415.6   5578.9   5844.6   3948.3
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22653.7  18079.7  19603.7  13905.3
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37593.2  38862.2  32235.8  24070.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoaarch64: vp9itxfm16: Restructure the idct32 store macros
Martin Storsjö [Sun, 26 Feb 2017 11:43:10 +0000 (13:43 +0200)]
aarch64: vp9itxfm16: Restructure the idct32 store macros

This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoaarch64: vp9itxfm16: Avoid .irp when it doesn't save any lines
Martin Storsjö [Sat, 25 Feb 2017 22:28:12 +0000 (00:28 +0200)]
aarch64: vp9itxfm16: Avoid .irp when it doesn't save any lines

This makes the code a bit more readable.

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoaarch64: vp9itxfm16: Fix a typo in a comment
Martin Storsjö [Sat, 25 Feb 2017 22:24:50 +0000 (00:24 +0200)]
aarch64: vp9itxfm16: Fix a typo in a comment

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoarm: vp9itxfm16: Avoid reloading the idct32 coefficients
Martin Storsjö [Fri, 24 Feb 2017 22:20:25 +0000 (00:20 +0200)]
arm: vp9itxfm16: Avoid reloading the idct32 coefficients

Keep the idct32 coefficients in narrow form in q6-q7, and idct16
coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering
q0-q3 in the pass1 function, and squeeze the idct16 coefficients
into q0-q1 in the pass2 function to avoid reloading them.

The idct16 coefficients are clobbered and reloaded within idct32_odd
though, since that turns out to be faster than narrowing them and
swapping them into q6-q7.

Before:                            Cortex       A7        A8        A9      A53
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    22653.8   18268.4   19598.0  14079.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon:   37699.0   38665.2   32542.3  24472.2
After:
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    22270.8   18159.3   19531.0  13865.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon:   37523.3   37731.6   32181.7  24071.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoarm: vp9itxfm16: Fix vertical alignment
Martin Storsjö [Fri, 24 Feb 2017 22:07:22 +0000 (00:07 +0200)]
arm: vp9itxfm16: Fix vertical alignment

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoarm: vp9itxfm16: Use the right lane size
Martin Storsjö [Fri, 24 Feb 2017 15:36:05 +0000 (17:36 +0200)]
arm: vp9itxfm16: Use the right lane size

This makes the code slightly clearer, but doesn't make any functional
difference.

Signed-off-by: Martin Storsjö <martin@martin.st>
2 years agoarm/aarch64: vp9: Fix vertical alignment
Martin Storsjö [Sun, 8 Jan 2017 22:04:19 +0000 (00:04 +0200)]
arm/aarch64: vp9: Fix vertical alignment

Align the second/third operands as they usually are.

Due to the wildly varying sizes of the written out operands
in aarch64 assembly, the column alignment is usually not as clear
as in arm assembly.

This is cherrypicked from libav commit
7995ebfad12002033c73feed422a1cfc62081e8f.

Signed-off-by: Martin Storsjö <martin@martin.st>