webm/libwebp.git
39 hours agoMerge "Bugfix: Incremental decode of lossy-alpha" master
James Zern [Tue, 22 Apr 2014 23:33:12 +0000 (16:33 -0700)]
Merge "Bugfix: Incremental decode of lossy-alpha"

40 hours agoBugfix: Incremental decode of lossy-alpha 49/69849/2
Urvang Joshi [Tue, 22 Apr 2014 22:13:14 +0000 (15:13 -0700)]
Bugfix: Incremental decode of lossy-alpha

When remapping buffer, br->eos_ was wrongly being set to true for
certain
images.

Also, refactored the end-of-stream detection as a function.

Reported in http://crbug.com/364830

Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84

2 days agoMIPS: fix error with number of registers. 42/69842/2
Djordje Pesut [Tue, 22 Apr 2014 09:56:44 +0000 (11:56 +0200)]
MIPS: fix error with number of registers.

Some versions of compiler in debug build can't find
a register in class 'GR_REGS' while reloading 'asm'

Number of used registers is decreased in this fix.

Change-Id: I7d7b8172b8f37f1de4db3d8534a346d7a72c5065

5 days agoMerge "Move the HuffmanCost() function to dsp lib"
skal [Fri, 18 Apr 2014 19:08:22 +0000 (12:08 -0700)]
Merge "Move the HuffmanCost() function to dsp lib"

5 days agoMove the HuffmanCost() function to dsp lib 17/69817/3
skal [Fri, 18 Apr 2014 15:14:46 +0000 (08:14 -0700)]
Move the HuffmanCost() function to dsp lib

This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)

There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.

Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000

5 days agoMIPS: fix assembler error revealed by clang's debug build 19/69819/1
Djordje Pesut [Fri, 18 Apr 2014 15:53:37 +0000 (17:53 +0200)]
MIPS: fix assembler error revealed by clang's debug build

.set at -  Indicates that macro expansions may clobber
           the assembler temporary ($at or $28) register.
           Some macros may not be expanded without this
           and will generate an error message if noat
           is in effect.

"at" also added to the clobber list.

Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d

6 days agoenc_mips32: fix unused symbol warning in debug 15/69815/1
James Zern [Fri, 18 Apr 2014 06:35:36 +0000 (23:35 -0700)]
enc_mips32: fix unused symbol warning in debug

move kC1 / kC2 under __OPTIMIZE__
missed in:
8dec120 enc_mips32: disable ITransform(One) in debug builds

Change-Id: Ic9a12e6d73090c8c06b0e7a4bc56dd9c76b8e596

6 days agoenc_mips32: disable ITransform(One) in debug builds 14/69814/3
James Zern [Fri, 18 Apr 2014 02:37:44 +0000 (19:37 -0700)]
enc_mips32: disable ITransform(One) in debug builds

avoids:
src/dsp/enc_mips32.c: In function 'ITransformOne':
src/dsp/enc_mips32.c:123:3: can't find a register in class 'GR_REGS' while reloading 'asm'
src/dsp/enc_mips32.c:123:3: 'asm' operand has impossible constraints

Change-Id: Ic469667ee572f25e502c9873c913643cf7bbe89d

6 days agoenc_neon: convert Disto4x4 to intrinsics 00/69800/3
James Zern [Sun, 13 Apr 2014 00:46:57 +0000 (17:46 -0700)]
enc_neon: convert Disto4x4 to intrinsics

Change-Id: I0f00d5af2de2301e8237c2a38a9612d3645abad6

8 days agocosmetics: 86/69786/3
Pascal Massimino [Wed, 16 Apr 2014 07:12:34 +0000 (00:12 -0700)]
cosmetics:

* remove MIPS32 suffix from static function names
* fix a long line in enc_neon.c

Change-Id: Ia1294ae46f471b3eb1e9ba43c6aa1b29a7aeb447

8 days agoenc_neon: cosmetics 85/69785/1
James Zern [Wed, 16 Apr 2014 06:57:03 +0000 (23:57 -0700)]
enc_neon: cosmetics

fix/remove incorrect comments
+ whitespace

Change-Id: Id1b86beb23e5bf946e73c34ab7066b6ca177f33b

8 days agoMerge "WIP: extract the float-calculation of HuffmanCost from loop"
skal [Tue, 15 Apr 2014 18:33:11 +0000 (11:33 -0700)]
Merge "WIP: extract the float-calculation of HuffmanCost from loop"

9 days agoMerge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)"
skal [Tue, 15 Apr 2014 14:09:12 +0000 (07:09 -0700)]
Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)"

9 days agoMIPS: MIPS32r1: Added optimizations for ExtraCost functions. 72/69772/2
Djordje Pesut [Tue, 15 Apr 2014 10:55:20 +0000 (12:55 +0200)]
MIPS: MIPS32r1: Added optimizations for ExtraCost functions.

ExtraCost and ExtraCostCombined

Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2

9 days agoWIP: extract the float-calculation of HuffmanCost from loop 33/69733/2
skal [Mon, 14 Apr 2014 15:57:26 +0000 (17:57 +0200)]
WIP: extract the float-calculation of HuffmanCost from loop

new function: VP8FinalHuffmanCost()

Change-Id: I42102f8e5ef6d7a7af66490af77b7dc2048a9cb9

9 days agoMerge "NEON intrinsics version of CollectHistogram"
skal [Tue, 15 Apr 2014 10:00:45 +0000 (03:00 -0700)]
Merge "NEON intrinsics version of CollectHistogram"

10 days agoNEON intrinsics version of CollectHistogram 32/69732/1
skal [Mon, 14 Apr 2014 14:41:48 +0000 (16:41 +0200)]
NEON intrinsics version of CollectHistogram

apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()

(also: fixed some #ifdef USE_INTRINSICS problems)

Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7

10 days agoreplace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8) 31/69731/2
skal [Mon, 14 Apr 2014 12:45:44 +0000 (14:45 +0200)]
replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)

saves few instructions

Change-Id: If8f464bb2894a209bba94825a4db9267df126d47

10 days agofix lossless_neon.c 30/69730/1
skal [Mon, 14 Apr 2014 12:25:52 +0000 (14:25 +0200)]
fix lossless_neon.c

* some extra {xx , 0 } in initializers
* replaced by vget_lane_u32() where appropriate

Change-Id: Iabcd8ec34d7c853920491fb147a10d4472280a36

10 days agoNEON intrinsics version of FTransform 99/69599/4
skal [Fri, 11 Apr 2014 18:01:29 +0000 (20:01 +0200)]
NEON intrinsics version of FTransform

as little bit slower than inlined asm it seems.
So disabled for now.

Change-Id: I8c942846f9bedaed57275675ea9dbbcb8dfd9ccd

13 days agoMerge "MIPS: MIPS32r1: Added optimizations for FastLog2"
Jovan Zelincevic [Thu, 10 Apr 2014 15:54:12 +0000 (08:54 -0700)]
Merge "MIPS: MIPS32r1: Added optimizations for FastLog2"

13 days agoMIPS: MIPS32r1: Added optimizations for FastLog2 75/69575/2
Jovan Zelincevic [Mon, 24 Mar 2014 13:47:19 +0000 (14:47 +0100)]
MIPS: MIPS32r1: Added optimizations for FastLog2

Functions VP8LFastLog2Slow and VP8LFastSLog2Slow

also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)

Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0

2 weeks agoNEON functions for lossless coding 44/69544/5
skal [Wed, 9 Apr 2014 16:40:02 +0000 (18:40 +0200)]
NEON functions for lossless coding

Verified OK, but right now they don't seem faster.
So they are disabled behind a USE_INTRINSICS flag (off for now)

Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c

2 weeks agoMIPS: MIPS32r1: Added optimizations for SSE functions. 40/69540/1
Slobodan Prijic [Fri, 7 Mar 2014 13:52:45 +0000 (14:52 +0100)]
MIPS: MIPS32r1: Added optimizations for SSE functions.

Change-Id: I1287fa65064192cc2edc5c4be2b1974be665b9b4

2 weeks agoMerge "fix the gcc-4.6.0 bug by implementing alternative method"
skal [Wed, 9 Apr 2014 06:25:59 +0000 (23:25 -0700)]
Merge "fix the gcc-4.6.0 bug by implementing alternative method"

2 weeks agofix the gcc-4.6.0 bug by implementing alternative method 24/69524/3
skal [Tue, 8 Apr 2014 19:43:27 +0000 (21:43 +0200)]
fix the gcc-4.6.0 bug by implementing alternative method

previous functions are a bit faster with gcc-4.8, so we keep them
for now.

Change-Id: I4081e5af66fbf606295d8a83875c1b889729b4dc

2 weeks agoenc_mips32.c: fix file mode 26/69526/1
James Zern [Sun, 6 Apr 2014 01:50:16 +0000 (18:50 -0700)]
enc_mips32.c: fix file mode

Change-Id: I5a43320e2ea2eebc88c65398acb9ea59b63af1fd

2 weeks agoMIPS: MIPS32r1: Add optimization for GetResidualCost 19/69519/5
Slobodan Prijic [Tue, 25 Feb 2014 15:22:18 +0000 (16:22 +0100)]
MIPS: MIPS32r1: Add optimization for GetResidualCost

+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual

Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d

2 weeks agoMerge "MIPS: MIPS32r1: Added optimization for FTransform"
pascal massimino [Tue, 8 Apr 2014 11:17:27 +0000 (04:17 -0700)]
Merge "MIPS: MIPS32r1: Added optimization for FTransform"

2 weeks agoMIPS: MIPS32r1: Added optimization for FTransform 18/69518/2
Djordje Pesut [Wed, 26 Feb 2014 11:48:26 +0000 (12:48 +0100)]
MIPS: MIPS32r1: Added optimization for FTransform

Change-Id: I9384dac483e8f98bcfdd277a0a3d6ec7c7a7b297

2 weeks ago ~30% encoding speedup: use NEON for QuantizeBlock() 06/69506/6
skal [Mon, 7 Apr 2014 16:02:25 +0000 (18:02 +0200)]
 ~30% encoding speedup: use NEON for QuantizeBlock()

also revamped the signature to avoid having to pass the 'first' parameter

Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5

2 weeks agoenc_neon: convert FTransformWHT to intrinsics 17/69517/2
James Zern [Sat, 5 Apr 2014 18:46:19 +0000 (20:46 +0200)]
enc_neon: convert FTransformWHT to intrinsics

slightly faster than the inline asm
in practice not much faster than the C-code in a full NEON build, but
still better overall in an Android-like one that only enables NEON for
certain files.

Change-Id: I69534016186064fd92476d5eabc0f53462d53146

2 weeks agoMIPS: MIPS32r1: Added optimization for Disto4x4 (TTransform) 05/69505/1
Djordje Pesut [Thu, 20 Feb 2014 09:56:39 +0000 (10:56 +0100)]
MIPS: MIPS32r1: Added optimization for Disto4x4 (TTransform)

Change-Id: Ieb20c5c52b964247cfe46f45f9a7415725bf7c02

2 weeks agoMIPS: MIPS32r1: Added optimization for QuantizeBlock 04/69504/1
Jovan Zelincevic [Wed, 19 Feb 2014 15:27:41 +0000 (16:27 +0100)]
MIPS: MIPS32r1: Added optimization for QuantizeBlock

Change-Id: I6047ab107e4d474e35b5af1dac391d5b3d8c049b

2 weeks agoMerge "MIPS: MIPS32r1: Add optimization for ITransform"
Djordje Pesut [Sat, 5 Apr 2014 17:36:05 +0000 (10:36 -0700)]
Merge "MIPS: MIPS32r1: Add optimization for ITransform"

2 weeks agolossless_neon: disable VP8LConvert* functions 95/69495/1
James Zern [Sat, 5 Apr 2014 03:25:05 +0000 (03:25 +0000)]
lossless_neon: disable VP8LConvert* functions

due to breakage with NDK/gcc-4.6 builds

Change-Id: Id96258e710ee33e08a023354b3227f27da986620

2 weeks agoNEON intrinsics for encoding 79/69479/3
skal [Fri, 4 Apr 2014 12:57:59 +0000 (14:57 +0200)]
NEON intrinsics for encoding

* inverse transform is actually slower with intrinsics + gcc-4.6,
  so is left disabled for now.
  With gcc-4.8, it's a bit faster than inlined assembly.

* Sum of Square error function provide a 2-3% speed up
  There's enabled by default (since there's no inlined-asm equivalent)

Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a

2 weeks agoMIPS: MIPS32r1: Add optimization for ITransform 74/69474/1
Djordje Pesut [Wed, 19 Feb 2014 14:33:50 +0000 (15:33 +0100)]
MIPS: MIPS32r1: Add optimization for ITransform

Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56

2 weeks agodec_neon: use vst_lane instead of vget_lane 67/69467/1
James Zern [Thu, 3 Apr 2014 21:56:26 +0000 (14:56 -0700)]
dec_neon: use vst_lane instead of vget_lane

results in fewer instructions, small speed improvement

Change-Id: I98de632d09ff09f295368c0d744cb4397b585084

2 weeks agoIntrinsics NEON version of TransformOne 58/69458/5
skal [Thu, 3 Apr 2014 07:25:01 +0000 (09:25 +0200)]
Intrinsics NEON version of TransformOne

+ misc cosmetics

* seems 4% slower than inlined-asm with gcc-4.6
* is a tad faster (<1%) with gcc-4.8
(disabled for now)

Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095

3 weeks agoMerge "dec_neon: use vld?_lane instead of vset?_lane"
pascal massimino [Thu, 3 Apr 2014 08:16:29 +0000 (01:16 -0700)]
Merge "dec_neon: use vld?_lane instead of vset?_lane"

3 weeks agoupsampling_neon: drop NEON suffix from local functions 57/69457/1
James Zern [Thu, 3 Apr 2014 02:49:16 +0000 (19:49 -0700)]
upsampling_neon: drop NEON suffix from local functions

Change-Id: I6583ad74aacf78dcbeb5a0ff0218a39bc3460e5a

3 weeks agoupsampling_sse2: drop SSE2 suffix from local functions 56/69456/1
James Zern [Thu, 3 Apr 2014 02:49:16 +0000 (19:49 -0700)]
upsampling_sse2: drop SSE2 suffix from local functions

Change-Id: I2349c1a8e5e15e1d204642096f84f3202721c297

3 weeks agoenc_sse2: drop SSE2 suffix from local functions 55/69455/1
James Zern [Thu, 3 Apr 2014 02:49:16 +0000 (19:49 -0700)]
enc_sse2: drop SSE2 suffix from local functions

Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e

3 weeks agodec_sse2: drop SSE2 suffix from local functions 54/69454/1
James Zern [Thu, 3 Apr 2014 02:49:16 +0000 (19:49 -0700)]
dec_sse2: drop SSE2 suffix from local functions

Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882

3 weeks agodec_neon: use vld?_lane instead of vset?_lane 52/69452/1
James Zern [Thu, 3 Apr 2014 06:03:18 +0000 (23:03 -0700)]
dec_neon: use vld?_lane instead of vset?_lane

results in fewer instructions, small speed improvement

Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a

3 weeks agocosmetic: fix long line 51/69451/1
Pascal Massimino [Thu, 3 Apr 2014 06:00:50 +0000 (23:00 -0700)]
cosmetic: fix long line

Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea

3 weeks agoMerge "add intrinsics NEON code for chroma strong-filtering"
James Zern [Thu, 3 Apr 2014 05:57:44 +0000 (22:57 -0700)]
Merge "add intrinsics NEON code for chroma strong-filtering"

3 weeks agoadd intrinsics NEON code for chroma strong-filtering 36/69436/2
skal [Wed, 2 Apr 2014 09:24:10 +0000 (11:24 +0200)]
add intrinsics NEON code for chroma strong-filtering

The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t
register, and re-use the previous (luma) functions

Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38

3 weeks agoMerge "Add SSE2 version of forward cross-color transform"
pascal massimino [Wed, 2 Apr 2014 21:18:59 +0000 (14:18 -0700)]
Merge "Add SSE2 version of forward cross-color transform"

3 weeks agoAdd SSE2 version of forward cross-color transform 45/69445/1
Urvang Joshi [Wed, 2 Apr 2014 19:21:20 +0000 (12:21 -0700)]
Add SSE2 version of forward cross-color transform

Encoding speed is roughly the same.

Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c

3 weeks agoUse histogram_bits to initalize transform_bits. 41/69441/1
Vikas Arora [Wed, 2 Apr 2014 18:46:04 +0000 (11:46 -0700)]
Use histogram_bits to initalize transform_bits.

This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.

Change-Id: I491aa1c726def934161d4a4377e009737fbeff82

3 weeks agoMerge "Add strong filtering intrinsics (inner and outer edges)"
James Zern [Wed, 2 Apr 2014 07:10:01 +0000 (00:10 -0700)]
Merge "Add strong filtering intrinsics (inner and outer edges)"

3 weeks agoAdd strong filtering intrinsics (inner and outer edges) 22/69422/3
skal [Tue, 1 Apr 2014 15:41:25 +0000 (17:41 +0200)]
Add strong filtering intrinsics (inner and outer edges)

+ added some work-around gcc-4.6 to make it compile (except one function).
+ lots of revamping

All variants tested ok.
Speed-up is ~5-7%

Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3

3 weeks agoAdd SSE2 function for Inverse Cross-color Transform 25/69425/1
Urvang Joshi [Tue, 1 Apr 2014 22:52:25 +0000 (15:52 -0700)]
Add SSE2 function for Inverse Cross-color Transform

Lossless decoding is now ~3% faster.

Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743

3 weeks agodec_neon: add strong loopfilter intrinsics 13/69413/2
James Zern [Sun, 30 Mar 2014 02:40:35 +0000 (19:40 -0700)]
dec_neon: add strong loopfilter intrinsics

vertical only currently, 2.5-3% faster
placed under USE_INTRINSICS as this change depends on the simple
loopfilter
improves the simple loopfilter slightly thanks to some reorganization

Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155

3 weeks agoMerge "add intrinsics version of SimpleHFilter16NEON()"
James Zern [Tue, 1 Apr 2014 07:57:11 +0000 (00:57 -0700)]
Merge "add intrinsics version of SimpleHFilter16NEON()"

3 weeks agowindows: fix dll builds 14/69414/1
James Zern [Tue, 1 Apr 2014 00:46:12 +0000 (17:46 -0700)]
windows: fix dll builds

WebPSafe* need to be marked external to allow mux/demux to access them
through libwebp.dll

Change-Id: Ib6620e00d376f7aa5a0550e1e244f759977f97a0

3 weeks agoMerge "add some colorspace conversion functions in NEON"
skal [Mon, 31 Mar 2014 20:15:18 +0000 (13:15 -0700)]
Merge "add some colorspace conversion functions in NEON"

3 weeks agoSSE2 variants of Subtract-Green: Rectify loop condition 11/69411/1
Urvang Joshi [Mon, 31 Mar 2014 17:51:45 +0000 (10:51 -0700)]
SSE2 variants of Subtract-Green: Rectify loop condition

When 4 pixels are left, they should be processed with SSE2.

Decoding is marginally faster (~0.4%).
Encoding speed: No observable difference.

Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e

3 weeks agoadd some colorspace conversion functions in NEON 07/69407/3
skal [Mon, 31 Mar 2014 14:36:33 +0000 (16:36 +0200)]
add some colorspace conversion functions in NEON

new file: lossless_neon.c
speedup is ~5%

gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509

I've tried adding  -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.

Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0

3 weeks agoadd intrinsics version of SimpleHFilter16NEON() 06/69406/1
skal [Mon, 31 Mar 2014 14:29:55 +0000 (16:29 +0200)]
add intrinsics version of SimpleHFilter16NEON()

It's disable for now, because it crashes gcc-4.6.3 during compilation
with -O2 or -O3. It's been tested OK with -O1.

Code is still globally disabled with USE_INTRINSICS, though.

Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c

3 weeks agoadd light filtering NEON intrinsics 05/69405/3
Pascal Massimino [Sun, 30 Mar 2014 16:12:41 +0000 (09:12 -0700)]
add light filtering NEON intrinsics

disabled for now (but tested OK), thanks to the USE_INTRINSICS #define
We'll activate the code when we're on par with non-intrinsics

Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff

3 weeks agofix typo in STORE_WHT 90/69390/2
Pascal Massimino [Fri, 28 Mar 2014 08:53:53 +0000 (01:53 -0700)]
fix typo in STORE_WHT

was working ok because dst == out

Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd

3 weeks agoTune HistogramCombineBin for large images. 91/69391/1
Vikas Arora [Fri, 28 Mar 2014 14:06:54 +0000 (07:06 -0700)]
Tune HistogramCombineBin for large images.

Tune HistogramCombineBin for hard images that are larger than 1-2 Mega
pixel and represent photographic images.

This speeds up lossless encoding on 1000 image corpus by 10-12% and compression
penalty of 0.1-0.2%.

Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81

3 weeks agouse WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free 62/69362/5
skal [Thu, 27 Mar 2014 22:27:32 +0000 (23:27 +0100)]
use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free

there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.

The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.

Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).

Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b

3 weeks agolossless_sse2: relocate VP8LDspInitSSE2 proto 85/69385/1
James Zern [Thu, 27 Mar 2014 22:05:04 +0000 (15:05 -0700)]
lossless_sse2: relocate VP8LDspInitSSE2 proto

this is in line with the other dsp files and silences a build warning.

Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2

3 weeks agoseparate SSE2 lossless functions into its own file 79/69379/1
skal [Thu, 27 Mar 2014 20:43:21 +0000 (21:43 +0100)]
separate SSE2 lossless functions into its own file

expose the predictor array as function pointers instead
of each individual sub-function

+ merged Average2() into ClampedAddSubtractHalf directly
+ unified the signature as "VP8LProcessBlueAndRedFunc"

no speed diff observed

Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044

4 weeks agoVP8LConvertFromBGRA: use conversion function pointers 76/69376/1
skal [Thu, 27 Mar 2014 08:00:35 +0000 (09:00 +0100)]
VP8LConvertFromBGRA: use conversion function pointers

Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c

4 weeks agodsp/dec: TransformDCUV: use VP8TransformDC 72/69372/1
James Zern [Wed, 26 Mar 2014 23:43:47 +0000 (16:43 -0700)]
dsp/dec: TransformDCUV: use VP8TransformDC

rather than forcing the C version; this is similar to TransformUV

Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a

4 weeks agoMerge "fix out-of-bound read during alpha-plane decoding"
skal [Wed, 26 Mar 2014 22:22:42 +0000 (15:22 -0700)]
Merge "fix out-of-bound read during alpha-plane decoding"

4 weeks agoMerge "dsp: reuse wht transform from dec in encoder"
James Zern [Wed, 26 Mar 2014 22:13:07 +0000 (15:13 -0700)]
Merge "dsp: reuse wht transform from dec in encoder"

4 weeks agoMerge "Add SSE2 version of ARGB -> BGR/RGB/... conversion functions"
skal [Wed, 26 Mar 2014 22:01:46 +0000 (15:01 -0700)]
Merge "Add SSE2 version of ARGB -> BGR/RGB/... conversion functions"

4 weeks agofix out-of-bound read during alpha-plane decoding 63/69363/2
skal [Wed, 26 Mar 2014 16:02:51 +0000 (17:02 +0100)]
fix out-of-bound read during alpha-plane decoding

With -bypass_filter switched on, the lossless-compressed
data is decoded ahead of time (before being transformed and
display). Hence, the last row was called twice.

http://code.google.com/p/webp/issues/detail?id=193

Change-Id: I9e13f495f6bd6f75fa84c4a21911f14c402d4b10

4 weeks ago2-5% faster trellis with clang/MacOS 29/69329/2
skal [Sat, 22 Mar 2014 09:20:42 +0000 (10:20 +0100)]
2-5% faster trellis with clang/MacOS
(and ~2-3% on ARM)

We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.

Also made the 'Node' structure tighter.

Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6

4 weeks agoAdd SSE2 version of ARGB -> BGR/RGB/... conversion functions 41/69341/3
skal [Tue, 25 Mar 2014 07:46:24 +0000 (08:46 +0100)]
Add SSE2 version of ARGB -> BGR/RGB/... conversion functions

~4-6% faster lossless decoding

Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361

4 weeks agodsp: reuse wht transform from dec in encoder 66/69366/1
James Zern [Sat, 22 Mar 2014 19:18:54 +0000 (12:18 -0700)]
dsp: reuse wht transform from dec in encoder

Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717

4 weeks agoAndroid.mk: fix build with APP_ABI=armeabi-v7a-hard 06/69306/1
James Zern [Fri, 21 Mar 2014 06:22:32 +0000 (23:22 -0700)]
Android.mk: fix build with APP_ABI=armeabi-v7a-hard

added in r9d; relax the check to build neon code

Change-Id: Ic52b3fbd3bf53617ee52b07a55b0ed05f6f9b26f

5 weeks agoMerge "cosmetics:"
Pascal Massimino [Tue, 18 Mar 2014 11:02:33 +0000 (04:02 -0700)]
Merge "cosmetics:"

5 weeks agocosmetics: 73/69273/3
Pascal Massimino [Mon, 17 Mar 2014 22:10:23 +0000 (15:10 -0700)]
cosmetics:

 - use VP8ScanUV, separate from VP8Scan[] (for luma)
 - fix indentation
 - few missing consts
 - change TrellisQuantizeBlock() signature

Change-Id: I94b437d791cbf887015772b5923feb83dd145530

5 weeks agoAssignSegments: quiet array-bounds warning 59/69259/1
James Zern [Sat, 15 Mar 2014 01:47:52 +0000 (18:47 -0700)]
AssignSegments: quiet array-bounds warning

nb (enc->segment_hdr_.num_segments_) will be in the range
[1, NUM_MB_SEGMENTS].

Change-Id: I5c2bd0bb82b17c99aff39c98b6b1747fc040dc16

5 weeks agoMerge "UpdateHistogramCost: avoid implicit double->float"
James Zern [Fri, 14 Mar 2014 22:50:57 +0000 (15:50 -0700)]
Merge "UpdateHistogramCost: avoid implicit double->float"

5 weeks agoUpdateHistogramCost: avoid implicit double->float 45/69245/1
James Zern [Fri, 14 Mar 2014 18:18:52 +0000 (11:18 -0700)]
UpdateHistogramCost: avoid implicit double->float

all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning

Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14

5 weeks agoExtend the search space for GetBestGreenRedToBlue 42/69242/1
Vikas Arora [Fri, 14 Mar 2014 16:54:58 +0000 (09:54 -0700)]
Extend the search space for GetBestGreenRedToBlue

Get back some of the compression gains by extending the search space for
GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not
helping much in yielding better compression density.

Before:
 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations)
 Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s

After:
1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations)
 Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s

Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d

5 weeks agoFix few nits 31/69231/1
Vikas Arora [Thu, 13 Mar 2014 20:56:30 +0000 (13:56 -0700)]
Fix few nits

Add/remove few casts, fixed indentation.

Change-Id: Icd141694201843c04e476f09142ce4be6e502dff

5 weeks agoOptimize and re-structure VP8LGetHistoImageSymbols 28/69228/2
Vikas Arora [Thu, 13 Mar 2014 18:34:12 +0000 (11:34 -0700)]
Optimize and re-structure VP8LGetHistoImageSymbols

Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.

This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.

Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce

5 weeks agoOptimize lossless decoding. 27/69227/1
Vikas Arora [Thu, 13 Mar 2014 18:26:28 +0000 (11:26 -0700)]
Optimize lossless decoding.

Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove
one if condition inside the main/critial loop. Also separated TransformColor &
TransformColorInverse into separate functions and avoid one 'if condition'
inside this critical method.

This change speeds up lossless decoding for Lenna image about 5% and 1000 image
corpus by 3-4%.

Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d

5 weeks agoDo a binary search to get the optimum cache bits. 26/69226/1
Vikas Arora [Thu, 13 Mar 2014 17:29:50 +0000 (10:29 -0700)]
Do a binary search to get the optimum cache bits.

This speeds up the lossless encoder by a bit (1-2%), without impacting the
compression density.

Change-Id: Ied6fb38fab58eef9ded078697e0463fe7c560b26

6 weeks agoMerge "allow 'cwebp -o -' to emit output to stdout"
James Zern [Wed, 12 Mar 2014 21:01:15 +0000 (14:01 -0700)]
Merge "allow 'cwebp -o -' to emit output to stdout"

6 weeks agoallow 'cwebp -o -' to emit output to stdout 09/69209/2
skal [Wed, 12 Mar 2014 18:48:00 +0000 (19:48 +0100)]
allow 'cwebp -o -' to emit output to stdout

Change-Id: I423d25898e1ba317ccbf456bb28ce45663a3b3d2

6 weeks agoallow some more stdin/stout I/O 08/69208/1
skal [Wed, 12 Mar 2014 18:32:16 +0000 (19:32 +0100)]
allow some more stdin/stout I/O

 * allow reading from stdin for dwebp / vwebp
 * allow writing to stdout for gif2webp

by introducing a new function ExUtilReadFromStdin()

Example use: cat in.webp | dwebp -o - -- - > out.png

Note that the '-- -' option must appear *last*
(as per general fashion for '--' option parsing)

Change-Id: I8df0f3a246cc325925d6b6f668ba060f7dd81d68

6 weeks agofix cwebp.1 typos after patch #69199 00/69200/1
skal [Tue, 11 Mar 2014 22:46:50 +0000 (23:46 +0100)]
fix cwebp.1 typos after patch #69199

Change-Id: I046a54dbb4210319ddb156f49dd9d13d47b0d035

6 weeks agoadd a -z option to cwebp, and WebPConfigLosslessPreset() function 99/69199/1
skal [Tue, 11 Mar 2014 22:25:35 +0000 (23:25 +0100)]
add a -z option to cwebp, and WebPConfigLosslessPreset() function

These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
   cwebp -z 5 in.png -o out_lossless.webp

There are 10 possible values for -z parameter:
   0 (fastest, lowest compression)
to 9 (slowest, best compression)

A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.

This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.

Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04

7 weeks ago4-5% faster trellis by removing some unneeded calculations. 30/69130/2
skal [Wed, 5 Mar 2014 22:21:43 +0000 (23:21 +0100)]
4-5% faster trellis by removing some unneeded calculations.

(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)

Output is bitwise the same as before.

Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649

7 weeks agohistogram.c: reindent after b33e8a0 04/69104/1
James Zern [Tue, 4 Mar 2014 08:38:14 +0000 (00:38 -0800)]
histogram.c: reindent after b33e8a0

b33e8a0 Refactor code for HistogramCombine.

Change-Id: Ia1b4b545c5f4e29cc897339df2b58f18f83c15b3

7 weeks agoMerge "~3-4% faster lossless encoding"
skal [Tue, 4 Mar 2014 08:17:52 +0000 (00:17 -0800)]
Merge "~3-4% faster lossless encoding"

7 weeks ago~3-4% faster lossless encoding 86/69086/2
skal [Mon, 3 Mar 2014 23:00:40 +0000 (00:00 +0100)]
~3-4% faster lossless encoding

by re-arranging some code from SkipRepeatedPixel()

Change-Id: I6c1fd7cd9af22cd9be4234217ff67d7b94f44137

7 weeks agoMerge "few cosmetics after patch #69079"
James Zern [Mon, 3 Mar 2014 23:13:25 +0000 (15:13 -0800)]
Merge "few cosmetics after patch #69079"

7 weeks agofew cosmetics after patch #69079 84/69084/1
skal [Mon, 3 Mar 2014 22:53:08 +0000 (23:53 +0100)]
few cosmetics after patch #69079

Change-Id: Ifa758420421b5a05825a593f6b43504887603ee7

7 weeks agoRefactor code for HistogramCombine. 80/69080/1
Vikas Arora [Mon, 3 Mar 2014 21:49:54 +0000 (13:49 -0800)]
Refactor code for HistogramCombine.

Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.

This speeds up lossless encoding by 1-2% and almost no impact on compression
density.

Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b