Xfermode: SSE2 implementation of colordodge&colorburn modes

With SSE2 optimization, performance of the related benchmarks will improve
about 45% for Xfermode_ColorDodge and little for Xfermode_ColorBurn on
desktop i7-3770. The little performance improvement for
Xfermode_ColorBurn is due to the portable version may mostly go the fast
if branch while the SSE2 version do the calculation for all the three
if-else branches. Here are the data:
before:
Xfermode_ColorDodge   8888:  cmsecs =  73.71   565:  cmsecs =  82.88
 Xfermode_ColorBurn   8888:  cmsecs =  46.46   565:  cmsecs =  52.23
after:
Xfermode_ColorDodge   8888:  cmsecs =  39.70   565:  cmsecs =  47.45
 Xfermode_ColorBurn   8888:  cmsecs =  45.02   565:  cmsecs =  51.15

BUG=skia:
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/224823004

git-svn-id: http://skia.googlecode.com/svn/trunk/src@14377 2bbb7eff-a529-9590-31e7-b0007b416f81
2 files changed