Xfermode: SSE2 implementation of darken&lighten mode

With SSE2 optimization, performance of the related two benchmarks will
improve about 45% on desktop i7-3770. Here are the data:
before:
Xfermode_Lighten   8888:  cmsecs =     33.60   565:  cmsecs =     48.84
 Xfermode_Darken   8888:  cmsecs =     34.16   565:  cmsecs =     48.99
after:
Xfermode_Lighten   8888:  cmsecs =     18.71   565:  cmsecs =     25.41
 Xfermode_Darken   8888:  cmsecs =     18.39   565:  cmsecs =     25.40

BUG=skia:
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/234653002

git-svn-id: http://skia.googlecode.com/svn/trunk/src@14395 2bbb7eff-a529-9590-31e7-b0007b416f81
1 file changed