Rework 32-bit SSE loads yet again.

The existing code in vec_avx.h produced
  warning: dereferencing type-punned pointer will break
   strict-aliasing rules
 with gcc 6.4.0.
We already had a macro to work around this within the rules of the
 C standard, but trying to use that here does not get optimized
 into a single MOVD like we were hoping.
Replacing it with memcpy() instead does get optimized correctly,
 but requires switching from a macro to an inline function in order
 to be able to declare a local variable and return a value.
We already have such an inline function in NSQ_del_dec_avx2.c, so
 hoist that out and use it everywhere, and then convert vec_avx.h
 to use it also.
3 files changed