[go: nahoru, domu]

Skip to content

Fastor V0.5.1

Compare
Choose a tag to compare
@romeric romeric released this 08 Apr 01:51
· 369 commits to master since this release

Although with a minor tag Fastor V0.5.1 includes some major changes specially in the API design, performance and stability

  1. SIMDVector has been reworked to fix the long-standing issue with fall-back to non SIMD code for non-64 bit types. The fall-back is now always to the correct scalar type where a scalar specialisation is available i.e. float, double, int32_t, int64_tand to a fixed array of size 1 holding the type for other cases. The API is now a lot closer to Vc and std::experimental::simd. SIMDVector for floating points is now also activated at SSE2 level allowing any compiler that automatically defines SSE2 without -march=native vectorise Fastor's code since all compiler these days define __SSE2__ at -O2/-O3 levels
  2. Fix a long-standing bug in network tensor contraction. Rework opmin_meta/cost models to be truly compile-time recursive in terms of depth first search. Strided contractions for networks have completely been removed and for pairs it is deactivated. Tensor contraction of networks now dispatches to by-pair einsum which has many specialisation including dispatching to matmul. More than an order of magninute performance gain in certain cases.
  3. Extremely fast matmul/gemm routines. Fastor now provides potentially the fastest gemm routine for small to medium sized tensors of single and double precision as far as static dispatch is concerned. Benchmarks have been added here. Many flavours of matmul implementations are now available, for different sizes and with remainder handling and mask loading/storing.
  4. AVX512 support for single and double floats
  5. Better macro handling through a series of new FASTOR_... macros
  6. Accurate timeit function based on rdtsc together with memory clobber and serialisation for further accuracy
  7. Fastor is now Windows compatible. The whole test suite runs and passes on MSVC 2019
  8. Quite a few bugs and compiler warnings have been fixed along the way