Results 1 to 25 of 25

Thread: Small libs to use inside compression libraries

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts

    Small libs to use inside compression libraries

    I believe that projects like xxhash and lz5 can significantly reduce burden of portability/maintenance work by offloading this into auxiliary libraries developed by nemequ:

  2. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    nemequ (13th February 2017)

  3. #2
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    I guess I can finish that thought… Stuff I maintain and/or wrote includes

    * Hedley — Hedley is a C header you can include in your project to enable compiler specific features while maintaining compatibility, with a focus on making your API harder to misuse by providing hints to compilers and static analyzers. This one may actually be my favorite project; it's amazingly helpful if you're into good warning/error messages, portability, and it even has some nice helpers for optimization.
    * µnit — µnit is a small and portable unit testing framework for C which includes pretty much everything you might expect from a C testing framework, plus a few pleasant surprises, wrapped in a nice API. Some nice features for compression libraries include a PRNG and inclusion of timing information in the output. I mentioned this here a while back.
    * TinyCThread — TinyCThread is a cross-platform implementation of the C11 threading API which uses POSIX or the Windows API, allowing you to use the standard C11 API on systems which don’t natively support it.
    * configure-cmake — Wrapper which provides an autotools-style configure script for CMake projects. Shell script, so it should work anywhere configure would. If your project uses CMake, please consider this; it makes CMake projects much easier for your users to build.
    * portable-snippets — Curated collection of miscellaneous portable code snippets (in C) for various tasks which are typically compiler or platform-specific. Currently this includes atomics, endianness, and portable GCC builtin-style functions backed by MSVC intrinsics or pure C (I plan to implement MSVC intrinsics using GCC eventually, too).
    * safe-math — Set of portable, overflow and underflow-safe math functions.

    I'd also like to point out some great libraries like this written and/or maintained by other people:

    * parg — Written by Jørgen Ibsen (Jibz around here). It's basically a portable getopt()/getopt_long() which helps you create great command-line interfaces with ease. Much better than trying to roll your own CLI.
    * pstdint.h — stdint.h for compilers which don't have it (like MSVC up until 2013).
    * win-iconv — Implementation of iconv() for Windows. I'm not sure how useful this will be for compression libraries, but if you have to do charset conversion this is a great way to go.

    If anyone has any ideas for additions, I'd be happy to add them to the list in the portable-snippets README.

  4. The Following 6 Users Say Thank You to nemequ For This Useful Post:

    Bulat Ziganshin (13th February 2017),Cyan (13th February 2017),inikep (13th February 2017),JamesB (14th February 2017),jibz (16th February 2017),seth (14th February 2017)

  5. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i have seen for a years those definitions in my own projects like tornado, then in Cyan projects, flying around from lz4 to xxhash to zstd and lz5. it may be a good time to outsource the compiler compatibility layer to those micro-libs.

    it will be great to hear what inikep and Cyan think about it.

    Good news is that TinyCThread uses the sami API as C11 standard: https://github.com/tinycthread/tinyc...ment-279466593 , i somehow lost sight of it

    I think that moving atomics into TinyCThread library may be a good idea. Just mention in the docs that the two parts of library are completely independent. Otherwise, you can just point in the TinyCThread docs to the atomics library, so users employing threads will know where to find atomics. Again, implementation via C/C++ standard libs will make the atomics more portable and seems pretty obvious in implementation.

    The next API level i will be glad to see in TinyCThread is implementation of concurrent queue and thread pool over these primitives. It's the standard set of APIs to work with task-based parallelism, and in particular it can be used to implement MTZSTD in higher-level approach

    Finally, you just have started portable builtins library. What about doing it owther way - 1) implement psnip_* primitives abstracting the compiler API, essentially just copycat existing implementations of these functions from ZSTD or so, 2) provide emulation of gcc/msvc builtins by the calls to psnip_* functions. So, projects may rely on psnip_* API and doesn't limit themself to the gcc or msvc set of instrinsics

  6. #4
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    i have seen for a years those definitions in my own projects like tornado, then in Cyan projects, flying around from lz4 to xxhash to zstd and lz5. it may be a good time to outsource the compiler compatibility layer to those micro-libs.
    Yeah, most of these started because I was sick of seeing lots of incomplete little implementations of the same stuff in all my projects. µnit is a bit of an exception to that (I wasn't really happy with anything out there).

    Quote Originally Posted by Bulat Ziganshin View Post
    I think that moving atomics into TinyCThread library may be a good idea. Just mention in the docs that the two parts of library are completely independent. Otherwise, you can just point in the TinyCThread docs to the atomics library, so users employing threads will know where to find atomics. Again, implementation via C/C++ standard libs will make the atomics more portable and seems pretty obvious in implementation.
    I disagree. If I could implement the C11 atomics API I'd be interested in doing this, but it's just not possible to emulate C11 atomics without compiler support. Since the API has to be different, I think it's a better fit for portable-snippets.

    Quote Originally Posted by Bulat Ziganshin View Post
    The next API level i will be glad to see in TinyCThread is implementation of concurrent queue and thread pool over these primitives. It's the standard set of APIs to work with task-based parallelism, and in particular it can be used to implement MTZSTD in higher-level approach
    Again, I don't know about putting those in TinyCThread, but I've been wanting a C11-based thread pool for a long time, just haven't gotten around to it

    I think a higher-level C library which uses TinyCThread (or a libc-provided C11 thread API) could be very nice. It could also contain things like promises, cancellables, etc.

    I actually wrote a library called Bump a while back which could be an interesting source of ideas (the User Guide is a good place to start). It's written in Vala, but most of the concepts map pretty well to C (Vala maps pretty well to C). Some personal favorites include a callback-based API for mutexes/semaphores (lock, run callback, unlock), and a resource pool.

    Quote Originally Posted by Bulat Ziganshin View Post
    Finally, you just have started portable builtins library. What about doing it owther way - 1) implement psnip_* primitives abstracting the compiler API, essentially just copycat existing implementations of these functions from ZSTD or so, 2) provide emulation of gcc/msvc builtins by the calls to psnip_* functions. So, projects may rely on psnip_* API and doesn't limit themself to the gcc or msvc set of instrinsics
    That's kind of what I'm doing, but I'm taking the inspiration from compiler primitives instead of projects like zstd. For example, GCC has __builtin_ffs. portable-snippets has a psnip_builtin_ffs with the same arguments and return value that works exactly the same as __builtin_ffs. If __builtin_ffs is available, psnip just defines psnip_builtin_ffs to __builtin_ffs. If not, it checks to see if _BitScanForward exists (an MSVC intrinsic) and, if so, will implement psnip_builtin_ffs using _BitScanForward. If it isn't available, psnip will define a fully portable version which should work with any C/C++ compiler.

    At some point I'd also like to have a psnip_BitScanForward (or maybe PsnipBitScanForward, I don't really care) which is implemented using __builtin_ffs if available, or portable code if not.

    Currently, there is support for ffs, clz, ctz, clrsb, popcount, and parity. bswap is the only one missing, and I only didn't bother with that because of endian.h, but it would be easy to implement.

    To be clear, people will be free to mix and match; there is nothing to stop people from using both psnip_builtin_ffs *and* psnip_BitScanForward. As for shared functions that aren't intrinsics in either compiler, I don't have any problem adding another header to psnip for that.

    Honestly, my biggest problem with a lot of these libraries are that I still end up copying stuff, though admittedly not as much. It's great to have single-file libraries that you can drop in anywhere, but if you don't allow dependencies you still end up copying. For example, endian.h could use a bswap from builtin-gnu.h to cut out some logic. µnit could be improved by depending on atomic.h. Pretty much everything could be improved by using Hedley.

    I'm tempted to suggest just rolling everything into one big library, but then you end up with glib, and look at how many people avoid using glib…
    Last edited by nemequ; 14th February 2017 at 00:51. Reason: add 3rd to last paragraph (about mixing apis)

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. Real inline directive for VS is __forceinline - others are often ignored. For gcc its __attribute__((always_inline))
    2. Its also useful to have an anti-inline macro - __declspec(noinline) / __attribute__((noinline))
    3. Other useful builtins are structure alignment, "restrict", __assume, __assume_aligned, __builtin_expect

  8. #6
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Shelwien View Post
    1. Real inline directive for VS is __forceinline - others are often ignored. For gcc its __attribute__((always_inline))
    2. Its also useful to have an anti-inline macro - __declspec(noinline) / __attribute__((noinline))
    3. Other useful builtins are structure alignment, "restrict", __assume, __assume_aligned, __builtin_expect
    With the exception of __assume_aligned and, debatably, __assume, this is all in Hedley:

    1. HEDLEY_INLINE, HEDLEY_ALWAYS_INLINE
    2. HEDLEY_NEVER_INLINE
    3. HEDLEY_RESTRICT, HEDLEY_LIKELY/HEDLEY_UNLIKELY

    __assume_aligned is available in a dev branch, and has been for a while, TBH I forgot about it. It should be in the next version, assuming I don't find any issues during testing.

    HEDLEY_UNREACHABLE is like __assume(0). Other than that, I'm not sure how much value __assume() adds when we already have assert() and likely/unlikely macros. Not saying "no", just trying to understand the use case before determining what to do.

  9. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I'm not sure how much value __assume() adds when we already have assert()

    #define __assume_aligned(x,y) __assume( (((byte*)x)-((byte*)0))%(y)==0 )

    This works for VS.

  10. #8
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Shelwien View Post
    #define __assume_aligned(x,y) __assume( (((byte*)x)-((byte*)0))%(y)==0 )

    This works for VS.
    That's basically how HEDLEY_ASSUME_ALIGNED is implemented for MSVC. So if we do end up with a HEDLEY_ASSUME_ALIGNED, can you think of any other reason to add a HEDLEY_ASSUME macro? AFAIK it's only implemented by MSVC (and ICC, for compatibility with MSVC, but they seem to prefer __assume_aligned).

    I remembered why I was hesitating about HEDLEY_ASSUME_ALIGNED… __builtin_assume_aligned has slightly different semantics, which makes things difficult. __builtin_assume_aligned returns a value, "and allows the compiler to assume that the returned pointer is at least align bytes aligned." So, __builtin_assume_aligned

    void *x = __builtin_assume_aligned (arg, 16);
    /* use x */


    vs. everyone else

    __assume_aligned(arg);
    /* use arg */


    I need to figure out whether or not calling __builtin_assume_aligned tells GCC anything about the argument, or just the return value. If not, maybe something like this would work:

    #define  HEDLEY_ASSUME_ALIGNED(ptr, align) (((((char*) ptr) - ((char*) 0)) %  (align) == 0) ? (0) : __builtin_unreachable())


    i.e., basically recreate __assume for GCC and use it to implement something more like __assume_aligned. Unfortunately __builtin_unreachable is only available in GCC 4.5+…

  11. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    TinyCThread:

    i imagined a library that includes the following:
    - C11 thread API as already implemented by TinyCThread.[hc]
    - separate module implementing concurrent queue on top of it
    - another module implementing thread pool on top of previous modules
    - just another module implementing atomics, independent on other modules

    The point is that each feature is in separate module, so library user just need to copy modules he need and drop everything else, thus limiting size inflation of his codebase

    This can be implemented by creating a new library, copying the TinyCThread.[hc] and adding new modules. Or, if you don't insist that TinyCThread should be limited to C11 APIs emulation, these new modules can be included directly into TinyCThread

    Overall goal is a minimalistic library useful for small projects like ZSTD and thus built in incremental manner rather than all-or-nothing monolithic design. May be, pure C library implementing that already exists but i don't know any

    And i don't mean that atomics module should mimic the entire C11 API, just that existing functions can provide alternative implementations via C11/C++11 standard APIs, thus improving portability of this module

    I've seen/implemented ~5-10 m/t compression libraries, and except for one case, they all used the same model - a pool of worker threads and queues to communicate with main thread. It's why i propose to implement this feature set first. Probably, this featureset will be highly useful in other areas too, allowing to implement most scenarios of simple multithreading usage.
    Last edited by Bulat Ziganshin; 14th February 2017 at 17:49.

  12. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    That's kind of what I'm doing, but I'm taking the inspiration from compiler primitives instead of projects like zstd. For example, GCC has __builtin_ffs. portable-snippets has a psnip_builtin_ffs with the same arguments and return value that works exactly the same as __builtin_ffs.
    Well, imho the function name is too large for comfortable use, and psnip_ffs will be better, and ideally the library should provide an option to define just ffs. I agree with idea of defining these functions API exactly as their compiler originals, so we may have both psnip_ffs and psnip_BitScanForward and then, if someone wish, emulation of gcc/msvc builtins on top of that. BTW, it may serve as a sort of forward-compatibility library, f.e. providing bswap for older gcc versions
    Last edited by Bulat Ziganshin; 14th February 2017 at 19:15.

  13. #11
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Looks like HEDLEY_ASSUME_ALIGNED isn't feasible for GCC, sorry

    I'll file a bug against GCC, but I'm not holding my breath.

  14. #12
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    How about
    Code:
    #define ASSUME_ALIGNED(arg, alignment) do {void *p = arg; if (p != __builtin_assume_aligned(p, alignment)) __builtin_unreachable();} while (0)
    ?

  15. The Following User Says Thank You to m^3 For This Useful Post:

    nemequ (16th February 2017)

  16. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Unfortunately its a pointer's attribute in gcc, not something inferred from known information about the pointer (like in VS/IC).
    __builtin_assume_aligned doesn't align the pointer, it just sets that attribute.
    Which is then can be used eg. for compiling vector instructions for aligned memory access, instead of unaligned.
    Or not generating unaligned branch for the next loop.

    Update: https://godbolt.org/g/mdJNbE vs https://godbolt.org/g/juOepA
    So I was right up to gcc 6.3. But gcc7 treats them the same.

  17. The Following User Says Thank You to Shelwien For This Useful Post:

    nemequ (16th February 2017)

  18. #14
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by m^3 View Post
    How about
    Code:
    #define ASSUME_ALIGNED(arg, alignment) do {void *p = arg; if (p !=  __builtin_assume_aligned(p, alignment)) __builtin_unreachable();} while  (0)
    ?
    Good thinking.

    I think the best I can do may be something like

    #if defined(HEDLEY_PRAGMA)
    # undef HEDLEY_PRAGMA
    #endif
    #if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
    # define HEDLEY_PRAGMA(value) _Pragma(#value)
    #elif HEDLEY_GCC_VERSION_CHECK(3,0,0)
    # define HEDLEY_PRAGMA(value) _Pragma(#value)
    #elif HEDLEY_MSVC_VERSION_CHECK(15,0,0)
    # define HEDLEY_PRAGMA(value) __pragma(value)
    #else
    # define HEDLEY_PRAGMA(value)
    #endif

    #if defined(HEDLEY_OPENMP_ALIGNED)
    # undef HEDLEY_OPENMP_ALIGNED
    #endif
    #if defined(_OPENMP) && (_OPENMP >= 201307L)
    # define HEDLEY_OPENMP_ALIGNED(value, alignment) HEDLEY_PRAGMA(omp simd aligned(value:alignment))
    #else
    # define HEDLEY_OPENMP_ALIGNED(value, alignment)
    #endif

    #if defined(HEDLEY_ASSUME_ALIGNED)
    # undef HEDLEY_ASSUME_ALIGNED
    #endif
    #if HEDLEY_INTEL_VERSION_CHECK(9,0,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) __assume_aligned(ptr, align)
    #elif HEDLEY_MSVC_VERSION_CHECK(13,10,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) __assume((((char*) ptr) - ((char*) 0)) % (align) == 0)
    #elif HEDLEY_GCC_HAS_BUILTIN(__builtin_assume_aligned,4,7,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) (ptr = (__typeof__(ptr)) __builtin_assume_aligned((ptr), align))
    #elif HEDLEY_GCC_HAS_BUILTIN(__builtin_unreachable,4,5,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) ((((char*) ptr) - ((char*) 0)) % (align) == 0) ? 1 : (__builtin_unreachable(), 0)
    #else
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align)
    #endif


    And just add a big warning that the arguments may be evaluated multiple times, so it must be a variable, *not* an abritrary expression. The const issue (i.e., what if the type of ptr is `double* const`) still bothers me a bit, but in practice you don't see that very much. However, explicitly stating that the arguments may be evaluated multiple times opens up the option to use OpenMP in addition to compiler-specific options. I'm not sure how the OpenMP pragma interacts with other methods of conveying alignment information, though; I'm planning to contact some OpenMP people to talk about it.
    Last edited by nemequ; 16th February 2017 at 07:10.

  19. #15
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    This can be implemented by creating a new library, copying the TinyCThread.[hc] and adding new modules. Or, if you don't insist that TinyCThread should be limited to C11 APIs emulation, these new modules can be included directly into TinyCThread
    Sorry, I'm going to insist that TinyCThread be limited to the C11 API, so this would need to be a separate project. I don't want to bloat TinyCThread, and I think "implementation of the C11 threads API" is a nice, clean line to define the scope of the project. If it were possible to have a portable "implementation of the C11 atomics API" I might be willing to add that, but not something new.

    Quote Originally Posted by Bulat Ziganshin View Post
    Overall goal is a minimalistic library useful for small projects like ZSTD and thus built in incremental manner rather than all-or-nothing monolithic design. May be, pure C library implementing that already exists but i don't know any

    And i don't mean that atomics module should mimic the entire C11 API, just that existing functions can provide alternative implementations via C11/C++11 standard APIs, thus improving portability of this module

    I've seen/implemented ~5-10 m/t compression libraries, and except for one case, they all used the same model - a pool of worker threads and queues to communicate with main thread. It's why i propose to implement this feature set first. Probably, this featureset will be highly useful in other areas too, allowing to implement most scenarios of simple multithreading usage.
    psnip already has an atomics module which doesn't try to recreate the C11 API. I'm going to try to put together some additional implementations soon; PGI doesn't seem to have any support for atomics (outside of OpenACC/OpenMP), so I think I'm going to end up adding some assembly implementations… assembly isn't exactly my strength, so any help here would be appreciated. I'd also like to add OpenACC and OpenMP implementations, but AFAIK neither supports CAS, so I'm not sure how to proceed there.

    I think a thread pool would be appropriate for psnip, if you'd like to house it there. And I'd certainly be interested in using that code, in several projects.

    Quote Originally Posted by Bulat Ziganshin View Post
    Well, imho the function name is too large for comfortable use, and psnip_ffs will be better, and ideally the library should provide an option to define just ffs.
    Hm. I'm generally not too worried about the length of names; I have a very strong preference for readability, especially in public APIs. psnip_builtin_* and psnip_intrin_* would make it very clear that the function is a GCC-style builtin or MSVC-style intrinsic… Removing the "_builtin"/"_intrin" would also allow for collisions, but it seems like newer intrinsics use CamelCase, so as long as there are no *existing* conflicts it's probably okay. I'm leaning slightly towards the psnip_ffs style, but does anyone else have an opinion?

    Just "ffs" would be okay as long as it's off by default.

    Quote Originally Posted by Bulat Ziganshin View Post
    I agree with idea of defining these functions API exactly as their compiler originals, so we may have both psnip_ffs and psnip_BitScanForward and then, if someone wish, emulation of gcc/msvc builtins on top of that. BTW, it may serve as a sort of forward-compatibility library, f.e. providing bswap for older gcc versions
    FWIW, that's already how things work; it falls back on the plain C for older versions of compilers, too.

    I haven't done the MSVC version yet, though. If anyone is interested, that could be a fun little project; it's easy to test so it doesn't require a lot great deal of expertise, but it could be a rather informative project for someone. Plus, it's a great excuse to spend some time playing with Bit Twiddling Hacks, which is always fun (not being sarcastic, that's one of my favorite web pages ).

  20. #16
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Quote Originally Posted by nemequ View Post
    Good thinking.

    I think the best I can do may be something like

    #if defined(HEDLEY_PRAGMA)
    # undef HEDLEY_PRAGMA
    #endif
    #if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
    # define HEDLEY_PRAGMA(value) _Pragma(#value)
    #elif HEDLEY_GCC_VERSION_CHECK(3,0,0)
    # define HEDLEY_PRAGMA(value) _Pragma(#value)
    #elif HEDLEY_MSVC_VERSION_CHECK(15,0,0)
    # define HEDLEY_PRAGMA(value) __pragma(value)
    #else
    # define HEDLEY_PRAGMA(value)
    #endif

    #if defined(HEDLEY_OPENMP_ALIGNED)
    # undef HEDLEY_OPENMP_ALIGNED
    #endif
    #if defined(_OPENMP) && (_OPENMP >= 201307L)
    # define HEDLEY_OPENMP_ALIGNED(value, alignment) HEDLEY_PRAGMA(omp simd aligned(value:alignment))
    #else
    # define HEDLEY_OPENMP_ALIGNED(value, alignment)
    #endif

    #if defined(HEDLEY_ASSUME_ALIGNED)
    # undef HEDLEY_ASSUME_ALIGNED
    #endif
    #if HEDLEY_INTEL_VERSION_CHECK(9,0,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) __assume_aligned(ptr, align)
    #elif HEDLEY_MSVC_VERSION_CHECK(13,10,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) __assume((((char*) ptr) - ((char*) 0)) % (align) == 0)
    #elif HEDLEY_GCC_HAS_BUILTIN(__builtin_assume_aligned,4,7,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) (ptr = (__typeof__(ptr)) __builtin_assume_aligned((ptr), align))
    #elif HEDLEY_GCC_HAS_BUILTIN(__builtin_unreachable,4,5,0)
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align) ((((char*) ptr) - ((char*) 0)) % (align) == 0) ? 1 : (__builtin_unreachable(), 0)
    #else
    # define HEDLEY_ASSUME_ALIGNED(ptr, align) HEDLEY_OPENMP_ALIGNED(ptr, align)
    #endif


    And just add a big warning that the arguments may be evaluated multiple times, so it must be a variable, *not* an abritrary expression. The const issue (i.e., what if the type of ptr is `double* const`) still bothers me a bit, but in practice you don't see that very much. However, explicitly stating that the arguments may be evaluated multiple times opens up the option to use OpenMP in addition to compiler-specific options. I'm not sure how the OpenMP pragma interacts with other methods of conveying alignment information, though; I'm planning to contact some OpenMP people to talk about it.
    Can you mitigate the problems with const using
    Code:
    __builtin_types_compatible_p(const typeof(p), typeof(p))
    and just skip the assignment if you can't do it, leaving plain __builtin_assume_aligned?

    ADDED: I think it's best to just skip gcc. Because treating it differently makes it likely to make code that works perfectly elsewhere, but doesn't compile there.

  21. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    you can steal a thread pool code here

    > I agree with idea of defining these functions API exactly as their compiler originals, so we may have both psnip_ffs and psnip_BitScanForward and then, if someone wish, emulation of gcc/msvc builtins on top of that. BTW, it may serve as a sort of forward-compatibility library, f.e. providing bswap for older gcc versions

    FWIW, that's already how things work; it falls back on the plain C for older versions of compilers, too.
    no. what i mean:
    1) psnip defines its own psnip_* functions
    2) psnip provides compatibility module that emulates gcc builtin_* functions via psnip_* ones
    3) this module can be included not only by msvc and other builtin-lacking compilers but by older gcc versions too, giving them forward compatibility with later gcc versions

    also, instead of trying to slowly develop builins module yourself, you can steal this code from zstd and other libraries around


    I understand that even microlibs are simpler to maintain if they use other microlibs. In order to simplify both development and usage of your libs, i may propose to copypaste appropriate parts of imported libs (such as Hedley) at the start of your header files (such as TinyCThread.h). Anyway, the more important problem is handling version mismatch in diamond dependencies

    PGI doesn't seem to have any support for atomics (outside of OpenACC/OpenMP), so I think I'm going to end up adding some assembly implementations…
    it probably supports C11/C++11
    Last edited by Bulat Ziganshin; 16th February 2017 at 14:14.

  22. #18
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Thread pools can be tricky depending on what you're trying to do.

    If you want to share a thread pool between multiple independent jobs (eg a decoder job, a do-something-interesting job and an encoder job) then you need multiple queues (think of them as pipes in a unix pipeline) so you can keep track of which order the results come in. Then we start hitting issues that one task may be running so much faster than other tasks (lz decode vs lz encode) and you don't want the reader to get too far ahead and start slurping the entire file into memory with a massive backlog of tasks for the other stages in the pipeline. This is probably something where coroutines work well on instead of threads as you can flip-flop between states, but assuming a standard threading model this then boils down to having queues with size limits.

    I also found sometimes giving the system more threads than it can keep up with actually slows things up. This turned out to be a poor implementation of CPU frequency auto-scaling in the (albeit old) linux kernel, or maybe the bios. For example, maybe I can drive 10 threads flat out on a 16-core system. However if I give it 16 cores it notices they spend a lot of time waiting and it nails down the frequency *by a huge degree* and the entire thing runs slower than just giving it 10. I fixed this by keeping track of the incoming job queue size. If the user asked for N threads and we currently have M working (where M < N) then wake up another worker only if we have >M jobs in the input queue waiting to be processed. This ensures there is always at least one more task for each of those workers. The workers then work in loops and keep processing a single queue until they have nothing left, at which point they check other queues (essentially task stealing), so it auto-balances based on rate that the I/O can handle. It's not perfect, but it had a major impact on speed when offering up too many threads.

    It all gets rather ghastly. I've been putting together such a monstrosity for samtools / htslib over here: https://github.com/samtools/htslib/b.../thread_pool.c

    It ain't pretty!

    You may also want to look at TBB (https://software.intel.com/en-us/intel-tbb), but I don't know how public it all is.

  23. The Following User Says Thank You to JamesB For This Useful Post:

    Bulat Ziganshin (23rd February 2017)

  24. #19
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by m^3 View Post
    Can you mitigate the problems with const using
    Code:
    __builtin_types_compatible_p(const typeof(p), typeof(p))
    and just skip the assignment if you can't do it, leaving plain __builtin_assume_aligned?
    Unfortunately, no. According to the __builtin_types_compatible_p docs, "This built-in function ignores top level qualifiers (e.g., const, volatile). For example, int is equivalent to const int."

    Quote Originally Posted by m^3 View Post
    ADDED: I think it's best to just skip gcc. Because treating it differently makes it likely to make code that works perfectly elsewhere, but doesn't compile there.
    I'm not sure… GCC is a pretty important target, and in practice I'm not sure how much of an issue this would really be. It seems like we would be ignoring a lot of good because of a corner case that not many people are likely to hit

    Besides, I think GCC's style makes more sense, so I don't really feel like punishing them for it.

    Does anyone else have an opinion on whether to

    (a) Skip GCC and clang support
    (b) Do (ptr = (__typeof__(ptr)) __builtin_assume_aligned((ptr), align)) on GCC (and clang)
    (c) Just skip an ASSUME_ALIGNED macro altogether, at least for now

    Quote Originally Posted by Bulat Ziganshin View Post
    you can steal a thread pool code here
    Hm, right now everything in the portable-snippets repo is public domain… not sure how strongly people feel about keeping it that way. 3-clause BSD is okay for me, but I know some people don't like the 2nd clause (about having to reproduce the copyright declaration in documentation)…

    Quote Originally Posted by Bulat Ziganshin View Post
    no. what i mean:
    1) psnip defines its own psnip_* functions
    2) psnip provides compatibility module that emulates gcc builtin_* functions via psnip_* ones
    3) this module can be included not only by msvc and other builtin-lacking compilers but by older gcc versions too, giving them forward compatibility with later gcc versions
    Ah, I see, so for #2 we would actually define, for example, __builtin_ffs on platforms it's not available. As long as it's opt-in, that's okay with me.

    Instead of a separate module, though, how about just requiring something like

    #define PSNIP_DEFINE_BUILTIN_ALIASES
    #include <.../psnip-builtin.h>


    That would be a bit easier to maintain, both from psnip's point of view and likely anyone trying to use the code.

    Quote Originally Posted by Bulat Ziganshin View Post
    also, instead of trying to slowly develop builins module yourself, you can steal this code from zstd and other libraries around
    It's pretty easy to find implementations for inspiration, but direct copying is complicated by copyrights… That said, the stuff on that bit-twiddling hacks site are public domain, so they would be okay.

    Quote Originally Posted by Bulat Ziganshin View Post
    I understand that even microlibs are simpler to maintain if they use other microlibs. In order to simplify both development and usage of your libs, i may propose to copypaste appropriate parts of imported libs (such as Hedley) at the start of your header files (such as TinyCThread.h). Anyway, the more important problem is handling version mismatch in diamond dependencies
    Hedley actually has a pretty strict policy about how this. It looks like

    #if !defined(HEDLEY_VERSION) || (HEDLEY_VERSION < 2)
    #if defined(HEDLEY_VERSION)
    # undef HEDLEY_VERSION
    #endif
    #define HEDLEY_VERSION 2

    #if defined(HEDLEY_VERSION_ENCODE)
    # undef HEDLEY_VERSION_ENCODE
    #endif
    #define HEDLEY_VERSION_ENCODE(major,minor,revision) (((major) * 1000000) + ((minor) * 1000) + (revision))

    #endif /* !defined(HEDLEY_VERSION) || (HEDLEY_VERSION < 2) */


    So, if someone includes an older version of hedley.h it will just be ignored. If someone includes a newer version its okay because we don't make API or ABI incompatible changes.

    it probably supports C11/C++11
    It does, but only in C11 mode, and there is no way to tell in the preprocessor what mode the compiler is in.

  25. #20
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Quote Originally Posted by nemequ View Post
    Unfortunately, no. According to the __builtin_types_compatible_p docs, "This built-in function ignores top level qualifiers (e.g., const, volatile). For example, int is equivalent to const int."



    I'm not sure… GCC is a pretty important target, and in practice I'm not sure how much of an issue this would really be. It seems like we would be ignoring a lot of good because of a corner case that not many people are likely to hit

    Besides, I think GCC's style makes more sense, so I don't really feel like punishing them for it.
    I agree that gcc is an important target, but look at what does the macro do. It gives compiler a hint that may or may not have some effect on optimisation. In some cases it's a major stuff, but I expect that average use will have a small effect. And definitely a small one compared to not compiling.

    I thought you could use C11 _Generic, but according to http://en.cppreference.com/w/c/language/generic gcc strips const qualifiers. You may verify it if you wish.

  26. #21
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Hm, right now everything in the portable-snippets repo is public domain… not sure how strongly people feel about keeping it that way. 3-clause BSD is okay for me, but I know some people don't like the 2nd clause (about having to reproduce the copyright declaration in documentation)…
    probably Yann can share it at whatever license you ask - it's not core part of his libraries. The same applies to builtins-emulation code from zstd/fse/xxhash

    Instead of a separate module, though, how about just requiring something like
    #define PSNIP_DEFINE_BUILTIN_ALIASES
    #include <.../psnip-builtin.h>
    you may go further - always define psnip_* functions and provide 3 #defines to define built_*, msvc-compatble instrinsics and bare function names like "ffs". except for the header size, it looks like a best solution

    So, if someone includes an older version of hedley.h it will just be ignored. If someone includes a newer version its okay because we don't make API or ABI incompatible changes.
    yeah, it's perfect solution. but what about your other libraries - they may be also included twice by defferent header-only libs?

  27. #22
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    probably Yann can share it at whatever license you ask - it's not core part of his libraries.
    Yann, are you watching this thread? I'd like to steal the thread pool and port it over to the C11 threads API (and, therefore, TinyCThread) for portable-snippets. Do you think it would be possible to get the thread pool code CC0-"licensed"? I know that may be a no-go since I think it's owned by FB not you…

    Quote Originally Posted by Bulat Ziganshin View Post
    The same applies to builtins-emulation code from zstd/fse/xxhash
    I took a quick look at some of them. Some of them seem to use a faster implementation, but are limited to specific bit-widths. The methods appear to be copied from that bit twiddling hacks page, so I'm aware of them, but portability was my #1 concern, especially as they are just portable fallbacks for compiler intrinsics. Once I get the portable versions done I may go back and look at providing a couple different versions of the portable variants for when we know we have specific widths.

    Quote Originally Posted by Bulat Ziganshin View Post
    you may go further - always define psnip_* functions and provide 3 #defines to define built_*, msvc-compatble instrinsics and bare function names like "ffs". except for the header size, it looks like a best solution
    I spent some time working on the builtins in portable-snippets. I added some MSVC intrinsics, mainly to make sure everything is ready for them as far as the architectural of the code is concerned. What I ended up with is `psnip_builtin_*` for GCC-style built-ins and `psnip_intrin_*` for MSVC-style intrinsics. If you define `PSNIP_BUILTIN_EMULATE_NATIVE` prior to including the header, it will also fill in the native names on compilers where they don't already exist. i.e., it will `#define __builtin_ffs(value) psnip_builtin_ffs(value)` everything but GCC ≥ 3.3.

    This actually made testing a bit easier since we can just uncoditionally test `psnip_builtin_foo` against `__builtin_foo` unconditionally (the result just won't mean much for platforms which don't have `__builtin_foo`). Most of those tests are just passing random values to both functions and making sure the results match.

    The updated README for builtin.h is probably worth a quick look, but the main benefit (IMHO) is that adding MSVC stuff is really easy now, so… well, patches welcome

    I also moved my safe-math code into the portable-snippets repo, including code to define `__builtin_*_overflow` if requested, just like for the other builtins. The implementation is just a bit too complex to put in builtin.h. I'll probably land something later today for fixed-width types (i.e., stdint.h).

    yeah, it's perfect solution. but what about your other libraries - they may be also included twice by defferent header-only libs?
    They all have include guards, so they can be included multiple times, but version mismatches can be a problem. Hedley really makes a very strong effort here because it is designed to be used as part of a pubilc API. Most of the other stuff is really more useful for a private API.

    If there is a lot of interest it would be possible to do something similar for other libraries… it's just a lot of extra bloat to maintain, and inline functions (which can't be `#undef`ed) need to be versioned.

  28. #23
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Zstandard's thread pool was largely contributed by Nick Terrell,
    and indeed, the code belongs to Facebook, as it was created during paid work time.
    Facebook released it under BSD-3, its standard OSS license.

    I have no idea how much effort it would take to change the license for this file,
    but it doesn't look straightforward (which incentive for authorization party to make such exception ?).
    Easiest / shortest route could be to keep the license as is, since it's quite permissive.

  29. The Following User Says Thank You to Cyan For This Useful Post:

    nemequ (25th February 2017)

  30. #24
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    I've been working on portable-snippetns some more, and I think it's in a pretty useful state now. It would be very helpful if people could try it out.

    Most of the improvements revolve around exact-width types, which there is now a header for in psnip. It will use <stdint.h> if available, or there is a fallback for older MSVC (using __int8, __int16, __int32, and __int64), or a version which only depends on <limits.h>, or projects which already have something in place can just use that.

    This made it possible to add exact-width 32-bit and 64-bit variants of the GCC builtins, such as `psnip_builtin_ffs32` and `psnip_builtin_ffs64`. GCC only defines functions for int, long, and long long, so the *32 and *64 functions should be quite useful for either increasing portability or eliminating some ifdefs. Is there any need for 8 or 16 bit versions?

    There are still builtins and intrinsics to actually implement, but I'd also like to start thinking about low-level functions like this that compilers don't have builtins for. Are there any other common operations which are useful for compression which would benefit from a common implementation? Unaligned loads/stores, maybe?

  31. #25
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    I noticed a bug in the readme of this:
    https://github.com/nemequ/portable-s.../master/endian

    psnip_uint64_t psnip_endian_be64(psnip_uint16_t v);

  32. The Following User Says Thank You to m^3 For This Useful Post:

    nemequ (27th February 2017)

Similar Threads

  1. Replies: 6
    Last Post: 15th June 2016, 05:56
  2. a small data compression contest on hackerrank.com:
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 0
    Last Post: 16th December 2013, 04:24
  3. Data compression for stream with small packet
    By alpha_one_x86 in forum Data Compression
    Replies: 1
    Last Post: 6th May 2012, 18:51
  4. Standard for compression libraries API
    By Bulat Ziganshin in forum Data Compression
    Replies: 47
    Last Post: 30th March 2009, 06:10
  5. a small plea for the command line compression developers
    By SvenBent in forum Data Compression
    Replies: 2
    Last Post: 14th June 2008, 02:51

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •