xref: /llvm-project/libc/src/string/memory_utils/README.md (revision 6363320ba6beeb7a014163aeb5a53b3ad362d844)
169090143SGuillaume Chatelet# The mem* framework
269090143SGuillaume Chatelet
369090143SGuillaume ChateletThe framework handles the following mem* functions:
469090143SGuillaume Chatelet - `memcpy`
569090143SGuillaume Chatelet - `memmove`
669090143SGuillaume Chatelet - `memset`
769090143SGuillaume Chatelet - `bzero`
869090143SGuillaume Chatelet - `bcmp`
969090143SGuillaume Chatelet - `memcmp`
1069090143SGuillaume Chatelet
1169090143SGuillaume Chatelet## Building blocks
1269090143SGuillaume Chatelet
1369090143SGuillaume ChateletThese functions can be built out of a set of lower-level operations:
1469090143SGuillaume Chatelet - **`block`** : operates on a block of `SIZE` bytes.
1569090143SGuillaume Chatelet - **`tail`** : operates on the last `SIZE` bytes of the buffer (e.g., `[dst + count - SIZE, dst + count]`)
1669090143SGuillaume Chatelet - **`head_tail`** : operates on the first and last `SIZE` bytes. This is the same as calling `block` and `tail`.
1769090143SGuillaume Chatelet - **`loop_and_tail`** : calls `block` in a loop to consume as much as possible of the `count` bytes and handle the remaining bytes with a `tail` operation.
1869090143SGuillaume Chatelet
1969090143SGuillaume ChateletAs an illustration, let's take the example of a trivial `memset` implementation:
2069090143SGuillaume Chatelet
2169090143SGuillaume Chatelet ```C++
2269090143SGuillaume Chatelet extern "C" void memset(const char* dst, int value, size_t count) {
2369090143SGuillaume Chatelet    if (count == 0) return;
2469090143SGuillaume Chatelet    if (count == 1) return Memset<1>::block(dst, value);
2569090143SGuillaume Chatelet    if (count == 2) return Memset<2>::block(dst, value);
2669090143SGuillaume Chatelet    if (count == 3) return Memset<3>::block(dst, value);
2769090143SGuillaume Chatelet    if (count <= 8) return Memset<4>::head_tail(dst, value, count);  // Note that 0 to 4 bytes are written twice.
2869090143SGuillaume Chatelet    if (count <= 16) return Memset<8>::head_tail(dst, value, count); // Same here.
2969090143SGuillaume Chatelet    return Memset<16>::loop_and_tail(dst, value, count);
3069090143SGuillaume Chatelet}
3169090143SGuillaume Chatelet ```
3269090143SGuillaume Chatelet
3369090143SGuillaume ChateletNow let's have a look into the `Memset` structure:
3469090143SGuillaume Chatelet
3569090143SGuillaume Chatelet```C++
3669090143SGuillaume Chatelettemplate <size_t Size>
3769090143SGuillaume Chateletstruct Memset {
3869090143SGuillaume Chatelet  static constexpr size_t SIZE = Size;
3969090143SGuillaume Chatelet
40*6363320bSSiva Chandra Reddy  LIBC_INLINE static void block(Ptr dst, uint8_t value) {
4169090143SGuillaume Chatelet    // Implement me
4269090143SGuillaume Chatelet  }
4369090143SGuillaume Chatelet
44*6363320bSSiva Chandra Reddy  LIBC_INLINE static void tail(Ptr dst, uint8_t value, size_t count) {
4569090143SGuillaume Chatelet    block(dst + count - SIZE, value);
4669090143SGuillaume Chatelet  }
4769090143SGuillaume Chatelet
48*6363320bSSiva Chandra Reddy  LIBC_INLINE static void head_tail(Ptr dst, uint8_t value, size_t count) {
4969090143SGuillaume Chatelet    block(dst, value);
5069090143SGuillaume Chatelet    tail(dst, value, count);
5169090143SGuillaume Chatelet  }
5269090143SGuillaume Chatelet
53*6363320bSSiva Chandra Reddy  LIBC_INLINE static void loop_and_tail(Ptr dst, uint8_t value, size_t count) {
5469090143SGuillaume Chatelet    size_t offset = 0;
5569090143SGuillaume Chatelet    do {
5669090143SGuillaume Chatelet      block(dst + offset, value);
5769090143SGuillaume Chatelet      offset += SIZE;
5869090143SGuillaume Chatelet    } while (offset < count - SIZE);
5969090143SGuillaume Chatelet    tail(dst, value, count);
6069090143SGuillaume Chatelet  }
6169090143SGuillaume Chatelet};
6269090143SGuillaume Chatelet```
6369090143SGuillaume Chatelet
6469090143SGuillaume ChateletAs you can see, the `tail`, `head_tail` and `loop_and_tail` are higher order functions that build on each others. Only `block` really needs to be implemented.
6569090143SGuillaume ChateletIn earlier designs we were implementing these higher order functions with templated functions but it appears that it is more readable to have the implementation explicitly stated.
6669090143SGuillaume Chatelet**This design is useful because it provides customization points**. For instance, for `bcmp` on `aarch64` we can provide a better implementation of `head_tail` using vector reduction intrinsics.
6769090143SGuillaume Chatelet
6869090143SGuillaume Chatelet## Scoped specializations
6969090143SGuillaume Chatelet
7069090143SGuillaume ChateletWe can have several specializations of the `Memset` structure. Depending on the target requirements we can use one or several scopes for the same implementation.
7169090143SGuillaume Chatelet
7269090143SGuillaume ChateletIn the following example we use the `generic` implementation for the small sizes but use the `x86` implementation for the loop.
7369090143SGuillaume Chatelet```C++
7469090143SGuillaume Chatelet extern "C" void memset(const char* dst, int value, size_t count) {
7569090143SGuillaume Chatelet    if (count == 0) return;
7669090143SGuillaume Chatelet    if (count == 1) return generic::Memset<1>::block(dst, value);
7769090143SGuillaume Chatelet    if (count == 2) return generic::Memset<2>::block(dst, value);
7869090143SGuillaume Chatelet    if (count == 3) return generic::Memset<3>::block(dst, value);
7969090143SGuillaume Chatelet    if (count <= 8) return generic::Memset<4>::head_tail(dst, value, count);
8069090143SGuillaume Chatelet    if (count <= 16) return generic::Memset<8>::head_tail(dst, value, count);
8169090143SGuillaume Chatelet    return x86::Memset<16>::loop_and_tail(dst, value, count);
8269090143SGuillaume Chatelet}
8369090143SGuillaume Chatelet```
8469090143SGuillaume Chatelet
8569090143SGuillaume Chatelet### The `builtin` scope
8669090143SGuillaume Chatelet
8769090143SGuillaume ChateletUltimately we would like the compiler to provide the code for the `block` function. For this we rely on dedicated builtins available in Clang (e.g., [`__builtin_memset_inline`](https://clang.llvm.org/docs/LanguageExtensions.html#guaranteed-inlined-memset))
8869090143SGuillaume Chatelet
8969090143SGuillaume Chatelet### The `generic` scope
9069090143SGuillaume Chatelet
9169090143SGuillaume ChateletIn this scope we define pure C++ implementations using native integral types and clang vector extensions.
9269090143SGuillaume Chatelet
9369090143SGuillaume Chatelet### The arch specific scopes
9469090143SGuillaume Chatelet
9569090143SGuillaume ChateletThen comes implementations that are using specific architectures or microarchitectures features (e.g., `rep;movsb` for `x86` or `dc zva` for `aarch64`).
9669090143SGuillaume Chatelet
9769090143SGuillaume ChateletThe purpose here is to rely on builtins as much as possible and fallback to `asm volatile` as a last resort.
98