5d48b312 | 06-May-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor bcmp, bcopy, bzero, memset
* For now continue to use stosq/stosb, movsq/movsb, cmpsq/cmpsb sequences which are well optimized on AMD and Intel. Do not just use the '*b' string
kernel - Refactor bcmp, bcopy, bzero, memset
* For now continue to use stosq/stosb, movsq/movsb, cmpsq/cmpsb sequences which are well optimized on AMD and Intel. Do not just use the '*b' string op. While this is optimized on Intel it is not optimized on AMD.
* Note that two string ops in a row result in a serious pessimization. To fix this, for now, conditionalize the movsb, stosb, or cmpsb op so it is only executed when the remaining count is non-zero. That is, assume nominal 8-byte alignment.
* Refactor pagezero() to use a movq/addq/jne sequence. This is significantly faster than movsq on AMD and only just very slightly slower than movsq on Intel.
* Also use the above adjusted kernel code in libc for these functions, with minor modifications. Since we are copying the code wholesale, replace the copyright for the related files in libc.
* Refactor libc's memset() to replicate the data to all 64 bits code and then use code similar to bzero().
Reported-by: mjg_ (info on pessimizations)
show more ...
|