1.\" $NetBSD: atomic_loadstore.9,v 1.6 2020/09/03 00:23:57 riastradh Exp $ 2.\" 3.\" Copyright (c) 2019 The NetBSD Foundation 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Taylor R. Campbell. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.Dd November 25, 2019 31.Dt ATOMIC_LOADSTORE 9 32.Os 33.Sh NAME 34.Nm atomic_load_relaxed , 35.Nm atomic_load_acquire , 36.Nm atomic_load_consume , 37.Nm atomic_store_relaxed , 38.Nm atomic_store_release 39.Nd atomic and ordered memory operations 40.Sh SYNOPSIS 41.In sys/atomic.h 42.Ft T 43.Fn atomic_load_relaxed "const volatile T *p" 44.Ft T 45.Fn atomic_load_acquire "const volatile T *p" 46.Ft T 47.Fn atomic_load_consume "const volatile T *p" 48.Ft void 49.Fn atomic_store_relaxed "volatile T *p" "T v" 50.Ft void 51.Fn atomic_store_release "volatile T *p" "T v" 52.Sh DESCRIPTION 53These type-generic macros implement memory operations that are 54.Em atomic 55and that have 56.Em memory ordering constraints . 57Aside from atomicity and ordering, the load operations are equivalent 58to 59.Li * Ns Fa p 60and the store operations are equivalent to 61.Li * Ns Fa p Li "=" Fa v . 62The pointer 63.Fa p 64must be aligned, even on architectures like x86 which generally lack 65strict alignment requirements; see 66.Sx SIZE AND ALIGNMENT 67for details. 68.Pp 69.Em Atomic 70means that the memory operations cannot be 71.Em fused 72or 73.Em torn : 74.Bl -bullet 75.It 76.Em Fusing 77is combining multiple memory operations on a single object into one 78memory operation, such as replacing 79.Bd -literal -compact 80 *p = v; 81 x = *p; 82.Ed 83by 84.Bd -literal -compact 85 *p = v; 86 x = v; 87.Ed 88since the compiler can prove that 89.Li \&*p 90will yield 91.Li v 92after 93.Li \&*p\ =\ v . 94For 95.Em atomic 96memory operations, the implementation 97.Em will not 98assume that 99.Bl -dash -compact 100.It 101consecutive loads of the same object will return the same value, or 102.It 103a store followed by a load of the same object will return the value 104stored, or 105.It 106consecutive stores of the same object are redundant. 107.El 108Thus, the implementation will not replace two consecutive atomic loads 109by one, will not elide an atomic load following a store, and will not 110combine two consecutive atomic stores into one. 111.Pp 112For example, 113.Bd -literal 114 atomic_store_relaxed(&flag, 1); 115 while (atomic_load_relaxed(&flag)) 116 continue; 117.Ed 118.Pp 119may be used to set a flag and then busy-wait until another thread 120clears it, whereas 121.Bd -literal 122 flag = 1; 123 while (flag) 124 continue; 125.Ed 126.Pp 127may be transformed into the infinite loop 128.Bd -literal 129 flag = 1; 130 while (1) 131 continue; 132.Ed 133.It 134.Em Tearing 135is implementing a memory operation on a large data unit such as a 13632-bit word by issuing multiple memory operations on smaller data units 137such as 8-bit bytes. 138The implementation will not tear 139.Em atomic 140loads or stores into smaller ones. 141Thus, as far as any interrupt, other thread, or other CPU can tell, an 142atomic memory operation is issued either all at once or not at all. 143.Pp 144For example, if a 32-bit word 145.Va w 146is written with 147.Pp 148.Dl atomic_store_relaxed(&w,\ 0x00010002); 149.Pp 150then an interrupt, other thread, or other CPU reading it with 151.Li atomic_load_relaxed(&w) 152will never witness it partially written, whereas 153.Pp 154.Dl w\ =\ 0x00010002; 155.Pp 156might be compiled into a pair of separate 16-bit store instructions 157instead of one single word-sized store instruction, in which case other 158threads may see the intermediate state with only one of the halves 159written. 160.El 161.Pp 162Atomic operations on any single object occur in a total order shared by 163all interrupts, threads, and CPUs, which is consistent with the program 164order in every interrupt, thread, and CPU. 165A single program without interruption or other threads or CPUs will 166always observe its own loads and stores in program order, but another 167program in an interrupt handler, in another thread, or on another CPU 168may issue loads that return values as if the first program's stores 169occurred out of program order, and vice versa. 170Two different threads might each observe a third thread's memory 171operations in different orders. 172.Pp 173The 174.Em memory ordering constraints 175make limited guarantees of ordering relative to memory operations on 176.Em other 177objects as witnessed by interrupts, other threads, or other CPUs, and 178have the following meanings: 179.Bl -tag -width relaxed 180.It relaxed 181No ordering relative to memory operations on any other objects is 182guaranteed. 183Relaxed ordering is the default for ordinary non-atomic memory 184operations like 185.Li "*p" 186and 187.Li "*p = v" . 188.Pp 189Atomic operations with relaxed ordering are cheap: they are not 190read/modify/write atomic operations, and they do not involve any kind 191of inter-CPU ordering barriers. 192.It acquire 193This memory operation happens before all subsequent memory operations 194in program order. 195However, prior memory operations in program order may be reordered to 196happen after this one. 197For example, assuming no aliasing between the pointers, the 198implementation is allowed to treat 199.Bd -literal 200 int x = *p; 201 if (atomic_load_acquire(q)) { 202 int y = *r; 203 *s = x + y; 204 return 1; 205 } 206.Ed 207.Pp 208as if it were 209.Bd -literal 210 if (atomic_load_acquire(q)) { 211 int x = *p; 212 int y = *r; 213 *s = x + y; 214 return 1; 215 } 216.Ed 217.Pp 218but 219.Em not 220as if it were 221.Bd -literal 222 int x = *p; 223 int y = *r; 224 *s = x + y; 225 if (atomic_load_acquire(q)) { 226 return 1; 227 } 228.Ed 229.It consume 230This memory operation happens before all memory operations on objects 231at addresses that are computed from the value returned by this one. 232Otherwise, no ordering relative to memory operations on other objects 233is implied. 234.Pp 235For example, the implementation is allowed to treat 236.Bd -literal 237 struct foo *foo0, *foo1; 238 239 struct foo *f0 = atomic_load_consume(&foo0); 240 struct foo *f1 = atomic_load_consume(&foo1); 241 int x = f0->x; 242 int y = f1->y; 243.Ed 244.Pp 245as if it were 246.Bd -literal 247 struct foo *foo0, *foo1; 248 249 struct foo *f1 = atomic_load_consume(&foo1); 250 struct foo *f0 = atomic_load_consume(&foo0); 251 int y = f1->y; 252 int x = f0->x; 253.Ed 254.Pp 255but loading 256.Li f0->x 257is guaranteed to happen after loading 258.Li foo0 259even if the CPU had a cached value for the address that 260.Li f0->x 261happened to be at, and likewise for 262.Li f1->y 263and 264.Li foo1 . 265.Pp 266.Fn atomic_load_consume 267functions like 268.Fn atomic_load_acquire 269as long as the memory operations that must happen after it are limited 270to addresses that depend on the value returned by it, but it is almost 271always as cheap as 272.Fn atomic_load_relaxed . 273See 274.Sx ACQUIRE OR CONSUME? 275below for more details. 276.It release 277All prior memory operations in program order happen before this one. 278However, subsequent memory operations in program order may be reordered 279to happen before this one too. 280For example, assuming no aliasing between the pointers, the 281implementation is allowed to treat 282.Bd -literal 283 int x = *p; 284 *q = x; 285 atomic_store_release(r, 0); 286 int y = *s; 287 return x + y; 288.Ed 289.Pp 290as if it were 291.Bd -literal 292 int y = *s; 293 int x = *p; 294 *q = x; 295 atomic_store_release(r, 0); 296 return x + y; 297.Ed 298.Pp 299but 300.Em not 301as if it were 302.Bd -literal 303 atomic_store_release(r, 0); 304 int x = *p; 305 int y = *s; 306 *q = x; 307 return x + y; 308.Ed 309.El 310.Ss PAIRING ORDERED MEMORY OPERATIONS 311In general, each 312.Fn atomic_store_release 313.Em must 314be paired with either 315.Fn atomic_load_acquire 316or 317.Fn atomic_load_consume 318in order to have an effect \(em it is only when a release operation 319synchronizes with an acquire or consume operation that any ordering 320guaranteed between memory operations 321.Em before 322the release operation and memory operations 323.Em after 324the acquire/consume operation. 325.Pp 326For example, to set up an entry in a table and then mark the entry 327ready, you should: 328.Bl -enum 329.It 330Perform memory operations to initialize the data. 331.Bd -literal 332 tab[i].x = ...; 333 tab[i].y = ...; 334.Ed 335.It 336Issue 337.Fn atomic_store_release 338to mark it ready. 339.Bd -literal 340 atomic_store_release(&tab[i].ready, 1); 341.Ed 342.It 343Possibly in another thread, issue 344.Fn atomic_load_acquire 345to ascertain whether it is ready. 346.Bd -literal 347 if (atomic_load_acquire(&tab[i].ready) == 0) 348 return EWOULDBLOCK; 349.Ed 350.It 351Perform memory operations to use the data. 352.Bd -literal 353 do_stuff(tab[i].x, tab[i].y); 354.Ed 355.El 356.Pp 357Similarly, if you want to create an object, initialize it, and then 358publish it to be used by another thread, then you should: 359.Bl -enum 360.It 361Perform memory operations to initialize the object. 362.Bd -literal 363 struct mumble *m = kmem_alloc(sizeof(*m), KM_SLEEP); 364 m->x = x; 365 m->y = y; 366 m->z = m->x + m->y; 367.Ed 368.It 369Issue 370.Fn atomic_store_release 371to publish it. 372.Bd -literal 373 atomic_store_release(&the_mumble, m); 374.Ed 375.It 376Possibly in another thread, issue 377.Fn atomic_load_consume 378to get it. 379.Bd -literal 380 struct mumble *m = atomic_load_consume(&the_mumble); 381.Ed 382.It 383Perform memory operations to use the object's members. 384.Bd -literal 385 m->y &= m->x; 386 do_things(m->x, m->y, m->z); 387.Ed 388.El 389.Pp 390In both examples, assuming that the value written by 391.Fn atomic_store_release 392in step\~2 393is read by 394.Fn atomic_load_acquire 395or 396.Fn atomic_load_consume 397in step\~3, this guarantees that all of the memory operations in 398step\~1 complete before any of the memory operations in step\~4 \(em 399even if they happen on different CPUs. 400.Pp 401Without 402.Em both 403the release operation in step\~2 404.Em and 405the acquire or consume operation in step\~3, no ordering is guaranteed 406between the memory operations in steps\~1 and\~4. 407In fact, without 408.Em both 409release and acquire/consume, even the assignment 410.Li m->z\ =\ m->x\ +\ m->y 411in step\~1 might read values of 412.Li m->x 413and 414.Li m->y 415that were written in step\~4. 416.Ss ACQUIRE OR CONSUME? 417You must use 418.Fn atomic_load_acquire 419when subsequent memory operations in program order that must happen 420after the load are on objects at 421.Em addresses that might not depend arithmetically on the resulting value . 422This applies particularly when the choice of whether to do the 423subsequent memory operation depends on a 424.Em control-flow decision based on the resulting value : 425.Bd -literal 426 struct gadget { 427 int ready, x; 428 } the_gadget; 429 430 /* Producer */ 431 the_gadget.x = 42; 432 atomic_store_release(&the_gadget.ready, 1); 433 434 /* Consumer */ 435 if (atomic_load_acquire(&the_gadget.ready) == 0) 436 return EWOULDBLOCK; 437 int x = the_gadget.x; 438.Ed 439.Pp 440Here the 441.Em decision of whether to load 442.Li the_gadget.x 443depends on a control-flow decision depending on value loaded from 444.Li the_gadget.ready , 445and loading 446.Li the_gadget.x 447must happen after loading 448.Li the_gadget.ready . 449Using 450.Fn atomic_load_acquire 451guarantees that the compiler and CPU do not conspire to load 452.Li the_gadget.x 453before we have ascertained that it is ready. 454.Pp 455You may use 456.Fn atomic_load_consume 457if all subsequent memory operations in program order that must happen 458after the load are performed on objects at 459.Em addresses computed arithmetically from the resulting value , 460such as loading a pointer to a structure object and then dereferencing 461it: 462.Bd -literal 463 struct gizmo { 464 int x, y, z; 465 }; 466 struct gizmo null_gizmo; 467 struct gizmo *the_gizmo = &null_gizmo; 468 469 /* Producer */ 470 struct gizmo *g = kmem_alloc(sizeof(*g), KM_SLEEP); 471 g->x = 12; 472 g->y = 34; 473 g->z = 56; 474 atomic_store_release(&the_gizmo, g); 475 476 /* Consumer */ 477 struct gizmo *g = atomic_load_consume(&the_gizmo); 478 int y = g->y; 479.Ed 480.Pp 481Here the 482.Em address 483of 484.Li g->y 485depends on the value of the pointer loaded from 486.Li the_gizmo . 487Using 488.Fn atomic_load_consume 489guarantees that we do not witness a stale cache for that address. 490.Pp 491In some cases it may be unclear. 492For example: 493.Bd -literal 494 int x[2]; 495 bool b; 496 497 /* Producer */ 498 x[0] = 42; 499 atomic_store_release(&b, 0); 500 501 /* Consumer 1 */ 502 int y = atomic_load_???(&b) ? x[0] : x[1]; 503 504 /* Consumer 2 */ 505 int y = x[atomic_load_???(&b) ? 0 : 1]; 506 507 /* Consumer 3 */ 508 int y = x[atomic_load_???(&b) ^ 1]; 509.Ed 510.Pp 511Although the three consumers seem to be equivalent, by the letter of 512C11 consumers\~1 and\~2 require 513.Fn atomic_load_acquire 514because the value determines the address of a subsequent load only via 515control-flow decisions in the 516.Li ?: 517operator, whereas consumer\~3 can use 518.Fn atomic_load_consume . 519However, if you're not sure, you should err on the side of 520.Fn atomic_load_acquire 521until C11 implementations have ironed out the kinks in the semantics. 522.Pp 523On all CPUs other than DEC Alpha, 524.Fn atomic_load_consume 525is cheap \(em it is identical to 526.Fn atomic_load_relaxed . 527In contrast, 528.Fn atomic_load_acquire 529usually implies an expensive memory barrier. 530.Ss SIZE AND ALIGNMENT 531The pointer 532.Fa p 533must be aligned \(em that is, if the object it points to is 534.\" 5352\c 536.ie t \s-2\v'-0.4m'n\v'+0.4m'\s+2 537.el ^n 538.\" 539bytes long, then the low-order 540.Ar n 541bits of 542.Fa p 543must be zero. 544.Pp 545All 546.Nx 547ports support atomic loads and stores on units of data up to 32 bits. 548Some ports additionally support atomic loads and stores on larger 549quantities, like 64-bit quantities, if 550.Dv __HAVE_ATOMIC64_LOADSTORE 551is defined. 552The macros are not allowed on larger quantities of data than the port 553supports atomically; attempts to use them for such quantities should 554result in a compile-time assertion failure. 555.Pp 556For example, as long as you use 557.Fn atomic_store_* 558to write a 32-bit quantity, you can safely use 559.Fn atomic_load_relaxed 560to optimistically read it outside a lock, but for a 64-bit quantity it 561must be conditional on 562.Dv __HAVE_ATOMIC64_LOADSTORE 563\(em otherwise it will lead to compile-time errors on platforms without 56464-bit atomic loads and stores: 565.Bd -literal 566 struct foo { 567 kmutex_t f_lock; 568 uint32_t f_refcnt; 569 uint64_t f_ticket; 570 }; 571 572 if (atomic_load_relaxed(&foo->f_refcnt) == 0) 573 return 123; 574#ifdef __HAVE_ATOMIC64_LOADSTORE 575 if (atomic_load_relaxed(&foo->f_ticket) == ticket) 576 return 123; 577#endif 578 mutex_enter(&foo->f_lock); 579 if (foo->f_refcnt == 0 || foo->f_ticket == ticket) 580 ret = 123; 581 ... 582#ifdef __HAVE_ATOMIC64_LOADSTORE 583 atomic_store_relaxed(&foo->f_ticket, foo->f_ticket + 1); 584#else 585 foo->f_ticket++; 586#endif 587 ... 588 mutex_exit(&foo->f_lock); 589.Ed 590.Sh C11 COMPATIBILITY 591These macros are meant to follow 592.Tn C11 593semantics, in terms of 594.Li atomic_load_explicit() 595and 596.Li atomic_store_explicit() 597with the appropriate memory order specifiers, and are meant to make 598future adoption of the 599.Tn C11 600atomic API easier. 601Eventually it may be mandatory to use the 602.Tn C11 603.Vt _Atomic 604type qualifier or equivalent for the operands. 605.Sh LINUX ANALOGUES 606The Linux kernel provides two macros 607.Li READ_ONCE(x) 608and 609.Li WRITE_ONCE(x,\ v) 610which are similar to 611.Li atomic_load_consume(&x) 612and 613.Li atomic_store_relaxed(&x,\ v) , 614respectively. 615However, while Linux's 616.Li READ_ONCE 617and 618.Li WRITE_ONCE 619prevent fusing, they may in some cases be torn \(em and therefore fail 620to guarantee atomicity \(em because: 621.Bl -bullet 622.It 623They do not require the address 624.Li "&x" 625to be aligned. 626.It 627They do not require 628.Li sizeof(x) 629to be at most the largest size of available atomic loads and stores on 630the host architecture. 631.El 632.Sh MEMORY BARRIERS AND ATOMIC READ/MODIFY/WRITE 633The atomic read/modify/write operations in 634.Xr atomic_ops 3 635have relaxed ordering by default, but can be combined with the memory 636barriers in 637.Xr membar_ops 3 638for the same effect as an acquire operation and a release operation for 639the purposes of pairing with 640.Fn atomic_store_release 641and 642.Fn atomic_load_acquire 643or 644.Fn atomic_load_consume . 645If 646.Li atomic_r/m/w() 647is an atomic read/modify/write operation in 648.Xr atomic_ops 3 , 649then 650.Bd -literal 651 membar_exit(); 652 atomic_r/m/w(obj, ...); 653.Ed 654.Pp 655functions like a release operation on 656.Va obj , 657and 658.Bd -literal 659 atomic_r/m/w(obj, ...); 660 membar_enter(); 661.Ed 662.Pp 663functions like a acquire operation on 664.Va obj . 665.Pp 666.Sy WARNING : 667The combination of 668.Fn atomic_load_relaxed 669and 670.Xr membar_enter 3 671.Em does not 672make an acquire operation; only read/modify/write atomics may be 673combined with 674.Xr membar_enter 3 675this way. 676.Pp 677On architectures where 678.Dv __HAVE_ATOMIC_AS_MEMBAR 679is defined, all the 680.Xr atomic_ops 3 681imply release and acquire operations, so the 682.Xr membar_enter 3 683and 684.Xr membar_exit 3 685are redundant. 686.Sh EXAMPLES 687Maintaining lossy counters. 688These may lose some counts, because the read/modify/write cycle as a 689whole is not atomic. 690But this guarantees that the count will increase by at most one each 691time. 692In contrast, without atomic operations, in principle a write to a 69332-bit counter might be torn into multiple smaller stores, which could 694appear to happen out of order from another CPU's perspective, leading 695to nonsensical counter readouts. 696(For frequent events, consider using per-CPU counters instead in 697practice.) 698.Bd -literal 699 unsigned count; 700 701 void 702 record_event(void) 703 { 704 atomic_store_relaxed(&count, 705 1 + atomic_load_relaxed(&count)); 706 } 707 708 unsigned 709 read_event_count(void) 710 { 711 return atomic_load_relaxed(&count); 712 } 713.Ed 714.Pp 715Initialization barrier. 716.Bd -literal 717 int ready; 718 struct data d; 719 720 void 721 setup_and_notify(void) 722 { 723 setup_data(&d.things); 724 atomic_store_release(&ready, 1); 725 } 726 727 void 728 try_if_ready(void) 729 { 730 if (atomic_load_acquire(&ready)) 731 do_stuff(d.things); 732 } 733.Ed 734.Pp 735Publishing a pointer to the current snapshot of data. 736(Caller must arrange that only one call to 737.Li take_snapshot() 738happens at any 739given time; generally this should be done in coordination with 740.Xr pserialize 9 741or similar to enable resource reclamation.) 742.Bd -literal 743 struct data *current_d; 744 745 void 746 take_snapshot(void) 747 { 748 struct data *d = kmem_alloc(sizeof(*d)); 749 750 d->things = ...; 751 752 atomic_store_release(¤t_d, d); 753 } 754 755 struct data * 756 get_snapshot(void) 757 { 758 return atomic_load_consume(¤t_d); 759 } 760.Ed 761.Sh CODE REFERENCES 762.Pa sys/sys/atomic.h 763.Sh SEE ALSO 764.Xr atomic_ops 3 , 765.Xr membar_ops 3 , 766.Xr pserialize 9 767.Sh HISTORY 768These atomic operations first appeared in 769.Nx 9.0 . 770.Sh CAVEATS 771C11 formally specifies that all subexpressions, except the left 772operands of the 773.Ql && , 774.Ql || , 775.Ql ?: , 776and 777.Ql \&, 778operators and the 779.Li kill_dependency() 780macro, carry dependencies for which 781.Dv memory_order_consume 782guarantees ordering, but most or all implementations to date simply 783treat 784.Dv memory_order_consume 785as 786.Dv memory_order_acquire 787and do not take advantage of data dependencies to elide costly memory 788barriers or load-acquire CPU instructions. 789.Pp 790Instead, we implement 791.Fn atomic_load_consume 792as 793.Fn atomic_load_relaxed 794followed by 795.Xr membar_datadep_consumer 3 , 796which is equivalent to 797.Xr membar_consumer 3 798on DEC Alpha and 799.Xr __insn_barrier 3 800elsewhere. 801.Sh BUGS 802Some idiot decided to call it 803.Em tearing , 804depriving us of the opportunity to say that atomic operations prevent 805fusion and 806.Em fission . 807