xref: /netbsd-src/share/man/man9/atomic_loadstore.9 (revision 82d56013d7b633d116a93943de88e08335357a7c)
1.\"	$NetBSD: atomic_loadstore.9,v 1.6 2020/09/03 00:23:57 riastradh Exp $
2.\"
3.\" Copyright (c) 2019 The NetBSD Foundation
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Taylor R. Campbell.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.Dd November 25, 2019
31.Dt ATOMIC_LOADSTORE 9
32.Os
33.Sh NAME
34.Nm atomic_load_relaxed ,
35.Nm atomic_load_acquire ,
36.Nm atomic_load_consume ,
37.Nm atomic_store_relaxed ,
38.Nm atomic_store_release
39.Nd atomic and ordered memory operations
40.Sh SYNOPSIS
41.In sys/atomic.h
42.Ft T
43.Fn atomic_load_relaxed "const volatile T *p"
44.Ft T
45.Fn atomic_load_acquire "const volatile T *p"
46.Ft T
47.Fn atomic_load_consume "const volatile T *p"
48.Ft void
49.Fn atomic_store_relaxed "volatile T *p" "T v"
50.Ft void
51.Fn atomic_store_release "volatile T *p" "T v"
52.Sh DESCRIPTION
53These type-generic macros implement memory operations that are
54.Em atomic
55and that have
56.Em memory ordering constraints .
57Aside from atomicity and ordering, the load operations are equivalent
58to
59.Li * Ns Fa p
60and the store operations are equivalent to
61.Li * Ns Fa p Li "=" Fa v .
62The pointer
63.Fa p
64must be aligned, even on architectures like x86 which generally lack
65strict alignment requirements; see
66.Sx SIZE AND ALIGNMENT
67for details.
68.Pp
69.Em Atomic
70means that the memory operations cannot be
71.Em fused
72or
73.Em torn :
74.Bl -bullet
75.It
76.Em Fusing
77is combining multiple memory operations on a single object into one
78memory operation, such as replacing
79.Bd -literal -compact
80	*p = v;
81	x = *p;
82.Ed
83by
84.Bd -literal -compact
85	*p = v;
86	x = v;
87.Ed
88since the compiler can prove that
89.Li \&*p
90will yield
91.Li v
92after
93.Li \&*p\ =\ v .
94For
95.Em atomic
96memory operations, the implementation
97.Em will not
98assume that
99.Bl -dash -compact
100.It
101consecutive loads of the same object will return the same value, or
102.It
103a store followed by a load of the same object will return the value
104stored, or
105.It
106consecutive stores of the same object are redundant.
107.El
108Thus, the implementation will not replace two consecutive atomic loads
109by one, will not elide an atomic load following a store, and will not
110combine two consecutive atomic stores into one.
111.Pp
112For example,
113.Bd -literal
114	atomic_store_relaxed(&flag, 1);
115	while (atomic_load_relaxed(&flag))
116		continue;
117.Ed
118.Pp
119may be used to set a flag and then busy-wait until another thread
120clears it, whereas
121.Bd -literal
122	flag = 1;
123	while (flag)
124		continue;
125.Ed
126.Pp
127may be transformed into the infinite loop
128.Bd -literal
129	flag = 1;
130	while (1)
131		continue;
132.Ed
133.It
134.Em Tearing
135is implementing a memory operation on a large data unit such as a
13632-bit word by issuing multiple memory operations on smaller data units
137such as 8-bit bytes.
138The implementation will not tear
139.Em atomic
140loads or stores into smaller ones.
141Thus, as far as any interrupt, other thread, or other CPU can tell, an
142atomic memory operation is issued either all at once or not at all.
143.Pp
144For example, if a 32-bit word
145.Va w
146is written with
147.Pp
148.Dl atomic_store_relaxed(&w,\ 0x00010002);
149.Pp
150then an interrupt, other thread, or other CPU reading it with
151.Li atomic_load_relaxed(&w)
152will never witness it partially written, whereas
153.Pp
154.Dl w\ =\ 0x00010002;
155.Pp
156might be compiled into a pair of separate 16-bit store instructions
157instead of one single word-sized store instruction, in which case other
158threads may see the intermediate state with only one of the halves
159written.
160.El
161.Pp
162Atomic operations on any single object occur in a total order shared by
163all interrupts, threads, and CPUs, which is consistent with the program
164order in every interrupt, thread, and CPU.
165A single program without interruption or other threads or CPUs will
166always observe its own loads and stores in program order, but another
167program in an interrupt handler, in another thread, or on another CPU
168may issue loads that return values as if the first program's stores
169occurred out of program order, and vice versa.
170Two different threads might each observe a third thread's memory
171operations in different orders.
172.Pp
173The
174.Em memory ordering constraints
175make limited guarantees of ordering relative to memory operations on
176.Em other
177objects as witnessed by interrupts, other threads, or other CPUs, and
178have the following meanings:
179.Bl -tag -width relaxed
180.It relaxed
181No ordering relative to memory operations on any other objects is
182guaranteed.
183Relaxed ordering is the default for ordinary non-atomic memory
184operations like
185.Li "*p"
186and
187.Li "*p = v" .
188.Pp
189Atomic operations with relaxed ordering are cheap: they are not
190read/modify/write atomic operations, and they do not involve any kind
191of inter-CPU ordering barriers.
192.It acquire
193This memory operation happens before all subsequent memory operations
194in program order.
195However, prior memory operations in program order may be reordered to
196happen after this one.
197For example, assuming no aliasing between the pointers, the
198implementation is allowed to treat
199.Bd -literal
200	int x = *p;
201	if (atomic_load_acquire(q)) {
202		int y = *r;
203		*s = x + y;
204		return 1;
205	}
206.Ed
207.Pp
208as if it were
209.Bd -literal
210	if (atomic_load_acquire(q)) {
211		int x = *p;
212		int y = *r;
213		*s = x + y;
214		return 1;
215	}
216.Ed
217.Pp
218but
219.Em not
220as if it were
221.Bd -literal
222	int x = *p;
223	int y = *r;
224	*s = x + y;
225	if (atomic_load_acquire(q)) {
226		return 1;
227	}
228.Ed
229.It consume
230This memory operation happens before all memory operations on objects
231at addresses that are computed from the value returned by this one.
232Otherwise, no ordering relative to memory operations on other objects
233is implied.
234.Pp
235For example, the implementation is allowed to treat
236.Bd -literal
237	struct foo *foo0, *foo1;
238
239	struct foo *f0 = atomic_load_consume(&foo0);
240	struct foo *f1 = atomic_load_consume(&foo1);
241	int x = f0->x;
242	int y = f1->y;
243.Ed
244.Pp
245as if it were
246.Bd -literal
247	struct foo *foo0, *foo1;
248
249	struct foo *f1 = atomic_load_consume(&foo1);
250	struct foo *f0 = atomic_load_consume(&foo0);
251	int y = f1->y;
252	int x = f0->x;
253.Ed
254.Pp
255but loading
256.Li f0->x
257is guaranteed to happen after loading
258.Li foo0
259even if the CPU had a cached value for the address that
260.Li f0->x
261happened to be at, and likewise for
262.Li f1->y
263and
264.Li foo1 .
265.Pp
266.Fn atomic_load_consume
267functions like
268.Fn atomic_load_acquire
269as long as the memory operations that must happen after it are limited
270to addresses that depend on the value returned by it, but it is almost
271always as cheap as
272.Fn atomic_load_relaxed .
273See
274.Sx ACQUIRE OR CONSUME?
275below for more details.
276.It release
277All prior memory operations in program order happen before this one.
278However, subsequent memory operations in program order may be reordered
279to happen before this one too.
280For example, assuming no aliasing between the pointers, the
281implementation is allowed to treat
282.Bd -literal
283	int x = *p;
284	*q = x;
285	atomic_store_release(r, 0);
286	int y = *s;
287	return x + y;
288.Ed
289.Pp
290as if it were
291.Bd -literal
292	int y = *s;
293	int x = *p;
294	*q = x;
295	atomic_store_release(r, 0);
296	return x + y;
297.Ed
298.Pp
299but
300.Em not
301as if it were
302.Bd -literal
303	atomic_store_release(r, 0);
304	int x = *p;
305	int y = *s;
306	*q = x;
307	return x + y;
308.Ed
309.El
310.Ss PAIRING ORDERED MEMORY OPERATIONS
311In general, each
312.Fn atomic_store_release
313.Em must
314be paired with either
315.Fn atomic_load_acquire
316or
317.Fn atomic_load_consume
318in order to have an effect \(em it is only when a release operation
319synchronizes with an acquire or consume operation that any ordering
320guaranteed between memory operations
321.Em before
322the release operation and memory operations
323.Em after
324the acquire/consume operation.
325.Pp
326For example, to set up an entry in a table and then mark the entry
327ready, you should:
328.Bl -enum
329.It
330Perform memory operations to initialize the data.
331.Bd -literal
332	tab[i].x = ...;
333	tab[i].y = ...;
334.Ed
335.It
336Issue
337.Fn atomic_store_release
338to mark it ready.
339.Bd -literal
340	atomic_store_release(&tab[i].ready, 1);
341.Ed
342.It
343Possibly in another thread, issue
344.Fn atomic_load_acquire
345to ascertain whether it is ready.
346.Bd -literal
347	if (atomic_load_acquire(&tab[i].ready) == 0)
348		return EWOULDBLOCK;
349.Ed
350.It
351Perform memory operations to use the data.
352.Bd -literal
353	do_stuff(tab[i].x, tab[i].y);
354.Ed
355.El
356.Pp
357Similarly, if you want to create an object, initialize it, and then
358publish it to be used by another thread, then you should:
359.Bl -enum
360.It
361Perform memory operations to initialize the object.
362.Bd -literal
363	struct mumble *m = kmem_alloc(sizeof(*m), KM_SLEEP);
364	m->x = x;
365	m->y = y;
366	m->z = m->x + m->y;
367.Ed
368.It
369Issue
370.Fn atomic_store_release
371to publish it.
372.Bd -literal
373	atomic_store_release(&the_mumble, m);
374.Ed
375.It
376Possibly in another thread, issue
377.Fn atomic_load_consume
378to get it.
379.Bd -literal
380	struct mumble *m = atomic_load_consume(&the_mumble);
381.Ed
382.It
383Perform memory operations to use the object's members.
384.Bd -literal
385	m->y &= m->x;
386	do_things(m->x, m->y, m->z);
387.Ed
388.El
389.Pp
390In both examples, assuming that the value written by
391.Fn atomic_store_release
392in step\~2
393is read by
394.Fn atomic_load_acquire
395or
396.Fn atomic_load_consume
397in step\~3, this guarantees that all of the memory operations in
398step\~1 complete before any of the memory operations in step\~4 \(em
399even if they happen on different CPUs.
400.Pp
401Without
402.Em both
403the release operation in step\~2
404.Em and
405the acquire or consume operation in step\~3, no ordering is guaranteed
406between the memory operations in steps\~1 and\~4.
407In fact, without
408.Em both
409release and acquire/consume, even the assignment
410.Li m->z\ =\ m->x\ +\ m->y
411in step\~1 might read values of
412.Li m->x
413and
414.Li m->y
415that were written in step\~4.
416.Ss ACQUIRE OR CONSUME?
417You must use
418.Fn atomic_load_acquire
419when subsequent memory operations in program order that must happen
420after the load are on objects at
421.Em addresses that might not depend arithmetically on the resulting value .
422This applies particularly when the choice of whether to do the
423subsequent memory operation depends on a
424.Em control-flow decision based on the resulting value :
425.Bd -literal
426	struct gadget {
427		int ready, x;
428	} the_gadget;
429
430	/* Producer */
431	the_gadget.x = 42;
432	atomic_store_release(&the_gadget.ready, 1);
433
434	/* Consumer */
435	if (atomic_load_acquire(&the_gadget.ready) == 0)
436		return EWOULDBLOCK;
437	int x = the_gadget.x;
438.Ed
439.Pp
440Here the
441.Em decision of whether to load
442.Li the_gadget.x
443depends on a control-flow decision depending on value loaded from
444.Li the_gadget.ready ,
445and loading
446.Li the_gadget.x
447must happen after loading
448.Li the_gadget.ready .
449Using
450.Fn atomic_load_acquire
451guarantees that the compiler and CPU do not conspire to load
452.Li the_gadget.x
453before we have ascertained that it is ready.
454.Pp
455You may use
456.Fn atomic_load_consume
457if all subsequent memory operations in program order that must happen
458after the load are performed on objects at
459.Em addresses computed arithmetically from the resulting value ,
460such as loading a pointer to a structure object and then dereferencing
461it:
462.Bd -literal
463	struct gizmo {
464		int x, y, z;
465	};
466	struct gizmo null_gizmo;
467	struct gizmo *the_gizmo = &null_gizmo;
468
469	/* Producer */
470	struct gizmo *g = kmem_alloc(sizeof(*g), KM_SLEEP);
471	g->x = 12;
472	g->y = 34;
473	g->z = 56;
474	atomic_store_release(&the_gizmo, g);
475
476	/* Consumer */
477	struct gizmo *g = atomic_load_consume(&the_gizmo);
478	int y = g->y;
479.Ed
480.Pp
481Here the
482.Em address
483of
484.Li g->y
485depends on the value of the pointer loaded from
486.Li the_gizmo .
487Using
488.Fn atomic_load_consume
489guarantees that we do not witness a stale cache for that address.
490.Pp
491In some cases it may be unclear.
492For example:
493.Bd -literal
494	int x[2];
495	bool b;
496
497	/* Producer */
498	x[0] = 42;
499	atomic_store_release(&b, 0);
500
501	/* Consumer 1 */
502	int y = atomic_load_???(&b) ? x[0] : x[1];
503
504	/* Consumer 2 */
505	int y = x[atomic_load_???(&b) ? 0 : 1];
506
507	/* Consumer 3 */
508	int y = x[atomic_load_???(&b) ^ 1];
509.Ed
510.Pp
511Although the three consumers seem to be equivalent, by the letter of
512C11 consumers\~1 and\~2 require
513.Fn atomic_load_acquire
514because the value determines the address of a subsequent load only via
515control-flow decisions in the
516.Li ?:
517operator, whereas consumer\~3 can use
518.Fn atomic_load_consume .
519However, if you're not sure, you should err on the side of
520.Fn atomic_load_acquire
521until C11 implementations have ironed out the kinks in the semantics.
522.Pp
523On all CPUs other than DEC Alpha,
524.Fn atomic_load_consume
525is cheap \(em it is identical to
526.Fn atomic_load_relaxed .
527In contrast,
528.Fn atomic_load_acquire
529usually implies an expensive memory barrier.
530.Ss SIZE AND ALIGNMENT
531The pointer
532.Fa p
533must be aligned \(em that is, if the object it points to is
534.\"
5352\c
536.ie t \s-2\v'-0.4m'n\v'+0.4m'\s+2
537.el ^n
538.\"
539bytes long, then the low-order
540.Ar n
541bits of
542.Fa p
543must be zero.
544.Pp
545All
546.Nx
547ports support atomic loads and stores on units of data up to 32 bits.
548Some ports additionally support atomic loads and stores on larger
549quantities, like 64-bit quantities, if
550.Dv __HAVE_ATOMIC64_LOADSTORE
551is defined.
552The macros are not allowed on larger quantities of data than the port
553supports atomically; attempts to use them for such quantities should
554result in a compile-time assertion failure.
555.Pp
556For example, as long as you use
557.Fn atomic_store_*
558to write a 32-bit quantity, you can safely use
559.Fn atomic_load_relaxed
560to optimistically read it outside a lock, but for a 64-bit quantity it
561must be conditional on
562.Dv __HAVE_ATOMIC64_LOADSTORE
563\(em otherwise it will lead to compile-time errors on platforms without
56464-bit atomic loads and stores:
565.Bd -literal
566	struct foo {
567		kmutex_t	f_lock;
568		uint32_t	f_refcnt;
569		uint64_t	f_ticket;
570	};
571
572	if (atomic_load_relaxed(&foo->f_refcnt) == 0)
573		return 123;
574#ifdef __HAVE_ATOMIC64_LOADSTORE
575	if (atomic_load_relaxed(&foo->f_ticket) == ticket)
576		return 123;
577#endif
578	mutex_enter(&foo->f_lock);
579	if (foo->f_refcnt == 0 || foo->f_ticket == ticket)
580		ret = 123;
581	...
582#ifdef __HAVE_ATOMIC64_LOADSTORE
583	atomic_store_relaxed(&foo->f_ticket, foo->f_ticket + 1);
584#else
585	foo->f_ticket++;
586#endif
587	...
588	mutex_exit(&foo->f_lock);
589.Ed
590.Sh C11 COMPATIBILITY
591These macros are meant to follow
592.Tn C11
593semantics, in terms of
594.Li atomic_load_explicit()
595and
596.Li atomic_store_explicit()
597with the appropriate memory order specifiers, and are meant to make
598future adoption of the
599.Tn C11
600atomic API easier.
601Eventually it may be mandatory to use the
602.Tn C11
603.Vt _Atomic
604type qualifier or equivalent for the operands.
605.Sh LINUX ANALOGUES
606The Linux kernel provides two macros
607.Li READ_ONCE(x)
608and
609.Li WRITE_ONCE(x,\ v)
610which are similar to
611.Li atomic_load_consume(&x)
612and
613.Li atomic_store_relaxed(&x,\ v) ,
614respectively.
615However, while Linux's
616.Li READ_ONCE
617and
618.Li WRITE_ONCE
619prevent fusing, they may in some cases be torn \(em and therefore fail
620to guarantee atomicity \(em because:
621.Bl -bullet
622.It
623They do not require the address
624.Li "&x"
625to be aligned.
626.It
627They do not require
628.Li sizeof(x)
629to be at most the largest size of available atomic loads and stores on
630the host architecture.
631.El
632.Sh MEMORY BARRIERS AND ATOMIC READ/MODIFY/WRITE
633The atomic read/modify/write operations in
634.Xr atomic_ops 3
635have relaxed ordering by default, but can be combined with the memory
636barriers in
637.Xr membar_ops 3
638for the same effect as an acquire operation and a release operation for
639the purposes of pairing with
640.Fn atomic_store_release
641and
642.Fn atomic_load_acquire
643or
644.Fn atomic_load_consume .
645If
646.Li atomic_r/m/w()
647is an atomic read/modify/write operation in
648.Xr atomic_ops 3 ,
649then
650.Bd -literal
651	membar_exit();
652	atomic_r/m/w(obj, ...);
653.Ed
654.Pp
655functions like a release operation on
656.Va obj ,
657and
658.Bd -literal
659	atomic_r/m/w(obj, ...);
660	membar_enter();
661.Ed
662.Pp
663functions like a acquire operation on
664.Va obj .
665.Pp
666.Sy WARNING :
667The combination of
668.Fn atomic_load_relaxed
669and
670.Xr membar_enter 3
671.Em does not
672make an acquire operation; only read/modify/write atomics may be
673combined with
674.Xr membar_enter 3
675this way.
676.Pp
677On architectures where
678.Dv __HAVE_ATOMIC_AS_MEMBAR
679is defined, all the
680.Xr atomic_ops 3
681imply release and acquire operations, so the
682.Xr membar_enter 3
683and
684.Xr membar_exit 3
685are redundant.
686.Sh EXAMPLES
687Maintaining lossy counters.
688These may lose some counts, because the read/modify/write cycle as a
689whole is not atomic.
690But this guarantees that the count will increase by at most one each
691time.
692In contrast, without atomic operations, in principle a write to a
69332-bit counter might be torn into multiple smaller stores, which could
694appear to happen out of order from another CPU's perspective, leading
695to nonsensical counter readouts.
696(For frequent events, consider using per-CPU counters instead in
697practice.)
698.Bd -literal
699	unsigned count;
700
701	void
702	record_event(void)
703	{
704		atomic_store_relaxed(&count,
705		    1 + atomic_load_relaxed(&count));
706	}
707
708	unsigned
709	read_event_count(void)
710	{
711		return atomic_load_relaxed(&count);
712	}
713.Ed
714.Pp
715Initialization barrier.
716.Bd -literal
717	int ready;
718	struct data d;
719
720	void
721	setup_and_notify(void)
722	{
723		setup_data(&d.things);
724		atomic_store_release(&ready, 1);
725	}
726
727	void
728	try_if_ready(void)
729	{
730		if (atomic_load_acquire(&ready))
731			do_stuff(d.things);
732	}
733.Ed
734.Pp
735Publishing a pointer to the current snapshot of data.
736(Caller must arrange that only one call to
737.Li take_snapshot()
738happens at any
739given time; generally this should be done in coordination with
740.Xr pserialize 9
741or similar to enable resource reclamation.)
742.Bd -literal
743	struct data *current_d;
744
745	void
746	take_snapshot(void)
747        {
748		struct data *d = kmem_alloc(sizeof(*d));
749
750		d->things = ...;
751
752		atomic_store_release(&current_d, d);
753	}
754
755	struct data *
756	get_snapshot(void)
757	{
758		return atomic_load_consume(&current_d);
759	}
760.Ed
761.Sh CODE REFERENCES
762.Pa sys/sys/atomic.h
763.Sh SEE ALSO
764.Xr atomic_ops 3 ,
765.Xr membar_ops 3 ,
766.Xr pserialize 9
767.Sh HISTORY
768These atomic operations first appeared in
769.Nx 9.0 .
770.Sh CAVEATS
771C11 formally specifies that all subexpressions, except the left
772operands of the
773.Ql && ,
774.Ql || ,
775.Ql ?: ,
776and
777.Ql \&,
778operators and the
779.Li kill_dependency()
780macro, carry dependencies for which
781.Dv memory_order_consume
782guarantees ordering, but most or all implementations to date simply
783treat
784.Dv memory_order_consume
785as
786.Dv memory_order_acquire
787and do not take advantage of data dependencies to elide costly memory
788barriers or load-acquire CPU instructions.
789.Pp
790Instead, we implement
791.Fn atomic_load_consume
792as
793.Fn atomic_load_relaxed
794followed by
795.Xr membar_datadep_consumer 3 ,
796which is equivalent to
797.Xr membar_consumer 3
798on DEC Alpha and
799.Xr __insn_barrier 3
800elsewhere.
801.Sh BUGS
802Some idiot decided to call it
803.Em tearing ,
804depriving us of the opportunity to say that atomic operations prevent
805fusion and
806.Em fission .
807