1*0Sstevel@tonic-gate /* 2*0Sstevel@tonic-gate * CDDL HEADER START 3*0Sstevel@tonic-gate * 4*0Sstevel@tonic-gate * The contents of this file are subject to the terms of the 5*0Sstevel@tonic-gate * Common Development and Distribution License, Version 1.0 only 6*0Sstevel@tonic-gate * (the "License"). You may not use this file except in compliance 7*0Sstevel@tonic-gate * with the License. 8*0Sstevel@tonic-gate * 9*0Sstevel@tonic-gate * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 10*0Sstevel@tonic-gate * or http://www.opensolaris.org/os/licensing. 11*0Sstevel@tonic-gate * See the License for the specific language governing permissions 12*0Sstevel@tonic-gate * and limitations under the License. 13*0Sstevel@tonic-gate * 14*0Sstevel@tonic-gate * When distributing Covered Code, include this CDDL HEADER in each 15*0Sstevel@tonic-gate * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 16*0Sstevel@tonic-gate * If applicable, add the following below this CDDL HEADER, with the 17*0Sstevel@tonic-gate * fields enclosed by brackets "[]" replaced with your own identifying 18*0Sstevel@tonic-gate * information: Portions Copyright [yyyy] [name of copyright owner] 19*0Sstevel@tonic-gate * 20*0Sstevel@tonic-gate * CDDL HEADER END 21*0Sstevel@tonic-gate */ 22*0Sstevel@tonic-gate /* 23*0Sstevel@tonic-gate * Copyright 2004 Sun Microsystems, Inc. All rights reserved. 24*0Sstevel@tonic-gate * Use is subject to license terms. 25*0Sstevel@tonic-gate */ 26*0Sstevel@tonic-gate 27*0Sstevel@tonic-gate #pragma ident "%Z%%M% %I% %E% SMI" 28*0Sstevel@tonic-gate 29*0Sstevel@tonic-gate /* 30*0Sstevel@tonic-gate * Big Theory Statement for mutual exclusion locking primitives. 31*0Sstevel@tonic-gate * 32*0Sstevel@tonic-gate * A mutex serializes multiple threads so that only one thread 33*0Sstevel@tonic-gate * (the "owner" of the mutex) is active at a time. See mutex(9F) 34*0Sstevel@tonic-gate * for a full description of the interfaces and programming model. 35*0Sstevel@tonic-gate * The rest of this comment describes the implementation. 36*0Sstevel@tonic-gate * 37*0Sstevel@tonic-gate * Mutexes come in two flavors: adaptive and spin. mutex_init(9F) 38*0Sstevel@tonic-gate * determines the type based solely on the iblock cookie (PIL) argument. 39*0Sstevel@tonic-gate * PIL > LOCK_LEVEL implies a spin lock; everything else is adaptive. 40*0Sstevel@tonic-gate * 41*0Sstevel@tonic-gate * Spin mutexes block interrupts and spin until the lock becomes available. 42*0Sstevel@tonic-gate * A thread may not sleep, or call any function that might sleep, while 43*0Sstevel@tonic-gate * holding a spin mutex. With few exceptions, spin mutexes should only 44*0Sstevel@tonic-gate * be used to synchronize with interrupt handlers. 45*0Sstevel@tonic-gate * 46*0Sstevel@tonic-gate * Adaptive mutexes (the default type) spin if the owner is running on 47*0Sstevel@tonic-gate * another CPU and block otherwise. This policy is based on the assumption 48*0Sstevel@tonic-gate * that mutex hold times are typically short enough that the time spent 49*0Sstevel@tonic-gate * spinning is less than the time it takes to block. If you need mutual 50*0Sstevel@tonic-gate * exclusion semantics with long hold times, consider an rwlock(9F) as 51*0Sstevel@tonic-gate * RW_WRITER. Better still, reconsider the algorithm: if it requires 52*0Sstevel@tonic-gate * mutual exclusion for long periods of time, it's probably not scalable. 53*0Sstevel@tonic-gate * 54*0Sstevel@tonic-gate * Adaptive mutexes are overwhelmingly more common than spin mutexes, 55*0Sstevel@tonic-gate * so mutex_enter() assumes that the lock is adaptive. We get away 56*0Sstevel@tonic-gate * with this by structuring mutexes so that an attempt to acquire a 57*0Sstevel@tonic-gate * spin mutex as adaptive always fails. When mutex_enter() fails 58*0Sstevel@tonic-gate * it punts to mutex_vector_enter(), which does all the hard stuff. 59*0Sstevel@tonic-gate * 60*0Sstevel@tonic-gate * mutex_vector_enter() first checks the type. If it's spin mutex, 61*0Sstevel@tonic-gate * we just call lock_set_spl() and return. If it's an adaptive mutex, 62*0Sstevel@tonic-gate * we check to see what the owner is doing. If the owner is running, 63*0Sstevel@tonic-gate * we spin until the lock becomes available; if not, we mark the lock 64*0Sstevel@tonic-gate * as having waiters and block. 65*0Sstevel@tonic-gate * 66*0Sstevel@tonic-gate * Blocking on a mutex is surprisingly delicate dance because, for speed, 67*0Sstevel@tonic-gate * mutex_exit() doesn't use an atomic instruction. Thus we have to work 68*0Sstevel@tonic-gate * a little harder in the (rarely-executed) blocking path to make sure 69*0Sstevel@tonic-gate * we don't block on a mutex that's just been released -- otherwise we 70*0Sstevel@tonic-gate * might never be woken up. 71*0Sstevel@tonic-gate * 72*0Sstevel@tonic-gate * The logic for synchronizing mutex_vector_enter() with mutex_exit() 73*0Sstevel@tonic-gate * in the face of preemption and relaxed memory ordering is as follows: 74*0Sstevel@tonic-gate * 75*0Sstevel@tonic-gate * (1) Preemption in the middle of mutex_exit() must cause mutex_exit() 76*0Sstevel@tonic-gate * to restart. Each platform must enforce this by checking the 77*0Sstevel@tonic-gate * interrupted PC in the interrupt handler (or on return from trap -- 78*0Sstevel@tonic-gate * whichever is more convenient for the platform). If the PC 79*0Sstevel@tonic-gate * lies within the critical region of mutex_exit(), the interrupt 80*0Sstevel@tonic-gate * handler must reset the PC back to the beginning of mutex_exit(). 81*0Sstevel@tonic-gate * The critical region consists of all instructions up to, but not 82*0Sstevel@tonic-gate * including, the store that clears the lock (which, of course, 83*0Sstevel@tonic-gate * must never be executed twice.) 84*0Sstevel@tonic-gate * 85*0Sstevel@tonic-gate * This ensures that the owner will always check for waiters after 86*0Sstevel@tonic-gate * resuming from a previous preemption. 87*0Sstevel@tonic-gate * 88*0Sstevel@tonic-gate * (2) A thread resuming in mutex_exit() does (at least) the following: 89*0Sstevel@tonic-gate * 90*0Sstevel@tonic-gate * when resuming: set CPU_THREAD = owner 91*0Sstevel@tonic-gate * membar #StoreLoad 92*0Sstevel@tonic-gate * 93*0Sstevel@tonic-gate * in mutex_exit: check waiters bit; do wakeup if set 94*0Sstevel@tonic-gate * membar #LoadStore|#StoreStore 95*0Sstevel@tonic-gate * clear owner 96*0Sstevel@tonic-gate * (at this point, other threads may or may not grab 97*0Sstevel@tonic-gate * the lock, and we may or may not reacquire it) 98*0Sstevel@tonic-gate * 99*0Sstevel@tonic-gate * when blocking: membar #StoreStore (due to disp_lock_enter()) 100*0Sstevel@tonic-gate * set CPU_THREAD = (possibly) someone else 101*0Sstevel@tonic-gate * 102*0Sstevel@tonic-gate * (3) A thread blocking in mutex_vector_enter() does the following: 103*0Sstevel@tonic-gate * 104*0Sstevel@tonic-gate * set waiters bit 105*0Sstevel@tonic-gate * membar #StoreLoad (via membar_enter()) 106*0Sstevel@tonic-gate * check CPU_THREAD for each CPU; abort if owner running 107*0Sstevel@tonic-gate * membar #LoadLoad (via membar_consumer()) 108*0Sstevel@tonic-gate * check owner and waiters bit; abort if either changed 109*0Sstevel@tonic-gate * block 110*0Sstevel@tonic-gate * 111*0Sstevel@tonic-gate * Thus the global memory orderings for (2) and (3) are as follows: 112*0Sstevel@tonic-gate * 113*0Sstevel@tonic-gate * (2M) mutex_exit() memory order: 114*0Sstevel@tonic-gate * 115*0Sstevel@tonic-gate * STORE CPU_THREAD = owner 116*0Sstevel@tonic-gate * LOAD waiters bit 117*0Sstevel@tonic-gate * STORE owner = NULL 118*0Sstevel@tonic-gate * STORE CPU_THREAD = (possibly) someone else 119*0Sstevel@tonic-gate * 120*0Sstevel@tonic-gate * (3M) mutex_vector_enter() memory order: 121*0Sstevel@tonic-gate * 122*0Sstevel@tonic-gate * STORE waiters bit = 1 123*0Sstevel@tonic-gate * LOAD CPU_THREAD for each CPU 124*0Sstevel@tonic-gate * LOAD owner and waiters bit 125*0Sstevel@tonic-gate * 126*0Sstevel@tonic-gate * It has been verified by exhaustive simulation that all possible global 127*0Sstevel@tonic-gate * memory orderings of (2M) interleaved with (3M) result in correct 128*0Sstevel@tonic-gate * behavior. Moreover, these ordering constraints are minimal: changing 129*0Sstevel@tonic-gate * the ordering of anything in (2M) or (3M) breaks the algorithm, creating 130*0Sstevel@tonic-gate * windows for missed wakeups. Note: the possibility that other threads 131*0Sstevel@tonic-gate * may grab the lock after the owner drops it can be factored out of the 132*0Sstevel@tonic-gate * memory ordering analysis because mutex_vector_enter() won't block 133*0Sstevel@tonic-gate * if the lock isn't still owned by the same thread. 134*0Sstevel@tonic-gate * 135*0Sstevel@tonic-gate * The only requirements of code outside the mutex implementation are 136*0Sstevel@tonic-gate * (1) mutex_exit() preemption fixup in interrupt handlers or trap return, 137*0Sstevel@tonic-gate * and (2) a membar #StoreLoad after setting CPU_THREAD in resume(). 138*0Sstevel@tonic-gate * Note: idle threads cannot grab adaptive locks (since they cannot block), 139*0Sstevel@tonic-gate * so the membar may be safely omitted when resuming an idle thread. 140*0Sstevel@tonic-gate * 141*0Sstevel@tonic-gate * When a mutex has waiters, mutex_vector_exit() has several options: 142*0Sstevel@tonic-gate * 143*0Sstevel@tonic-gate * (1) Choose a waiter and make that thread the owner before waking it; 144*0Sstevel@tonic-gate * this is known as "direct handoff" of ownership. 145*0Sstevel@tonic-gate * 146*0Sstevel@tonic-gate * (2) Drop the lock and wake one waiter. 147*0Sstevel@tonic-gate * 148*0Sstevel@tonic-gate * (3) Drop the lock, clear the waiters bit, and wake all waiters. 149*0Sstevel@tonic-gate * 150*0Sstevel@tonic-gate * In many ways (1) is the cleanest solution, but if a lock is moderately 151*0Sstevel@tonic-gate * contended it defeats the adaptive spin logic. If we make some other 152*0Sstevel@tonic-gate * thread the owner, but he's not ONPROC yet, then all other threads on 153*0Sstevel@tonic-gate * other cpus that try to get the lock will conclude that the owner is 154*0Sstevel@tonic-gate * blocked, so they'll block too. And so on -- it escalates quickly, 155*0Sstevel@tonic-gate * with every thread taking the blocking path rather than the spin path. 156*0Sstevel@tonic-gate * Thus, direct handoff is *not* a good idea for adaptive mutexes. 157*0Sstevel@tonic-gate * 158*0Sstevel@tonic-gate * Option (2) is the next most natural-seeming option, but it has several 159*0Sstevel@tonic-gate * annoying properties. If there's more than one waiter, we must preserve 160*0Sstevel@tonic-gate * the waiters bit on an unheld lock. On cas-capable platforms, where 161*0Sstevel@tonic-gate * the waiters bit is part of the lock word, this means that both 0x0 162*0Sstevel@tonic-gate * and 0x1 represent unheld locks, so we have to cas against *both*. 163*0Sstevel@tonic-gate * Priority inheritance also gets more complicated, because a lock can 164*0Sstevel@tonic-gate * have waiters but no owner to whom priority can be willed. So while 165*0Sstevel@tonic-gate * it is possible to make option (2) work, it's surprisingly vile. 166*0Sstevel@tonic-gate * 167*0Sstevel@tonic-gate * Option (3), the least-intuitive at first glance, is what we actually do. 168*0Sstevel@tonic-gate * It has the advantage that because you always wake all waiters, you 169*0Sstevel@tonic-gate * never have to preserve the waiters bit. Waking all waiters seems like 170*0Sstevel@tonic-gate * begging for a thundering herd problem, but consider: under option (2), 171*0Sstevel@tonic-gate * every thread that grabs and drops the lock will wake one waiter -- so 172*0Sstevel@tonic-gate * if the lock is fairly active, all waiters will be awakened very quickly 173*0Sstevel@tonic-gate * anyway. Moreover, this is how adaptive locks are *supposed* to work. 174*0Sstevel@tonic-gate * The blocking case is rare; the more common case (by 3-4 orders of 175*0Sstevel@tonic-gate * magnitude) is that one or more threads spin waiting to get the lock. 176*0Sstevel@tonic-gate * Only direct handoff can prevent the thundering herd problem, but as 177*0Sstevel@tonic-gate * mentioned earlier, that would tend to defeat the adaptive spin logic. 178*0Sstevel@tonic-gate * In practice, option (3) works well because the blocking case is rare. 179*0Sstevel@tonic-gate */ 180*0Sstevel@tonic-gate 181*0Sstevel@tonic-gate /* 182*0Sstevel@tonic-gate * delayed lock retry with exponential delay for spin locks 183*0Sstevel@tonic-gate * 184*0Sstevel@tonic-gate * It is noted above that for both the spin locks and the adaptive locks, 185*0Sstevel@tonic-gate * spinning is the dominate mode of operation. So long as there is only 186*0Sstevel@tonic-gate * one thread waiting on a lock, the naive spin loop works very well in 187*0Sstevel@tonic-gate * cache based architectures. The lock data structure is pulled into the 188*0Sstevel@tonic-gate * cache of the processor with the waiting/spinning thread and no further 189*0Sstevel@tonic-gate * memory traffic is generated until the lock is released. Unfortunately, 190*0Sstevel@tonic-gate * once two or more threads are waiting on a lock, the naive spin has 191*0Sstevel@tonic-gate * the property of generating maximum memory traffic from each spinning 192*0Sstevel@tonic-gate * thread as the spinning threads contend for the lock data structure. 193*0Sstevel@tonic-gate * 194*0Sstevel@tonic-gate * By executing a delay loop before retrying a lock, a waiting thread 195*0Sstevel@tonic-gate * can reduce its memory traffic by a large factor, depending on the 196*0Sstevel@tonic-gate * size of the delay loop. A large delay loop greatly reduced the memory 197*0Sstevel@tonic-gate * traffic, but has the drawback of having a period of time when 198*0Sstevel@tonic-gate * no thread is attempting to gain the lock even though several threads 199*0Sstevel@tonic-gate * might be waiting. A small delay loop has the drawback of not 200*0Sstevel@tonic-gate * much reduction in memory traffic, but reduces the potential idle time. 201*0Sstevel@tonic-gate * The theory of the exponential delay code is to start with a short 202*0Sstevel@tonic-gate * delay loop and double the waiting time on each iteration, up to 203*0Sstevel@tonic-gate * a preselected maximum. The BACKOFF_BASE provides the equivalent 204*0Sstevel@tonic-gate * of 2 to 3 memory references delay for US-III+ and US-IV architectures. 205*0Sstevel@tonic-gate * The BACKOFF_CAP is the equivalent of 50 to 100 memory references of 206*0Sstevel@tonic-gate * time (less than 12 microseconds for a 1000 MHz system). 207*0Sstevel@tonic-gate * 208*0Sstevel@tonic-gate * To determine appropriate BACKOFF_BASE and BACKOFF_CAP values, 209*0Sstevel@tonic-gate * studies on US-III+ and US-IV systems using 1 to 66 threads were 210*0Sstevel@tonic-gate * done. A range of possible values were studied. 211*0Sstevel@tonic-gate * Performance differences below 10 threads were not large. For 212*0Sstevel@tonic-gate * systems with more threads, substantial increases in total lock 213*0Sstevel@tonic-gate * throughput was observed with the given values. For cases where 214*0Sstevel@tonic-gate * more than 20 threads were waiting on the same lock, lock throughput 215*0Sstevel@tonic-gate * increased by a factor of 5 or more using the backoff algorithm. 216*0Sstevel@tonic-gate */ 217*0Sstevel@tonic-gate 218*0Sstevel@tonic-gate #include <sys/param.h> 219*0Sstevel@tonic-gate #include <sys/time.h> 220*0Sstevel@tonic-gate #include <sys/cpuvar.h> 221*0Sstevel@tonic-gate #include <sys/thread.h> 222*0Sstevel@tonic-gate #include <sys/debug.h> 223*0Sstevel@tonic-gate #include <sys/cmn_err.h> 224*0Sstevel@tonic-gate #include <sys/sobject.h> 225*0Sstevel@tonic-gate #include <sys/turnstile.h> 226*0Sstevel@tonic-gate #include <sys/systm.h> 227*0Sstevel@tonic-gate #include <sys/mutex_impl.h> 228*0Sstevel@tonic-gate #include <sys/spl.h> 229*0Sstevel@tonic-gate #include <sys/lockstat.h> 230*0Sstevel@tonic-gate #include <sys/atomic.h> 231*0Sstevel@tonic-gate #include <sys/cpu.h> 232*0Sstevel@tonic-gate #include <sys/stack.h> 233*0Sstevel@tonic-gate 234*0Sstevel@tonic-gate #define BACKOFF_BASE 50 235*0Sstevel@tonic-gate #define BACKOFF_CAP 1600 236*0Sstevel@tonic-gate 237*0Sstevel@tonic-gate /* 238*0Sstevel@tonic-gate * The sobj_ops vector exports a set of functions needed when a thread 239*0Sstevel@tonic-gate * is asleep on a synchronization object of this type. 240*0Sstevel@tonic-gate */ 241*0Sstevel@tonic-gate static sobj_ops_t mutex_sobj_ops = { 242*0Sstevel@tonic-gate SOBJ_MUTEX, mutex_owner, turnstile_stay_asleep, turnstile_change_pri 243*0Sstevel@tonic-gate }; 244*0Sstevel@tonic-gate 245*0Sstevel@tonic-gate /* 246*0Sstevel@tonic-gate * If the system panics on a mutex, save the address of the offending 247*0Sstevel@tonic-gate * mutex in panic_mutex_addr, and save the contents in panic_mutex. 248*0Sstevel@tonic-gate */ 249*0Sstevel@tonic-gate static mutex_impl_t panic_mutex; 250*0Sstevel@tonic-gate static mutex_impl_t *panic_mutex_addr; 251*0Sstevel@tonic-gate 252*0Sstevel@tonic-gate static void 253*0Sstevel@tonic-gate mutex_panic(char *msg, mutex_impl_t *lp) 254*0Sstevel@tonic-gate { 255*0Sstevel@tonic-gate if (panicstr) 256*0Sstevel@tonic-gate return; 257*0Sstevel@tonic-gate 258*0Sstevel@tonic-gate if (casptr(&panic_mutex_addr, NULL, lp) == NULL) 259*0Sstevel@tonic-gate panic_mutex = *lp; 260*0Sstevel@tonic-gate 261*0Sstevel@tonic-gate panic("%s, lp=%p owner=%p thread=%p", 262*0Sstevel@tonic-gate msg, lp, MUTEX_OWNER(&panic_mutex), curthread); 263*0Sstevel@tonic-gate } 264*0Sstevel@tonic-gate 265*0Sstevel@tonic-gate /* 266*0Sstevel@tonic-gate * mutex_vector_enter() is called from the assembly mutex_enter() routine 267*0Sstevel@tonic-gate * if the lock is held or is not of type MUTEX_ADAPTIVE. 268*0Sstevel@tonic-gate */ 269*0Sstevel@tonic-gate void 270*0Sstevel@tonic-gate mutex_vector_enter(mutex_impl_t *lp) 271*0Sstevel@tonic-gate { 272*0Sstevel@tonic-gate kthread_id_t owner; 273*0Sstevel@tonic-gate hrtime_t sleep_time = 0; /* how long we slept */ 274*0Sstevel@tonic-gate uint_t spin_count = 0; /* how many times we spun */ 275*0Sstevel@tonic-gate cpu_t *cpup, *last_cpu; 276*0Sstevel@tonic-gate extern cpu_t *cpu_list; 277*0Sstevel@tonic-gate turnstile_t *ts; 278*0Sstevel@tonic-gate volatile mutex_impl_t *vlp = (volatile mutex_impl_t *)lp; 279*0Sstevel@tonic-gate int backoff; /* current backoff */ 280*0Sstevel@tonic-gate int backctr; /* ctr for backoff */ 281*0Sstevel@tonic-gate 282*0Sstevel@tonic-gate ASSERT_STACK_ALIGNED(); 283*0Sstevel@tonic-gate 284*0Sstevel@tonic-gate if (MUTEX_TYPE_SPIN(lp)) { 285*0Sstevel@tonic-gate lock_set_spl(&lp->m_spin.m_spinlock, lp->m_spin.m_minspl, 286*0Sstevel@tonic-gate &lp->m_spin.m_oldspl); 287*0Sstevel@tonic-gate return; 288*0Sstevel@tonic-gate } 289*0Sstevel@tonic-gate 290*0Sstevel@tonic-gate if (!MUTEX_TYPE_ADAPTIVE(lp)) { 291*0Sstevel@tonic-gate mutex_panic("mutex_enter: bad mutex", lp); 292*0Sstevel@tonic-gate return; 293*0Sstevel@tonic-gate } 294*0Sstevel@tonic-gate 295*0Sstevel@tonic-gate /* 296*0Sstevel@tonic-gate * Adaptive mutexes must not be acquired from above LOCK_LEVEL. 297*0Sstevel@tonic-gate * We can migrate after loading CPU but before checking CPU_ON_INTR, 298*0Sstevel@tonic-gate * so we must verify by disabling preemption and loading CPU again. 299*0Sstevel@tonic-gate */ 300*0Sstevel@tonic-gate cpup = CPU; 301*0Sstevel@tonic-gate if (CPU_ON_INTR(cpup) && !panicstr) { 302*0Sstevel@tonic-gate kpreempt_disable(); 303*0Sstevel@tonic-gate if (CPU_ON_INTR(CPU)) 304*0Sstevel@tonic-gate mutex_panic("mutex_enter: adaptive at high PIL", lp); 305*0Sstevel@tonic-gate kpreempt_enable(); 306*0Sstevel@tonic-gate } 307*0Sstevel@tonic-gate 308*0Sstevel@tonic-gate CPU_STATS_ADDQ(cpup, sys, mutex_adenters, 1); 309*0Sstevel@tonic-gate 310*0Sstevel@tonic-gate backoff = BACKOFF_BASE; 311*0Sstevel@tonic-gate 312*0Sstevel@tonic-gate for (;;) { 313*0Sstevel@tonic-gate spin: 314*0Sstevel@tonic-gate spin_count++; 315*0Sstevel@tonic-gate /* 316*0Sstevel@tonic-gate * Add an exponential backoff delay before trying again 317*0Sstevel@tonic-gate * to touch the mutex data structure. 318*0Sstevel@tonic-gate * the spin_count test and call to nulldev are to prevent 319*0Sstevel@tonic-gate * the compiler optimizer from eliminating the delay loop. 320*0Sstevel@tonic-gate */ 321*0Sstevel@tonic-gate for (backctr = backoff; backctr; backctr--) { 322*0Sstevel@tonic-gate if (!spin_count) (void) nulldev(); 323*0Sstevel@tonic-gate }; /* delay */ 324*0Sstevel@tonic-gate backoff = backoff << 1; /* double it */ 325*0Sstevel@tonic-gate if (backoff > BACKOFF_CAP) { 326*0Sstevel@tonic-gate backoff = BACKOFF_CAP; 327*0Sstevel@tonic-gate } 328*0Sstevel@tonic-gate 329*0Sstevel@tonic-gate SMT_PAUSE(); 330*0Sstevel@tonic-gate 331*0Sstevel@tonic-gate if (panicstr) 332*0Sstevel@tonic-gate return; 333*0Sstevel@tonic-gate 334*0Sstevel@tonic-gate if ((owner = MUTEX_OWNER(vlp)) == NULL) { 335*0Sstevel@tonic-gate if (mutex_adaptive_tryenter(lp)) 336*0Sstevel@tonic-gate break; 337*0Sstevel@tonic-gate continue; 338*0Sstevel@tonic-gate } 339*0Sstevel@tonic-gate 340*0Sstevel@tonic-gate if (owner == curthread) 341*0Sstevel@tonic-gate mutex_panic("recursive mutex_enter", lp); 342*0Sstevel@tonic-gate 343*0Sstevel@tonic-gate /* 344*0Sstevel@tonic-gate * If lock is held but owner is not yet set, spin. 345*0Sstevel@tonic-gate * (Only relevant for platforms that don't have cas.) 346*0Sstevel@tonic-gate */ 347*0Sstevel@tonic-gate if (owner == MUTEX_NO_OWNER) 348*0Sstevel@tonic-gate continue; 349*0Sstevel@tonic-gate 350*0Sstevel@tonic-gate /* 351*0Sstevel@tonic-gate * When searching the other CPUs, start with the one where 352*0Sstevel@tonic-gate * we last saw the owner thread. If owner is running, spin. 353*0Sstevel@tonic-gate * 354*0Sstevel@tonic-gate * We must disable preemption at this point to guarantee 355*0Sstevel@tonic-gate * that the list doesn't change while we traverse it 356*0Sstevel@tonic-gate * without the cpu_lock mutex. While preemption is 357*0Sstevel@tonic-gate * disabled, we must revalidate our cached cpu pointer. 358*0Sstevel@tonic-gate */ 359*0Sstevel@tonic-gate kpreempt_disable(); 360*0Sstevel@tonic-gate if (cpup->cpu_next == NULL) 361*0Sstevel@tonic-gate cpup = cpu_list; 362*0Sstevel@tonic-gate last_cpu = cpup; /* mark end of search */ 363*0Sstevel@tonic-gate do { 364*0Sstevel@tonic-gate if (cpup->cpu_thread == owner) { 365*0Sstevel@tonic-gate kpreempt_enable(); 366*0Sstevel@tonic-gate goto spin; 367*0Sstevel@tonic-gate } 368*0Sstevel@tonic-gate } while ((cpup = cpup->cpu_next) != last_cpu); 369*0Sstevel@tonic-gate kpreempt_enable(); 370*0Sstevel@tonic-gate 371*0Sstevel@tonic-gate /* 372*0Sstevel@tonic-gate * The owner appears not to be running, so block. 373*0Sstevel@tonic-gate * See the Big Theory Statement for memory ordering issues. 374*0Sstevel@tonic-gate */ 375*0Sstevel@tonic-gate ts = turnstile_lookup(lp); 376*0Sstevel@tonic-gate MUTEX_SET_WAITERS(lp); 377*0Sstevel@tonic-gate membar_enter(); 378*0Sstevel@tonic-gate 379*0Sstevel@tonic-gate /* 380*0Sstevel@tonic-gate * Recheck whether owner is running after waiters bit hits 381*0Sstevel@tonic-gate * global visibility (above). If owner is running, spin. 382*0Sstevel@tonic-gate * 383*0Sstevel@tonic-gate * Since we are at ipl DISP_LEVEL, kernel preemption is 384*0Sstevel@tonic-gate * disabled, however we still need to revalidate our cached 385*0Sstevel@tonic-gate * cpu pointer to make sure the cpu hasn't been deleted. 386*0Sstevel@tonic-gate */ 387*0Sstevel@tonic-gate if (cpup->cpu_next == NULL) 388*0Sstevel@tonic-gate last_cpu = cpup = cpu_list; 389*0Sstevel@tonic-gate do { 390*0Sstevel@tonic-gate if (cpup->cpu_thread == owner) { 391*0Sstevel@tonic-gate turnstile_exit(lp); 392*0Sstevel@tonic-gate goto spin; 393*0Sstevel@tonic-gate } 394*0Sstevel@tonic-gate } while ((cpup = cpup->cpu_next) != last_cpu); 395*0Sstevel@tonic-gate membar_consumer(); 396*0Sstevel@tonic-gate 397*0Sstevel@tonic-gate /* 398*0Sstevel@tonic-gate * If owner and waiters bit are unchanged, block. 399*0Sstevel@tonic-gate */ 400*0Sstevel@tonic-gate if (MUTEX_OWNER(vlp) == owner && MUTEX_HAS_WAITERS(vlp)) { 401*0Sstevel@tonic-gate sleep_time -= gethrtime(); 402*0Sstevel@tonic-gate (void) turnstile_block(ts, TS_WRITER_Q, lp, 403*0Sstevel@tonic-gate &mutex_sobj_ops, NULL, NULL); 404*0Sstevel@tonic-gate sleep_time += gethrtime(); 405*0Sstevel@tonic-gate } else { 406*0Sstevel@tonic-gate turnstile_exit(lp); 407*0Sstevel@tonic-gate } 408*0Sstevel@tonic-gate } 409*0Sstevel@tonic-gate 410*0Sstevel@tonic-gate ASSERT(MUTEX_OWNER(lp) == curthread); 411*0Sstevel@tonic-gate 412*0Sstevel@tonic-gate if (sleep_time == 0) { 413*0Sstevel@tonic-gate LOCKSTAT_RECORD(LS_MUTEX_ENTER_SPIN, lp, spin_count); 414*0Sstevel@tonic-gate } else { 415*0Sstevel@tonic-gate LOCKSTAT_RECORD(LS_MUTEX_ENTER_BLOCK, lp, sleep_time); 416*0Sstevel@tonic-gate } 417*0Sstevel@tonic-gate 418*0Sstevel@tonic-gate LOCKSTAT_RECORD0(LS_MUTEX_ENTER_ACQUIRE, lp); 419*0Sstevel@tonic-gate } 420*0Sstevel@tonic-gate 421*0Sstevel@tonic-gate /* 422*0Sstevel@tonic-gate * mutex_vector_tryenter() is called from the assembly mutex_tryenter() 423*0Sstevel@tonic-gate * routine if the lock is held or is not of type MUTEX_ADAPTIVE. 424*0Sstevel@tonic-gate */ 425*0Sstevel@tonic-gate int 426*0Sstevel@tonic-gate mutex_vector_tryenter(mutex_impl_t *lp) 427*0Sstevel@tonic-gate { 428*0Sstevel@tonic-gate int s; 429*0Sstevel@tonic-gate 430*0Sstevel@tonic-gate if (MUTEX_TYPE_ADAPTIVE(lp)) 431*0Sstevel@tonic-gate return (0); /* we already tried in assembly */ 432*0Sstevel@tonic-gate 433*0Sstevel@tonic-gate if (!MUTEX_TYPE_SPIN(lp)) { 434*0Sstevel@tonic-gate mutex_panic("mutex_tryenter: bad mutex", lp); 435*0Sstevel@tonic-gate return (0); 436*0Sstevel@tonic-gate } 437*0Sstevel@tonic-gate 438*0Sstevel@tonic-gate s = splr(lp->m_spin.m_minspl); 439*0Sstevel@tonic-gate if (lock_try(&lp->m_spin.m_spinlock)) { 440*0Sstevel@tonic-gate lp->m_spin.m_oldspl = (ushort_t)s; 441*0Sstevel@tonic-gate return (1); 442*0Sstevel@tonic-gate } 443*0Sstevel@tonic-gate splx(s); 444*0Sstevel@tonic-gate return (0); 445*0Sstevel@tonic-gate } 446*0Sstevel@tonic-gate 447*0Sstevel@tonic-gate /* 448*0Sstevel@tonic-gate * mutex_vector_exit() is called from mutex_exit() if the lock is not 449*0Sstevel@tonic-gate * adaptive, has waiters, or is not owned by the current thread (panic). 450*0Sstevel@tonic-gate */ 451*0Sstevel@tonic-gate void 452*0Sstevel@tonic-gate mutex_vector_exit(mutex_impl_t *lp) 453*0Sstevel@tonic-gate { 454*0Sstevel@tonic-gate turnstile_t *ts; 455*0Sstevel@tonic-gate 456*0Sstevel@tonic-gate if (MUTEX_TYPE_SPIN(lp)) { 457*0Sstevel@tonic-gate lock_clear_splx(&lp->m_spin.m_spinlock, lp->m_spin.m_oldspl); 458*0Sstevel@tonic-gate return; 459*0Sstevel@tonic-gate } 460*0Sstevel@tonic-gate 461*0Sstevel@tonic-gate if (MUTEX_OWNER(lp) != curthread) { 462*0Sstevel@tonic-gate mutex_panic("mutex_exit: not owner", lp); 463*0Sstevel@tonic-gate return; 464*0Sstevel@tonic-gate } 465*0Sstevel@tonic-gate 466*0Sstevel@tonic-gate ts = turnstile_lookup(lp); 467*0Sstevel@tonic-gate MUTEX_CLEAR_LOCK_AND_WAITERS(lp); 468*0Sstevel@tonic-gate if (ts == NULL) 469*0Sstevel@tonic-gate turnstile_exit(lp); 470*0Sstevel@tonic-gate else 471*0Sstevel@tonic-gate turnstile_wakeup(ts, TS_WRITER_Q, ts->ts_waiters, NULL); 472*0Sstevel@tonic-gate LOCKSTAT_RECORD0(LS_MUTEX_EXIT_RELEASE, lp); 473*0Sstevel@tonic-gate } 474*0Sstevel@tonic-gate 475*0Sstevel@tonic-gate int 476*0Sstevel@tonic-gate mutex_owned(kmutex_t *mp) 477*0Sstevel@tonic-gate { 478*0Sstevel@tonic-gate mutex_impl_t *lp = (mutex_impl_t *)mp; 479*0Sstevel@tonic-gate 480*0Sstevel@tonic-gate if (panicstr) 481*0Sstevel@tonic-gate return (1); 482*0Sstevel@tonic-gate 483*0Sstevel@tonic-gate if (MUTEX_TYPE_ADAPTIVE(lp)) 484*0Sstevel@tonic-gate return (MUTEX_OWNER(lp) == curthread); 485*0Sstevel@tonic-gate return (LOCK_HELD(&lp->m_spin.m_spinlock)); 486*0Sstevel@tonic-gate } 487*0Sstevel@tonic-gate 488*0Sstevel@tonic-gate kthread_t * 489*0Sstevel@tonic-gate mutex_owner(kmutex_t *mp) 490*0Sstevel@tonic-gate { 491*0Sstevel@tonic-gate mutex_impl_t *lp = (mutex_impl_t *)mp; 492*0Sstevel@tonic-gate kthread_id_t t; 493*0Sstevel@tonic-gate 494*0Sstevel@tonic-gate if (MUTEX_TYPE_ADAPTIVE(lp) && (t = MUTEX_OWNER(lp)) != MUTEX_NO_OWNER) 495*0Sstevel@tonic-gate return (t); 496*0Sstevel@tonic-gate return (NULL); 497*0Sstevel@tonic-gate } 498*0Sstevel@tonic-gate 499*0Sstevel@tonic-gate /* 500*0Sstevel@tonic-gate * The iblock cookie 'ibc' is the spl level associated with the lock; 501*0Sstevel@tonic-gate * this alone determines whether the lock will be ADAPTIVE or SPIN. 502*0Sstevel@tonic-gate * 503*0Sstevel@tonic-gate * Adaptive mutexes created in zeroed memory do not need to call 504*0Sstevel@tonic-gate * mutex_init() as their allocation in this fashion guarantees 505*0Sstevel@tonic-gate * their initialization. 506*0Sstevel@tonic-gate * eg adaptive mutexes created as static within the BSS or allocated 507*0Sstevel@tonic-gate * by kmem_zalloc(). 508*0Sstevel@tonic-gate */ 509*0Sstevel@tonic-gate /* ARGSUSED */ 510*0Sstevel@tonic-gate void 511*0Sstevel@tonic-gate mutex_init(kmutex_t *mp, char *name, kmutex_type_t type, void *ibc) 512*0Sstevel@tonic-gate { 513*0Sstevel@tonic-gate mutex_impl_t *lp = (mutex_impl_t *)mp; 514*0Sstevel@tonic-gate 515*0Sstevel@tonic-gate ASSERT(ibc < (void *)KERNELBASE); /* see 1215173 */ 516*0Sstevel@tonic-gate 517*0Sstevel@tonic-gate if ((intptr_t)ibc > ipltospl(LOCK_LEVEL) && ibc < (void *)KERNELBASE) { 518*0Sstevel@tonic-gate ASSERT(type != MUTEX_ADAPTIVE && type != MUTEX_DEFAULT); 519*0Sstevel@tonic-gate MUTEX_SET_TYPE(lp, MUTEX_SPIN); 520*0Sstevel@tonic-gate LOCK_INIT_CLEAR(&lp->m_spin.m_spinlock); 521*0Sstevel@tonic-gate LOCK_INIT_HELD(&lp->m_spin.m_dummylock); 522*0Sstevel@tonic-gate lp->m_spin.m_minspl = (int)(intptr_t)ibc; 523*0Sstevel@tonic-gate } else { 524*0Sstevel@tonic-gate ASSERT(type != MUTEX_SPIN); 525*0Sstevel@tonic-gate MUTEX_SET_TYPE(lp, MUTEX_ADAPTIVE); 526*0Sstevel@tonic-gate MUTEX_CLEAR_LOCK_AND_WAITERS(lp); 527*0Sstevel@tonic-gate } 528*0Sstevel@tonic-gate } 529*0Sstevel@tonic-gate 530*0Sstevel@tonic-gate void 531*0Sstevel@tonic-gate mutex_destroy(kmutex_t *mp) 532*0Sstevel@tonic-gate { 533*0Sstevel@tonic-gate mutex_impl_t *lp = (mutex_impl_t *)mp; 534*0Sstevel@tonic-gate 535*0Sstevel@tonic-gate if (lp->m_owner == 0 && !MUTEX_HAS_WAITERS(lp)) { 536*0Sstevel@tonic-gate MUTEX_DESTROY(lp); 537*0Sstevel@tonic-gate } else if (MUTEX_TYPE_SPIN(lp)) { 538*0Sstevel@tonic-gate LOCKSTAT_RECORD0(LS_MUTEX_DESTROY_RELEASE, lp); 539*0Sstevel@tonic-gate MUTEX_DESTROY(lp); 540*0Sstevel@tonic-gate } else if (MUTEX_TYPE_ADAPTIVE(lp)) { 541*0Sstevel@tonic-gate LOCKSTAT_RECORD0(LS_MUTEX_DESTROY_RELEASE, lp); 542*0Sstevel@tonic-gate if (MUTEX_OWNER(lp) != curthread) 543*0Sstevel@tonic-gate mutex_panic("mutex_destroy: not owner", lp); 544*0Sstevel@tonic-gate if (MUTEX_HAS_WAITERS(lp)) { 545*0Sstevel@tonic-gate turnstile_t *ts = turnstile_lookup(lp); 546*0Sstevel@tonic-gate turnstile_exit(lp); 547*0Sstevel@tonic-gate if (ts != NULL) 548*0Sstevel@tonic-gate mutex_panic("mutex_destroy: has waiters", lp); 549*0Sstevel@tonic-gate } 550*0Sstevel@tonic-gate MUTEX_DESTROY(lp); 551*0Sstevel@tonic-gate } else { 552*0Sstevel@tonic-gate mutex_panic("mutex_destroy: bad mutex", lp); 553*0Sstevel@tonic-gate } 554*0Sstevel@tonic-gate } 555*0Sstevel@tonic-gate 556*0Sstevel@tonic-gate /* 557*0Sstevel@tonic-gate * Simple C support for the cases where spin locks miss on the first try. 558*0Sstevel@tonic-gate */ 559*0Sstevel@tonic-gate void 560*0Sstevel@tonic-gate lock_set_spin(lock_t *lp) 561*0Sstevel@tonic-gate { 562*0Sstevel@tonic-gate int spin_count = 1; 563*0Sstevel@tonic-gate int backoff; /* current backoff */ 564*0Sstevel@tonic-gate int backctr; /* ctr for backoff */ 565*0Sstevel@tonic-gate 566*0Sstevel@tonic-gate if (panicstr) 567*0Sstevel@tonic-gate return; 568*0Sstevel@tonic-gate 569*0Sstevel@tonic-gate if (ncpus == 1) 570*0Sstevel@tonic-gate panic("lock_set: %p lock held and only one CPU", lp); 571*0Sstevel@tonic-gate 572*0Sstevel@tonic-gate backoff = BACKOFF_BASE; 573*0Sstevel@tonic-gate while (LOCK_HELD(lp) || !lock_spin_try(lp)) { 574*0Sstevel@tonic-gate if (panicstr) 575*0Sstevel@tonic-gate return; 576*0Sstevel@tonic-gate spin_count++; 577*0Sstevel@tonic-gate /* 578*0Sstevel@tonic-gate * Add an exponential backoff delay before trying again 579*0Sstevel@tonic-gate * to touch the mutex data structure. 580*0Sstevel@tonic-gate * the spin_count test and call to nulldev are to prevent 581*0Sstevel@tonic-gate * the compiler optimizer from eliminating the delay loop. 582*0Sstevel@tonic-gate */ 583*0Sstevel@tonic-gate for (backctr = backoff; backctr; backctr--) { /* delay */ 584*0Sstevel@tonic-gate if (!spin_count) (void) nulldev(); 585*0Sstevel@tonic-gate } 586*0Sstevel@tonic-gate 587*0Sstevel@tonic-gate backoff = backoff << 1; /* double it */ 588*0Sstevel@tonic-gate if (backoff > BACKOFF_CAP) { 589*0Sstevel@tonic-gate backoff = BACKOFF_CAP; 590*0Sstevel@tonic-gate } 591*0Sstevel@tonic-gate SMT_PAUSE(); 592*0Sstevel@tonic-gate } 593*0Sstevel@tonic-gate 594*0Sstevel@tonic-gate if (spin_count) { 595*0Sstevel@tonic-gate LOCKSTAT_RECORD(LS_LOCK_SET_SPIN, lp, spin_count); 596*0Sstevel@tonic-gate } 597*0Sstevel@tonic-gate 598*0Sstevel@tonic-gate LOCKSTAT_RECORD0(LS_LOCK_SET_ACQUIRE, lp); 599*0Sstevel@tonic-gate } 600*0Sstevel@tonic-gate 601*0Sstevel@tonic-gate void 602*0Sstevel@tonic-gate lock_set_spl_spin(lock_t *lp, int new_pil, ushort_t *old_pil_addr, int old_pil) 603*0Sstevel@tonic-gate { 604*0Sstevel@tonic-gate int spin_count = 1; 605*0Sstevel@tonic-gate int backoff; /* current backoff */ 606*0Sstevel@tonic-gate int backctr; /* ctr for backoff */ 607*0Sstevel@tonic-gate 608*0Sstevel@tonic-gate if (panicstr) 609*0Sstevel@tonic-gate return; 610*0Sstevel@tonic-gate 611*0Sstevel@tonic-gate if (ncpus == 1) 612*0Sstevel@tonic-gate panic("lock_set_spl: %p lock held and only one CPU", lp); 613*0Sstevel@tonic-gate 614*0Sstevel@tonic-gate ASSERT(new_pil > LOCK_LEVEL); 615*0Sstevel@tonic-gate 616*0Sstevel@tonic-gate backoff = BACKOFF_BASE; 617*0Sstevel@tonic-gate do { 618*0Sstevel@tonic-gate splx(old_pil); 619*0Sstevel@tonic-gate while (LOCK_HELD(lp)) { 620*0Sstevel@tonic-gate if (panicstr) { 621*0Sstevel@tonic-gate *old_pil_addr = (ushort_t)splr(new_pil); 622*0Sstevel@tonic-gate return; 623*0Sstevel@tonic-gate } 624*0Sstevel@tonic-gate spin_count++; 625*0Sstevel@tonic-gate /* 626*0Sstevel@tonic-gate * Add an exponential backoff delay before trying again 627*0Sstevel@tonic-gate * to touch the mutex data structure. 628*0Sstevel@tonic-gate * spin_count test and call to nulldev are to prevent 629*0Sstevel@tonic-gate * compiler optimizer from eliminating the delay loop. 630*0Sstevel@tonic-gate */ 631*0Sstevel@tonic-gate for (backctr = backoff; backctr; backctr--) { 632*0Sstevel@tonic-gate if (!spin_count) (void) nulldev(); 633*0Sstevel@tonic-gate } 634*0Sstevel@tonic-gate backoff = backoff << 1; /* double it */ 635*0Sstevel@tonic-gate if (backoff > BACKOFF_CAP) { 636*0Sstevel@tonic-gate backoff = BACKOFF_CAP; 637*0Sstevel@tonic-gate } 638*0Sstevel@tonic-gate 639*0Sstevel@tonic-gate SMT_PAUSE(); 640*0Sstevel@tonic-gate } 641*0Sstevel@tonic-gate old_pil = splr(new_pil); 642*0Sstevel@tonic-gate } while (!lock_spin_try(lp)); 643*0Sstevel@tonic-gate 644*0Sstevel@tonic-gate *old_pil_addr = (ushort_t)old_pil; 645*0Sstevel@tonic-gate 646*0Sstevel@tonic-gate if (spin_count) { 647*0Sstevel@tonic-gate LOCKSTAT_RECORD(LS_LOCK_SET_SPL_SPIN, lp, spin_count); 648*0Sstevel@tonic-gate } 649*0Sstevel@tonic-gate 650*0Sstevel@tonic-gate LOCKSTAT_RECORD(LS_LOCK_SET_SPL_ACQUIRE, lp, spin_count); 651*0Sstevel@tonic-gate } 652