1.\" $OpenBSD: pctr.4,v 1.7 2013/07/16 16:05:49 schwarze Exp $ 2.\" 3.\" Pentium performance counter driver for OpenBSD. 4.\" Copyright 1996 David Mazieres <dm@lcs.mit.edu>. 5.\" 6.\" Modification and redistribution in source and binary forms is 7.\" permitted provided that due credit is given to the author and the 8.\" OpenBSD project by leaving this copyright notice intact. 9.\" 10.Dd $Mdocdate: July 16 2013 $ 11.Dt PCTR 4 amd64 12.Os 13.Sh NAME 14.Nm pctr 15.Nd driver for CPU performance counters 16.Sh SYNOPSIS 17.Cd "pseudo-device pctr 1" 18.Sh DESCRIPTION 19The 20.Nm 21device provides access to the performance counters on AMD and Intel brand 22processors, and to the TSC on others. 23.Pp 24Intel processors have two 40-bit performance counters which can be 25programmed to count events such as cache misses, branch target buffer hits, 26TLB misses, dual-issues, interrupts, pipeline flushes, and more. 27While AMD processors have four 48-bit counters, their precision is decreased 28to 40 bits. 29.Pp 30There is one 31.Em ioctl 32call to read the status of all counters, and one 33.Em ioctl 34call to program the function of each counter. 35All require the following includes: 36.Bd -literal -offset indent 37#include <sys/types.h> 38#include <machine/cpu.h> 39#include <machine/pctr.h> 40.Ed 41.Pp 42The current state of all counters can be read with the 43.Dv PCIOCRD 44.Em ioctl , 45which takes an argument of type 46.Dv "struct pctrst" : 47.Bd -literal -offset indent 48#define PCTR_NUM 4 49struct pctrst { 50 u_int pctr_fn[PCTR_NUM]; 51 pctrval pctr_tsc; 52 pctrval pctr_hwc[PCTR_NUM]; 53}; 54.Ed 55.Pp 56In this structure, 57.Em ctr_fn 58contains the functions of the counters, as previously set by the 59.Dv PCIOCS0 , 60.Dv PCIOCS1 , 61.Dv PCIOCS2 62and 63.Dv PCIOCS3 64ioctls (see below). 65.Em pctr_hwc 66contains the actual value of the hardware counters. 67.Em pctr_tsc 68is a free-running, 64-bit cycle counter. 69.Pp 70The functions of the counters can be programmed with ioctls 71.Dv PCIOCS0 , 72.Dv PCIOCS1 , 73.Dv PCIOCS2 74and 75.Dv PCIOCS3 76which require a writeable file descriptor and take an argument of type 77.Dv "unsigned int" . \& 78The meaning of this integer is dependent on the particular CPU. 79.Ss Time stamp counter 80The time stamp counter is available on most of the AMD and Intel CPUs. 81It is set to zero at boot time, and then increments with each cycle. 82Because the counter is 64-bits wide, it does not overflow. 83.Pp 84The time stamp counter can be read directly from user-mode using 85the 86.Fn rdtsc 87macro, which returns a 64-bit value of type 88.Em pctrval . 89The following example illustrates a simple use of 90.Fn rdtsc 91to measure the execution time of a hypothetical subroutine called 92.Fn functionx : 93.Bd -literal -offset indent 94void 95time_functionx(void) 96{ 97 pctrval tsc; 98 99 tsc = rdtsc(); 100 functionx(); 101 tsc = rdtsc() - tsc; 102 printf("Functionx took %llu cycles.\en", tsc); 103} 104.Ed 105.Pp 106The value of the time stamp counter is also returned by the 107.Dv PCIOCRD 108.Em ioctl , 109so that one can get an exact timestamp on readings of the hardware 110event counters. 111.Pp 112The performance counters can be read directly from user-mode without 113need to invoke the kernel. 114The macro 115.Fn rdpmc ctr 116takes 0, 1, 2 or 3 as an argument to specify a counter, and returns that 117counter's 40-bit value (which will be of type 118.Em pctrval ) . 119This is generally preferable to making a system call as it introduces 120less distortion in measurements. 121.Pp 122Counter functions supported by these CPUs contain several parts. 123The most significant byte (an 8-bit integer shifted left by 124.Dv PCTR_CM_SHIFT ) 125contains a 126.Em "counter mask" . 127If non-zero, this sets a threshold for the number of times an event 128must occur in one cycle for the counter to be incremented. 129The 130.Em "counter mask" 131can therefore be used to count cycles in which an event 132occurs at least some number of times. 133The next byte contains several flags: 134.Bl -tag -width PCTR_EN 135.It Dv PCTR_U 136Enables counting of events that occur in user mode. 137.It Dv PCTR_K 138Enables counting of events that occur in kernel mode. 139You must set at least one of 140.Dv PCTR_K 141and 142.Dv PCTR_U 143to count anything. 144.It Dv PCTR_E 145Counts edges rather than cycles. 146For some functions this allows you 147to get an estimate of the number of events rather than the number of 148cycles occupied by those events. 149.It Dv PCTR_EN 150Enable counters. 151This bit must be set in the function for counter 0 152in order for either of the counters to be enabled. 153This bit should probably be set in counter 1 as well. 154.It Dv PCTR_I 155Inverts the sense of the 156.Em "counter mask" . \& 157When this bit is set, the counter only increments on cycles in which 158there are no 159.Em more 160events than specified in the 161.Em "counter mask" . 162.El 163.Pp 164The next byte (shifted left by the 165.Dv PCTR_UM_SHIFT ) 166contains flags specific to the event being counted, also known as the 167.Em "unit mask" . 168.Pp 169For events dealing with the L2 cache, the following flags are valid 170on Intel brand processors: 171.Bl -tag -width PCTR_UM_M 172.It Dv PCTR_UM_M 173Count events involving modified cache coherency state lines. 174.It Dv PCTR_UM_E 175Count events involving exclusive cache coherency state lines. 176.It Dv PCTR_UM_S 177Count events involving shared cache coherency state lines. 178.It Dv PCTR_UM_I 179Count events involving invalid cache coherency state lines. 180.El 181.Pp 182To measure all L2 cache activity, all these bits should be set. 183They can be set with the macro 184.Dv PCTR_UM_MESI 185which contains the bitwise or of all of the above. 186.Pp 187For event types dealing with bus transactions, there is another flag 188that can be set in the 189.Em "unit mask" : 190.Bl -tag -width PCTR_UM_A 191.It Dv PCTR_UM_A 192Count all appropriate bus events, not just those initiated by the 193processor. 194.El 195.Pp 196Events marked 197.Em (MESI) 198require the 199.Dv PCTR_UM_[MESI] 200bits in the 201.Em "unit mask" . \& 202Events marked 203.Em (A) 204can take the 205.Dv PCTR_UM_A 206bit. 207.Pp 208Finally, the least significant byte of the counter function is the 209event type to count. 210A list of possible event functions could be obtained by running a 211.Xr pctr 1 212command with 213.Fl l 214option. 215.Sh FILES 216.Bl -tag -width /dev/pctr -compact 217.It Pa /dev/pctr 218.El 219.Sh ERRORS 220.Bl -tag -width "[ENODEV]" 221.It Bq Er ENODEV 222An attempt was made to set the counter functions on a CPU that does 223not support counters. 224.It Bq Er EINVAL 225An invalid counter function was provided as an argument to the 226.Dv PCIOCSx 227.Em ioctl . 228.It Bq Er EPERM 229An attempt was made to set the counter functions, but the device was 230not open for writing. 231.El 232.Sh SEE ALSO 233.Xr pctr 1 , 234.Xr ioctl 2 235.Sh HISTORY 236A 237.Nm 238device first appeared in 239.Ox 2.0 . 240Support for amd64 architecture appeared in 241.Ox 4.3 . 242.Sh AUTHORS 243.An -nosplit 244The 245.Nm 246device was written by 247.An David Mazieres Aq Mt dm@lcs.mit.edu . 248Support for amd64 architecture was written by 249.An Mike Belopuhov Aq Mt mikeb@openbsd.org . 250.Sh BUGS 251Not all counter functions are completely accurate. 252Some of the functions may not make any sense at all. 253Also you should be aware of the possibility of an interrupt between 254invocations of 255.Fn rdpmc 256and/or 257.Fn rdtsc 258that can potentially decrease the accuracy of measurements. 259