xref: /csrg-svn/sys/hp300/DOC/Pmap.notes (revision 63561)
1*63559SmckusickFollowing are some observations about the the BSD hp300 pmap module that
2*63559Smckusickmay prove useful for other pmap modules:
3*63559Smckusick
4*63559Smckusick1. pmap_remove should be efficient with large, sparsely populated ranges.
5*63559Smckusick
6*63559Smckusick   Profiling of exec/exit intensive work loads showed that much time was
7*63559Smckusick   being spent in pmap_remove.  This was primarily due to calls from exec
8*63559Smckusick   when deallocating the stack segment.  Since the current implementation
9*63559Smckusick   of the stack is to "lazy allocate" the maximum possible stack size
10*63559Smckusick   (typically 16-32mb) when the process is created, pmap_remove will be
11*63559Smckusick   called with a large chunk of largely empty address space.  It is
12*63559Smckusick   important that this routine be able to quickly skip over large chunks
13*63559Smckusick   of allocated but unpopulated VA space.  The hp300 pmap module did check
14*63559Smckusick   for unpopulated "segments" (which map 4mb chunks) and skipped them fairly
15*63559Smckusick   efficiently but once it found a valid segment descriptor (STE), it rather
16*63559Smckusick   clumsily moved forward over the PTEs mapping that segment.  Particularly
17*63559Smckusick   bad was that for every PTE it would recheck that the STE was valid even
18*63559Smckusick   though we should already know that.
19*63559Smckusick
20*63559Smckusick   pmap_protect can benefit from similar optimizations though it is
21*63559Smckusick   (currently) not called with large regions.
22*63559Smckusick
23*63559Smckusick   Another solution would be to change the way stack allocation is done
24*63559Smckusick   (i.e. don't preallocate the entire address range) but I think it is
25*63559Smckusick   important to be able to efficiently support such large, spare ranges
26*63559Smckusick   that might show up in other applications (e.g. a randomly accessed
27*63559Smckusick   large mapped file).
28*63559Smckusick
29*63559Smckusick2. Bit operations (i.e. ~,&,|) are more efficient than bitfields.
30*63559Smckusick
31*63559Smckusick   This is a 68k/gcc issue, but if you are trying to squeeze out maximum
32*63559Smckusick   performance...
33*63559Smckusick
34*63559Smckusick3. Don't flush TLB/caches for inactive mappings.
35*63559Smckusick
36*63559Smckusick   On the hp300 the TLBs are either designed as, or used in such a way that,
37*63559Smckusick   they are flushed on every context switch (i.e. there are no "process
38*63559Smckusick   tags")  Hence, doing TLB flushes on mappings that aren't associated with
39*63559Smckusick   either the kernel or the currently running process are a waste.  Seems
40*63559Smckusick   pretty obvious but I missed it for many years.  An analogous argument
41*63559Smckusick   applies to flushing untagged virtually addressed caches (ala the 320/350).
42