1*63559SmckusickFollowing are some observations about the the BSD hp300 pmap module that 2*63559Smckusickmay prove useful for other pmap modules: 3*63559Smckusick 4*63559Smckusick1. pmap_remove should be efficient with large, sparsely populated ranges. 5*63559Smckusick 6*63559Smckusick Profiling of exec/exit intensive work loads showed that much time was 7*63559Smckusick being spent in pmap_remove. This was primarily due to calls from exec 8*63559Smckusick when deallocating the stack segment. Since the current implementation 9*63559Smckusick of the stack is to "lazy allocate" the maximum possible stack size 10*63559Smckusick (typically 16-32mb) when the process is created, pmap_remove will be 11*63559Smckusick called with a large chunk of largely empty address space. It is 12*63559Smckusick important that this routine be able to quickly skip over large chunks 13*63559Smckusick of allocated but unpopulated VA space. The hp300 pmap module did check 14*63559Smckusick for unpopulated "segments" (which map 4mb chunks) and skipped them fairly 15*63559Smckusick efficiently but once it found a valid segment descriptor (STE), it rather 16*63559Smckusick clumsily moved forward over the PTEs mapping that segment. Particularly 17*63559Smckusick bad was that for every PTE it would recheck that the STE was valid even 18*63559Smckusick though we should already know that. 19*63559Smckusick 20*63559Smckusick pmap_protect can benefit from similar optimizations though it is 21*63559Smckusick (currently) not called with large regions. 22*63559Smckusick 23*63559Smckusick Another solution would be to change the way stack allocation is done 24*63559Smckusick (i.e. don't preallocate the entire address range) but I think it is 25*63559Smckusick important to be able to efficiently support such large, spare ranges 26*63559Smckusick that might show up in other applications (e.g. a randomly accessed 27*63559Smckusick large mapped file). 28*63559Smckusick 29*63559Smckusick2. Bit operations (i.e. ~,&,|) are more efficient than bitfields. 30*63559Smckusick 31*63559Smckusick This is a 68k/gcc issue, but if you are trying to squeeze out maximum 32*63559Smckusick performance... 33*63559Smckusick 34*63559Smckusick3. Don't flush TLB/caches for inactive mappings. 35*63559Smckusick 36*63559Smckusick On the hp300 the TLBs are either designed as, or used in such a way that, 37*63559Smckusick they are flushed on every context switch (i.e. there are no "process 38*63559Smckusick tags") Hence, doing TLB flushes on mappings that aren't associated with 39*63559Smckusick either the kernel or the currently running process are a waste. Seems 40*63559Smckusick pretty obvious but I missed it for many years. An analogous argument 41*63559Smckusick applies to flushing untagged virtually addressed caches (ala the 320/350). 42