# Operating System – Page III

• Structure of the Page Table

The most common techniques for structuring the page table

• Hierarchical Paging

Most modern computer systems support a large logical-address space. In such an environment, the page table itself becomes excessively large. The one page table already isn’t enough for so large space.

One simple solution to this problem is to divide the page table into smaller pieces. There are several ways to accomplish this division.

One way is to use a two-level paging algorithm, in which the page table itself is also paged.

• forward-mapped page table

A page is divided into two page. Using a page to find other page. The address translation works from the outer page table inward.

The Pentium-II uses this architecture.

• section

The VAX architecture also support a variation of two-level paging. The VAX is 32-bit machine with page size of 512 bytes.

The logical-address of a process is divided into four equal sections, each of which consists of 230 bytes.

Each section represents a different part of the logical-address space of a process. The first 2 high-order bits of the logical address designate the appropriate section.

By partitioning the page table in this manner, the operating system can leave partitions unused until a process needs them.

Disadvantage

For 64-bit : For a system with a 64-bit logical-address space, a two-level paging scheme is no longer appropriate. The outer page will become large. The obvious method to avoid such a larger table is to divide the outer page table into smaller pieces. This approach is also used on some 32-bit processors for added flexibility and efficiency.

However, for 64-bit architectures, hierarchical page tables are generally considered inappropriate.

• Hashed Pag Tables

A common approach for handling address spaces larger than 32 bits is to use a hased page table, with the hash value being the virtual-page number. Each entry in the hash table contains a linked list of elements that hash to the same location (to handle collisions).

Each element consists of three fields:

1. the virtual page number
2. the value of the mapped page frame
3. a pointer to the next element in the linked list.

Algorithm

The virtual page number in the virtual address is hased into the hash table.

The virtual page number is compared to field(1) [the virtual page number] in the first element in the linked list.

If there is a match, the corresponding page frame( field(2) [the value of the mapped page frame] ) is used to form the desired physical address.

If there is no match, subsequent entries in the linked list are searched for a matching virtual page number.

Clustered page tables

A variation to this scheme that is favorable for 64-bit address spaces has been proposed.

Clustered page tables are similar to hased page tables except that each entry in the hash table refers to several pages (such as 16) rather than a single page. Therefore, a signle page-table entry can store the mappings for multiple physical-page frame.

Clustered page tables are particularly useful for sparse address spaces where memory reference are nocontiguous and scattered throughout the address space.

• Inverted Page Table
• Face the problem

The table is sorted by virtual address, the operating system is able to calculate where in the table the associated physical-address entry is, and to use that value directly.

One of the drawbacks of this method is that each page table may consist of millions of entries. These tables may consume large amounts of physical memory, which is required just to keep track of how the other physical memory is being used.

• Inverted Page Table

To solve this problem, we can use an inverted page table.

An inverted page table has one entry for each real page (or frame) of memory. Each entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. Thus, only one page table is in the system, and it has only one entry for each page of physical memory.

Because only one page table is in the system yet there are usually several different address spaces mapping physical memory, inverted page tables often require an address-space identifier stored in each entry of the page table. Storing the address-space identifier ensures the mapping of a logical page for a particular process to the corresponding physical page frame.

The system used inverted page tables include the 64-bit UltraSPARC and PowerPC

• Simple virtual memory address
<process-id, page-page, offset> Each inverted page-table entry is a pair <process-id, page-number> where the process-id assumes the role of the address-space identifier.

When a memory reference occurs, part of the virtual address, consisting of <process-id, page-number >, is presented to the memory subsystem.

The inverted page table is then searched for a match.
If a match is found – say, at entry i – then the physical address <i, offset> is generated.
If no match is found, then an illegal address access as been attempted.
• The problem

Although this scheme decreases the amount of memory needed to store each page table, it increases the amount of time needed to search the table when a page reference occurs.

Because the inverted page is sorted by a physical address, but lookups occur on virtual addresses, the whole table might need to be searched for a match. This search would take far too long.

• The improvement for this problem

To alleviate this problem, we use a hash table to limit the search to one – or at most a few – page-table entries.

Of course, each access to the hash table adds a memory reference to the procedure, so one virtual-memory reference require at least two real-memory reads:
one for the hash-table entry
one for the page table

To improve performance, recall that the TLB is search first, before the hash table is consulted.

• Shared Pages

The sharing common code is particularly important in a time-sharing environment.

If the code is reentrant code, it can be shared. Reentrant code (or pure code) is non-self-modifying code. If the code is reentrant, then it never changes during execution. Thus, two or more processes can execute the same code at the same time. Each process, has own copy of registers and data storage to hold the data for the process’ execution. The data for two different processes will, of course, vary for each process.

Other heavily used programs can also be shared – compilers, window system, run-time libraries, database system, and so on.

To be sharable, the code must be reentrant. The read-only nature of shared code should not be left to the correctness of the code; operating system should enforce this property.

This sharing of memory among processes on a system is similar to the sharing of the address space of a task by threads.

Some operating systems implement shared memory using shared pages.

Systems that use inverted page tables have difficulty implementing shared memory.
Shared memory is usually implemented as multiple virtual addresses (one for each process sharing the memory) that are mapped to one physical address.
This standard method cannot be used, however, as there is only one virtual page entry for every physical page, so one physical page cannot have two (or more) shared virtual addresses.

Organizing memory according to pages provides numerous other benefits in addition to allowing several processes to share the same physical pages.

# Operating System – Page II

• Protection

Memory protection in a paged environment is accomplished by protection bits that are associated with each frame.

• The one bit

Normally, these bits are kept in the page table. One bit can define a page to be read-write or read-only.

The protection bits can be checked to verify that no writes are being made to a read-only page. An attempt to write to a read-only page causes a hardware trap to the operating system (or memory-protection violation)

• Expand one bit

Providing a finer level of protection. We can create hardware to provide read-only, read-write, or execute-only protection. Or, by providing separate protection bits for each kind of access, we can allow any combination of these access; illegal attempts will be trapped to operating system.

• One more bits

One more bits is generally attached to each entry in the page table: a valid-invalid bit.

When this bit is set to “valid”, this value indicates that the associated page is in the process’ logical-address space, and is thus a legal (or valid) page.

If the bit is set to “invalid”, this value indicates that the page is not in the process’s logical-address space. Illegal address are trapped by using the valid-invalid bit.

The operating system sets this bit for each page to allow or disallow accesses to that page.

• page-table length register

Raely does a process use all its address range. In fact, many processes use only a small faction of the address space available to them. It would be wasteful in these causes to create a page with entries for every page in the address range. Most of this table would be unused, but would take up valuable memory space.

Some system provide hardware, in the form of a page-table length register (PTLR), to indicate the size of the page table. This value is checked against every logical address to verify that the address is in the valid range for the process. Failure of this test causes an error trap to the operating system.

# Operating System – Page I

The memory-management scheme that permits the physical-address space of a process to be noncontiguous. Paging avoids the considerable problem of fitting the varying-sized memory chunks onto the backing store.

The paging in its various forms is commonly used in most operating systems.

• Frames Vs. Paging
• Frame
Physical memory is broken into fixed-sized blocks called frame
• Pages
Logical memory is broken into blocks of the same size called pages

When a process is to executed, it’s pages are loaded into any available memory frames from the blocking store. The backing store is divided into fixed-sized blocks that are of the same size as the memory frames.

Paging itself is a form of dynamic relocation. Every logical address is bound by the paging hardware to some physical address. Using paging is similar to using a table of base (or relocation) register, one for each frame of memory.

• Page Table
Every address generated by the CPU is divided into two parts: a page number p and a page offset d
• page number

The page number is used as an index into a page table.

page table contains the base address of each page in physical memory. The base address is combined with the page offset to define the physical memory address that is sent to the memory unit.

• page size

The page size (like the frame size) is defined by the hardware. The size of a page is typically a power of 2, varying between 512 bytes and 16 MB per-page, depending on the computer architecture.The high-order m-n bits of a logical address designed the page number, and the n low-order bits designed the page offset.If process size is independent of page size, we expect internal fragmentation to average one-half page per process. This consideration suggests that small page sizes are desirable.However, overhead is involved in each page-table entry, and this overhead is reduced as the size of the pages increases. Also, disk I/O is more efficient when the number of data being transferred is larger.

Generally, page sizes have grown over time as processes, data sets, and main memory have become larger. Today pages typically are between 4KB and 8KB, and some systems support even larger page size.

Each page-table entry is usually 4 bytes long, but that size can vary as well.

overhead: Work or information that provides support – possible critical support – for a computing process but is not an intrinsic part of the operation or data. Overhead often adds to processing time but is generally necessary.

• Advantage and disadvantage
Advantage
no external fragmentation
When we use a paging scheme, we have no external fragmentation:
Any free frame can be allocated to a process that needs it.

Disadvantage
have internal fragmentation
We may have some internal fragmentation. Notice that frames are allocated as units. If the memory requirements of a process do not happen to fall on page boundaries, the last frame allocated may not be completely full.
In the worst case, a process would need n pages plus one byte. It would be allocated n+1 frames, resulting in an internal fragmentation of almost an entire frame ( extra need a frame, only add the 1 byte)

• The produce from the page to frame

When a process arrives in the system to be executed, its size, expressed in pages, is examined.Each page of the process needs one frame. Thus if the process require n pages, at least n frames must be available in memory. If n frames are available, they are allocated to this arriving process.The fist page of the process is loaded into one of the allocated frames, and the frame number is put in the page table for this process. The next page is loaded into another frame, and its frame number is put into the page table

• User aspect and operating system aspect

User aspect
The user program views that memory as one single contiguous space, containing only this one program.

System aspect
System view the actual physical memory. In fact, the user program is scattered throughout physical memory, which also holds other programs.

The difference between the user’s view of memory and the actual physical memory is reconciled by the address-translation hardware. The logical addresses are translated into physical address.
This mapping is hidden from the user and is controlled by the operating system.

The user process by definition is unable to access memory it does not own. It has no way of addressing memory outside of its page table, and the table includes only those pages that the process owns.

• The things managed by operating system
• Frame table
The frame table has one entry for each physical page frame, indicating whether the latter is free or allocated and, if it is allocated, to which page of which process or processes.
• Operation of the user process in user space
If a user makes a system call (to do I/O, for example) and provides an address as a parameter(a buffer for instance), that address must be mapped to produce the correct physical address.The operating system maintains a copy of the page table for each process, just as it maintains a copy of the instruction counter and register contents. This copy is used to translate logical address to physical addresses whenever the operating system must map a logical address to a physical address manually. It is also used by the CPU dispatcher to define the hardware page table when a process is to be allocated the CPU. Paging therefore increases the context-switch time.
• The hardware support in paging

Each operating system has its own methods for storing page tables.

A pointer to the page table is stored with the other register values (like the instruction counter) in the process control block. When the dispatcher is told to start a process, it must reload the user registers and define the correct hardware page-table values from the stored user page table.

• The different way storing the page-table

A set of dedicated register
A the simplest case, the page table is implemented as a set of dedicated registers.
These registers should be built with very high-speed logic to make the paging-address translation efficient. Every access to memory must go through the paging map, so efficiency is a major consideration.
The CPU dispatcher reload these registers, just as it reloads the other registers. Instructions to load or modify the page-table register are, of course, privileged, so that only the operating system can change the memory map.

This scheme only satisfy the the page table is small. When the page table to be very large (for example 1 million entries), the use of fast registers to implement the page table is not feasible.

A page-table base register (PTBR)
The page table is kept in main memory, and a page-table base register (PTBR) points to the page table. Changing page table requires changing only this one register, substantially reducing context-switch time.

The problem with this approach is the time required to access a user memory location.
First, finding the page-table with PTBR need to access a user memory location
Second, accessing the desired place in memory also need to access a user memory locations

Translation look-aside buffer (TLB)
To use a special, small, fast-lookup hardware cache, called translation look-aside buffer (TLB).
The TLB is associative, high-speed memory.
Each entry in the TLB consists of two parts: a key( or tag) and a value.
When the associative memory is presented with an item, it is compared with all keys simultaneously.

• If the item is found:
The corresponding value field is returned. The search is fast; the hardware, however, is expensive

Typically, the number of entries in a TLB is small, often numbering between 64 and 1,024

• The mechanism of the TLBThe TLB contains only a few of the page-table entries.
When a logical address is generated by the CPU, its page number is presented to the TLB.

If the page number is found
its frame number is immediately available and is used to access memory. The whole task is very fast.

If the page number is not in the TLB
known as a TLB miss
A memory reference to the page table must be made. When the frame number is obtained, we can use it to access memory. In addition, we add the page number and frame number to the TLB, so that they will be found quickly on the next reference.

If the TL is already full of entries
The operating system must select one for replacement.
Replacement policies ranger from leat recently used (LRU) to random.
Furthermore, some TLBs allow entries to be wired down, meaning that they cannot be removed from the TLB, TLB entries for kernel code are often wired down.

• The protection method of the TLB

support address-space identifiers (ASIDs)
Some TLBs store address-space identifiers (ASIDs) in each entry of the TLB. An ASID uniquely identifies each process and is used to provide address space protection for that process.
When the TLB attempts to resolve virtual page numbers, it ensures the ASID for the currently running process matches the ASID associated with the virtual page.
If the ASIDs do not match, they are treated as a TLB miss.
In addition to providing address-space protection, an ASID allows the TLB to contain entries for several different processes simultaneously.
not support ASIDs
every time a new page table is selected (for instace, each context switch), the TLB must be flushed (or erased) to ensure that the next executing process does not use the wrong translation information.
Otherwise, there could be old entries in the TLB that contain vaild virtual addresses but have incorrect or invalid physical addresses left over from the previous process.

The percentage of times that a particular page number is found in the TLB is called the hit ratio.

# Operating System MM IV

The memory is usually divided into two partition:
One for the resident operating system, and one for the user processes
We may place the operating system in either low memory or high memory. The major factor affecting this decision is the location of the interrupt vector. Since the interrupt vector is often in low memory, programmers usually place the operating system in low memory as well.

In the contiguous memory allocation,each process is contained in a single continuous section of memory.

• Memory Protection
protecting the operation system from the user processes and protecting the user processes from one another The relocation register is used for this protection. The relocation register contains the value of the smallest physical address; the limit register contains the range of logical address. With relocation and limit register, each logical address must be less than the limit register; the MMU maps the logical address dynamically by adding the value in the relocation register.The CPU can check the every address generated, so it can protect the operating system and the other users programs and data from being modified by this running process.

The relocation register scheme provides an effective way to allow the operating system size to change dynamically. Like the device drive, it is loaded to memory when it is needed.
Such code is sometimes called transient operating-system code; it comes and goes as needed.

• Memory Allocation
• The hole

All memory is available for user processes, and is consider as one large block of available memory, a hole.When a process arrives and needs memory, we search for a hole large enough for this process. If we find one, we allocate only as much memory as is needed, keeping the rest available to satisfy future requests.

• The allocation procedure from the process

As processes enter the system, they are put into an input queue. The operating system takes into account the memory requirements of each process and the amount of available memory space in determining which processes are allocated memory.When a process is allocated space, it is loaded into memory and it can then compete for the CPU.When a process terminates, it releases its memory, which the operating system may then fill with another process from the input queue.

• The allocation procedure from the input queue

At any given time, we have a list of available block sizes and the input queue. The operating system can order the input queue according to a scheduling algorithm.Memory is allocated to processes until, finally, the memory requirements of the next process cannot be satisfied; no available block of memory (or hole) is larger enough to hold process.The operating system can then wait until a large enough block is available, or it can skip down the input queue to see whether the smaller memory requirements of some other process can be met.

• The choice of hole during allocation

In general, a set of holes, of various sizes, is scattered throughout memory at any given time.When a process arrives and needs memory, the system searches this set of hole that is large enough for this process.If the hole is too large, it is split into two: One part is allocated to the arriving process; the other is returned to the set of holes.If the new hole is adjacent holes are merged to form one larger hole.After this point, the system may need to check whether these are processes waiting for memory and whether this newly freed and recombined memory could satisfy the demands of any these waiting processes.

• dynamic storage-allocation problem
How to satisfy a request of size n from a list of free holes.
The most common ways used to select a free hole from the set of available holes.
• First fit
Allocate the first hole that is big enough. first satisfy
• Best fit
Allocate the smallest hole that is big enough. available
• Worst fit
Allocate the largest hole. largest

first fit and best fit are better than worst fit in terms of decreasing both time and storage utilization.

Neither first fit nor best fit is clearly better in terms of storage utilization, but first fit is generally fast.

• external fragmentation

As processes are loaded and removed from memory, the free memory space is broken into little pieces.External fragmentation exists when enough total memory space exists to satisfy a request, but it is not contiguous; storage is fragmented into a large number of small holes.

The external fragmentation in first-fit and best-fit
The selection of the first-fit versus best-fit strategies can affect the amount of fragmentation. (First fit is better for some systems, whereas best fit is better for others)

Another factor is which end of a free block is allocated. (Which is the leftover piece – the one on the top, or the one on the bottom?)

No matter which algorithm is used, external fragmentation will be a problem.

50-percent rule Statistical analysis of first fit, given N allocated blocks, another 0.5N blocks will be lost due to fragmentation. That is one-third of memory be unusable!

One solution of the problem of external fragmentation is compaction. The goal is to shuffle the memory contents to place all free memory together in one large block.

If relocation is static and is done at assembly or load time, compaction cannot be done;
Compaction is possible only if relocation is dynamic, and is done at execution.

The simplest compaction algorithm can be expensive.

• internal fragmentation

The overhead to keep track of this hole will be substantially larger than the hole itself.The general approach is to break the physical memory into fixed-sized blocks, and allocate memory in unit of block sizes. With this approach, the memory allocated to a process may be slightly larger than the requested memory. The difference between these two numbers is internal fragmentation – memory that is internal to a partition but is not being used.

Another possible solution to the external-fragmentation problem is to permit the logical-address space of a process to be noncontiguous, thus allowing a process to be allocated physical memory wherever the latter is available.

Two complementary techniques achieve this solution: paging and segmentation. These thechniques can also be combined.

# Operating System (MM) III

How to execute for process in the memory. The process is round-robin and need execute by order. So the swapping is necessary.

• Swapping

A process, however, can be swapped temporarily out of memory to a backing store, and then brought back into memory for continuous.

The quantum must also be sufficiently large that reasonable amount of computing are done between swaps.

A variant of this swapping policy is used for priority-based scheduling algorithm. This variant of swapping is sometimes called roll out, roll in.

• Address Binding
Normally a process that is swapped out will be swapped back into the same memory space that it occupied previously.
• If binding is done at assembly or load time the process cannot be moved to different locations
• If execution-time binding is being used a process can be swapped into a different memory space, because the physical addresses are computed during execution time.
• Backing Store
The backing store is commonly a fast disk.It must be large enough to accommodate copies of all memory images for all users, and it must provide direct access to these memory images.The system maintains a ready queue consisting of all processes whose memory image are on the backing store or in memory and are ready to run.
• Whenever the CPU schedule decide to execute a process, it calls the dispatcher . The dispatcher checks to see whether the next process in the queue is in memory. If not, and there is no free memory region, the dispatcher swaps out a process currently in memory and swaps in the desired process. It the reloads registers as normal and transfers control to the selected process.
• Performance
• time
For efficient CPU utilization, we want our execution time for each process to be long relative to the swap time.
• The factors affected to swapping performance
• Memory Allocation

The major part of the swap time is transfer time. The total transfer time is directly proportional to the amount of memory swapped.Therefore, it would be useful to know exactly how much memory a user process is using, not simply how much it might be using. Then, we would need to swap only what is actually used, reducing swap time. For this method to be effective, the user must keep the system informed of any changes in memory requirements. Thus, a process with dynamic memory requirements will need to issue system call (request memory and release memory) to inform the operating system of its changing memory needs.

• Waiting I/O operation

Swapping is constrained by other factors as well. A process may be waiting for an I/O operation when we want to swap that process to free up its memory. However, if the I/O is asynchronously accessing the user memory for I/O buffers, then the process cannot be swapped. Assume that the I/O operation was queued because the device was busy.

The two main solution to this problem are never to swap a process with pending I/O or execute I/O operations only into operating-system buffers. Transfers between operating-system buffers and process memory then occur only when the process is swapped in.

• The modification of swapping
The modification of swapping is used in many versions of UNIX.Swapping was normally disabled, but would start if many processes were running and were using a threshold amount of memory. Swapping would again be halted if the load on the system were reduced.

# Operating System (MM) II

How to optimise the memory usage:
Dynamic Loading, Dynamic Linking, Shared Libraries, Overlays

• ### Dynamic Loading

To obtain better memory-space utilisation, we can use dynamic loading. With dynamic loading, a routines are kept on disk in a relocatable load format.
When a routine needs to call another routine, the calling routine first checks to see whether the other routine has been loaded. If not, the relocatable linking loader is called to load the desired routine into memory and to update the program’s address tables to reflect this change. Then, control is passed to the newly loaded routine.

Advantage: Although the total program size may be large, the portion that is used (and hence loaded) may be much smaller.

Not require special support from the operating system. It is the responsibility of the users to design their programs to take advantage of such a method.
The dynamic loading belongs to programmer. It can be controlled by user

• ### Dynamic Linking

Static Linking: the system language libraries are treated like any other object module and are combined by the loader into the library.

With dynamic linking, a stub is included in the image for each library-routine reference. This stub is a small piece of code that indicates how to locate the appropriate memory-resident library routine, or how to load the library if the routine is not already present.

When this stub is executed, it checks to see whether the needed routine is already in memory. If not, the program loads the routine. Either way, the stub replaces itself with the address of the routine, and executes the routine. Thus, the next time that code segment is reached, the library routine is executed directly, incurring no cost for dynamic linking.

Under this scheme, all processes that use a language library execute only copy of the library code.

Unlike dynamic loading, dynamic linking generally requires help from the operating system.

• ### Overlays

To enable a process to be larger than the amount of memory allocated to it, we can use overlays.

The idea of overlays is to keep in memory only those instructions and data that are needed at any given time. When other instructions are needed, they are loaded into space occupied previously by instructions that are no longer needed.

As in dynamic loading, overlays do not require any special support from the operating system. They can be implemented completely by the user with simple file structures, reading from the files into memory and then jumping to that memory and executing the newly read instructions.

The operating system notice only that there is more I/O than usual.

• The programmer, on the other hand, must design and program the overlay structure properly.
• This task can be a major undertaking, requiring complete knowledge of the structure of the program,its code, and its data structures.
• Because the program is, by definition, large – small program do not need to be overlaid – obtaining a sufficient understanding of the program may be difficult.
• The use of overlays is currently limited to microcomputer and other system that have limited amounts of physical memory and that lack hardware support for more advanced techniques.

# Operating System (MM) I

Ignore how a memory address is generated by program. Interesting in only the sequence of memory addresses generated by the running program

• Addressing Binding

The process must be executed in the memory, so memory manager is needed by every process

Input Queue: The collection of processes on the disk that is waiting to be brought into memory for execution forms the input queue

The normal procedure is to select one of the processes in the input-queue and to load that process into memory

Most system allow a user process to reside in any part of the physical memory. Address may be represented in different ways.

• Source code: symbolic address
• Complie: bind these symbolic address to relocatable addresses
• Linking or Loading: bind these relocatable addresses to absolute address

Each binding is a mapping from one address space to another

• Compile Time:

The literal text contained with address is defined in programming lanuage. The complie know this address. The absolute code can be generated. The code is recompiled when it is changed.

• Load Time:

In this step, the address already had been generated by the compiler. This address is called relocatable code. The binding is delayed until load time. The code only need be reloaded to incorporate the change value.

• Execution Time:

The binding address must be delayed until run time. Special hardware must be available for this scheme to work

• Logical Address space vs. Physical Address Space

The memory can be understaned as address and the space started in this address. The memory’s manipulation is also operating that space.

Logical-Address Space: an address generated by the CPU
Physical-Address Space: an address loaded into the memory-address register

• Compile-time, load-time:

address-binding methods generate identical logical and physical address.

• execution-time:

result in differing logical and physical address. logical address = virtual address

The set of all logical address generated by a program is a logical-address space.

The set of all physical address corresponding to these logical address is a physical-address space

MMU: The run-time mapping from virtual to physical addresses is done by a hardware device called the memory-management unit (MMU)

The user program never sees the real physical address. The user program supplies logical address; these logical address must be mapped to physical addresses before they are used.