Breaking the O(n) Barrier for the Best-Fit Algorithm.

Dhruv Matani
[email protected]

What is the Best-fit Algorithm?

If you have n blocks of memory each not necessarily the same size, and not necessarily adjacently placed, then the task of finding that block of memory which fits most closely for a given request, is called the best-fit algorithm. So, by definition, the best-fit algorithm chooses the memory block which results in the least wastage of memory for that particular request. In other words, the algorithm can be described as that algorithm which chooses a block out of n blocks whose size is greater than or equal to the size requested, and there is no other block from the n blocks is smaller than the selected block. The algorithm can also be described as one which selects a block from the n given blocks such the difference between the size(S) of the selected block and the request size(R) is non-negative and is minimum.




How has this algorithm been implemented traditionally?

Consider you have a linear address space B-bytes long. Now, traditionally, we create a linked list of n blocks from this linear address space of B-bytes. Each of these n-blocks need not be same in size or placed adjacently. Thus, each of these n-blocks contain some book-keeping information such as the next block in the linked list, the previous block in the linked list, and the size of the block, etc...1

Let us for simplicity consider that the n-block are equal in size, and that there is no book-keeping overhead. The linked-list is an address ordered linked list, meaning that the the head of the linked list is placed at the lowest address, and the last node(tail) of the linked list will be placed at the highest address. Thus, the address of the next node after each node will be higher that the node itself or will be the address of the node itself. For example, Head->next >= Head is the invariant maintained, and by induction, the tail node has the greatest address.

The traditional approach consists of visiting each node in the linked-list of free nodes (called as a free-list, and referred to as FL in the rest of this document), and checking whether this node's size is the smallest size greater than or equal to the size requested.

Pseudo code for the above:

Best-Fit(request_size)
begin
node rover = head;
node min = rover;
// Assume that head is always a valid node. Also, head may be the only node in the FL, thus making:
// Head->next = tail.

do
{
diff = rover->size - request_size;
if (diff >= 0 && rover->size < min->size)
min = rover;
rover = rover->next;
} while (rover != tail);

if (min->size >= request_size)
// Block was found.
else
// Search was unsuccessful.
end

As you can clearly see, the algorithm complexity is to the order of O(n) where, n is the total number of blocks in the FL.

Is this complexity too much? Can the algorithm be implemented in a cheaper manner? Is the algorithm used sufficiently widely that an improvement will make any difference to the running times of any real application?

The answer to all the above questions would be a YES!


How can we go about reducing the complexity of the Best-fit algorithm?

Now, if instead of maintaining the Free-blocks as a linked-list, we maintain the same information in the form of a Balanced Binary Tree2, then we could probably do better. Lets see how:

We maintain the free nodes in the form of a balanced binary tree. The right children will be having sizes greater than the root, and the left children of the root will have sizes lesser than the size of the root node. All nodes of the same size will be maintained as a linked list with the head of that linked list associated directly with the balanced binary tree. We do this to basically group all nodes with the same size together so that they can be retrieved very quickly if we maintain some sort of cache. Also, this reduces the height of the balanced binary tree, which further reduces search times for the individual nodes having varying sizes.

So much for the setting up of the data structure!



How will this data structure help us reducing the search times?

Consider that the nodes corresponding to the free memory are arranged as a balanced binary tree as explained above. Now, finding the best-fit for the given request would be analogous to finding the node with the least size from the class/set of nodes having size greater than or equal to the request size. We may note that if we find a node of size equal to the request size, then we need not search any further because a node of a smaller size satisfying the above criterion is not possible once we have a node whose size is equal to the request size. The above problem can be solved by recursively reducing the set of all nodes that satisfy the criterion: "Have size greater than the request size". Once the above set has the size 1, there are no more sub-divisions possible, so we have arrived at the answer automatically!

Thus, even if we consider that no duplicate size nodes exist, the complexity of the algorithm reduces from O(n) to O(lg(n))!

The pseudo code for the same is given below:
The algorithm assumes that the root pointer points to the head of the balanced binary tree, and is never non-zero(ie is always a valid pointer).

Best-Fit(request_size)
begin
node rover = head;
node min = a node such that min->size == INFINITY;

while (rover != NULL)
{
if (request_size == rover_size)
{
min = rover;
goto do_return;
}
else
if (request_size < rover->size)
{
min = rover;
rover = rover->left;
}
else
{
rover = rover->right;
}
}

do_return:
if (min->size != INFINITY)
{
// 1. Remove single node from the linked list of nodes having the same size as min.
if (node->empty())
{
delete_node(node);
if (root == NULL)
root = new node.
}
// Return the removed node.
}
else
{
// Create a new node having the required size, and return that to the requester.
}
end




Other improvements to the above algorithm.

The algorithm discussed above can be improved in many ways with regards to:

1. Memory fragmentation.
2. Memory locality.

1. Memory Fragmentation:

A] External Fragmentation: The problem of external memory fragmentation arises when many small blocks of memory are carved out of a large block, and then are not coalesced together to re-form the larger block when possible. Consider that the user requests the allocator for many blocks of very small sizes, and then it gives them all back to the allocator. When a request for a large block arrives at a later stage, the allocator will not be able to serve that request without asking for more memory from the OS, even though it has that much memory free! This problem definitely needs to be solved, otherwise the OS will run out of memory some time or the other. What we can do to counter this is that we maintain a linked list of the free and used blocks along with maintaining the free blocks as a balanced binary tree. This means that the coalesceing operation can be performed in constant time, because an address ordered linked list of all the blocks is maintained. Also, this address ordered linked-list does not interfere with the operation of the balanced binary tree, so all the operations can be implemented with the required complexity.

B] Internal Fragmentation: The problem of internal fragmentation arises when the allocator gives the user a block of size greater than what the user asked for, thus resulting in a wastage of memory. Suppose the user asked for 10 bytes, and the balanced binary tree has only nodes of size 5, 6, 25, and 36, then the best-fit would be the node of size 25. Thus, 15 bytes would be wasted every time a request of size 10 bytes comes along. Consider that the user makes a 10 byte request 100 times during the life of the program. Thus wastage would cumulate up to 15*100 bytes, which is 1500 bytes. Almost 1.5kB! Which is again definitely not acceptable for long running programs. The manner in which we tackle this problem is the exact opposite manner in which we tackle external fragmentation. What we do here is that we split the best-fit node to fit the user's request size, and the re-insert the remainder of the node back into the tree. this means that in the current example, a new node of size 15 bytes would get added to the tree after the 25 byte node was split up into a 10 byte node and a 15 byte node. Thus, the next time a request for 10 bytes came along, the best fit would be the node consisting of 10 byte entries. This would again be split up into 10 bytes and 5 bytes. Thus, there is no memory wastage. Only if the program were to never use the 5 bytes chunks, would memory be wasted. but then again, the total wastage would only be 5 bytes for every 2 requests, which turns out to be 5*100/2 = 5*50 = 250 bytes, which is not even 1/4kB!

2. Memory Locality:

The problem of memory locality crops up when the allocator is used to allocate memory for linked-list nodes. The allocator should then as far as possible return memory in increasing memory order as opposed to any random address order. However, we know that the linked lists will mostly ask for nodes all of which have the same size in memory, so what we can do is that we maintain the linked list for each size sorted in address order. this would mean that the requests going out would also be in increasing address order, and hence the memory locality would be preserved. This task of maintaining the nodes in address sorted order however involves some sort of extra computation, and if the linked list for a particular size grows too big, then to maintain it in address sorted order would be quite expensive, and would probably outweigh the gains obtained from the improved algorithm. So, this trick should be used with utmost care. However, for the general case, we need not be too worried, because program behaviour is not absolutely erratic or random, and there is definitely some sort of periodicity and regularity with which certain operations are carried out, so such extreme cases would probably not arise in the common case.





Notes:

[1] The book-keeping information may vary from implementation to implementation depending upon what operations it wants to support and with what complexity.

[2] Balanced Binary Tree here refers to either a Red Black Tree implementation, or AVL Trees, or some other algorithm, which guarantees that the tree will not degenerate into a linked list, and will remain height balanced after a series of random insertions and deletions.





Hosted by www.Geocities.ws

1