Breaking the O(n) Barrier for the Best-Fit Algorithm.
What
is the Best-fit Algorithm?
If you have n blocks of memory each not necessarily the same size, and
not necessarily adjacently placed, then the task of finding that block
of memory which fits most closely for a given request, is called the
best-fit algorithm. So, by definition, the best-fit algorithm chooses
the memory block which results in the least wastage of memory for that
particular request. In other words, the algorithm can be described as
that algorithm which chooses a block out of n blocks whose size is
greater than or equal to the size requested, and there is no other block
from the n blocks is smaller than the selected block. The algorithm can
also be described as one which selects a block from the n given blocks
such the difference between the size(S) of the selected block and the
request size(R) is non-negative and is minimum.
How
has this algorithm been implemented traditionally?
Consider you have a linear address space B-bytes long. Now,
traditionally, we create a linked list of n blocks from this linear
address space of B-bytes. Each of these n-blocks need not be same in
size or placed adjacently. Thus, each of these n-blocks contain some
book-keeping information such as the next block in the linked list, the
previous block in the linked list, and the size of the block, etc...1
Let us for simplicity consider that the n-block are equal in size, and
that there is no book-keeping overhead. The linked-list is an address
ordered linked list, meaning that the the head of the linked list is
placed at the lowest address, and the last node(tail) of the linked list
will be placed at the highest address. Thus, the address of the next
node after each node will be higher that the node itself or will be the
address of the node itself. For example, Head->next >= Head is the
invariant maintained, and by induction, the tail node has the greatest
address.
The traditional approach consists of visiting each node in the
linked-list of free nodes (called as a free-list, and referred to as FL
in the rest of this document), and checking whether this node's size is
the smallest size greater than or equal to the size requested.
Pseudo code for the above:
Best-Fit(request_size)
begin
node
rover = head;
node
min = rover;
// Assume that head is always a valid
node. Also, head may be the only node in the FL, thus making:
// Head->next = tail.
do
{
diff = rover->size - request_size;
if (diff >= 0 &&
rover->size < min->size)
min = rover;
rover = rover->next;
} while
(rover != tail);
if
(min->size >= request_size)
// Block was found.
else
// Search was unsuccessful.
end
As you can clearly see, the algorithm complexity is to the order of
O(n) where, n is the total number of blocks in the FL.
Is this complexity too much? Can the algorithm be implemented in a
cheaper manner? Is the algorithm used sufficiently widely that an
improvement will make any difference to the running times of any real
application?
The answer to all the above questions would be a YES!
How
can we go about reducing the complexity of the Best-fit algorithm?
Now, if instead of maintaining the Free-blocks as a linked-list, we
maintain the same information in the form of a Balanced Binary Tree2, then we could probably do better.
Lets see how:
We maintain the free nodes in the form of a balanced binary tree. The
right children will be having sizes greater than the root, and the left
children of the root will have sizes lesser than the size of the root
node. All nodes of the same size will be maintained as a linked list
with the head of that linked list associated directly with the balanced
binary tree. We do this to basically group all nodes with the same size
together so that they can be retrieved very quickly if we maintain some
sort of cache. Also, this reduces the height of the balanced binary
tree, which further reduces search times for the individual nodes having
varying sizes.
So much for the setting up of the data structure!
How
will this data structure help us reducing the search times?
Consider that the nodes corresponding to the free memory are arranged
as a balanced binary tree as explained above. Now, finding the best-fit
for the given request would be analogous to finding the node with the
least size from the class/set of nodes having size greater than or equal
to the request size. We may note that if we find a node of size equal to
the request size, then we need not search any further because a node of
a smaller size satisfying the above criterion is not possible once we
have a node whose size is equal to the request size. The above problem
can be solved by recursively reducing the set of all nodes that satisfy
the criterion: "Have size greater than the request size". Once the above
set has the size 1, there are no more sub-divisions possible, so we have
arrived at the answer automatically!
Thus, even if we consider that no duplicate size nodes exist, the
complexity of the algorithm reduces from O(n) to O(lg(n))!
The pseudo code for the same is given below:
The algorithm assumes that the root pointer points to the head of the
balanced binary tree, and is never non-zero(ie is always a valid
pointer).
Best-Fit(request_size)
begin
node
rover = head;
node
min = a node such that min->size == INFINITY;
while
(rover != NULL)
{
if
(request_size == rover_size)
{
min = rover;
goto
do_return;
}
else
if
(request_size < rover->size)
{
min = rover;
rover = rover->left;
}
else
{
rover = rover->right;
}
}
do_return:
if
(min->size != INFINITY)
{
// 1. Remove single node from the
linked list of nodes having the same size as min.
if
(node->empty())
{
delete_node(node);
if (root == NULL)
root = new node.
}
// Return the removed node.
}
else
{
// Create a new node having the
required size, and return that to the requester.
}
end
Other
improvements to the above algorithm.
The algorithm discussed above can be improved in many ways with regards
to:
1. Memory fragmentation.
2. Memory locality.
1. Memory
Fragmentation:
A] External
Fragmentation: The problem of external memory fragmentation
arises when many small blocks of memory are carved out of a large block,
and then are not coalesced together to re-form the larger block when
possible. Consider that the user requests the allocator for many blocks
of very small sizes, and then it gives them all back to the allocator.
When a request for a large block arrives at a later stage, the allocator
will not be able to serve that request without asking for more memory
from the OS, even though it has that much memory free! This problem
definitely needs to be solved, otherwise the OS will run out of memory
some time or the other. What we can do to counter this is that we
maintain a linked list of the free and used blocks along with
maintaining the free blocks as a balanced binary tree. This means that
the coalesceing operation can be performed in constant time, because an
address ordered linked list of all the blocks is maintained. Also, this
address ordered linked-list does not interfere with the operation of the
balanced binary tree, so all the operations can be implemented with the
required complexity.
B] Internal
Fragmentation: The problem of internal fragmentation arises when
the allocator gives the user a block of size greater than what the user
asked for, thus resulting in a wastage of memory. Suppose the user asked
for 10 bytes, and the balanced binary tree has only nodes of size 5, 6,
25, and 36, then the best-fit would be the node of size 25. Thus, 15
bytes would be wasted every time a request of size 10 bytes comes along.
Consider that the user makes a 10 byte request 100 times during the life
of the program. Thus wastage would cumulate up to 15*100 bytes, which
is 1500 bytes. Almost 1.5kB! Which is again definitely not acceptable
for long running programs. The manner in which we tackle this problem
is the exact opposite manner in which we tackle external fragmentation.
What we do here is that we split the best-fit node to fit the user's
request size, and the re-insert the remainder of the node back into the
tree. this means that in the current example, a new node of size 15
bytes would get added to the tree after the 25 byte node was split up
into a 10 byte node and a 15 byte node. Thus, the next time a request
for 10 bytes came along, the best fit would be the node consisting of
10 byte entries. This would again be split up into 10 bytes and 5
bytes. Thus, there is no memory wastage. Only if the program were to
never use the 5 bytes chunks, would memory be wasted. but then again,
the total wastage would only be 5 bytes for every 2 requests, which
turns out to be 5*100/2 = 5*50 = 250 bytes, which is not even 1/4kB!
2. Memory
Locality:
The problem of memory locality crops up
when the allocator is used to allocate memory for linked-list nodes. The
allocator should then as far as possible return memory in increasing
memory order as opposed to any random address order. However, we know
that the linked lists will mostly ask for nodes all of which have the
same size in memory, so what we can do is that we maintain the linked
list for each size sorted in address order. this would mean that the
requests going out would also be in increasing address order, and hence
the memory locality would be preserved. This task of maintaining the
nodes in address sorted order however involves some sort of extra
computation, and if the linked list for a particular size grows too big,
then to maintain it in address sorted order would be quite expensive,
and would probably outweigh the gains obtained from the improved
algorithm. So, this trick should be used with utmost care. However, for
the general case, we need not be too worried, because program behaviour
is not absolutely erratic or random, and there is definitely some sort
of periodicity and regularity with which certain operations are carried
out, so such extreme cases would probably not arise in the common case.
Notes:
[1] The book-keeping information may vary from
implementation to implementation depending upon what operations it wants
to support and with what complexity.
[2] Balanced Binary Tree here refers to either a
Red Black Tree implementation, or AVL Trees, or some other algorithm,
which guarantees that the tree will not degenerate into a linked list,
and will remain height balanced after a series of random insertions and
deletions.