FYBSc(IT)
Semester – II
Design
& Analysis of Algorithms.
Introduction
An algorithm can be defined a s a set of finite rules that gives a sequence of operations for solving a specific type of problem. An algorithm has five inportant features:
1)
Finiteness
: An algorithm must always terminate
after a finite number of steps.
2)
Definiteness
: Each step of an algorithm must be
precisely defined; the actions to be carried out must be rigorously and
unambiguously specified for each case.
3)
Input
: An algorithm has zero or more inputs. Quantities that are given to it
initially before the algorithm begins, or dynamically as the algorithm runs.
These inputs are taken from a specified set of objects.
4)
Output
: An algorithm has one or more outputs. Quantities that have a specified
relation to the inputs.
5)
Effectiveness
: An algorithm is also generally expected to be effective, in the sense that
its operations alone should be sufficient to work out the solution on a paper
or in any programming language.
A useful algorithm should require not only a finite
number of steps, but a reasonable number. One criteria of goodness is the
length of time taken to perform the algorithm; this can be expressed in terms
of the number of times each step is executed. Other criteria are the
adaptability of the algorithm to different kind of computers, its simplicity
and elegance.
The study of algorithms include many important &
active areas.
i)
How
to devise an algorithm.
ii)
How
to express an algorithm,
iii)
How
to validate an algorithm,
iv)
How
to analyze algorithm &
v)
How
to test a program.
An algorithm is devised in such a way that it should be
general solution to the problem under consideration. The algorithms can be
expressed in any structured programming languages such as Sparks, Pascal etc..
Once an algorithm is designed it is necessary to show that it computes the
correct answer for all possible legal inputs. This ensures that the algorithm
will work correctly independent of the issues concerning the programming
language it will be ultimately written in. Once the validation has been done
the algorithm has to be verified. Once it is executed, it makes the use of the
CPU to perform various operations and it also uses memory to store the program
and its data. Analysis of algorithm refers to the process of determining how
much computing time and storage an algorithm will require. When the algorithm
is analyzed and validated it needs to be tested by expressing it any
programming language. The testing of any algorithm is a two step process.
a)
Debugging
b)
Profiling.
Debugging refers to executing programs
on sample data sets and tracing faulty results. Profiling is the process of
executing correct program on data sets and measuring the time and space
required to compute the results.
Running time of a Program.
When solving a problem we frequently face a choice among
algorithms. On what basis should we choose? There aer two often contradictory
goals.
a)
Choose
an algorithm that is easy to understand, code and debug.
b)
Choose
an algorithm that makes efficient use of the computers resources, especially
that one which runs as fast as possible. As now-a-days memories have become
quite cheaper, we can overlook the space factor.
When we are writing a program to be used once or a few
times, goal (1) is most important. Here the cost of programmer’s time will
likely exceed by far the cost of running the program. When presented with a
problem whose solution is to be used many times, the cost of running the
program may far exceed the cost of writing it. Then it is sound to implement a
complicated algorithm, provided that the resulting program will run
significantly faster than the simpler program.
So, in building a
complex system it is often desirable to implement a simple prototype on which
measurements and simulation can be performed, before coming to the final
design. The programmers must not only be
aware of ways of making programs run fast, but must know when to apply
these techniques and when not to bother.
Measuring the running time of a Program.
The running time of a program depends on the factors such
as
a)
the
input to the program,
b)
the
quality of the code generated by the comiler used to create the object program,
c)
the
nature and speed of the instructions on the machine used to execute the program
and,
d)
the
time complexity of the algorithm underlying the program.
Often the running time depends not on the exact input but
the size of the input. A good example is the process of sorting algorithms. The
more the numbers are the more time it takes to produce the output. Then we can
talk of T(n), the running time of the program
with inputs of size n. Some program may have a running time T(n)=cn2
where c is a constant. Thus we can think of T(n) as being the
number of instructions executed on a ideal computer. In the worst case T(n)
will be maximum. So we consider Tavg (n), the average over
all inputs of size n. But in practice the average running time is harder
to determine because average input has no obvious mathematical meaning. So the
worst case running time is used as the principle measure of the time
complexity.
Data comes in all shapes and sizes, but often it can be organized in the same way. For example consider a list of things to do, a list of ingredients in a recipe, or a attendance list for a class. Although each contains a different type of data, they all contain data organized data in a similar way : a list. A list is one example of a data structure. There are many other common ways to organize data as stacks, queues, sets, trees, heaps etc.
To solve any problem we need to store data. Once a data structure has been chosen for a particular application , it is given life by the logical instructions that manipulate the related data items stored in it. Thus, a study of data structures is also necessarily a study of algorithms that control them.
Queues.
The distinguishing characteristics of a Queue is that it stores and retrieves data in a First In First Out(FIFO) manner. This means that the first element placed in the queue is the first to be removed. Insertions are limited at one end while deletions may occour only on the other end. A simple way to think of a queue is as a line at the bus-stop. As the line grows, newcommers join in at the end. When a bus arrives, the person at the head of the line boards the bus. In computing, to place an element at the end/tail of a queue, we enqueue it; to remove an element from the head we dequeqe it.
Queues can be implemented using arrays and using linked lists.The array representation of queues is discussed in detail.
Array Representation of Queues
Let us consider computer jobs being scheduled in a batch processing environment. We shall further suppose that all job names are integer values and that the jobs are scheduled strictly in the order in which they arrive. Then an array and two pointers can be used to implement the scheduling queue. Let the two pointers be Rear(top) and Front(bottom). The rear and front are initially set to 0 and 1 respectively.
Let us consider an empty queue as shown in fig 1(a). Recalling that the insertions may be made only at the rear (top) of the queue, suppose that job 4456 now arrives to be processed. The queue changes to look like fig 1(b)

Arraysize Arraysize
________ ________
3 ________ 3 ________
2 ________ 2 ________ Rear
= 1
![]()
1
Front =
1 1 4456
Rear = 0 Front
= 1
Fig 1(a) Fig 1(b)
1024
Arraysize 100 Rear
________
________
3 ___3740__
2 ___5688__
Front = 2
1 4456
Fig 1(c)
which shows that the addition of any ITEM to the queue requires two steps:
i) Increment Rear &
ii) Queue(Rear) := Item
If the system is ready to process the job 4456, the front entry must be removed from the queue to an appropriate location identified by ITEM and the Front be decreased by one i.e
i) Item := Queue(Front)
ii) Front := Front + 1
The algorithms that adds up an element and delets an element from the queue are as listed below.
Algorithm ADD( QUEUE, ARRAYSIZE, REAR, ITEM, FULL)
( * General procedure to add ITEM to REAR of QUEUE * )
VAR QUEUE(ARRAYSIZE), REAR, ITEM : INTEGER
VAR FULL : BOOLEAN
IF REAR = ARRAYSIZE THEN
FULL := TRUE
ELSE
FULL := FALSE
REAR := REAR + 1
QUEUE[REAR] := ITEM
ENDIF
RETURN
END ADD
Algorithm DELETE( QUEUE, ARRAYSIZE, FRONT, ITEM, EMPTY)
( * General procedure to Remove Front entry from QUEUE * )
VAR QUEUE(ARRAYSIZE), FRONT, ITEM : INTEGER
VAR EMPTY : BOOLEAN
IF REAR < FRONT THEN
EMPTY := TRUE
ELSE
EMPTY := FALSE
ITEM := QUEUE[FRONT]
FRONT := FRONT - 1
ENDIF
RETURN
END ADD
Linked List implementation of Queues
Defn : Linked lists is a structure that not only contains a data field but also one pointer to other nodes in the list.
Linked lists are useful in maintaining collections of data, similar to the ways that arrays are often used. However they are considerably more efficient in performing insertions and deletions. Linked lists also make use of dynamically allocated storage, which is storage allocated at runtime. Since in many applications the size of the data is not known at compile time.
![]()
DEF
ABC
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Head
\ Null Fig 2
The linked implementation allows the queue to be completely dynamic with size restrictions imposed only by the pool of the available nodes. The queue is implemented as a linked list with an additional REAR pointer to the last node in the list so that the list need not to be traversed to find this node. A special FRONT pointer need not to be maintained, because LINK(HEAD) leads directly to the first item in the queue.
Summarising the conditional checks that signal a single, an empty, one-entry, or full queue.
Empty queue ŕ REAR = HEAD
One-entry-queue ŕ REAR = LINK(HEAD)
Full queue ŕ Depends upon memory space available
Priority Queues
Priority queues are used to set some priorities in processing elements of a queue. For example, at a university computer centre, students in introsuctory computer science courses may receive the highest priority for their jobs to encourage a quick turnaround. Students in upper division courses may have the next priority, whereas jobs related to faculty research,
which requires a great deal of computation, get the
lowest priority. These type of jobs could be called A, B, and C
respectively. Any A job is serviced before B or C, any B job is serviced
before any C job. A data structure capable of representing such a queue
would require just one FRONT pointer but three rear pointers – one for each
A, B, and C.
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
Stacks
The Distinguishing feature of a stack is that it can store and retrieve data in the last in first out, or LIFO manner. This means that the last element placed on a stack is the first to be removed. A convenient way to think of a stack is as a can of tennis balls. When we place balls in the can, the can is filled up from the bottom to the top. When we remove the balls, the can is emptied from the top to the bottom. In case we need a particular ball in the stach then we will have to remove all the balls above it. In computing, to place an element on the top of a stack, we push it; to remove an element from the top, we pop it.
Array implementation of a stack.
A strategy similar to that used for an array implementation of a queue can be followed. However, because insertions and deletions occour at the same end of a stack,only one pointer will be needed instead of two required for a queue. We will call the pointeras TOP
Thus to push an item into the stack, the operations will be
i) TOP := TOP + 1
ii) STACK(TOP) := ITEM
Popping an entry from the stack into ITEM requires
i) ITEM := STACK(TOP)
ii) TOP := TOP – 1
The complete algorithms to PUSH and POP an element in the array are as discussed below.
Algorithm PUSH(ITEM, STACK, ARRAYSIZE, TOP, FULL)
( * Push ITEM into Stack, FULL is a flag indicating whether *)
( * or not there is room on STACK to perform this operationn *)
VAR ITEM, TOP : Integer
VAR FULL : BOOLEAN
IF TOP = ARRAYSIZE THEN
FULL := TRUE
ELSE
FULL := FALSE
TOP := TOP + 1
STACK(TOP) := ITEM
ENDIF
RETURN
END PUSH
Algorithm POP(ITEM, STACK, ARRAYSIZE, TOP, EMPTY)
( * Push ITEM into Stack, FULL is a flag indicating whether *)
( * or not there is room on STACK to perform this operationn *)
VAR ITEM, TOP : Integer
VAR FULL : BOOLEAN
IF TOP = 0 THEN
EMPTY := TRUE
ELSE
EMPTY := FALSE
ITEM := STACK(TOP)
TOP := TOP - 1
ENDIF
RETURN
END PUSH
Linked Implementation of a Stack
When we use the link list implementation, we are using a relatively small amount of memory space needed to maintain pointers for the dynamic allocation of stack space. Following the usual convention of maintaining a dummy header node at the beginning of a linked list, a stack with two entries 65 & 43 would appear as shown in the fig 4 below.
43
65
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Head
![]()
Top
\ Null
Fig 4 : Linked implementation of a stack with 2 data
nodes and a dummy header
The condition to test for an empty stack would be TOP = HEAD. A full stack occours only when the there is no memory left to create a node. The procedures to push and pop now become nothing more than inserts and deletes from the beginning of a linked list.
Trees
Most of us are famalier of ‘Family Trees’ or genealogical trees which is a simple way of expressing a hierarchical relationship between parents, children, brothers, sisters, cousins, aunts, uncles, and so on. The crutial relationship in such a tree is between parent and a child. Such a parent child relationship is an excellent example of a hierarchical relationship in which there exists a well-defined order of precedence between two items being related. A Tree is a data structure that represents hierarchical relationships between indivudial data items. The figure below is an a example of a tree

Fig 5. A tree
Every element of a tree is called as a node. In the above tree the element A is at the highest level ( 0 ) of the tree and is called its root (also called as the root node). The nodes A,B & C are the child nodes of the node P as they are directly connected to the root node. The node R` is the parent node of A, B & C. The child nodes of any given parents constitutes a set of siblings. Thus A, B, C & D,E are siblings. Thus A,B & C are at level 1. A link between a parent and a child is called a branch in a tree structure.
A subtree is a subset of a tree that itself a tree. In the above fig 5. The nodes C, D & E forms a subtree.
Now we can define a tree in a recursive fashion as a set of nodes that :
i) Is either empty or
ii) Has a designated node called the root from which descend zero or more subtrees. Each subtree itself satisfies the definition of a tree.
One of the most important of all tree structures is the binary tree. In a binary tree, each node has no more than two child nodes( or at the most two child nodes). In other words in a binary tree each node has two subtrees
( null or non-null ) known as left subtree and the right subtree.

Fig 5. A Binary tree
Tree structures arise quite naturally in both computer science and data processing applications. In data processing, file index schemes and hierarchical database management systems, such as Information Management System (IMS) developed by IBM, typically make use of trees. In computer science tree structures are in compilers for parsing algebric expressions. The hierarchy of algebric operations involved in mathematical expressions can be represented by the hierarchical nature of a tree.
Let us consider a algebric expression
(A – B) + C * (E / F)
The tree for the above expression is as shown in fig.6.

Fig .6
Traversing Trees
Traversing a tree means visiting each and every node of a tree starting from the root node. There are three types of tree traversing techniques for binary trees
· Preorder Traversal
· Inorder Traversal
· Postorder Traversal
Preorder Traversal
The steps involved in preorder traversal are as follows
i) Process the root node
ii) Process the left subtree recursively
iii) Process the right subtree recursively
The sequence of nodes visited in preorder traversal of the binary tree in fig 6 will be + - A B * C / E F
Inorder Traversal
The steps involved in preorder traversal are as follows
i) Process the left subtree recursively
ii) Process the root node
iii) Process the right subtree recursively
The sequence of nodes visited in inorder traversal of the binary tree in fig 6 will be A – B + C * E / F
Postorder Traversal
The steps involved in preorder traversal are as follows
i) Process the left subtree recursively
ii) Process the right subtree recursively
iii) Process the root node
The sequence of nodes visited in postorder traversal of the binary tree in fig 6 will be A B – C E F / * +
Linked representation of Binary tree
A binary tree is maintained in the memory by means of a linked representation which uses two pointer fields (one for each child) and one data field i.e LLINK, RLINK & INFO respectively. In the linked representation, insertions and deletions involve no data movement except the rearrangement of pointers.
Let T be a binary tree. First of all each node N of T will correspond to a location K such that
i) INFO[K] contains the data at node N
ii) LLINK[K] contains the location of the left child of node N
iii) RLINK[K] contains the location of the right child of node N
LLINK INFO RLINK
+
![]()
![]()

![]()
![]()

![]()
![]()
Fig. 7 A Linked representation of the binary tree
The ROOT will contain the location of the root of T. If any subtree is empty, then the corresponding pointer will contain null value; if the tree is empty, then the ROOT will contain the null value
Although for most purposes the linked representation of the binary tree is most efficient, it does have certain disadvantages
a) Wasted memory space in null pointers.
b) Given a node, it is difficult to determine its parent.
c) Its implementation algorithm is more difficult in languages that do not offer dynamic storage techniques; for example FORTRAN, COBOL, BASIC etc.
The drawback b) can be overcome by adding a parent pointer field to a node, provided you can afford using more memory.
In older languages such as BASIC and FORTRAN, it can be implemented using arrays for both pointers and data field.
Heaps
A heap is a tree, usually a binary tree, in which each child node has a smaller value than its parent. Thus the root node is the largest node in the tree. We may also choose to arrange a heap such that each child node has a largeer value than the parent. In this case, the root node is the smallest node. These like these are partially ordered because, although the nodes along every branch have a specific order to them, the nodes at one level aer not necessarily ordered with respect to the nodes at another level. A heap in
( a ) Heap
|
11 |
1 |
5 |
7 |
6 |
12 |
17 |
8 |
4 |
10 |
2 |
( b ) Heap represented in an array.
Fig .8 Heaps
which each child is smaller than its parent is top-heavy.
This is because the largest node is on the top. While the heap in which each
child is larger than its parent is bottom—heavy.
Heapsort
The heap sort is a sorting algorithm whose efficiency is roughly equivalient to that of the quick sort. Its average efficiency is O(n(log2n)).
Let us consider an array as shown fig.8(b). The first phase of the tree appears as shown in fig.8(b). The goal of the first phase is now to sort the data elements along each path from leaf node level to the root node. If we wish to sort in ascending orde, then the numbers along any path from leaf node to root should be in increasing order.
Phase I
1) Process the node which is the parent of the rightmost node on the lowest level. If its value is less than the value of its largest child, swap these values, otherwise do nothing.
2) Move left on the same level. Compare the value of the parent node with the value of the child nodes. If the parent is smaller than the largest child, swap them.
3) When the left end of this level is reached, move up a level, and, beginning with the rightmost parent node, repeat step ( 2 ). Continue swapping the original parent with the larger of its children until it is larger than its children.
4) Repeat step ( 3 ) until all level 1 nodes have been processed (Root is at level 0).
Phase II of the heap sort finds the node with the largest value in the tree and cuts it off from the tree. This is then repeated to find te second largest value, which is also removed from the tree. The process continues until only two nodes are left in the tree, which are exchanged if necessary. The precise steps are as follows
Phase II
1) Compare the root node with its children, swapping it with the largest child if the largest child is larger than the root.
2) If a swap occurred in step ( 1 ), then continue swapping the value which was originally in the root position until it is larger than its children. In effect, this original root node value is now being walked down a path in the tree to ensure that all paths retain values arranged in ascending order from leaf node to root node.
3) Swap the root node with the bottom rightmost child.
4) Repeat steps ( 1 ) through ( 3 ) until only two elements are left.
Graphs
In general a Graph is a structure that represents a less restrictive relationship between data items. A common example of a graph is a collection of cities and the roads between them; two cities are related if there is a route between them. Their relationship is not hierarchical because traffic on the routes between the cities can flow in either direction. Graphs are frequently used in areas such as transportation networks, electrical circuitry, artificial intelligence, chemical structure of crystals, analysis of programming languages etc.
A graph may be formally defined as a set of objects called nodes and edges. In this context a node is a data element of the graph; an edge is a path between two nodes.
Defn : A Graph consists of two sets V and E. The set V is a finite, nonempty set of vertices. The set E is a set of pairs of vertices; these pairs are called edges. The notations V(G) and E(G) represent the sets of vertices and edges respectively of graph G. A graph can also be represented as
G = (V,E) .
In an undirected graph, an edge between two nodes is not directionally oriented. Thus, in a undirected graph if AG (A to G) is an edge, so is GA.
In a directed graph or digraph, the edges between nodes are directionally oriented.


(a) G1 (b) G2
Fig .8 Graphs
In the abve fig (a) is an undirected graph and (b) is a directed graph.
The degree of a vertex is the number of its adjacent vertices. For a directed graph the indegree of a node is the number of edges coming into it, while the outdegree is the number of edges going out of that node.
The set of representations of the graph G1 is as follows
V(G1) = { M, P, N }
E(G1) = { (M,N), (M,P), (P,N) }
Representation of graphs
There are two standard ways of maintaining a graph G in the memory of a computer. One way called the sequential representation of G, is by means of its adjacency matrix. The other way, called the linked representation of G, is by means of linked lists.
Adjacency matrix.
Suppose G is a simple directed graph with m nodes, and suppose the nodes of G have been ordered are called v1, v2, ……, vm. Then the adjacency matrix A = (aij ) of the graph G is the m X n matrix defined as follows
Aij
= 1 if vi is adjacent
to vj, if there is an edge (vi, vj)
0 otherwise
Such a matrix A, which contains entries of only 0 and 1, is called a bit matrix or boolean matrix. A graph containing n nodes can be represented by a matrix containing n rows and n columns. The martix is formed by placing a 1 in the i th row and j th column ( the (i,j)th entry) of the matrix if there is an edge between node i and node j of the graph. The adjacency matrix A of the graph G does depend on the ordering of the nodes of G ; that is, a different ordering of the nodes may result in a different adjacency matrix.
Suppose G is an undirected graph, then the adjacency matrix A of G will be symmetric matrix i.e one in which aij = aji for every i and j.
The adjacency materices for the graphs G1 & G2 in fig. 8 are as follows
A B C
D E 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
M P N A
M P N
|
0 |
1 |
B |
||
|
1 |
0 |
C |
||
|
1 |
1 |
D |
E
(a)
(b)
Fig.9 Adjacency matrix
Divide and Conquer
The divide and conquer strategy suggests splitting the inputs into k distinct subsets resulting k subproblems. These subproblems must be solved indivudially and a procedure should be used to combine the solutions into the solution of the initial problem. If the subproblems are relatively large, then reapply the divide and conquer strategy to further break them into subproblems. Often the subproblems resulting from the reapplication of the divide and conquer strtegy are of the same type as the original problem. This strategy is reapplied till the subproblems are small enough to be solved. The way the divide and conquer splits the subproblems can be solved by writing a control abstraction. ‘Control abstraction is a procedure whose flow of control is clear but whose primary operations are specified by some other procedures whose precise meanings are undefined.’
For ex. If P is a problem to be solved, we can determine whether the input size is small and that the answer can be computed without splitting, by invoking a procedure say SMALL(P). Otherwise P is splitted into subproblems P1, P2,……, Pk and solved by recursively applying divide and conquer to each of these subproblems.
Binary Search
Binary search is a sequential search technique and should be used when records are sorted without any consideration given to order. By paying what may initially seem like a small price, we can dramatically increase the efficiency of our search effort using a simple technique called the binary search. The price we must pay is
a) The list of keys must be maintained in physical order (ex. In an array in
which is a squencial block of memory)
b) The number of the keys must be maintained (i.e the length of the
array)
c) There must be a direct access to the relative keys in the last (i.e any
location in the array should be accessible directly)
Let ai, 1 < I < n be a list of elements which are stored in nondecreasing order. Consider the problem of determining whether a given element is in the list. Divide and conquer suggests breaking up any instance of this search into subinstances and search for the desired element in those subinstances. For.ex consider a list in the fig.10 below
In the fig.10 we
wish to locate the data associated
with key 7630. The strategy is to divide the list into two halves by taking
the 5th key as the mid element. Here
we consider that the 5th element is the mid. We now begin the
search from position 5 towards the right as the value being sought is
greater than the mid. We again reapply the divide and conquer strategy to
this subinstance from 5 to 8. Now the mid is computed as 6 as shown below.
|
|
2574 |
|
2 |
3685 |
|
3 |
4569 |
|
4 |
5124 |
|
5 |
6165 |
|
6 |
7630 |
|
7 |
8467 |
|
8 |
9020 |
Fig
10. List for binary search.
(6 + 8) / 2 = 6
A General procedure for the above problem is shown below
BIN_SEARCH (A, N, X, J)
Arguments : A is the array of elements
N number of elements in the array
X the element to be searched.
J the position of the element to be determined.
LOCAL : LOW, HIGH, MID
BEGIN
LOW := 1
HIGH := N
WHILE LOW < HIGH DO
MID := | (LOW + HIGH) / 2 |
IF (X < A[MID]) THEN
HIGH := MID – 1
ELSE IF (X > A[MID]) THEN
LOW := MID + 1
ELSE
RETURN MID
ENDIF
END WHILE
RETURN
END BIN_SEARCH
Now talking about the efficiency of this procedure let us consider an example
A : (1) (2) (3) (4) (5) (6) (7) (8) (9)
Elements : -15 -6 0 7 9 23 54 82 101
Comparisons : 3 2 3 3 1 3 2 3 4
No element requires more than 4 comparisons to be found. The average is obtained by summing the comparisons needed to find all nine items and dividing it by 9 resulting 25/9, or approx 2.77 comparisions per successful search on an average.
MergeSort
Another example of divide and conquer is the merge sort algorithm. The most advantageous thing about this algorithm is that it has the worst case complexity in the order of O(n log2 n). We shall assume that the elements are to be sorted in a non decreasing order. The general idea is as follows.
Given a sequence of n elements A(1), A(2),…….,A(n) the general idea is to imagine them split into two sets A(1), …., A(n/2) and A((n/2) +1),….,A(n).
Each sequence is indivudially sorted and the resulting sequences are merged to produce a single sorted sequence of n elements.
Let us consider a file containing the data
(2, 6, 3, 1, 4, 31, 23, 8, 11, 19, 21, 37, 14, 57, 28)
We divide it into two original files as
O1 = (2, 6, 3, 1, 4, 31, 23)
O2 = (8, 11, 19, 21, 37, 14, 57, 28)
After the first path we have segments of length 1, so we have
Y1 = {(2,8), (3,19), (4,37), (23,57)}
Y2 = {(6,11), (1,21), (14,31), 28}
After the 2nd pass we have segments of length 2
O1 = {(2,6,8,11), (4,14,31,37)}
O2 = {(1,3,19,21),(23,28,57)}
After 3rd pass we have segments of length 4
Y1 = {(1,2,3,6,8,11,19,21)}
Y2 = {(4,14,23,28,31,37,57)}
After the 4th pass we have segemnts we get a block of length 8
O1 = {(1,2,3,4,6,8,11,1419,21,23,28,31,37,57)}
O2 = EMPTY
The procedure was described beginning with the segments of length 1. Subsequently larger segments can be stored in main memory, so the efficiency of the algorithm can be enhanced by taking conveniently larger segemnts. The procedure of merge sirt usually deals with external file media and therefore is system dependent.
Selection Sort
The main idea behind selection sort is to find the smallest entry among in an array A( j ), A( j+1 ),…….,A( n ), and then interchange it with A( j ). This process is then repeated for each value of j.
The algorithm is as below
SELECTION_SORT(A,N)
Arguments : A the array of elements.
N number of elements in the array A..
BEGIN
LOCAL : I, J, K, N, ITEM
FOR I := 1 TO N-1 DO
K := I
ITEM : = A[ K ]
FOR J := I + 1 TO N DO
IF A[ J ] < ITEM THEN
ITEM := A[ J ]
K := J
ENDIF
END FOR
ITEM : = A[ I ]
A [ I ] := A [ K ]
A[ K ] := ITEM
END FOR
RETURN
END SELECTION_SORT
The above procedure can be visualised as shown below
77 33 44 11 88 22 66 55
![]()
Pivot
![]()
11 33 44 77 88 22 66 55
Pivot
![]()
11 22 44 77 88 33 66 55
![]()
Pivot
![]()
11 22 33 77 88 44 66 55
![]()
Pivot
![]()
![]()
11 22 33 44 88 77 66 55
Pivot
![]()
![]()
11 22 33 44 55 77 66 88
Pivot
![]()
11 22 33 44 55 66 77 88
![]()
Pivot
11 22 33 44 55 66 77 88 Sorted
Inspite of the superiority of the selection sort over bubble sort and insertion sort there is no significant gain in run time; its efficiency is O(n2) for n data items.
Hashing
In order to understand the concept of hashing, let us consider a data structure which uses hashing. A symbol table is a data structure which allows to easily determine the presence or absence of an element in the program body. It also permits easy insertion and deletion of symbols by a compiler. The simplest structure of a symbol table can be visualised as a two dimentional array with one column for the symbol and other for the address of that symbol. Access to the symbol table can be made sequentially using a binary search or some other search technique. But if the symbol under consideration is at the first location in a table of size 9 then it needs atleast 3 comparisions. The most practical technique for maintaining a symbol table is hashing. Here the address of a symbol is obtained by computing some arithmatic function, f, of X. f(X) gives the address where X should be placed in the table. This address will be referred to as the hash address and the function f(X) is called as the hasing function. If the hash generated by the function alreasy contains some symbol then a collision occours. The desired properties of this function are that it is easily computable and that it minimises the number of collisions. Thus it can be thought of as a functio which generates a random number in the range of 0 to n where n is the maximum index of the symbol table. One of the simplest example of a hashing function is
f(X) = X mod M
This will give an index in the range of 0 to M-1. The choice of M has to be made properly generally some power of 2.
We can think of many methods of resolving collisions. Some of the popular methods are sequential and rehashing.
Sequential method.
Suppose the collision occours at a hash address 4, then place the symbol in any free location in sequence to the hash address generated. For ex. In the fig below the hash address generateed is 4. The next subsequent free locations are searched and the symbol is placed in the location 7 which is free.
1 2 3 4 5 6 7 8 9 10 Hash address = 4 Value = K
|
Fig. 11 Collision |
1524 |
||
|
T |
|
||
|
V |
8857 |
||
|
B |
|
||
|
E |
5712 |
||
|
A |
1001 |
||
|
|
|
||
|
R |
6698 |
||
|
B |
1212 |
||
|
|
|
Rehashing.
In rehashing once a collision occours the hashing function is called again to generate a new address where the symbol could be stored. If that location is again non empth then again it is rehashed. But taking into consideration the possiblity that the hashing function may generate the same hash address again and again, some parameteres passed to the hasing function may be changed. For ex
On 1st collision the hash function called will be
f(X+1)
on 2nd collision
f(X+2)
and so on till a free location is foucnd to place the symbol X.
Graph Traversal
In almost all practical applications of graphs, we need to visit systematically all the nodes. On e such application occurs when the organizers of a political campaign want their candidate to visit all important political centers. The presence or absence of direct transportation routes (edges) between centers will determine the possible ways in which all the centers could be visited. At this moment, the only concern is to develop an algorithm that ensures that all the nodes are visited and moreover to make it economical.
Depth-First Search (DFS)
The main logic of the Depth-First Search algorithm is similar to the preorder traversal of a tree. It can be accomplished as follows
1) Choose any node in the graph. Call it as the search node and mark it visited.
2) Using the adjacency matrix of the graph, find a node adjacent to the search node (that is connected by an edge from the search node) that has not been visited yet. Designate this as the new search node (but remember the previous one) and mark it as visited.
3) Repeat Step 2 using the new search node. If no nodes satisfying (2) can be found, return to the previous search node and continue from there.
4) When a return to the previous search node in (3) is impossible, the search from the originally chosen node is complete.
5) If the graph still contains unvisited nodes, choose any node that has not been visited and repeat Steps (1) through (4).
This algorithm is called Depth-first-search because the search continues progressively deeper in a recursive manner. The sequence of nodes visited in the DFS traversal of the graph I fig. 12 is PQTSR.
Breadth first search.
Another widely used scheme
for graph traversal is the breadth-first-search. It can be used
both for directed and undirected graphs. Essentially the BFS begins at a
given node and then proceeds to all nodes directly connected to that node.
The following steps are involved in the algorithm. Fig 12.

1. Begin with any node, mark it visited.
2. Proceed to the next node having the edge connection to the node I
step (1), and mark it as visited.
3. Come back to the node in step (1), descend along an edge toward an
unvisited node, and mark the new node as visited.
4. Repeat step (3) until all nodes adjacent to the node in step (1) have
been visited.
5. Repeat step (1) through (4) from the node visited in (2), then starting
from the nodes visited in step (3) in the order visited.
In a BFS, the graph is searched broadly by exploring all the nodes adjacent to a node. The sequence of nodes visited in the BFS traversal of the graph I fig. 12 is PQRST.
Minimum
Spanning Trees
In many natural applications of graphs, not only edges between nodes, but also the weights of these edges, play an important role. The weight of an edge is just its value. For example an airline may be interested in not only creating an air route between two cities but also the shortest such air route. The distance between the two cities is the weight of the edge connecting these two cities. A graph with weighted edges is called a network. Refer fig 13

Fig 13 (a). Network
with weighted edges

Fig 13 (b) Minimal spanning
tree for the graph in fig 13(a)
Spanning tree: Defn-
Let G = (V,E) be an undirected connected graph. A subgraph
T = (V’,E’) is a spanning tree of G iff T is a tree. The above tree in fig.13(b) is a spanning tree for the graph in fig 13(a) .
One can assume the nodes to be cities and the edges as the distance between the two cities. Given such a weighted graph, we can select
Given a network , you must connect all the nodes in the network so that the total edge weight is minimized. This tree which is the part of a graph is called as the minimum spanning tree.
Greedy Method
The greedy method is the most straightforward design technique, applicable to a variety of problems. Almost all of these problems have n inputs and requires us to find a subset that satisfies some constraints ( max or min). Any subset that satisfies these constraints is called a feasible solution. We are now required to find a feasible solution that either maximizes or minimizes a given objective function. A feasible solution that does this is called an optimal solution.
Using the greedy method one can devise an algorithm that works in stages, considering one input at a time. At every stage, a decision is made whether or not a solution obtained is an optimal solution. This is done by considering some selection procedure. If the next input leads to a feasible solution then it is included in the set of feasible solution otherwise it is left out. The selection of the inputs also needs some optimization measures such that they result in an optimal solution.
Optimal
Storage on Tapes
Let there be n programs p1,p2,……..,pn having retrieval time t1,t2,……tn respectively. The programs are to be placed on a magnetic tape in such a way that the mean retrieval time is reduced. This problem can be easily solved using the greedy method. If we consider all ways to organize these n programs, we will get n! feasible solutions. Let us consider an example with three programs to be stored on a magnetic tape.
Ex : Let (p1,p2,p3) be three programs having lengths (l1,l2,l3) = (3,15,23) to be stored on a tape of length L. Find out the optimal ordering of the programs so as to minimize the mean retrieval time.
The mean retrieval time can be minimized if we find an optimal ordering of the programs such that the distance traveled is minimum.
Here we will get 6 feasible solutions( i.e 3 programs organized in 6 different ways on the tape). The se solutions are as follows
i) l1 + ( l1 + l2 ) + ( l1 + l2 + l3 ) = 3+3+15+3+15+23 = 62
ii) l1 + ( l1 + l3 ) + ( l1 + l3 + l2 ) = 3+3+23+3+23+15 = 70
iii) l2 + ( l2 + l1 ) + ( l2 + l1 + l3 ) = 15+15+3+15+3+23=74
iv) l2 + ( l2 + l3 ) + ( l2 + l3 + l1) = 15+15+23+15+23+3=94
v) l3 + ( l3 + l1 ) + ( l3 + l1 + l2) = 23+23+3+23+3+15=90
vi) l3 + ( l3 + l2 ) + ( l3 + l2 + l1) = 23+23+15+23+15+3=102
From the above five feasible solutions obtained the solution ( i ) i.e the ordering l1,l2,l3 is the optimal one.
The greedy method just requires to store the programs in the nondecreasing order of their lengths.
Job
Sequencing with deadlines.
In this problem we are given n jobs. With every job i, di >0 is the deadline and pi >0 is the profit. For any job i the profit pi is earned iff the job is completed by its deadline. In order to process a job one has to execute it on a machine for one unit of time. Only one machine is available for processing these jobs. A feasible solution for this problem is a combination of jobs fed to the machine such that the jobs are completed by its deadline and the profit is maximized. An optimal feasible solution is a feasible solution with maximum profit value.
Ex.
Let n=3, (p1,p2,p3) = ( 40,30,20) and (d1,d2,d3) = ( 1,2,1) be the profits and deadlines for jobs j1,j2,j3 respectively. Find the optimal feasible solution.
Soln :- The feasible solutions will be those combination of jobs such that they are completed in their respective deadlines.
The feasible solutions and profits earned are
Feasible solution Processing sequence Profits
i) ( 1 ) 1 40
ii) ( 2 ) 2 30
iii) ( 3 ) 3 20
iv) (1,2) 1,2 70
v) (2,3) 3,2 50
Now, applying the greedy method we choose solution ( iv ) as the optimal feasible solution with profit 70.
Optimal Merge
Patterns
When more than two sorted files are to be merged together the merge can be accomplished by repeatedly merging sorted files in pairs. Thus if files F1, F2, F3 are to be merged we could first merge F1 & F2 to get X1 and then merge X1 with F3 to get the resultant merged sorted file X2. Alternatively, we could first merge F2 & F3 to get X1 and then merge X1 & F1 to get X2. Different pairings require different amount of computing time. The concern will be now to determine an optimal way to pairwise merge n sorted files together.
A greedy attempt to obtain an optimal merge pattern is easy to formulate. Since merging an n record file and m record file requires possibly N+m record moves, the obvious way to select a criteria for choosing the files to be merge will be : at each stage merge the two smallest size files together. For ex if we have five files (f1,f2,…,f5) with sizes ( 7, 20, 15,60,25) the greedy method would generate the following merge pattern
f1+f3 = x1 (22)
f2+x1 = x2 (42)
f5+x2 = x3 (67)
f4+x3 = x4 (127)
The total number of record moves is 127. One can verify that this is the optimal merge pattern for the given problem instance. This merge pattern can be represented in form of a binary tree. The leaf nodes are drawn as squares and represent the given five files. These nodes will be called as external nodes . The remaining nodes are drawn circular and are called internal nodes. Each internal node has exactly two children. The number in each node is the length of that file represented by that node.

Fig 14. Merge Pattern
Single Source
Shortest Path
Graphs may be used to represent the network of roads in a state or a country with the vertices representing cities and edges representing sections of the roads. The edges may be assigned weights which might be the distance between the two cities connected by that edge or the time to drive from one city to the other. A person wishing to travel from city A to city B will naturally be interested in some questions
i) Is there a path from A to B ?
ii) If there is more than one path from A to B, which is the shortest path ?
The total length of a path connecting two cities might be the sum of all the weights of the edges which lie in the path connecting the cities.
The starting vertex of the path is called as the source and the last vertex is the destination. In diagraphs one way streets may occur limiting the paths from a node A to B.
In order to devise a greedy based algorithm to generate an optimal sequence of nodes, we will have to consider all the possible paths between the two nodes one by one. The greedy way to generate shortest paths would be to generate paths in nondecreasing order of their path lengths. First a shortest path to the nearest vertex is generated, then a shortest path to the second nearest vertex is generated and so on. For the graph in fig. 13 (a) the shortest path between A & C would be ABC with a total weight of 1000 units.
Knapsack Problem
The greedy method can also be used to solve some more complicated problems like the Knapsack problem. Given n objects and a knapsack with capacity M, objects i1,i2,….,in having weights w1,w2,…..,wn. If a fraction xi, 0 <= xi <= 1, of the object I is placed in the knapsack then a profit of pixi is earned. The objective is to fill the knapsack such that it maximizes the profit earned. Since the knapsack capacity is M, we require the total weight of all chosen objects to be at the most M.
Mathematically the problem can be stated as
Maximize ∑ pi xi
1 ≤ i ≤ n
Subject to ∑ wi xi ≤ M
1 ≤ i ≤ n
and 0 ≤ xi ≤ 1, 1 ≤ i ≤ n
The profits and weights are positive integers.
Ex. Consider the following instance of the knapsack problem :
N=3, M=30, (p1,p2,p3) = (20,40,30), and (w1,w2,w3)= ( 10,15,45)
Soln :-
The feasible solutions are
( x1,x2,x3) ∑wi xi ∑pi xi
(0,1,3/9) 30 50
(1,1,0) 25 60
(1,1,1/9) 30 63.33
Of these three feasible solutions, solution ( iii) yields the maximum profit. In case the sum of all the weights is less or equal to M, then xi = 1, is an optimal solution. So let us assume that sum of the weights exceeds M. Now all the xi’s cannot be 1. It is important to note that an optimal solution will fill the knapsack exactly. This can be done by increasing some fractional amount of some object i until the total weight is equal to M.
A 0/1 Knapsack problem is a problem where either 0 or whole of the object can be placed in the knapsack. No fraction of the object is allowed.
CODE OPTIMIZATION
The function of a compiler is to translate programs written in some source language into an equivalent assembly language or machine language program. The translation will depend on the particular assembly language and hence the machine.
We shall assume a simple machine model. This machine has only one register called the accumulator. All the arithmetic has to be performed in this register. For simplicity we shall restrict the machine’s operation to +, - , * and / operations only.
Let the required assembly language instructions are:
LOAD X : load the accumulator with contents of memory location X.
STORE X : stores contents of accumulator into memory location X.
OP X : OP may be ADD, SUB, MUL, DIV.
The instruction OP X computes the operator OP using the contents of the accumulator as the left operand and that of the memory location X and stores the result in the accumulator.
For ex. Consider an arithmetic expression (a+b) * (c-d)
Two possible assembly language versions of the code are
LOAD a LOAD c
ADD b SUB d
STORE T1 STORE T1
LOAD c LOAD a
SUB d ADD b
STORE T2 MUL T1
LOAD T1
MUL T2
A translation of an expression E into the machine or assembly language of a given machine is optimal iff it has a minimum number of instructions.
Out of the possible assembly codes generated, our concern now is to find out the optimal one. The form in which we have seen expressions is the Infix form i.e. the operators are in between the operands. In generating the optimal code, it is convenient to represent arithmetic expressions in form of binary trees. Each non leaf node in the binary tree will represent an operator and will be called an internal node. The left subtree of an internal node P will represent the binary tree form of the left operand of the operator represented at P while the right subtree represents the right operand. A leaf node represents either a variable or a constant. This binary tree representing the arithmetic expression is called as expression tree. Similar type of trees are used by the compiler for parsing.
An expression tree for (a+b) * (c-d) is as given below in fig. 15

Fig. 15 Expression tree
Backtracking
Many problems which deal with searching for a set of solutions or which ask for an optimal solution satisfying some constraints can be solved using backtracking formulation. In order to apply backtrack method, the desired solution must be expressible as an n-tuple (x1,x2,…..,xn) where xi are chosen from some finite set Si. We often use backtracking while proving theorems. In the flow, suppose we are unable to prove the theorem we reconsider the initial condition or change the lemma(some proven facts) applied. A classic combinatorial problem is to place eight queens on an
8 X 8 chessboard so that no two queens attack each other. Let us number the rows and columns of the chessboard 1 through 8. The queens may also be numbered 1 through 8. All the solutions to the 8-queens can be represented as 8-tuples (x1,x2,….,x8) where xi is the column number in which the ith queen is placed. One of the solutions to the problem is (4,6,8,2,7,1,3,5) and is shown in the fig.16 below.
1 2 3 4 5 6 7 8
|
|
|
|
Q |
|
|
|
|
|
|
|
|
|
|
Q |
|
|
|
|
|
|
|
|
|
|
Q |
|
|
Q |
|
|
|
|
|
|
|
|
|
|
|
|
|
Q |
|
|
Q |
|
|
|
|
|
|
|
|
|
|
Q |
|
|
|
|
|
|
|
|
|
|
Q |
|
|
|
1 2 3 4 5 6 7 8
Column
Row
Fig 16. Solution to 8-Queens problem
Backtracking algorithm determine problem solutions by systematically searching the solution space for the given problem instance. The search is facilitated by using a tree organization for the solution space. For a given solution space many tree organizations may be possible. The tree organization of the 4-queens solution space is given below. Nodes are numbered as in depth first search.
Algorithm to find all possible solutions of n-queens problem.
Procedure NQUEEN(n)
Integer k, n, X(1:n)
X(1) := 0;
K := 1 //k is the current row; X(k) the current column//
WHILE k > 0 DO
X(k) := X(k) + 1 //move to the next column//
WHILE X(k) ≤ n AND NOT PLACE(k) DO // PLACE returns
true if a queen can be placed otherwise false.
X(k) := X(k) + 1
REPEAT
IF X(k) ≤ n
THEN IF k = n
THEN PRINT(X)
ELSE k := k + 1
X(k) := 0
ENDIF
ELSE
K := k – 1 //Backtrack//
ENDIF
REPEAT
END NQUEEN
If we consider how effective this procedure is, for an 8X8 chessboard there are 864 possible ways to place 8 pieces or approximately 4.4 billion 8-tuples to examine. However by allowing placement of queens on distinct rows and columns we require the examination of at most 8! or only 40,320 8-tuples.
Graph Coloring
Let G be a graph and m be a given positive integer. We want to discover if the nodes of G can be colored in such a way that no two adjacent nodes have the same color yet only m colors are used. This is termed as the m-colorability decision problem. The m-colorability optimization problem asks for the smallest integer m for which the graph G can be colored. This integer is referred to as chromatic number.
A graph is said to be a planner iff it can be drawn in a plane in such a way that no two edges cross each other. A special case of the m-colorability decision problem is the 4-color problem for planner graphs. This problem asks the following question : given any map, can the regions be colored in such a way that no two adjacent regions have the same color yet only four colors are used. It has been proved that 4 colors are sufficient to solve any such problem. A map and its planner representation is as shown below.
Branch & Bound
The term branch and bound refers to all state space search methods in which all children of the node E generated before any other live node can become the E-node. The branch and bound strategy suggests that the problem is bounded (terminated) as soon as no feasible node (state) is reached with an input. The branch is terminated at such instance. The 4-queen state space tree generated by branch and bound is as shown below.