FYBSc(IT)

Semester – II

 

Design & Analysis of Algorithms.

 

Introduction

An algorithm can be defined a s a set of finite rules that gives a sequence of operations for solving a specific type of problem. An algorithm has five inportant features:

1)     Finiteness :  An algorithm must always terminate after a finite number of steps.

2)     Definiteness :  Each step of an algorithm must be precisely defined; the actions to be carried out must be rigorously and unambiguously specified for each case.

3)     Input : An algorithm has zero or more inputs. Quantities that are given to it initially before the algorithm begins, or dynamically as the algorithm runs. These inputs are taken from a specified set of objects.

4)     Output : An algorithm has one or more outputs. Quantities that have a specified relation to the inputs.

5)     Effectiveness : An algorithm is also generally expected to be effective, in the sense that its operations alone should be sufficient to work out the solution on a paper or in any programming language.

A useful algorithm should require not only a finite number of steps, but a reasonable number. One criteria of goodness is the length of time taken to perform the algorithm; this can be expressed in terms of the number of times each step is executed. Other criteria are the adaptability of the algorithm to different kind of computers, its simplicity and elegance.

The study of algorithms include many important & active areas.

i)                   How to devise an algorithm.

ii)                 How to express an algorithm,

iii)               How to validate an algorithm,

iv)               How to analyze algorithm &

v)                 How to test a program.

An algorithm is devised in such a way that it should be general solution to the problem under consideration. The algorithms can be expressed in any structured programming languages such as Sparks, Pascal etc.. Once an algorithm is designed it is necessary to show that it computes the correct answer for all possible legal inputs. This ensures that the algorithm will work correctly independent of the issues concerning the programming language it will be ultimately written in. Once the validation has been done the algorithm has to be verified. Once it is executed, it makes the use of the CPU to perform various operations and it also uses memory to store the program and its data. Analysis of algorithm refers to the process of determining how much computing time and storage an algorithm will require. When the algorithm is analyzed and validated it needs to be tested by expressing it any programming language. The testing of any algorithm is a two step process.

a)     Debugging

b)    Profiling.

          Debugging refers to executing programs on sample data sets and tracing faulty results. Profiling is the process of executing correct program on data sets and measuring the time and space required to compute the results.

 

Running time of a Program.

When solving a problem we frequently face a choice among algorithms. On what basis should we choose? There aer two often contradictory goals.

a)     Choose an algorithm that is easy to understand, code and debug.

b)    Choose an algorithm that makes efficient use of the computers resources, especially that one which runs as fast as possible. As now-a-days memories have become quite cheaper, we can overlook the space factor.

When we are writing a program to be used once or a few times, goal (1) is most important. Here the cost of programmer’s time will likely exceed by far the cost of running the program. When presented with a problem whose solution is to be used many times, the cost of running the program may far exceed the cost of writing it. Then it is sound to implement a complicated algorithm, provided that the resulting program will run significantly faster than the simpler program.

 So, in building a complex system it is often desirable to implement a simple prototype on which measurements and simulation can be performed, before coming to the final design. The programmers must not only be  aware of ways of making programs run fast, but must know when to apply these techniques and when not to bother.

 

Measuring the running time of a Program.

The running time of a program depends on the factors such as

a)     the input to the program,

b)    the quality of the code generated by the comiler used to create the object program,

c)     the nature and speed of the instructions on the machine used to execute the program and,

d)    the time complexity of the algorithm underlying the program.

Often the running time depends not on the exact input but the size of the input. A good example is the process of sorting algorithms. The more the numbers are the more time it takes to produce the output. Then we can talk of T(n), the running time of the program  with inputs of size n. Some program may have a running time T(n)=cn2 where c is a constant. Thus we can think of T(n) as being the number of instructions executed on a ideal computer. In the worst case T(n) will be maximum. So we consider Tavg (n), the average over all inputs of size n. But in practice the average running time is harder to determine because average input has no obvious mathematical meaning. So the worst case running time is used as the principle measure of the time complexity.

 

 

 

Elementary Data structures

Data comes in all shapes and sizes, but often it can be organized in the same way. For example consider a list of things to do, a list of ingredients in a recipe, or a attendance list for a class. Although each contains a different type of data, they all contain data organized data in a similar way : a list. A list is one example of a data structure. There are many other common ways to organize data as stacks, queues, sets, trees, heaps etc.

To solve any problem we need to store data. Once a data structure has been chosen for a particular application , it is given life by the logical instructions that manipulate the related data items stored in it. Thus, a study of data structures is also necessarily a study of algorithms that control them.

 

 

 

Queues.

The distinguishing characteristics of a Queue is that it stores and retrieves data in a First In First Out(FIFO) manner. This means that the first element placed in the queue is the first to be removed. Insertions are limited at one end while deletions may occour only on the other end. A simple way to think of a queue is as a line at the bus-stop. As the line grows, newcommers join in at the end. When a bus arrives, the person at the head of the line boards the bus. In computing, to place an element at the end/tail of a queue, we enqueue it; to remove an element from the head we dequeqe it.

Queues can be implemented using arrays and using linked lists.The array representation of queues is discussed in detail.

 

Array Representation of Queues

 Let us consider computer jobs being scheduled in a batch processing environment. We shall further suppose that all job names are integer values and that the jobs are scheduled strictly in the order in which they arrive. Then an array and two pointers can be used to implement the scheduling queue. Let the two pointers be Rear(top) and Front(bottom). The rear and front are initially set to 0 and 1 respectively.

Let us consider an empty queue as shown in fig 1(a). Recalling that the insertions may be made only at the rear (top) of the queue, suppose that job 4456 now arrives to be processed. The queue changes to look like fig 1(b)

 

 

 

 Arraysize                                           Arraysize     

 

 

 

 

 

                ________                                             ________

           3   ________                                        3   ________

           2   ________                                        2   ________     Rear = 1

           1                                    Front = 1         1     4456

                                              Rear = 0                                    Front = 1

                   Fig 1(a)                                               Fig 1(b)                    

 

 

1024

 
          Arraysize              100                                  Rear

                                           ________

 

 

 

 

                                           ________

                                       3  ___3740__

                                       2  ___5688__             Front = 2                 

                                       1       4456

 

Fig 1(c)

which shows that the addition of any ITEM to the queue requires two steps:

i)                   Increment Rear &

ii)                 Queue(Rear) := Item

 

If the system is ready to process the job 4456, the front entry must be removed from the queue to an appropriate location identified by ITEM and the Front be decreased by one i.e

i)                   Item := Queue(Front)

ii)                 Front := Front + 1

The algorithms that adds up an element and delets an element from the queue are as listed below.

 

 

Algorithm  ADD( QUEUE, ARRAYSIZE, REAR, ITEM, FULL)

( * General procedure to add ITEM to REAR of QUEUE * )

 

VAR QUEUE(ARRAYSIZE), REAR, ITEM  : INTEGER

VAR FULL  : BOOLEAN

 

IF REAR = ARRAYSIZE THEN

          FULL := TRUE

ELSE

          FULL := FALSE

          REAR := REAR + 1

          QUEUE[REAR] := ITEM

ENDIF

RETURN

END ADD

 

 

 

 

Algorithm  DELETE( QUEUE, ARRAYSIZE, FRONT, ITEM, EMPTY)

( * General procedure to Remove Front entry from QUEUE * )

 

VAR QUEUE(ARRAYSIZE), FRONT, ITEM  : INTEGER

VAR EMPTY  : BOOLEAN

 

IF REAR < FRONT THEN

          EMPTY := TRUE

ELSE

          EMPTY := FALSE

          ITEM := QUEUE[FRONT]    

          FRONT := FRONT - 1

ENDIF

RETURN

END ADD

 

 

 

Linked List implementation of Queues

Defn : Linked lists is a structure that not only contains a data field but also one pointer to other nodes in the list.

Linked lists are useful in maintaining collections of data, similar to the ways that arrays are often used. However they are considerably more efficient in performing insertions and deletions. Linked lists also make use of dynamically allocated storage, which is storage allocated at runtime. Since in many applications the size of the data is not known at compile time.

 

 

 

DEF

 

ABC

 
     Head

 

 

 

 \ Null                                      Fig 2

 

The linked implementation allows the queue to be completely dynamic with size restrictions imposed only by the pool of the available nodes. The queue is implemented as a linked list with an additional REAR pointer to the last node in the list so that the list need not to be traversed to find this node. A special FRONT pointer need not to be maintained, because LINK(HEAD) leads directly to the first item in the queue.

Summarising the conditional checks that signal a single, an empty, one-entry, or full queue.

 

          Empty queue ŕ REAR = HEAD

          One-entry-queue ŕ REAR = LINK(HEAD)

          Full queue ŕ Depends upon memory space available

 

Priority Queues

Priority queues are used to set some priorities in processing elements of a queue. For example, at a university computer centre, students in introsuctory computer science courses may receive the highest priority for their jobs to encourage a quick turnaround. Students in upper division courses may have the next priority, whereas jobs related to faculty research,

which requires a great deal of computation, get the lowest priority. These type of jobs could be called A, B, and C respectively. Any A job is serviced before B or C, any B job is serviced before any C job. A data structure capable of representing such a queue would require just one FRONT pointer but three rear pointers – one for each A, B, and C.

 
                            

 

HEAD

 

STATS

PRINT

REARA

 
BANK

COPY

HEADB

 
CHECK

UPDATE

HEADC

 
AVER

TEST

 

Fig 3

 

Stacks

The Distinguishing feature of a stack is that it can store and retrieve data in the last in first out, or LIFO manner. This means that the last element placed on a stack is the first to be removed. A convenient way to think of a stack is as a can of tennis balls. When we place balls in the can, the can is filled up from the bottom to the top. When we remove the balls, the can is emptied from the top to the bottom. In case we need a particular ball in the stach then we will have to remove all the balls above it. In computing, to place an element on the top of a stack, we push it; to remove an element from the top, we pop it.

 

Array implementation of a stack.

A strategy similar to that used for an array implementation of a queue can be followed. However, because insertions and deletions occour at the same end of a stack,only one pointer will be needed instead of two required for a queue. We will call the pointeras TOP

Thus to push an item into the stack, the operations will be

i)                   TOP := TOP + 1

ii)                 STACK(TOP) := ITEM

Popping an entry from the stack into ITEM requires

i)                   ITEM := STACK(TOP)

ii)                 TOP := TOP – 1

The complete algorithms to PUSH and POP an element in the array are as discussed below.

 

Algorithm PUSH(ITEM, STACK, ARRAYSIZE, TOP, FULL)

( * Push ITEM into Stack, FULL is a flag indicating whether *)

( * or not there is room on STACK to perform this operationn *)

 

VAR ITEM, TOP : Integer

VAR   FULL : BOOLEAN

IF TOP = ARRAYSIZE THEN

          FULL := TRUE

ELSE

          FULL := FALSE

          TOP := TOP + 1

          STACK(TOP) := ITEM

ENDIF

RETURN

END PUSH

 

 

 

 

Algorithm POP(ITEM, STACK, ARRAYSIZE, TOP, EMPTY)

( * Push ITEM into Stack, FULL is a flag indicating whether *)

( * or not there is room on STACK to perform this operationn *)

 

VAR ITEM, TOP : Integer

VAR   FULL : BOOLEAN

IF TOP = 0 THEN

          EMPTY := TRUE

ELSE

          EMPTY := FALSE

          ITEM := STACK(TOP)

          TOP := TOP - 1

ENDIF

RETURN

END PUSH

 

 

Linked Implementation of a Stack

When we use the link list implementation, we are using a relatively small amount of memory space needed to maintain pointers for the dynamic allocation of stack space. Following the usual convention of maintaining a dummy header node at the beginning of a linked list, a stack with two entries 65 & 43 would appear as shown in the fig 4 below.

 

 

43

 

65

 
     Head

 

 


                                                   Top

 

 \ Null                                     

                   Fig 4 : Linked implementation of a stack with 2 data

                                     nodes and a dummy header

 

The condition to test for an empty stack would be TOP = HEAD. A full stack occours only when the there is no memory left to create a node. The procedures to push and pop now become nothing more than inserts and deletes from the beginning of a linked list.

 

 

 

 

 

 

Trees

Most of us are famalier of ‘Family Trees’ or genealogical trees which is  a simple way of expressing a hierarchical relationship between parents, children, brothers, sisters, cousins, aunts, uncles, and so on. The crutial relationship in such a tree is between parent and a child. Such a parent child relationship is an excellent example of a hierarchical relationship in which there exists a well-defined order of precedence between two items being related. A Tree is a data structure that represents hierarchical relationships between indivudial data items. The figure below is an a example of a tree

 

 

 

 

 

 

 

 

 

 

 

 


Fig 5. A tree

 

Every element of a tree is called as a node. In the above tree the element A is at the highest level ( 0 ) of  the tree and is called its root (also called as the root node). The nodes A,B & C are the child nodes of the node P as they are directly connected to the root node. The node R` is the parent node of A, B & C. The child nodes of any given parents constitutes a set of siblings. Thus A, B, C  & D,E are siblings. Thus A,B & C are at level 1. A link between a parent and a child is called a branch in a tree structure.

A subtree is a subset of a tree that itself a tree. In the above fig 5. The nodes C, D & E forms a subtree.

Now we can define a tree in a recursive fashion as a set of nodes that :

i)                   Is either empty or

ii)                 Has a designated node called the root from which descend zero or more subtrees. Each subtree itself satisfies the definition of a tree.

 

One of the most important of all tree structures is the binary tree. In a binary tree, each node has no more than two child nodes( or at the most two child nodes). In other words in a binary tree each node has two subtrees

( null or non-null ) known as left subtree and the right subtree.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig 5. A Binary tree

 

Tree structures arise quite naturally in both computer science and data processing applications. In data processing, file index schemes and hierarchical database management systems, such as Information Management System (IMS) developed by IBM, typically make use of trees. In computer science tree structures are in compilers for parsing algebric expressions. The hierarchy of algebric operations involved in mathematical expressions can be represented by the hierarchical nature of a tree.

Let us consider a algebric expression

          (A – B) + C * (E / F)

The tree for the above expression is as shown in fig.6.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig .6

 

 

 

 

Traversing Trees

Traversing a tree means visiting each and every node of a tree starting from the root node. There are three types of tree traversing techniques for binary trees

·        Preorder Traversal

·        Inorder Traversal

·        Postorder Traversal

 

Preorder Traversal

The steps involved in preorder traversal are as follows

i)                   Process the root node

ii)                 Process the left subtree recursively

iii)               Process the right subtree recursively

The sequence of nodes visited in preorder traversal of the binary tree in fig 6 will  be  + - A B * C / E F

 

Inorder Traversal

The steps involved in preorder traversal are as follows

i)                   Process the left subtree recursively

ii)                 Process the root node

iii)               Process the right subtree recursively

The sequence of nodes visited in inorder traversal of the binary tree in fig 6 will be  A – B + C * E / F

 

Postorder Traversal

The steps involved in preorder traversal are as follows

i)                   Process the left subtree recursively

ii)                 Process the right subtree recursively

iii)               Process the root node

The sequence of nodes visited in postorder traversal of the binary tree in fig 6 will be A B – C E F / * +

 

 

Linked representation of Binary tree

A binary tree is maintained in the memory by means of a linked representation which uses two pointer fields (one for each child) and one data field i.e LLINK, RLINK & INFO respectively. In the linked representation, insertions and deletions involve no data movement except the rearrangement of pointers.

 

 

 

 

Let T be a binary tree. First of all each node N of T will correspond to a location K such that

i)                   INFO[K]  contains the data at node N

ii)                 LLINK[K] contains the location of the left child of node N

iii)               RLINK[K] contains the location of the right child of node N

 

 

                             LLINK     INFO    RLINK

+

 
   

 


                                        

 

 


                                                                                       

                                        

 

 

 


Fig. 7 A Linked representation of the binary tree

 

The ROOT will contain the location of the root of T. If any subtree is empty, then the corresponding pointer will contain null value; if the tree is empty, then the ROOT will contain the null value

Although for most purposes the linked representation of the binary tree is most efficient, it does have certain disadvantages

a)     Wasted memory space in null pointers.

b)    Given a node, it is difficult to determine its parent.

c)     Its implementation algorithm is more difficult in languages that do not offer dynamic storage techniques; for example FORTRAN, COBOL, BASIC etc.

The drawback b) can be overcome by adding a parent pointer field to a node, provided you can afford using more memory.

In older languages such as BASIC and FORTRAN, it can be implemented using arrays for both pointers and data field.

 

 

Heaps

A heap is a tree, usually a binary tree, in which each child node has a smaller value than its parent. Thus the root node is the largest node in the tree. We may also choose to arrange a heap such that each child node has a largeer value than the parent. In this case, the root node is the smallest node. These like these are partially ordered because, although the nodes along every branch have a specific order to them, the nodes at one level aer not necessarily ordered with respect to the nodes at another level. A heap in

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


( a ) Heap                                                                           

 

 

11

1

5

7

6

12

17

8

4

10

2

 

( b ) Heap represented in an array.

 

Fig .8  Heaps

 

which each child is smaller than its parent is top-heavy. This is because the largest node is on the top. While the heap in which each child is larger than its parent is bottom—heavy.

 

Heapsort

The heap sort is a sorting algorithm whose efficiency is roughly equivalient to that of the quick sort. Its average efficiency is O(n(log2n)).

Let us consider an array as shown fig.8(b). The first phase of the tree appears as shown in fig.8(b). The goal of the first phase is now to sort the data elements along each path from leaf node level to the root node. If we wish to sort in ascending orde, then the numbers along any path from leaf node to root should be in increasing order.

 

Phase I

1)     Process the node which is the parent of the rightmost node on the lowest level. If its value is less than the value of its largest child, swap these values, otherwise do nothing.

2)     Move left on the same level. Compare the value of the parent node with the value of the child nodes. If the parent is smaller than the largest child, swap them.

3)     When the left end of this level is reached, move up a level, and, beginning with the rightmost parent node, repeat step ( 2 ). Continue swapping the original parent with the larger of its children until it is larger than its children.

4)     Repeat step ( 3 ) until all level 1 nodes have been processed (Root is at level 0).

Phase II of the heap sort finds the node with the largest value in the tree and cuts it off from the tree. This is then repeated to find te second largest value, which is also removed from the tree. The process continues until only two nodes are left in the tree, which are exchanged if necessary. The precise steps are as follows

 

Phase II

1)     Compare the root node with its children, swapping it with the largest child if the largest child is larger than the root.

2)     If a swap occurred in step ( 1 ), then continue swapping the value which was originally in the root position until it is larger than its children. In effect, this original root node value is now being walked down a path in the tree to ensure that all paths retain values arranged in ascending order from leaf node to root node.

3)     Swap the root node with the bottom rightmost child.

4)     Repeat steps ( 1 ) through ( 3 ) until only two elements are left.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Graphs

In general a Graph is a structure that represents a less restrictive relationship between data items. A common example of a graph is a collection of cities and the roads between them; two cities are related if there is a route between them. Their relationship is not hierarchical because traffic on the routes between the cities can flow in either direction. Graphs are frequently used in areas such as transportation networks, electrical circuitry, artificial intelligence, chemical structure of crystals, analysis of programming languages etc.

A graph may be formally defined as a set of objects called nodes and edges. In this context a node is a data element of the graph; an edge is a path between two nodes.

Defn : A Graph consists of two sets V and E. The set V is a finite, nonempty set of vertices. The set E is  a set of pairs of vertices; these pairs are called edges. The notations V(G) and E(G) represent the sets of vertices and edges respectively of graph G. A graph can also be represented as

 G = (V,E) .

In an undirected graph, an edge between two nodes is not directionally oriented. Thus, in a undirected graph if AG (A to G) is an edge, so is GA.

In a directed graph or digraph, the edges between nodes are directionally oriented.

 

 

 

 

 

 

 

 

 

 


     (a) G1                                                              (b) G2

 

 

Fig .8 Graphs

In the abve fig (a) is an undirected graph and (b) is a directed graph.

The degree of a vertex is the number of its adjacent vertices. For a directed graph the indegree of a node is the number of edges coming into it, while the outdegree is the number of edges going out of that node.

The set of representations of the graph G1 is as follows

          V(G1) = { M, P, N }

          E(G1) = { (M,N), (M,P), (P,N) }

 

 

Representation of graphs

There are two standard ways of maintaining a graph G in the memory of a computer. One way called the sequential representation of G, is by means of its adjacency matrix. The other way, called the linked representation of G, is by means of linked lists.

 

Adjacency matrix.

 Suppose G is a simple directed graph with m nodes, and suppose the nodes of G have been ordered are called v1, v2, ……, vm. Then the adjacency matrix A = (aij ) of the graph G is the m X n matrix defined as follows

 

          Aij =           1 if vi is adjacent to vj, if there is an edge (vi, vj)

                             0 otherwise

 

Such a matrix A, which contains entries of only 0 and 1, is called a bit matrix or boolean matrix. A graph containing n nodes can be represented by a matrix containing n rows and n columns. The martix is formed by placing a 1 in the i th row and j th column ( the (i,j)th entry) of the matrix if there is an edge between node i and node j of the graph. The adjacency matrix A of the graph G does depend on the ordering of the nodes of G ; that is, a different ordering of the nodes may result in a different adjacency matrix.

Suppose G is an undirected graph, then the adjacency matrix A of G will be symmetric matrix i.e one in which aij = aji for every i and j.

The adjacency materices for the graphs G1 & G2 in fig. 8 are as follows

 

 A          B       C       D       E     

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

0

 

 
 

 


M

 

P

 

N

 

A        

 

 
           M           P         N

0

1

B                        

 

 
1

1

0

C        

 

 
1

1

1

D        

 

 
0

 

E        

 

 
 


                   (a)

 

 

                                                                             (b)

 

Fig.9 Adjacency matrix

 

 

 

 

 

Divide and Conquer

The divide and conquer strategy suggests splitting the inputs into k distinct subsets resulting k subproblems. These subproblems must be solved indivudially and a procedure should be used to combine the solutions into the solution of the initial problem. If the subproblems are relatively large, then reapply the divide and conquer strategy to further break them into subproblems. Often the subproblems resulting from the reapplication of the divide and conquer strtegy are of the same type as the original problem. This strategy is reapplied till the subproblems are small enough to be solved. The way the divide and conquer splits the subproblems can be solved by writing a control abstraction. ‘Control abstraction is a procedure whose flow of control is clear but whose primary operations are specified by some other procedures whose precise meanings are undefined.’

For ex. If P is a problem to be solved, we can determine whether the input size is small and that the answer can be computed without splitting, by invoking a procedure say SMALL(P). Otherwise P is splitted into subproblems P1, P2,……, Pk and solved by recursively applying divide and conquer to each of these subproblems.

 

Binary Search

Binary search is a sequential search technique and should be used when records are sorted without any consideration given to order. By paying what may initially seem like a small price, we can dramatically increase the efficiency of our search effort using a simple technique called the binary search. The price we must pay is

a)     The list of keys must be maintained in physical order (ex. In an array in

      which is a squencial block of memory)

b)    The number of the keys must be maintained (i.e the length of the

      array)

c)     There must be a direct access to the relative keys in the last (i.e any

     location in the array should be accessible directly)

 

Let ai, 1 < I < n be a list of elements which are stored in nondecreasing order. Consider the problem of determining whether a given element is in the list. Divide and conquer suggests breaking up any instance of this search into subinstances and search for the desired element in those subinstances.  For.ex consider a list in the fig.10 below

 

 

 

 

In the fig.10 we wish to locate the data  associated with key 7630. The strategy is to divide the list into two halves by taking the 5th key as the mid element. Here we consider that the 5th element is the mid. We now begin the search from position 5 towards the right as the value being sought is greater than the mid. We again reapply the divide and conquer strategy to this subinstance from 5 to 8. Now the mid is computed as 6 as shown below.

 
 

 


1

2574

2

3685

3

4569

4

5124

5

6165

6

7630

7

8467

8

9020

 

            Fig 10. List for binary search.

 

(6 + 8) / 2 = 6

A General procedure for the above problem is shown below

 

BIN_SEARCH (A, N, X, J)

Arguments : A is the array of elements

                    N number of elements in the array

                   X the element to be searched.

                   J the position of the element to be determined.

LOCAL      : LOW, HIGH, MID

BEGIN

          LOW := 1

          HIGH := N

          WHILE LOW < HIGH DO

             MID := | (LOW + HIGH) / 2 |

             IF (X < A[MID]) THEN

                   HIGH := MID – 1

             ELSE IF (X > A[MID]) THEN

                   LOW := MID + 1

             ELSE

                   RETURN MID

             ENDIF

          END WHILE

          RETURN

END BIN_SEARCH

 

Now talking about the efficiency of this procedure let us consider an example

 

 

 

          A         :       (1)     (2)     (3)     (4)     (5)     (6)     (7)     (8)     (9)

Elements       :      -15      -6       0        7        9        23      54      82     101

Comparisons :       3        2        3        3        1        3        2        3        4

 

No element requires more than 4 comparisons to be found. The average is obtained by summing the comparisons needed to find all nine items and dividing it by 9 resulting 25/9, or approx 2.77 comparisions per successful search on an average.

 

 

MergeSort

Another example of divide and conquer is the merge sort algorithm. The most advantageous thing about this algorithm is that it has the worst case complexity in the order of O(n log2 n). We shall assume that the elements are to be sorted in a non decreasing order. The general idea is as follows.

Given a sequence of n elements A(1), A(2),…….,A(n) the general idea is to imagine them split into two sets A(1), …., A(n/2) and A((n/2) +1),….,A(n).

Each sequence is indivudially sorted and the resulting sequences are merged to produce a single sorted sequence of n elements.

 Let us consider a file containing the data

          (2, 6, 3, 1, 4, 31, 23, 8, 11, 19, 21, 37, 14, 57, 28)

We divide it into two original files as

          O1 = (2, 6, 3, 1, 4, 31, 23)

          O2 = (8, 11, 19, 21, 37, 14, 57, 28)

After the first path we have segments of length 1, so we have

          Y1 = {(2,8), (3,19), (4,37), (23,57)}

          Y2 = {(6,11), (1,21), (14,31), 28}

After the 2nd pass we have segments of length 2

          O1 = {(2,6,8,11), (4,14,31,37)}

          O2 = {(1,3,19,21),(23,28,57)}

After 3rd pass we have segments of length 4

          Y1 = {(1,2,3,6,8,11,19,21)}

          Y2 = {(4,14,23,28,31,37,57)}

After the 4th pass we have segemnts we get a block of length 8

          O1 = {(1,2,3,4,6,8,11,1419,21,23,28,31,37,57)}

          O2 = EMPTY

The procedure was described beginning with the segments of length 1. Subsequently larger segments can be stored in main memory, so the efficiency of the algorithm can be enhanced by taking conveniently larger segemnts. The procedure of merge sirt usually deals with external file media and therefore is system dependent.

 

 

 

Selection Sort

The main idea behind selection sort is to find the smallest entry among in an array A( j ), A( j+1 ),…….,A( n ), and then interchange it with A( j ). This process is then repeated for each value of j.

The algorithm is as below

 

SELECTION_SORT(A,N)

Arguments : A the array of elements.

                   N number of elements in the array A..

BEGIN

 LOCAL : I, J, K, N, ITEM

  FOR I := 1 TO N-1 DO

          K := I

          ITEM : = A[ K ]

          FOR J := I + 1 TO N DO

             IF A[ J ] < ITEM THEN

                   ITEM := A[ J ]

                   K := J

             ENDIF

          END FOR

          ITEM : = A[ I ]

          A [ I ] := A [ K ]

          A[ K ] := ITEM

   END FOR

RETURN

 END SELECTION_SORT

 

The above procedure can be visualised as shown below

 

                                     

          77      33      44      11      88      22      66      55

 


          Pivot

 


          11      33      44      77      88      22      66      55

 


                   Pivot

 


          11      22      44      77      88      33      66      55

 


                             Pivot

 

 

 

          11      22      33      77      88      44      66      55

 


                                      Pivot

 


          11      22      33      44      88      77      66      55

 

                                                Pivot

 


          11      22      33      44      55      77      66      88

 

                                                          Pivot

 


          11      22      33      44      55      66      77      88

 


                                                                   Pivot

 

          11      22      33      44      55      66      77      88      Sorted

 

Inspite of the superiority of the selection sort over bubble sort and insertion sort there is no significant gain in run time; its efficiency is O(n2) for n data items.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Hashing

In order to understand the concept of hashing, let us consider a data structure which uses hashing. A symbol table is a data structure which allows to easily determine the presence or absence of an element in the program body. It also permits easy insertion and deletion of symbols by a compiler. The simplest structure of a symbol table can be visualised as a two dimentional array with one column for the symbol and other for the address of that symbol. Access to the symbol table can be made sequentially using a binary search or some other search technique. But if the symbol under consideration is at the first location in a table of size 9 then it needs atleast 3 comparisions. The most practical technique for maintaining a symbol table is hashing. Here the address of a symbol is obtained by computing some arithmatic function, f, of X. f(X) gives the address where X should be placed in the table. This address will be referred to as the hash address and the function f(X) is called as the hasing function. If the hash generated by the function alreasy contains some symbol then a collision occours. The desired properties of this function are that it is easily computable and that it minimises the number of collisions. Thus it can be thought of as a functio which generates a random number in the range of 0 to n where n is the maximum index of the symbol table. One of the simplest example of a hashing function is

                             f(X) = X mod M

This will give an index in the range of 0 to M-1. The choice of M has to be made properly generally some power of 2.

We can think of many methods of resolving collisions. Some of the popular methods are sequential and rehashing.

Sequential method.

Suppose the collision occours at a hash address 4, then place the symbol in any free location in sequence to the hash address generated. For ex. In the fig below the hash address generateed is 4. The next subsequent free locations are searched and the symbol is placed in the location 7 which is free.

1

2

3

4

5

6

7

8

9

10

 

Hash address = 4

Value  = K

 
 


Fig. 11 Collision

 
C

1524

T

5567

V

8857

B

1020

E

5712

A

1001

 

R

6698

B

1212

 

 

Rehashing.

In rehashing once a collision occours the hashing function is called again to generate a new address where the symbol could be stored. If that location is again non empth then again it is rehashed. But taking into consideration the possiblity that the hashing function may generate the same hash address again and again, some parameteres passed to the hasing function may be changed. For ex

          On 1st collision the hash function called will be

                             f(X+1)

          on 2nd collision

                             f(X+2)

and so on till a free location is foucnd to place the symbol X.


Graph Traversal

In almost all practical applications of graphs, we need to visit systematically all the nodes. On e such application occurs when the organizers of a political campaign want their candidate to visit all important political centers. The presence or absence of direct transportation routes (edges) between centers will determine the possible ways in which all the centers could be visited. At this moment, the only concern is to develop an algorithm that ensures that all the nodes are visited and moreover to make it economical.

 

Depth-First Search (DFS)

The main logic of the Depth-First Search algorithm is similar to the preorder traversal of a tree. It can be accomplished as follows

1)     Choose any node in the graph. Call it as the search node and mark it visited.

2)     Using the adjacency matrix of the graph, find a node adjacent to the search node (that is connected by an edge from the search node) that has not been visited yet. Designate this as the new search node (but remember the previous one) and mark it as visited.

3)     Repeat Step 2 using the new search node. If no nodes satisfying (2) can be found, return to the previous search node and continue from there.

4)     When a return to the previous search node in (3) is impossible, the search from the originally chosen node is complete.

5)     If the graph still contains unvisited nodes, choose any node that has not been visited and repeat Steps (1) through (4).

This algorithm is called Depth-first-search because the search continues progressively deeper in a recursive manner. The sequence of nodes visited in the DFS traversal of the graph I fig. 12 is PQTSR.

 

 

Breadth first search.

 

Another widely used scheme for graph traversal is the breadth-first-search.  It can be used both for directed and undirected graphs. Essentially the BFS begins at a given node and then proceeds to all nodes directly connected to that node. The following steps are involved in the algorithm.

 

Fig 12.

 
 

 

 

 

 

 

 

 

 

 

 


1. Begin with any node, mark it visited.

2. Proceed to the next node having the edge connection to the node I 

    step (1), and mark it as visited.

3. Come back to the node in step (1), descend along an edge toward an

    unvisited node, and mark the new node as visited.

4. Repeat step (3) until all nodes adjacent to the node in step (1) have

    been visited.

5. Repeat step (1) through (4) from the node visited in (2), then starting

    from the nodes visited in step (3) in the order visited.

In a BFS, the graph is searched broadly by exploring all the nodes adjacent to a node. The sequence of nodes visited in the BFS traversal of the graph I fig. 12 is PQRST.

 

 

 

 

Minimum Spanning Trees

In many natural applications of graphs, not only edges between nodes, but also the weights of these edges, play an important role. The weight of an edge is just its value. For example an airline may be interested in not only creating an air route between two cities but also the shortest such air route. The distance between the two cities is the weight of the edge connecting these two cities. A graph with weighted edges is called a network. Refer fig 13

 

 

 

 

 

 

 

 

 


Fig 13 (a). Network with weighted edges

 

 

 

 

 

 

 

 

 


Fig 13 (b) Minimal spanning tree for the graph in fig 13(a)

Spanning tree: Defn-

 Let G = (V,E) be an undirected connected graph. A subgraph

T = (V’,E’) is a spanning tree of G iff T is a tree. The above tree in fig.13(b) is a spanning tree for the graph in fig 13(a) .

 

One can assume the nodes to be cities and the edges as the distance between the two cities. Given such a weighted graph, we can select

Given a network , you must connect all the nodes in the network so that the total edge weight is minimized. This tree which is the part of a graph is called as the minimum spanning tree.

 

 

 

Greedy Method

The greedy method is the most straightforward design technique, applicable to a variety of problems. Almost all of these problems have n inputs and requires us to find a subset that satisfies some constraints       ( max or min). Any subset that satisfies these constraints is called a feasible solution. We are now required to find a feasible solution that either maximizes or minimizes a given objective function. A feasible solution that does this is called an optimal solution.

Using the greedy method one can devise an algorithm that works in stages, considering one input at a time. At every stage, a decision is made whether or not a solution obtained is an optimal solution. This is done by considering some selection procedure. If the next input leads to a feasible solution then it is included in the set of feasible solution otherwise it is left out. The selection of the inputs also needs some optimization measures such that they result in an optimal solution.

 

Optimal Storage on Tapes

Let there be n programs p1,p2,……..,pn having retrieval time t1,t2,……tn respectively. The programs are to be placed on a magnetic tape in such a way that the mean retrieval time is reduced. This problem can be easily solved using the greedy method. If we consider all ways to organize these n programs, we will get n! feasible solutions. Let us consider an example with three programs to be stored on a magnetic tape.

Ex : Let (p1,p2,p3) be three programs having lengths (l1,l2,l3) = (3,15,23)  to be stored on a tape of length L. Find out the optimal ordering of the programs so as to minimize the mean retrieval time.

     The mean retrieval time can be minimized if we find an optimal ordering of the programs such that the distance traveled is minimum.

Here we will get 6 feasible solutions( i.e 3 programs organized in 6 different ways on the tape). The se solutions are as follows

i)                   l1 + ( l1 + l2 ) + ( l1 + l2 + l3 ) = 3+3+15+3+15+23 = 62

ii)                 l1 + ( l1 + l3 ) + ( l1 + l3 + l2 ) = 3+3+23+3+23+15 = 70

iii)               l2 + ( l2 + l1 ) + ( l2 + l1 + l3 ) = 15+15+3+15+3+23=74

iv)               l2 + ( l2 + l3 ) + ( l2 + l3 + l1) = 15+15+23+15+23+3=94

v)                 l3 + ( l3 + l1 ) + ( l3 + l1 + l2) = 23+23+3+23+3+15=90

vi)               l3 + ( l3 + l2 ) + ( l3 + l2 + l1) = 23+23+15+23+15+3=102

 

From the above five feasible solutions obtained the solution ( i )  i.e the ordering l1,l2,l3 is the optimal one.

The greedy method just requires to store the programs in the nondecreasing order of their lengths.

 

 

Job Sequencing with deadlines.

 

In this problem we are given n jobs. With every job i, di >0 is the deadline and pi >0 is the profit. For any job i the profit pi is earned iff the job is completed by its deadline. In order to process a job one has to execute it on a machine for one unit of time. Only one machine is available for processing these jobs. A feasible solution for this problem is a combination of jobs fed to the machine such that the jobs are completed by its deadline and the profit is maximized. An optimal feasible solution is a feasible solution with maximum profit value.

 

Ex.

 Let n=3, (p1,p2,p3) = ( 40,30,20) and (d1,d2,d3) = ( 1,2,1) be the profits and deadlines for jobs j1,j2,j3 respectively. Find the optimal feasible solution.

 Soln :- The feasible solutions will be those combination of jobs such that they are completed in their respective deadlines.

The feasible solutions and profits earned are

 

     Feasible solution              Processing sequence        Profits

i)                   ( 1 )                                1                               40

ii)                 ( 2 )                                2                               30

iii)               ( 3 )                                3                               20

iv)               (1,2)                                1,2                            70

v)                 (2,3)                                3,2                            50

 

Now, applying the greedy method we choose solution ( iv ) as the optimal feasible solution with profit 70.

 

 

 

Optimal Merge Patterns

When more than two sorted files are to be merged together the merge can be accomplished by repeatedly merging sorted files in pairs. Thus if files F1, F2, F3 are to be merged we could first merge F1 & F2 to get X1 and then merge X1 with F3 to get the resultant merged sorted file X2. Alternatively, we could first merge F2 & F3 to get X1 and then merge X1 & F1 to get X2. Different pairings require different amount of computing time. The concern will be now to determine an optimal way to pairwise merge n sorted files together.

A greedy attempt to obtain an optimal merge pattern is easy to formulate. Since merging an n record file and m record file requires possibly N+m record moves, the obvious way to select a criteria for choosing the files to be merge will be : at each stage merge the two smallest size files together. For ex if we have five files (f1,f2,…,f5) with sizes ( 7, 20, 15,60,25) the greedy method would generate the following merge pattern

 

f1+f3 = x1 (22)

f2+x1 = x2 (42)

f5+x2 = x3 (67)

f4+x3 = x4 (127)

The total number of record moves is 127. One can verify that this is the optimal merge pattern for the given problem instance. This merge pattern can be represented in form of a binary tree. The leaf nodes are drawn as squares and represent the given five files. These nodes will be called as external nodes . The remaining nodes are drawn circular and are called internal nodes. Each internal node has exactly two children. The number in each node is the length of that file represented by that node.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig 14. Merge Pattern

          

 

Single Source Shortest Path

Graphs may be used to represent the network of roads in a state or a country with the vertices representing cities and edges representing sections of the roads. The edges may be assigned weights which might be the distance between the two cities connected by that edge or the time to drive from one city to the other. A person wishing to travel from city A to city B will naturally be interested in some questions

i)       Is there a path from A to B ?

ii)     If there is more than one path from A to B, which is the shortest path ?

     The total length of a path connecting two cities might be the sum of all the weights of the edges which lie in the path connecting the cities.

The starting vertex of the path is called as the source and the last vertex is the destination. In diagraphs one way streets may occur limiting the paths from a node A to B.

In order to devise a greedy based algorithm to generate an optimal sequence of nodes, we will have to consider all the possible paths between the two nodes one by one. The greedy way to generate shortest paths would be to generate paths in nondecreasing order of their path lengths. First a shortest path to the nearest vertex is generated, then a shortest path to the second nearest vertex is generated and so on. For the graph in fig. 13 (a) the shortest path between A & C would be ABC with a total weight of 1000 units.

 

Knapsack Problem

The greedy method can also be used to solve some more complicated problems like the Knapsack problem. Given n objects and a knapsack with capacity M, objects i1,i2,….,in having weights w1,w2,…..,wn. If a fraction xi, 0 <= xi <= 1, of the object I is placed in the knapsack then a profit of pixi is earned. The objective is to fill the knapsack such that it maximizes the profit earned. Since the knapsack capacity is M, we require the total weight of all chosen objects to be at the most M.

Mathematically the problem can be stated as

      Maximize   ∑ pi xi

                   1 ≤ i  ≤ n

 

      Subject to     wi xi  ≤ M

                   1 ≤ i  ≤ n

 

and 0 ≤ xi ≤ 1, 1 ≤ i ≤ n

The profits and weights are positive integers.

 

Ex. Consider the following instance of the knapsack problem :

N=3, M=30, (p1,p2,p3) = (20,40,30), and  (w1,w2,w3)= ( 10,15,45)

Soln :-

The feasible solutions are

     ( x1,x2,x3)                  ∑wi xi                   ∑pi xi

     

      (0,1,3/9)                        30                50

      (1,1,0)                           25                60

      (1,1,1/9)                        30                63.33

 

Of these three feasible solutions, solution ( iii) yields the maximum profit. In case the sum of all the weights is less or equal to M, then xi = 1, is an optimal solution. So let us assume that sum of the weights exceeds M. Now all the xi’s cannot be 1. It is important to note that an optimal solution will fill the knapsack exactly. This can be done by increasing some fractional amount of some object i until the total weight is equal to M.

A 0/1 Knapsack problem is a problem where either 0 or whole of the object can be placed in the knapsack. No fraction of the object is allowed.

 

CODE OPTIMIZATION

The function of a compiler is to translate programs written in some source language into an equivalent assembly language or machine language program. The translation will depend on the particular assembly language and hence the machine.

We shall assume a simple machine model. This machine has only one register called the accumulator. All the arithmetic has to be performed in this register. For simplicity we shall restrict the machine’s operation to +, - , * and / operations only.

Let the required assembly language instructions are:

 LOAD X : load the accumulator with contents of memory location X.

 STORE X : stores contents of accumulator into memory location X.

OP X : OP may be ADD, SUB, MUL, DIV.

 

The instruction OP X computes the operator OP using the contents of the accumulator as the left operand and that of the memory location X and stores the result in the accumulator.

For ex. Consider an arithmetic expression (a+b) * (c-d)

Two possible assembly language versions of the code are

 

      LOAD a                             LOAD c

      ADD b                               SUB d

      STORE T1                         STORE T1

      LOAD c                                       LOAD a

      SUB d                                 ADD b

      STORE T2                         MUL T1

      LOAD T1

      MUL T2

 

A translation of an expression E into the machine or assembly language of a given machine is optimal iff it has a minimum number of instructions.

Out of the possible assembly codes generated, our concern now is to find out the optimal one. The form in which we have seen expressions is the Infix form i.e. the operators are in between the operands. In generating the optimal code, it is convenient to represent arithmetic expressions in form of binary trees. Each non leaf node in the binary tree will represent an operator and will be called an internal node. The left subtree of an internal node P will represent the binary tree form of the left operand of the operator represented at P while the right subtree represents the right operand. A leaf node represents either a variable or a constant. This binary tree representing the arithmetic expression is called as expression tree. Similar type of trees are used by the compiler for parsing.

An expression tree for (a+b) * (c-d) is as given below in fig. 15

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig. 15 Expression tree

 

 

Backtracking

Many problems which deal with searching for a set of solutions or which ask for an optimal solution satisfying some constraints can be solved using backtracking formulation. In order to apply backtrack method, the desired solution must be expressible as an n-tuple (x1,x2,…..,xn) where xi are chosen from some finite set Si. We often use backtracking while proving theorems. In the flow, suppose we are unable to prove the theorem we reconsider the initial condition or change the lemma(some proven facts) applied. A classic combinatorial problem is to place eight queens on an

8 X 8 chessboard so that no two queens attack each other. Let us number the rows and columns of the chessboard 1 through 8. The queens may also be numbered 1 through 8.  All the solutions to the 8-queens can be represented as 8-tuples (x1,x2,….,x8) where xi is the column number in which the ith queen is placed. One of the solutions to the problem is (4,6,8,2,7,1,3,5) and is shown in the fig.16 below.

 

                           1      2       3      4       5       6     7        8

 

 

 

Q

 

 

 

 

 

 

 

 

 

Q

 

 

 

 

 

 

 

 

 

Q

 

Q

 

 

 

 

 

 

 

 

 

 

 

 

Q

 

Q

 

 

 

 

 

 

 

 

 

Q

 

 

 

 

 

 

 

 

 

Q

 

 

 

1

2

3

4

5

6

7

8

 
    Column

Row           

                  

                  

 

                  

                  

                  

 

 

 

 

Fig 16. Solution to 8-Queens problem

 

Backtracking algorithm determine problem solutions by systematically searching the solution space for the given problem instance. The search is facilitated by using a tree organization for the solution space. For a given solution space many tree organizations may be possible. The tree organization of the 4-queens solution space is given below. Nodes are numbered as in depth first search.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Algorithm to find all possible solutions of n-queens problem.

 

Procedure NQUEEN(n)

Integer k, n, X(1:n)

X(1) := 0;

K := 1                   //k is the current row; X(k) the current column//

WHILE k > 0 DO

          X(k) := X(k) + 1   //move to the next column//

          WHILE X(k) ≤ n AND NOT PLACE(k) DO // PLACE returns

true if a queen can be  placed otherwise false.

                   X(k) := X(k) + 1

          REPEAT

          IF X(k) ≤ n

            THEN IF k = n

                   THEN PRINT(X)

                   ELSE k := k + 1

                             X(k) := 0

            ENDIF

          ELSE 

            K := k – 1 //Backtrack//

          ENDIF

REPEAT

END NQUEEN

 

If we consider how effective this procedure is, for an 8X8 chessboard there are 864 possible ways to place 8 pieces or approximately 4.4 billion 8-tuples to examine. However by allowing placement of queens on distinct rows and columns we require the examination of at most 8! or only 40,320 8-tuples.

 

 

Graph Coloring

 Let G be a graph and m be a given positive integer. We want to discover if the nodes of G can be colored in such a way that no two adjacent nodes have the same color yet only m colors are used. This is termed as the m-colorability decision problem. The m-colorability optimization problem asks for the smallest integer m for which the graph G can be colored. This integer is referred to as chromatic number.

A graph is said to be a planner iff it can be drawn in a plane in such a way that no two edges cross each other. A special case of the m-colorability decision problem is the 4-color problem for planner graphs. This problem asks the following question : given any map, can the regions be colored in such a way that no two adjacent regions have the same color yet only four colors are used. It has been proved that 4 colors are sufficient to solve any such problem. A map and its planner representation is as shown below.

 

 

 

Branch & Bound

The term branch and bound refers to all state space search methods in which all children of the node E generated before any other live node can become the E-node. The branch and bound strategy suggests that the problem is bounded (terminated) as soon as no feasible node (state) is reached with an input. The branch is terminated at such instance. The 4-queen state space tree generated by branch and bound is as shown below.