A distributed approach for solving a system of linear equations

Dhruv Matani.
Third Year Computer Engg.
September 4, 2004
[email protected]

Abstract

This paper discusses an algorithm and its implementation for solving a system of linear equations on a distributed system. The algorithm used is The Cramer's rule for solving a large number of linear equations simultaneously. Hence, a pivotal part of this algorithm is solving a determinant of a square matrix in a distributed manner. This topic of determinant solving is discussed in detail, followed by some results of running the program with different number of computers on the network.

Introduction.
Practical applications of solving a system of linear equations.
Existing solutions.
A distributed approach for solving a system of linear equations.
A distributed approach for finding the determinant of a square matrix.
Complexity analysis.
The implementation in detail.
Results of tests conducted measuring the improvements in time required for solving the system of linear equations.
Future developments.
Conclusion.
Acknowledgments.
References.

1. Introduction

A system of linear equations has many applications in the fields of chemistry, physics, mathematics, and also in our day to day lives. Solving slightly non-trivial problems become very easy if we can express them as a system of linear equations, and solve them simultaneously. They provide just the abstraction that the human mind needs to solve the original problem without taking into account other unrelated data.

We describe in this paper a method for solving a large number of such simultaneous linear equations quickly and efficiently on a distributed system. The rationale for choosing a distributed system lies in the fact that they are very cheap to build, because a large number of inexpensive computers connected by a high-speed network can form a very powerful distributed network.

We first describe a distributed approach for solving the determinant of a square matrix, and then apply Cramer's rule to solve the system of linear equations simultaneously. We shall show how these two tasks are very much inter-related, and closely coupled for the correct functioning of the final algorithm.

We have implemented this program on a Linux based system, and have measured the timings for runs of the program for solving the same number of equations, but with a different number of clients connected to the main server.

2. Practical applications of solving a system of linear equations

Digital signal processing (DSP) is the study of signals in a digital representation and the processing methods of these signals. DSP and analog signal processing are subsets of signal processing. It has three major subfields: audio signal processing, digital image processing and speech processing.
In DSP, engineers most commonly study digital signals in one of the following domains: time domain (one-dimensional signals), spatial domain (multidimensional signals), frequency domain, autocorrelation domain, and wavelet domains. They choose the domain in which to process a signal by making an educated guess (or trying out different possibilities) as to which domain best represents the essential characteristics of the signal. A sequence of samples from a measuring device produces a time or spatial domain representation, whereas a discrete Fourier transform produces the frequency domain information. The autocorrelation is, loosely speaking, defined as the expected value of correlation of the signal with itself on some distance in time or spatial distance. http://www.fact-index.com/d/di/digital_signal_processing.html
Linear programming is the process of solving a system of linear equalities and linear inequalities over a set of unknown real variables, along with a linear objective function to be maximized.

Many practical problems in operations research can be expressed as linear programming problems. For instance, if x1 is the number of acres planted with wheat and x2 is the number planted with corn, and a farmer has a limited number of acres A, and has a limited permissible amount F of fertilizer and P of insecticide which can be used, each of which is required in different amounts per acre (F1, F2, P1, P2) for wheat and corn respectively, and knows the selling price of wheat S1 and the selling price of corn S2, then the optimal number of acres to plant with wheat vs corn can be expressed as a linear program. http://www.fact-index.com/l/li/linear_programming.html
Numerical analysis is that branch of applied mathematics which studies the methods and algorithms to find (approximate) numerical solutions to various mathematical problems, using a finite sequence of arithmetic and logical operations. Most solutions of numerical problems build on the theory of linear algebra. http://www.fact-index.com/n/nu/numerical_analysis_1.html

Nonlinear Least Squares Curve Fitting: Simple linear curve fitting deals with functions that are linear in the parameters, even though they may be nonlinear in the variables. For example, a parabola y=a+b*x+c*x*x is a nonlinear function of x (because of the x-squared term), but fitting a parabola to a set of data is a relatively simple linear curve-fitting problem because the parameters enter into the formula as simple multipliers of terms that are added together. Another example of a linear curve-fitting problem is y= a+b*Log(x)+c/x; the terms involve nonlinear functions of the independent variable x, but the parameters enter into the formula in a simple, linear way. http://members.aol.com/johnp71/nonlin.html

3. Existing Solutions

Many solutions currently exist for solving such a large number of simultaneous linear equations. Many of them are iterative methods like the Gauss-Jordan iterative method. Others include the Gauss elimination, and matrix-methods which involve manipulation of the contents of a matrix to get the final result. The matrix methods including the Gauss elimination method essentially use the following relation:

Ax = B

where,

A --> The coefficient matrix.
x --> The list of variables.
B --> The column matrix representing the RHS of the equations.

The Gauss-Jordan method, which is an iterative method uses the values of the unknowns computed at the previous step in its next step while refining the values. thus, there is a dependence of one step on the previous step making this method not very attractive for implementing on a distributed system.

The Gauss elimination method involves converting the original matrix into an upper or lower triangular matrix, and then computing the individual values by re-substituting them into the resulting equations. This method too suffers from the defect that each step is dependent on the previous step. there is one more dependency introduced in the fact that the same operations must be performed on the column matrix on the other side as is being performed on the original matrix. Thus, making it unsuitable for our purposes. [MFA2001]

The matrix-inversion method again involves converting the original matrix to a unit matrix, and suffers from the same defects as the Gauss-elimination method, making it unsuitable for our purposes.

4. A distributed approach for solving a system of linear equations

In our distributed approach, we use the Cramer's rule for solving the system of linear equations. Cramer's rule is defined as:

Given a system of 'n' linear equations in 'n' distinct unknowns represented as:

a₁x + a₂y + ... + a_nz = k₁
b₁x + b₂y + ... + b_nz = k₂
.
.
.
c₁x + c₂y + ... + c_nz = k₃

We can find the unknowns x, y, ... , z using the following formulae:

Let

∆ = det(coefficient matrix)
∆_x = det(coefficient matrix, with the first column replaced by RHS column matrix)
∆_y = det(coefficient matrix, with the second column replaced by RHS column matrix)
and so on...
∆_z = det(coefficient matrix, with the n^th column replaced by RHS column matrix)

Thus,

x = ∆_x / ∆
y = ∆_y / ∆
.
.
.
z = ∆_z / ∆

As you can clearly see, the process of calculating the values ∆, ∆_x, ∆_y, ..., ∆_z is independent of any other process, and
also each of the above values can be calculated independently of each other. So, this method seems to be very attractive for implementing on a distributed system, since it involves a series of steps which can be performed independently of each other. So, in our approach, we use the Cramer's rule for solving the system of linear equations simultaneously.

As you might have noticed, now that we know that each of the above operations can be performed independently of each other the fact still remains that the procedure for calculating the determinant of a square matrix is a very time consuming process, and should be done efficiently for the efficient operation of our algorithm. We describe in detail in the following section the procedure for calculating the determinant of a square matrix. Also described is the algorithm for calculating the same in a distributed environment. [DAB2002] [LN012003]

5. A distributed approach for finding the determinant of a square matrix

The process of calculating the determinant of a square matrix is a recursive one, as we shall show shortly. Thus, it should be possible to break up the process into distinct independent units, each operating in isolation from the others.

Consider a 1 X 1 matrix given by:

The determinant of this matrix is trivial to find, and is the value 'a' itself.

Now, consider a 2 X 2 matrix given by:

a b
c d

The determinant of the above 2 X 2 matrix is given by the formula:

a(d) - b(c)

Now, consider a 3 X 3 matrix given by:

a   b   c
d   e   f
g   h   i

To calculate the determinant of the above matrix, we use the formula:

a(e(h) - f(h)) - b(d(i) - f(g)) + c(d(h) - e(g))

As you can see, the above step of calculating the determinant of a 3 X 3 matrix involves calculating the determinant of a 2 X 2 matrix 3 times. And, the process of calculating the determinant of a 2 X 2 matrix involves calculating the determinant of a 1 X 1 matrix 2 times.

Generalizing, we conclude that the process of calculating the determinant of an n X n matrix involves calculating the determinant of an (n-1) X (n-1) matrix 'n' times. This shows us that the process of calculating the determinant of a square matrix is in fact a recursive process.

Now, coming to the point of calculating the determinant of a square matrix in a distributed environment, we make the following assumptions:

The computer containing the n X n matrix whose determinant is to be calculated is called the 'master'.

The 'master' is connected to at least 'n' client machines by means of a high speed network connection. Each of these 'n' clients are capable of calculating the determinant of a square matrix.

Each of these 'n' clients may be connected to any number of sub-clients, which may in turn be connected to sub-sub-clients, and so on and so forth... Thus, the client acts as a server for the sub-client, and this kind of relationship is replicated at each level.

It is not necessary for the correctness of the algorithm for the points (1..3) to hold, but they are necessary for showing that the proposed algorithm in such a scenario is very efficient in solving the determinant of an n X n matrix.

Once the above environment is set up, the 'master' starts the process of calculating the determinant of the n X n matrix by delegating to each of the 'n' clients the task of calculating the determinant of each of the 'n' (n-1) X (n-1) matrices. These 'n' clients may in turn delegate part of their work-load to sub-clients connected to them, and so on and so forth... Thus, which each client has finished performing the task that its master assigned it, the master is passed back the result from each of its clients, where it is consolidated and the final result is computed by the master.

We can express the algorithm as:

The master delegates the task of calculating the determinant of 'n' (n-1) X (n-1) matrices to each of the 'n' clients connected to itself.

Each of the clients in turn delegate the work to clients connected to them, and this continues till there is no client with a client connected to it. At this stage, the determinant is calculated locally on that machine.

These clients at various stages in the tree-hierarchy then pass the computer results up the tree to their respective masters.

The masters who are waiting for the computer results accept them, and consolidate the computed determinants of the 'n' sub-matrices, and compute the value of the determinant of the square matrix that they have been assigned.

6. Complexity analysis

The complexity for finding the determinant of an n X n matrix can be expressed by the following recurrence:

O(n) = n + n(O(n-1)) ______(1)

The first 'n' stands for the 'n' multiplication operations performed after each of the 'n' (n-1) X (n-1) determinants have been computed, and the n(O(n-1)) stands for the complexity of calculating the value of the 'n' (n-1) X (n-1) determinants 'n' times.

Solving the above recurrence, we get:

O(n) = nⁿ _______(2)

The above complexity represents that for a serial execution of the determinant algorithm on a uni-processor system, or more simply said, on a single machine.

To reduce the complexity, we have decided to use the distributed approach for computing the determinants.

The complexity for finding the determinant of an n X n matrix in a distributed environment is quite different from that in a non-distributed one.

We notice that each of the (n-1) X (n-1) determinants are being computed simultaneously(in parallel) on the various clients that have been assigned the mutually independent tasks of computing the determinants.

Thus, in the eqn.(1), the term n(O(n-1) becomes simply O(n-1).
The new recurrence for a distributed system can be given as:

O(n) = n + O(n-1) ______(3)

Solving the above recurrence, we get:

O(n) = n² _________(4)

Cramer's rule solves the determinant of an n X n matrix 'n' times, so the first serial(non-distributed) method gives us an asymptotic upper bound of:

O(n) = n x (2)
O(n) = n x nⁿ
O(n) = n⁽ⁿ⁺¹⁾ ______(5)

However, for a distributed system, the asymptotic upper bound is given by:

O(n) = n + (4)
O(n) = n + n²
O(n) = n² _______(6)

Thus, the asymptotic upper bound for solving 'n' linear equations in 'n' unknowns is: n².

7. The implementation in detail

The ideas expressed in the document up to now have been implemented in standard C++ in the form of a computer program, which can be compiled and run on a Computer system. This section discusses the implementation in detail, highlighting some of the features and pit-falls of the current implementation.

The implementation is currently split up into 7 logical units.

Client

The client is the part of the code that runs on the client machines, and is responsible for connection to the central server.
The client accepts requests like 'start', 'reset_client', 'reset_subclient'and 'quit' from the server, and acts on them. See below for a complete list of commands supported by the client, and how each one of them is acted upon by the client.

Server

The server is responsible for polling continuously for new connections from clients. Once a clients tries to connect to the server, and the server accepts this connection, the client is said to have been connected to the server, and the server stores the unique ID associated with this connection locally withing its memory.
This server also partially manages the task of distributing the work-load of solving the determinant remotely along with the determinant module.
It most be noted that the server code is executed on both the server(master) machine as well as the client machines. This is because the client may acts as a server for other sub-clients, etc...

File parser

The file parser defines a set of rules for the syntax of equations that can be understood by the parser, and reads them off the input file. These set of rules are in the form of a CFL(Context Free Language) represented by a set of productions.
The rules are given below.

Determinant module

The determinant module is the main part of the Equation Cruncher application which is responsible for solving the determinants both remotely and locally.
It contains various helper functions to assist itself in the correct and efficient execution of the determinant solving logic.
This piece of code makes heavy use of threads, and a lot of the operations being mutually independent can be carried out concurrently. So, having the master as an SMP system would be highly beneficial from the efficiency and speed of execution point of view.
This module interacts extensively with both the Socket module as well as the Node manager.
It extracts free clients from the Node manager, and instructs them to carry out various operations. A list of operations and how these interactions take place are given below.
The interaction of this module with the Socket module is also very large, and there is strong coupling between these two modules, as is the case with this module and the Node manager.

Node manager

The Node manager is responsible for effectively and correctly maintaining the list of all clients currently connected to the master after the server has recognized such clients and has accepted their connections.
This module contains functions that send matrices across the network to clients, and on the client side, another set of functions receive the data, and re-create the matrix in its original form from the extra data provided in the command header. A description of the packer header is given below.
It contains atomic operations for getting a client socket, and replenishing it.
This piece of code is tightly coupled with the server code.

Socket manager

This contains basic wrapper functions around standard UNIX socket functions, and implements basic error checking.
It throws exceptions if there is any error in the execution of the underlying functions.
The list of exceptions is given below.

Synchronization manager

This is just a thin wrapper around the POSIX thread API, and provides a solitary Mutex class which allows programs to have mutually exclusive access to any piece of shared data.

List of commands accepted by the client, and actions taken when these commands are received

Command	Action taken
start	When the client receives a start command, it first receives the complete matrix from its server, and then starts evaluating the determinant of that square matrix received. When the determinant calculation is complete, it sends the server a result command in return along with the actual result.
reset_client	When the client receives a reset_client command, it performs 2 main functions. Firstly, it sends all its clients a reset_subclient command, and secondly it disconnects itself from its server by closing the open socket, and then starts waiting for a connection once again on the same IP address and port as before.
reset_subclient	When the SubClients receive a reset_subclient command, they just display the appropriate message to the user, and propagate the message to their clients.
quit	When the client receives a quit command, it closes the connection with all it's clients, and its server, and gracefully exits.

Set of productions defining the Context Free Language(syntax) of the equations appearing in the input file.

EQN --> LHS = RHS
DOUBLE -->(0..9)^* | (0..9)^*.(0..9)⁺
RHS --> (+ | - | e)DOUBLE
LHS --> LHS(+ | -)TERM | TERM
TERM --> (+ | - | e)DOUBLE_VAR
VAR --> (a..z)⁺

Structure of the Packet Header

struct pkt_header
{
    int size; // Amount of data after the packer header gets over.
    int type; // Type of packet: { data, command }
    int code; // Command issued: { quit, start, result, reset_client }
    double reserved; // Reserved for the implementation to use as it feels right.
};

List of exceptions thrown by the Socket module

Exception Name	Thrown When?	Public Base Classes
eqn_cruncher::socket_error	Any general socket error while creating the socket, connecting to the server, listening for connections, accepting connections, sending/receiving data.	std::runtime_exception
eqn_cruncher::host_error	Host name lookup error.	eqn_cruncher::socket_error
eqn_cruncher::port_error	Error binding to the specified port number.	eqn_cruncher::socket_error

List of exceptions thrown by the File parser

Exception Name	Thrown When?	Public Base Classes
eqn_cruncher::file_error	Input file does not exist.	std::runtime_exception
eqn_cruncher::parse_error	Error encountered during parsing the input file. This error is a syntax error encountered during scanning/reading the input file data.dat. This exception suggests that the format of the equations is not correct, and does not adhere to the list of productions given above.	std::runtime_exception

8. Results of tests conducted measuring the improvements in time required for solving the system of linear equations

The measurements were taken on systems running Mandrake Linux - 9. They had a hardware configuration of Intel Celeron 2400MHz CPU with 128MB RAM. The various machines were connected to each other by a 10/100MBPs fast ethernet connection. In all, there were 9 systems connected to each other, with one acting as the server, and the other 9 as clients.

We solved 10 simultaneous linear equations in 10 unknowns on these systems increasing the number of clients each time from 0(zero) right up to 8(eight). The results are given below in a tabular format, followed by a graphical representation.

Number of clients connected to the server.	Time taken (in seconds) to solve 10 equations.
0	340.525
1	132.95
2	80.2574
3	37.3861
4	20.8911
5	13.2871
6	9.9505
7	7.56436
8	5.38614

Graph showing the variation in the number of clients connected v/s time taken to evaluate 10 linear equations in 10 unknowns.

As you can see, the time taken to solve the equations reduces as the number of clients connected to the server increases. Thus, showing that the implementation is fairly scalable, and will perform well under various loads with many clients connected to the server(master) machine.

Shown below is the tree-hierarchy structure for the arrangement of clients wrt the server(master) machine that will result in optimal performance.

Tree-hierarchy structure for optimal perfromance

9. Future work

These are the areas which we shall be concentrating on in the near future. Any help in these or other related areas is greatly welcomed, and will be highly appreciated.

Optimizations:

Network speed test: This should test the speed between each client, and the server by sending ping like packets.
Client speed(CPU speed) test: This should test the speed of the client in terms of CPU speed.
Client memory(RAM) capacity test: This should test whether the client machine has enough memory to handle the request being sent to it without making sufficient use of it's swap file.

Load Balancing:

This algorithm should balance the load between the various clients by relinquishing some unused clients in some server to some other client who needs them(and their processing speed) to distribute its work-load.

Fault Tolerance:

If some client goes down while evaluating the determinant, then the server should immediately recognize this situation, and take remedial steps such as sending the request to another client if there exists a free client or queuing up the request for a client as soon as some client gets free, or to evaluate the determinant itself.

10. Conclusion

We have seen the implementation of an algorithm which solves a system of linear equations on distributed systems. The test results show that it is more efficient compared to the serial algorithm, which executes on a single machine. We have also computed the asymptotic upper bound for both the approaches, and have seen that given a greater number of processors, the asymptotic upper bound for out distributed algorithm tends towards n², and compared to nⁿ for the serial algorithm.

We have also seen the timings for solving 9 equations in 9 unknowns on a distributed system, each containing a different number of clients. We observe that as the number of clients increases, the speed of the algorithm also increases, thus showing that the algorithm is scalable. We would have liked to perform the tests with an SMP/server system as the master, but such a computer was not available at the time of testing.

Future work in the areas of optimization, Load balancing, and creating a Fault tolerant system will be encouraged, and greatly appreciated. We hope to accomplish the points mentioned in (9) in the year to follow.

There is some scope for improving the algorithm of finding the determinant, and improvements which use properties of determinants to eliminate redundant calculations will be encouraged, and implemented. For example, one optimization already in place is that if the element with which the determinant of a sub-matrix is to be multiplied is 0, then the value of the determinant is not calculated, thus saving a lot of time. This will prove highly beneficial for sparse matrices. Another area that has been explored is cutting down the number of iterations, and network data flow. When we wish to calculate the determinant of a 3 X 3 matrix or a square matrix of a smaller dimension, then the formula is used directly instead of using the recursive function.

We would also like to explore the possibility of using a grid based approach instead of the current tree-like hierarchy based approach, since we believe that it would alleviate the problem of load balancing to some extent.

11. Acknowledgments

The author would like to thank Siddharth Shah for introducing him to the world of clusters, grids, and helping him out with the basics of Networking in computers, without which this project would never have been possible. This project was actually made because Siddharth wanted to do something related to networking, and I wanted to do something related to Algorithms. So, we reached a compromise, and this baby was born! I would also like to thank Apurva Mehta, who helped me test the initial client-server network model, which I had implemented in C++. Thanks to Aniket for proof reading this document, reporting the mistakes and helping me explain certain concepts in a better manner. My mom needs a special mention, because she was the one who reminded me that there existed something called food while I was busy thinking about the whole concept and later implementing the ideas.

We would like to pay our tributes to Dr. Cramer, without whom this project would never have been possible!

12. References

[MFA2001] Mark F. Adams. A Distributed Memory Unstructured Gauss-Seidel Algorithm for Multigrid Smoothers. August 17, 2001.

[DAB2002] David A Bader, Bernard M.E. Moret, and Peter Sanders. Algorithm Engineering for Parallel Computation.

[LN012003] Lecture notes. Csanky's Algorithm: matrix determinant € NC². 10/14/2003.

Hosted by www.Geocities.ws