Grid Systems – Computing meets
Economics and the Life Sciences

EMTM 650 - Emerging Technologies

Olufemi Anthony

Introduction

Grid computing provides one of the most compelling examples of technological synthesis to emerge in the computer science and engineering field for some time. Computational grids and peer-to-peer systems are being used to solve large-scale problems in business, science, and engineering. Problems involving drug discovery, intractable mathematical problems are being tackled with grid computing. The best-known real-world example of such a problem is that of SETI, the Search for Extra-Terrestrial Intelligence, which I constantly run on my home computer as a screensaver when my terminal is idle.

Description

Grid computing involves harnessing the computational power of disparate, heterogeneous, geographically separated computer systems to solve large problems. These systems are often different in many ways, including

Platform e.g. PC, workstations, super computers, workstations

Hardware resources – CPU, I/O, memory

Software applications

The grid is an architecture that encompasses the following elements:

Computers - Networked PCs, workstations, supercomputers, clusters, PDA, laptops.
Software - Applications that do job scheduling on remote machines connected to the grid, as well as applications that run the job on the machines themselves.
Data and Databases - Databases that reside on one or more grid machines, e.g. human genome database.
People - Consumers/users who request computational resources on the grid, and resource providers who make resources available via their machines connected to the grid.
Distributed network - The Internet, and other smaller networks that make the grid possible.

There are 2 aspects of synthesis I would like to highlight in this paper. One involves what one might call an extraneous or enabling synthesis, whereby grid computing is used to enhance another field of endeavor. Grid computing is proving to be extremely valuable in the field of drug discovery and biological computation. It is enhancing the field of bio-informatics and is helping to reduce the computation time needed for drug discovery screening by many orders of magnitude.

One real world example of this synthesis was the screening phase of the Anthrax Research project, a collaborative effort between United Devices, a grid computing company, Department of Defence, IBM. Using grid computing, participating members used their computers to help in screening 3.57 billion molecules in the search for a drug to treat advanced-stage anthrax.[1] The process was completed in 24 days.

I also discuss an externally influenced synthesis, whereby the grid computing is influenced and enhanced by another field. This synthesis involves a new paradigm involving the marriage of economics to computation.
Managing the resources in a grid system while satisfying the needs of multiple, similarly disparate, geographically dispersed users is a non-trivial task.
There are various models for grid resource management architecture including[2]:

Hierarchical - currently employed in current systems
Abstract Owner model
Economy/Market model

The current systems for resource management and scheduling utilize conventional models whereby jobs are scheduled based on simple cost functions. The cost functions are focused more on system parameters such as throughput, CPU utilization rather than on utility of application processing. For example, in such systems resources have constant cost, regardless of time and availability. Such systems, while suitable for small to midsize dedicated grid systems turn out to be inadequate for large-scale Internet scale grid systems. This is because on such a scale, the grid becomes a utility and users have to compete with each other for resources. In such a scenario, a purely centralized system-centric view does not allocate scarce resources in the most efficient manner since the system is itself a highly decentralized and distributed system with many heterogeneous parameters. It turns out that a promising model for such resource management or large grids comes from the field of economics using a market-based approach. This problem is aptly modeled in a market-economy framework for it involves demand and supply of limited resources with various constraints.
There are many benefits to economy-based resource management including the following: [3]

It uses the profit motive to incentives resources owners to make available their idle computing resources for use in the computational grid.
It helps regulate demand and supply in the grid.
It helps in the prioritization of resource use of the grid
It provides a decentralized framework and avoids the need for a central coordinator
It provides user-centric rather than system-centric scheduling policies.
It provides a system for the efficient allocation and management of resources.
It allows for the exchange of all kinds of resources – computational power, memory, storage, and bandwidth.
It helps to provide access to grid resources in a fair manner.
It facilitates the building of a scalable system as it helps decentralize the decision-making process across the network.
It empowers both suppliers and consumers as they can make decisions that best fit their needs.

In the computational market framework, the resource owners are modeled as producers/suppliers and the resource users as consumers whose interactions follow economic laws of supply and demand. A user/consumer is in competition with other users and a resource owner with other resources owners. However, the resources being exchanged are in the computational domain.

An architecture for grid computational economy consists of the following elements:

Grid Users
Grid Resource Broker
Grid Middleware services
Grid Service Providers

The basic architecture for such an economy is shown below [4]:

Brief Description of Grid Computing Economic Models

Some of the economic models that are being investigated for managing resources in a grid computing environment include[5]:

Commodity Market Model
Tendering/Contract
Monopoly and Oligopoly
Bid-Based Proportional Resource Sharing
Bartering/Coalition/Community
Bargaining
Auction – including English, Dutch, Vickrey, and first price sealed-bid auctions.
Posted price

Of these, the commodity market, auction, tendering and proportional resource sharing models are the ones that are being most widely researched.

Commodity Market
In this model, services are supplied at the market equilibrium price based on laws of supply and demand.

Auction
In this model a single seller negotiates with multiple consumers with the seller trying to obtain the best price as in a real-world auction.

Tendering

This model is based on the contracting/bidding process used by businesses to govern the exchange of goods and services. In this case, a user submits a request for computing resources and the sellers submit bids in an attempt to win the contract.

Proportional resource sharing

In this model users are allocated credits that they use to access resources. Resources are allocated to users based on the value of their bid in relation to the bids of other users.

Development issues

Grid computing has had many hurdles to overcome on its way to becoming a viable field.

The development of the Internet is the primary factor that has made grid computing possible. Without a network such as the Internet, these disparate computational resources could not connected to each other to make the process possible. Also, the Internet serves as the super-highway to connect idle machines whose CPU time can then be utilized as part of the World Wide Grid (WWG).

In line with the development of the Internet, the development of open, standard networking protocols such as TCP/IP, and the Java language, which is fast becoming the language of network computing has enabled the interoperability of applications across geographically disparate, heterogeneous systems. This is another critical factor that helps make grid computing possible. Also, the advent of secure networking protocols such as SSL and large key encryption make users more confident about outsourcing their large computing jobs to resource providers on the grid.

Before the advent of the Internet, machines could not be efficiently networked together to create the World Wide Grid that is now developing. This grid is now being used for projects such as SETI, Fight Against AIDS etc.

Before the advent of a networked language such as Java, interoperability of applications across computers on this emerging WWG would have been impossible. Given that most machines either have a Java Virtual Machine (JVM), or can easily download one, writing applications that will run on almost any platform is much easier.

Commoditization of personal computers have made them ubiquitous in society. There is now a trend to move away from proprietary supercomputers to those based on commodity hardware and software components.[6] This has led to research in the areas of cluster technology by industry stalwarts such as IBM and Sun Microsystems who have poured millions of dollars into such projects.

Without the overcoming of these hurdles above – lack of a distributed network to connect machines together, lack of open standards and a common language like Java, the synthesis between computation and biology for use in drug discovery would not be possible. Projects such as the Anthrax research screening could not be done in the short length of time, even on the massively parallel supercomputers of today. Also, development of applications, especially graphics applications has contributed significantly to making drug-modeling possible.

Another hurdle that has to be overcome and which I deal with at length in this paper has to do with resource allocation, management and scheduling of computing resources across the grid. This is currently missing in most of the grid systems currently in use. This is an important hurdle, because for the field to become a commercial success, it has to have sound economic underpinnings that will compel users to make use of the grid, as well as incentivize suppliers to provide resources for the grid. It is only now that a viable marketplace model is being developed for the regulation of resource demand and supply in computational grid systems.

Development Horizon

The synthesis between computing and biology for use in drug discovery has been long in the making. The Internet was first developed in the 1960s as Arpanet, but it was not until the 1990s that it became the info superhighway that we now know it to be. Similarly, the PC only came into its own in the late 1980s and it became a mass-market commodity item only in the late 1990s. Thus one can say that this synthesis has now become possible only after 30 years of development of the Internet and PC industry, as well as advances in networking protocols, processing power and software applications.

The current research that is being carried out that marries computing and economics for resource allocation in computational grid systems has a much shorter lifespan. It is only with the advent of grid computing itself, and the lack of a viable economic model that is driving the development of a market-based resource management architecture and framework. Thus the work has been done on this synthesis for about 2 years and is still ongoing.

Drivers of the syntheses

Drug companies are the main drivers behind the use of grid computing to shorten the screening times for drug discovery. These companies wish to reduce their costs, and grid computing seems to be a viable cheaper alternative to renting expensive computer time on a massively parallel machine or purchasing one. Companies such as Celera Genomics (in partnership with grid computing provider Parabon, Platform Computing), Entelos[7] – a disease modeling firm, Glaxo Smith-Kline[8], Merck, Johnson and Johnson[9] are all spearheading efforts to use grid computing technology in for high performance computing a various capacities in the pharmaceutical industry. The federal government is also playing a large role, since it announced plans for the National Science Foundation to help fund and develop TeraGrid[10], which would provide massive computing resources as a utility over the Internet. TeraGrid is being developed in cooperation with IBM, Oracle, Qwest, Intel corporations to name a few. It is easy to envisage institutions like the National Institutes of Health (NIH) utilizing the resources of TeraGrid in the research and development of vaccines for diseases such as HIV/AIDS, for example.

In the case of resource allocation and management for grid computation, the work is mainly being done in university research labs at this point. One of the most promising systems in this area is being developed by a group of researchers at Monash University in Australia. The system is named GRACE – Grid Architecture for Computational Economy[11]. This resource management infrastructure can be used with existing grid computing systems such as Nimrod/G, Globus and others to provide an economically efficient means of allocation of resources on such systems. The architecture of the GRACE system consists of the following components, as described above:

A resource broker that can act on the behalf of a user (Nimrod/G). The role of the resource broker is to interact with the grid and try to fulfill the user’s request for resources. Nimrod/G manages applications and computational grids based on one of the economic models described previously. It does resource discovery, utilizes the Grid Trading server to trade resources, manages data, and provides an interface for the user to interact wit the grid.

The Nimrod/G resource broker architecture is illustrated below: [12]

Resource trading protocols for different economic models
Grid Market Directory – a mediator for negotiating between users and resource providers. It corresponds to the Grid middleware services layer mentioned above.
Grid Trading Server – This piece of software does the task of carrying out the transaction between the broker acting on behalf of the user and the resource provider
Specification of Pricing Policy – This identifies which economic model is to be used for trading of resources.
Accounting and payment systems

The GRACE system is being used in the development of the DesignDrug@Home application, which is a "virtual laboratory for Molecular Modeling for Drug Design" on a peer-to-peer grid[13]. This system can be used for drug discovery screening and it is the culmination of both syntheses I have discussed in this paper - synthesis of computation and biology in drug discovery, and use of market economy models in the allocation of resources in the peer-to-peer grid. It is not commercially applicable, but it provides a framework or reference for the development of real-world applications.

Evaluation of syntheses

In the area of drug discovery and research, grid computing is here to stay. It has proved itself already as a cheap alternative to more expensive supercomputers in providing massively parallel computing power for this area. The Anthrax screening project discussed earlier is one prime example of this. This process was judged to be a success because it drastically reduced the time it would have taken to complete if grid computing was not used. It was mentioned that without the technique of grid computing, the process would have taken years instead of the 24 days that it took.[14]

Another example of the power of grid computing comes if we consider how long it will take to screen 200,000 compounds as part of a drug design. Assuming each job takes 2 hours of time, it would take a total of 400,000 hours or around 45 years. Using a cluster-based super computer with 64 nodes would take just under a year. A large-scale grid of supercomputers would solve the problem in a day. However, one can use a WWG model of peer-to-peer machines to solve the problem in a few hours.[15]

There are numerous volunteer projects involving grid computing in drug discovery including FightAIDS@HOME, computeagainstcancer.org, Intel-United Devices Cancer Research Project etc. While none of these projects have yet to find a cure for these deadly diseases, yet they provide valuable computing power that would be impossible to obtain otherwise.

Other efforts involve molecular docking, protein synthesis and so on. Thus in this regard, the synthesis is already a success and will continue to be so. Non-drug discovery test beds include SETI, Nasa IPG, EcoGrid and Gusto.

The jury is still out as to which model for allocating resources in a grid framework will eventually prevail. However, it seems as if some model of it will have to be adopted for grid computing to become ubiquitous and to provide resources on tap. Assuming that the Internet becomes a utility, the laws of supply and demand will have to hold sway so that these resources will be allocated in the most efficient manner. Thus, one can say that a computational economy is necessary for the survival of the WWG. In the meantime, the results in the laboratory are very promising.

Future of Grid Computing

Grid computing seems to be a possibly lucrative industry. This can be seen from the field, of entrants into the field, many of whom are leading industry lights. Among them are IBM, Sun Microsystems, and Intel. All these companies have Grid Computing initiatives into which they are pouring millions of research dollars. For example, Sun already has a Grid Engine software product that it offers to the life sciences market. The life sciences computing market alone was a staggering $2.54 Billion, and grid computing accounted for a measly 3% of this share. It is forecasted to grow to $4Billion by 2007[16]. Under a conservative assumption of 25% market penetration by grid computation, this makes it a $1Billion market in life sciences alone! Thus it seems as if there is a market opportunity for grid computing. It lowers costs for companies, levels the playing field and provides super-computing power to companies who would otherwise be unable to afford it.

Conclusions

It seems like a safe bet to assert that grid computing will become the primary means of achieving high-performance computing, assuming current trends continue and the research continues to yield promising results. The development of the TeraGrid by the NSF will only serve to accelerate and enhance this trend. (Recall that it was the investment by the US Government that enabled the development of the Internet).

Also, this grid infrastructure will utilize economic-based management systems to manage the resources of the grid and facilitate the exchange of resources between providers and users of the grid. It is also a safe bet that once grid systems become widely available and accessible, e.g. TeraGrid, easy to use, scalable end user applications will be developed and become prominent. The impact of a World Wide Grid will be to the 21^st century what the electric grid was to the 20^th century - it will provide high performance computation capability 'on tap'. We are only witnessing the beginnings of this worldwide revolution in computing that will shape the future of the 21^st century.

Grid Systems – Computing meets Economics and the Life Sciences EMTM 650 - Emerging Technologies

Introduction

Grid Systems – Computing meets
Economics and the Life Sciences

EMTM 650 - Emerging Technologies