Olufemi
Anthony
[email protected]
4/6/02
Grid
computing provides one of the most compelling examples of technological
synthesis to emerge in the computer science and engineering field for some
time. Computational grids and peer-to-peer systems are being used to solve
large-scale problems in business, science, and engineering. Problems involving
drug discovery, intractable mathematical problems are being tackled with grid
computing. The best-known real-world example of such a problem is that of SETI,
the Search for Extra-Terrestrial Intelligence, which I constantly run on my
home computer as a screensaver when my terminal is idle.
Description
Grid computing involves harnessing the computational power of disparate,
heterogeneous, geographically separated computer systems to solve large
problems. These systems are often different in many ways, including
Platform
e.g. PC, workstations, super computers, workstations
Hardware
resources CPU, I/O, memory
Software
applications
The
grid is an architecture that encompasses the following elements:
There
are 2 aspects of synthesis I would like to highlight in this paper. One
involves what one might call an extraneous or enabling synthesis, whereby grid
computing is used to enhance another field of endeavor. Grid computing is
proving to be extremely valuable in the field of drug discovery and biological
computation. It is enhancing the field of bio-informatics and is helping to
reduce the computation time needed for drug discovery screening by many orders
of magnitude.
One
real world example of this synthesis was the screening phase of the Anthrax
Research project, a collaborative effort between United Devices, a grid
computing company, Department of Defence, IBM. Using grid computing,
participating members used their computers to help in screening 3.57 billion
molecules in the search for a drug to treat advanced-stage anthrax.[1]
The process was completed in 24 days.
I also discuss an externally influenced synthesis, whereby the grid computing
is influenced and enhanced by another field. This synthesis involves a new
paradigm involving the marriage of economics to computation.
Managing the resources in a grid system while satisfying the needs of multiple,
similarly disparate, geographically dispersed users is a non-trivial task.
There are various models for grid resource management architecture including[2]:
The current systems for resource management and scheduling utilize conventional
models whereby jobs are scheduled based on simple cost functions. The cost
functions are focused more on system parameters such as throughput, CPU
utilization rather than on utility of application processing. For example, in
such systems resources have constant cost, regardless of time and availability.
Such systems, while suitable for small to midsize dedicated grid systems turn
out to be inadequate for large-scale Internet scale grid systems. This is
because on such a scale, the grid becomes a utility and users have to compete
with each other for resources. In such a scenario, a purely centralized
system-centric view does not allocate scarce resources in the most efficient
manner since the system is itself a highly decentralized and distributed system
with many heterogeneous parameters. It turns out that a promising model for
such resource management or large grids comes from the field of economics using
a market-based approach. This problem is aptly modeled in a market-economy
framework for it involves demand and supply of limited resources with various
constraints.
There are many benefits to economy-based resource management including the
following: [3]
In
the computational market framework, the resource owners are modeled as
producers/suppliers and the resource users as consumers whose interactions
follow economic laws of supply and demand. A user/consumer is in competition
with other users and a resource owner with other resources owners. However, the
resources being exchanged are in the computational domain.
An
architecture for grid computational economy consists of the following elements:
The basic architecture for such an economy is shown below [4]:

Brief
Description of Grid Computing Economic Models
Some
of the economic models that are being investigated for managing resources in a
grid computing environment include[5]:
Of these, the commodity market,
auction, tendering and proportional resource sharing models are the ones that
are being most widely researched.
Commodity Market
In this model, services are supplied at the market equilibrium price based on
laws of supply and demand.
Auction
In this model a single seller negotiates with multiple consumers with the
seller trying to obtain the best price as in a real-world auction.
Tendering
This model is based on the
contracting/bidding process used by businesses to govern the exchange of goods
and services. In this case, a user submits a request for computing resources
and the sellers submit bids in an attempt to win the contract.
Proportional resource sharing
In this model users are allocated
credits that they use to access resources. Resources are allocated to users
based on the value of their bid in relation to the bids of other users.
Development
issues
Grid computing has had many hurdles to overcome on its way to becoming a viable
field.
The
development of the Internet is the primary factor that has made grid computing
possible. Without a network such as the Internet, these disparate computational
resources could not connected to each other to make the process possible. Also,
the Internet serves as the super-highway to connect idle machines whose CPU
time can then be utilized as part of the World Wide Grid (WWG).
In
line with the development of the Internet, the development of open, standard
networking protocols such as TCP/IP, and the Java language, which is fast
becoming the language of network computing has enabled the interoperability of
applications across geographically disparate, heterogeneous systems. This is
another critical factor that helps make grid computing possible. Also, the advent of secure networking
protocols such as SSL and large key encryption make users more confident about
outsourcing their large computing jobs to resource providers on the grid.
Before
the advent of the Internet, machines could not be efficiently networked
together to create the World Wide Grid that is now developing. This grid is now
being used for projects such as SETI, Fight Against AIDS etc.
Before
the advent of a networked language such as Java, interoperability of
applications across computers on this emerging WWG would have been impossible.
Given that most machines either have a Java Virtual Machine (JVM), or can
easily download one, writing applications that will run on almost any platform is
much easier.
Commoditization
of personal computers have made them ubiquitous in society. There is now a
trend to move away from proprietary supercomputers to those based on commodity
hardware and software components.[6]
This has led to research in the areas of cluster technology by industry
stalwarts such as IBM and Sun Microsystems who have poured millions of dollars
into such projects.
Without
the overcoming of these hurdles above lack of a distributed network to
connect machines together, lack of open standards and a common language like
Java, the synthesis between computation and biology for use in drug discovery
would not be possible. Projects such as the Anthrax research screening could
not be done in the short length of time, even on the massively parallel
supercomputers of today. Also, development of applications, especially graphics
applications has contributed significantly to making drug-modeling possible.
Another
hurdle that has to be overcome and which I deal with at length in this paper
has to do with resource allocation, management and scheduling of computing
resources across the grid. This is currently missing in most of the grid
systems currently in use. This is an important hurdle, because for the field to
become a commercial success, it has to have sound economic underpinnings that
will compel users to make use of the grid, as well as incentivize suppliers to
provide resources for the grid. It is only now that a viable marketplace model
is being developed for the regulation of resource demand and supply in
computational grid systems.
Development Horizon
The
synthesis between computing and biology for use in drug discovery has been long
in the making. The Internet was first developed in the 1960s as Arpanet, but it
was not until the 1990s that it became the info superhighway that we now know
it to be. Similarly, the PC only came into its own in the late 1980s and it
became a mass-market commodity item only in the late 1990s. Thus one can say
that this synthesis has now become possible only after 30 years of development
of the Internet and PC industry, as well as advances in networking protocols,
processing power and software applications.
The
current research that is being carried out that marries computing and economics
for resource allocation in computational grid systems has a much shorter
lifespan. It is only with the advent of grid computing itself, and the lack of
a viable economic model that is driving the development of a market-based
resource management architecture and framework. Thus the work has been done on
this synthesis for about 2 years and is still ongoing.
Drivers of the
syntheses
Drug companies are the main drivers behind the use of grid computing to shorten
the screening times for drug discovery. These companies wish to reduce their
costs, and grid computing seems to be a viable cheaper alternative to renting
expensive computer time on a massively parallel machine or purchasing one.
Companies such as Celera Genomics (in partnership with grid computing provider
Parabon, Platform Computing), Entelos[7]
a disease modeling firm, Glaxo Smith-Kline[8],
Merck, Johnson and Johnson[9]
are all spearheading efforts to use grid computing technology in for high
performance computing a various capacities in the pharmaceutical industry. The
federal government is also playing a large role, since it announced plans for
the National Science Foundation to help fund and develop TeraGrid[10],
which would provide massive computing resources as a utility over the Internet.
TeraGrid is being developed in cooperation with IBM, Oracle, Qwest, Intel
corporations to name a few. It is easy to envisage institutions like the
National Institutes of Health (NIH) utilizing the resources of TeraGrid in the
research and development of vaccines for diseases such as HIV/AIDS, for
example.
In
the case of resource allocation and management for grid computation, the work
is mainly being done in university research labs at this point. One of the most
promising systems in this area is being developed by a group of researchers at
Monash University in Australia. The system is named GRACE Grid Architecture
for Computational Economy[11].
This resource management infrastructure can be used with existing grid
computing systems such as Nimrod/G, Globus and others to provide an
economically efficient means of allocation of resources on such systems. The
architecture of the GRACE system consists of the following components, as
described above:
The Nimrod/G resource broker
architecture is illustrated below: [12]

The
GRACE system is being used in the development of the DesignDrug@Home application, which is a
"virtual laboratory for Molecular Modeling for Drug Design" on a
peer-to-peer grid[13]. This
system can be used for drug discovery screening and it is the culmination of both
syntheses I have discussed in this paper - synthesis of computation and biology
in drug discovery, and use of market economy models in the allocation of
resources in the peer-to-peer grid. It is not commercially applicable, but it
provides a framework or reference for the development of real-world
applications.
Evaluation of
syntheses
In
the area of drug discovery and research, grid computing is here to stay. It has
proved itself already as a cheap alternative to more expensive supercomputers
in providing massively parallel computing power for this area. The Anthrax
screening project discussed earlier is one prime example of this. This process
was judged to be a success because it drastically reduced the time it would
have taken to complete if grid computing was not used. It was mentioned that
without the technique of grid computing, the process would have taken years
instead of the 24 days that it took.[14]
Another
example of the power of grid computing comes if we consider how long it will
take to screen 200,000 compounds as part of a drug design. Assuming each job
takes 2 hours of time, it would take a total of 400,000 hours or around 45
years. Using a cluster-based super computer with 64 nodes would take just under
a year. A large-scale grid of supercomputers would solve the problem in a day.
However, one can use a WWG model of peer-to-peer machines to solve the problem
in a few hours.[15]
There
are numerous volunteer projects involving grid computing in drug discovery
including FightAIDS@HOME, computeagainstcancer.org, Intel-United
Devices Cancer Research Project etc. While none of these projects have yet to
find a cure for these deadly diseases, yet they provide valuable computing
power that would be impossible to obtain otherwise.
Other
efforts involve molecular docking, protein synthesis and so on. Thus in this
regard, the synthesis is already a success and will continue to be so. Non-drug discovery test beds include SETI,
Nasa IPG, EcoGrid and Gusto.
The
jury is still out as to which model for allocating resources in a grid
framework will eventually prevail. However, it seems as if some model of it
will have to be adopted for grid computing to become ubiquitous and to provide
resources on tap. Assuming that the Internet becomes a utility, the laws of
supply and demand will have to hold sway so that these resources will be
allocated in the most efficient manner. Thus, one can say that a computational
economy is necessary for the survival of the WWG. In the meantime, the results
in the laboratory are very promising.
Future
of Grid Computing
Grid
computing seems to be a possibly lucrative industry. This can be seen from the
field, of entrants into the field, many of whom are leading industry lights.
Among them are IBM, Sun Microsystems, and Intel. All these companies have Grid
Computing initiatives into which they are pouring millions of research dollars.
For example, Sun already has a Grid Engine software product that it offers to
the life sciences market. The life sciences computing market alone was a
staggering $2.54 Billion, and grid computing accounted for a measly 3% of this
share. It is forecasted to grow to $4Billion by 2007[16].
Under a conservative assumption of 25% market penetration by grid computation,
this makes it a $1Billion market in life sciences alone! Thus it seems as if
there is a market opportunity for grid computing. It lowers costs for
companies, levels the playing field and provides super-computing power to
companies who would otherwise be unable to afford it.
Conclusions
It
seems like a safe bet to assert that grid computing will become the primary
means of achieving high-performance computing, assuming current trends continue
and the research continues to yield promising results. The development of the
TeraGrid by the NSF will only serve to accelerate and enhance this trend.
(Recall that it was the investment by the US Government that enabled the
development of the Internet).
Also,
this grid infrastructure will utilize economic-based management systems to
manage the resources of the grid and facilitate the exchange of resources
between providers and users of the grid. It is also a safe bet that once grid
systems become widely available and accessible, e.g. TeraGrid, easy to use,
scalable end user applications will be developed and become prominent. The
impact of a World Wide Grid will be to the 21st century what the
electric grid was to the 20th century - it will provide high
performance computation capability 'on tap'. We are only witnessing the
beginnings of this worldwide revolution in computing that will shape the future
of the 21st century.
[1] Anthrax Research Project http://members.ud.com/projects/anthrax
[2] Rajkumar Buyya, Steve Chapin, David DiNucci: Architectural Models for Resource Management in the Grid - http://www.csse.monash.edu.au/~rajkumar/papers/gridmodels.pdf
[3] Rajkumar Buyya, David Abramson, Jonathan Giddy: Economy Driven Resource management Architecture for Global Computational Power Grids - http://www.csse.monash.edu.au/~rajkumar/papers/GridEconomy.pdf
[4] Rajkumar Buyya, GRid Meets Economics http://www.csse.monash.edu.au/~rajkumar/talks/euroglobus.ppt
[5] Rajkumar Buyya, David Abramson, Jonathan Giddy: - Economic Models for Resource Management and Scheduling in Grid Computing http://www.csse.monash.edu.au/~rajkumar/papers/emodelsgrid.pdf
[6] Buyya R., Vazhkudai S, Compute Power Market: Towards a Market-Oriented Grid: http://www.buyya.com/papers/cdm.pdf
[7] Grid Computing Planet: Grid Computing To Power Disease Simulation Project - http://www.gridcomputingplanet.com/news/article/0,,3281_948561,00.html
[8] CIO Insight: Grid Computing Gathers Speed http://www.cioinsight.com/article/0,3658,s%253D301%2526a%253D20213,00.asp
[9] Business 2.0, Dec 2001: Give it to a Grid http://www.business2.com/articles/mag/0,1640,35005,FF.html
[10] NSF TeraGrid Project: Teragrid Website - http://www.teragrid.org
[11] Rajkumar Buyya, David Abramson, Jonathan Giddy: Economy Driven Resource management Architecture for Global Computational Power Grids - http://www.csse.monash.edu.au/~rajkumar/papers/GridEconomy.pdf
[12] ibid 4.
[13] ibid 4.
[14] ibid 1.
[15] Rajkumar
Buyya, Kim Branson, David Abramson, Jonathan Giddy: The Virtual laboratory:
A toolset to enable Distributed Molecular Modeling for Drug Design on the World-Wide
Grid
[16] IT Vendors Hyping Grid and Distributed Computing Still Find Pharma a Hard Sell: http://www.cognigencorp.com/content/aboutus/articles/bioinform_article.html