The Spread of the Code-Red Worm (CRv2)

On July 19, 2001 more than 359,000 computers were infected with the Code-Red (CRv2) worm in less than 14 hours. At the peak of the infection frenzy, more than 2,000 new hosts were infected each minute. 43% of all infected hosts were in the United States, while 11% originated in Korea followed by 5% in China and 4% in Taiwan. The .NET Top Level Domain (TLD) accounted for 19% of all compromised machines, followed by .COM with 14% and .EDU with 2%. We also observed 136 (0.04%) .MIL and 213 (0.05%) .GOV hosts infected by the worm. An animation of the geographic expansion of the worm is available.

Outline:

Background and Methodology

Around 10:00 UTC in the morning of July 19th, 2001 a random seed variant of the Code-Red worm (CRv2) began to infect hosts running unpatched versions of Microsoft's IIS webserver. The worm spreads by probing random IP addresses and infecting all hosts vulnerable to the IIS exploit. As noted by others, there are at least two variants of the worm: one that used a fixed, static seed for its random number generator, and another that used a random seed.In this paper, the characteristics of the random seed variant of Code-Red (CRv2) are examined.

This analysis covers spread of the worm during the 24 hour period beginning July 19th at midnight UTC. The data used for this preliminary study were collected from two locations: a /8 network at UCSD and two /16 networks at Lawrence Berkeley Laboratory (LBL). Two types of data from the UCSD network are used to maximize coverage of the expansion of the worm. Between midnight and 16:30 UTC, a passive network monitor recorded headers of all packets destined for the /8 research network. After 16:30 UTC, a filter installed on a campus router to reduce congestion caused by the worm blocked all external traffic to this network. Because this filter was put into place upstream of the monitor, we were unable to capture IP packet headers after 16:30 UTC. However, a second UCSD data set consisting of sampled netflow output from the filtering router was available at the UCSD site throughout the 24 hour period. Vern Paxson provided probe information collected by Bro on the LBL networks between 10:00 UTC on July 19th and 7:00 on July 20th.

Host Infection Rate

Over 359,000 unique infected hosts in this 24-hour period. Hosts were considered to be infected if they sent TCP SYN packets on port 80 to nonexistent hosts on these networks. To determine the rate of host infection, the time of the first attempts of each infected hosts to spread the worm was recorded . Because the data represent a sample of all probes sent by infected machines, the number of hosts monitored provides a lower bound on the number of hosts that have been compromised at any given point in time. Figure 1 shows the number of infected hosts over time as monitored on the UCSD and LBL networks. The growth curve of the hosts measured with passively tapped packet headers is steepest because the monitor sees all packets destined for approximately 1/256th of the address space. The curve based on the netflow data grows more slowly because the netflow data represents a sampling of all traffic, and some flow export packets were dropped before being recorded. The LBL curve increases more slowly because although Bro monitors all packets, it observes a much smaller portion of the address space. Assuming random selection of addresses probed by infected hosts, a larger address space will be probed more often than a smaller one. All of the horizontal (no growth) segments on the curves are caused by periods in which collection of data. could not occur.
Figure 1

Figure 2 shows the number of infected hosts over time across all three data sources. The growth of the curve between 11:00 and 16:30 UTC is exponential, as can be seen in the logarithmic scale plot (Figure 3). On the surface, it seems to fit reasonably with the growth model for the worm infection proposed by Stuart Staniford. Discrepancies between the upper ranges of the growth model and our data are caused both by the fixed cutoff time of the worm itself and by hosts repaired or isolated throughout the day.

Figure 4 provides a more detailed view of the spread of the worm in terms of the number of new hosts seen in 1 minute periods throughout the day. Unfortunately, the peak of the initial curve occurs at about the same time that the passive monitor data became unavailable, so the duration of the 2,000 host/minute infection rate is unknown. In particular, the large spike of 7,700 hosts is an anomaly caused by a small gap in the collected netflow data that resulted in measurement of all hosts infected during the downtime at the time collection resumed. Thus the spike in the number of hosts infected is actually representative of all the hosts infected between 16:51 and 17:21 UTC. In actuality, we believe that the infection Rate from 16:30-18:00 UTC tapered smoothly.
Figure 4

Again, 359,104 hosts were compromised in approximately 13 hours. Although the growth was slowing, had the worm not been programmed to stop spreading at midnight, additional hosts would have been compromised. The infection rate would have continued to decrease once the vast majority of vulnerable machines were infected. We speculate that the memory resident status of this worm would have allowed reinfection of many hosts.

Host Deactivation

During the course of the day, many initially infected machines were patched, rebooted, or filtered and consequently ceased to probe networks for vulnerable hosts. Figure 5 shows the total number of inactive hosts over time. The majority of hosts stopped probing in the last hour before midnight UTC on July 20th. At midnight, the worm was programmed to switch from an `infection phase' to an `attack phase', so the large rise in host inactivity is due to this design. The end of day phase change can be seen clearly in Figure 6, which shows the number of newly inactive hosts per minute. As in previous graphs, the spike near 16:30 is due to a gap in data collection.

Host Characterization

To understand the attributes of hosts compromised by Code-Red (other than the fact that they were running unpatched versions of IIS on Microsoft Windows platforms), the domains, geographic locations, and top level domains (TLDs) of the infected hosts were examined . Table 1 shows the breakdown of hosts by country, as placed by ipmapper. Surprisingly, Korea is the second most prevalent source country of compromised machines, with 10.57% of all infected hosts.

Top 10 Countries
Country           #      %
--------------------------
US           157694  43.91
KR            37948  10.57
CN            18141   5.05
TW            15124   4.21
CA            12469   3.47
UK            11918   3.32
DE            11762   3.28
AU             8587   2.39
JP             8282   2.31
NL             7771   2.16

Table 2 provides a breakdown of infected machines by TLD. NET, COM, and EDU are all represented in proportions roughly equivalent to their share of all existing hosts. It was also observed that 136 MIL and 213 GOV hosts were infected by the worm. Approximately 50% of all infected hosts had no reverse DNS records, so they were unable to be classified by the their domain names. These included, for example, the 390 hosts with addresses in 10.0.0.0/8. These machines were probably on private networks and were infected via either an external interface or another machine accessible via both internal and external networks. This suggests that many more hosts on internal networks may have been compromised.

Top 10 TLDs
TLD               #      %
--------------------------
Unknown      169584  47.22
net           67486  18.79
com           51740  14.41
edu            8495   2.37
tw             7150   1.99
jp             4770   1.33
ca             4003   1.11
it             3076   0.86
fr             2677   0.75
nl             2633   0.73

Table 3 shows the individual domains with the most infected hosts. Note that the top domain names are providers of home and small business connectivity, suggesting that hosts maintained by individuals at home are an important aspect of global Internet health.

Top 10 Domains
Domain                      #      %
------------------------------------
Unknown                169584  47.22
home.com                10610   2.95
rr.com                   5862   1.63
t-dialin.net             5514   1.54
pacbell.net              3937   1.10
uu.net                   3653   1.02
aol.com                  3595   1.00
hinet.net                3491   0.97
net.tw                   3401   0.95
edu.tw                   2942   0.82

Animations

To help get a feel for the global impact of the worm and its growth, an animation of the geographic spread of the worm in five minute intervals between midnight UTC on July 19, 2001 and midnight UTC on July 20, 2001 was created . For the animation, infected hosts were mapped to latitude and longitude values using ipmapper, and aggregated by the number at each unique location. The radius of each circle is sized relative to the infected hosts mapped to the center of the circle using the formula 1+ln(total-infected-hosts). When smaller circles are obscured by larger circles, their totals are not combined with the larger circle; the smaller data points are hidden from view. Although attempts were made to identify the geographic location of each host as accurately as possible, in many cases, the granularity of the location was limited to the country of origin. These hosts are plotted at the center of their respective countries. Thus, the rapidly expanding central regions of most countries is an artifact of the localization method.

Flipbook animation of geographic spread of Code-Red worm (Preferred format 207k .fli)
Note: The recommended way to view the flipbook format is to use xanim on a Unix platform, or QuickTime Player 5 on Macintosh and Windows boxes. Use the "open URL" feature of a QuickTime player and paste in the URL.

Conclusions

The primary observation to make about the Code-Red worm is the speed at which a malicious exploit of a ubiquitous software bug can incapacitate host machines. In particular, physical and geographical boundaries are meaningless in the face of a virulent attack. In less than 14 hours, 359,104 hosts were compromised. The global Internet community dodged a bullet with the Code-Red worm: little damage was actually inflicted in the attack. The worm did no significant damage to the machines it infected. It had a preset cutoff time. Although it attempted to launch a Denial of Service (DoS) attack against www1.whitehouse.gov, it orchestrated the attack against the IP address of the server, rather than the domain name, and actually checked to make sure that port 80 at the whitehouse.gov IP address was active before launching the denial of service phase of the attack. These features made it trivially easy to disable the Denial of Service (phase 2) portion of the attack. We cannot expect such weaknesses in the design of future attacks.

This assault also demonstrates that machines operated by home users or small businesses (hosts less likely to be maintained by a professional sysadmin) are integral to the robustness of the global Internet. As is the case with biologically active pathogens, vulnerable hosts can and do put everyone at risk, regardless of the significance of their role in the population.

The Code-Red worm is a wake-up call. This exploit demonstrates clearly the need to keep machines up-to-date with security developments. This exploit also underscores the need to back up critical systems; the worm could easily have corrupted data, reformatted hard drives, or caused other irreparable damage. Indeed, in the final analysis, we should all be uncomfortable with the extent to which luck, rather than proactive diligence, maintains the stability of the Internet infrastructure.

Code-Red also provides the Internet community a chance to test its response to a virulent security threat with minimal long-term damage. There was, however, some unexpected collateral damage to infrastructure: printers, routers, switches, dsl modems, and other devices with web interfaces crashed, rebooted or were otherwise damaged by the worm's probes. We should assess our response to the attack -- How quickly and reliably can we disseminate news about the threat? How quickly can infected hosts be located, isolated, and repaired? In the case of the Code-Red worm, even windowsupdate.microsoft.com was infected, and many hosts were re-infected during attempts to patch them.

Finally, we should all be concerned that it seems to take a global, catastrophic incident to motivate us to respond to a known threat. The exploit was discovered on June 18, 2001 and the first version of the Code-Red virus emerged on July 12th, 2001. The truly virulent strain of the worm began to spread on July 17th, a full 29 days after the initial discovery of the exploit and four days after the detection of the first (static seed) attack. In the future, we cannot afford to remain complacent in the face of such blatant warnings.

Overview