(As an aside: If you didn't peruse it when you saw the link off a page from the last lesson, you might be interested in this tutorial on: COMPUTER NETWORKING BASICS. It is pretty good, and if you are interested in knowing a little more about networking, its well worth pursuing.)
We all understand somewhat how Internet applications work: (many of you will probably be familiar with only e-mail and The Web)
And indeed, this is how almost all Internet interchanges happen.� What we want to look at in this section though is the first part and second part of that process. (You will be playing with the 3rd part in your exercise associated with this lesson.) How does one system on the Internet find another one?� The answer is something called an IP address (alternate definition), it is a unique identifying number assigned to a particular host which is connected to the Internet This number (by the current scheme, which is being replaced by a� new scheme called CDIR which is tied to the completion of and adoption of the next generation of the IP protocol scheme).� The IP address is simply a 4 part number, each part is separated by a period and can take a value between 0 and 255, e.g. the IP address for your MSU Blackboard site is: 147.133.1.23 .� These IP addresses are the real way to find your path on the Internet and are, as the name implies part of the IP protocol.
So what about all those funny addresses and names on the Internet; "after all", you might ask, "I don't get to yahoo by going to some strange 4 part number, but by simply going to www.yahoo.com" (Actually you could use a funny number to get to Yahoo. One of the funny numbers that will get you to Yahoo is 216.115.108.245. We will use it later in your exercise.) The next question then is how do the the Internet names of sites and companies get assigned, and once assigned how are they related to those IP addresses that are the real addressing system on the Internet?
First of all we should start using proper terminology, the core part of any Internet site name is called the "domain name" for that site, and this is associated with a certain IP address range called the "domain". The whole process of assigning IP addresses and also assigning domain names is managed by an organization called InterNIC (alternate definition). I'm sure as a user of the Internet you will have noticed that most site names end in one of about 5 ways:
These and other possible such endings are called top-level domains (alternate definition). (There have been a couple of pushes over recent years to construct more abbreviations for these top level domains.) When someone wants to start a new Internet service the first thing they must do is register a domain name for themselves. There are many companies today that offer domain name registration services. In order to complete the registration process though you must have a place to "park", as they say, your domain name. What this means is that every domain must have an associated IP address or address range, after all domain names are only mnemonics for the underlying IP addresses they point to. (FAQ about domain name registration) There are many stories about domain name theft, domain name rights and domain name squatting. Such controversies are inevitable since the domain name is the public face of whatever organization it represents, and if an organization has an established word, name or phrase that represents them (or that they might have trademarked) they will want to own any domain name that is associated with this word or phrase. The domain name is only a part of the address need to identify a particular host on the Internet, you also need what is called the hostname. Since the domain, and its associated domain name, can describe a whole network and all of the network services available in that network, the hostname is that part which identifies particular host or Internet service in the domain. Thus to uniquely identify the particular Internet host, or server or service you are looking for you need what is called a fully qualified domain name (FQDN) with a hostname, domain name and top-level domain name all put together, e.g. "www.yahoo.com". As you can see from the example the parts of an FQDN (In Internet lingo the whole FQDN is sometimes referred to as the "hostname", which if you consider the Internet as one big network would be a correct usage of the word.) are strung together with periods separating them just like an IP address.
Now, we are getting close to putting it all together. We know that computers are really identified on the Internet by their IP address, but that users generally reference them using their hostname (we will start using the word "hostname" from here forward to refer to an FQDN on the Internet, since the only network we will be talking about from here on out is the Internet). So how are the 2 things related? There obviously must be some system somewhere that translates hostnames into IP addresses. Indeed there is such a system, and it is called the DNS (alternate definition) ("Domain Name System", or if you are referring to a particular machine which does that translation DNS can stand for Domain Name Server). So each domain, and all its associated hostnames, must be hosted and entered into a domain name server somewhere for it to be accessible on the Internet Not all domains run their own domain name server though, for instance if I go rent some space with an Internet Service Provider, ISP, and have this ISP host my domain, they will use their systems DNS to host my domain and many others as well. However, a large organization like MSU runs its own DNS as do all such organizations that run and manage their own network hardware.
We are now only missing one small piece of this network naming and addressing system, and that is the concept of a network port. Just like IP addresses are a built in addressing scheme for the IP layer of the Internet protocol stack, ports are an addressing scheme for the next protocol layer the TCP layer. If an IP address specifies a unique host computer system, then the finer gradations of ports must specify some subsystem of that computer. Specifically these ports are logical (in computer lingo this just means they are virtual or not physical ports) access points into separate network services that the host system makes available to the TCP/IP protocol stack (i.e. the Internet). So various network applications on the specified hosts system are said to "listen" to on a particular port; this means they are waiting for some other host system on the Internet to request whatever type of service that application provides.
Maybe you are wondering if these ports are an essential part of how we access TCP/IP (i.e. Internet) applications and services how come I don't need to specify one when I access a service like www.yahoo.com's web pages. The answer to this is because there are a large number of these port numbers for which everyone has agreed to use a specific port number for accessing a specific Internet service, and most Internet clients, like your web browser, know and use these ports by default if the user does not specify a different port. So unbeknownst to you, your Internet client software has been dabbling with the Internet addresses you specify, quietly adding the appropriate port numbers to the hostnames when you forgot or neglected to. Here is a short list of the most common of these accepted port numbers:
|
Internet Application Protocol |
Port Number |
|---|---|
| FTP - File Transfer Protocol |
21 |
| TELNET |
23 |
| SMTP - Simple Mail Transfer Protocol |
25 |
| Gopher |
70 |
| Finger |
79 |
| HTTP - Hyper Text Transfer Protocol - Used for WWW |
80 |
| POP3 - Post Office Protocol |
110 |
| NNTP - Network News Transfer Protocol - Net News |
110 |
| HTTPS - Secure HyperText Transfer Protocol - Used for WWW |
443 |
(In a section titled "WELL KNOWN PORT NUMBERS" RFC 1700 has a fairly extensive listing of commonly accepted port numbers.) All these standardized port numbers will be below 1024, so system and Internet service administrators should never use numbers below 1025 when assigning their own port numbers to services.
Now we know how to address a specific host (via hostname or IP address) and any network services that host offers (via the appropriate port number). The next step would be to access specific data, information or other resources available through that network service, e.g. we want to access a specific web page from web server running a certain host machine. The protocol used to find individual pieces of information or resources is called URL, Uniform Resource Locator (quite aptly named wouldn't you say?).� So, the URL is not actually a way to address a host but rather a way to locate a particular resource on that host.� In fact the hostname is only a small part of what makes up a URL.��( You should peruse these two links: 1. A Beginner's Guide to URLs, 2. Web Naming and Addressing Overview (URIs, URLs, ...).)
The most basic structure of a URL is:
Scheme:Source
, where the scheme is what we might call the
protocol or service being referenced for the resource, and the source is some
reference to the location of that resource on the host system. The most common
schemes are:
source vary considerably depending upon the scheme
type and the resource being sought. For two of the most common protocols the
source looks as follows:
Many of these parts are optional and proper URLs come in a wide variety of shapes and sizes combining any number of these parts. Some examples would be:
source part,
only the hostname is specified. This still works because (as we will learn
more about later when we discuss web sites and web servers) there is a
default document called index.html that all servers will try to
serve as the default page if none other is specified. So, this URL is
equivalent to http://www.yahoo.com/index.html
searchpart of the URL (Can you lookup, or guess which
country this site is hosted in from its URL?)
The TCP (Transport Control Protocol) has 2 major application protocols essentially built as a part of it. These are Telnet and FTP. What does this mean for us? Well, since the Telnet application protocol is built into the TCP protocol, and any other application protocols stack on top of TCP (see the diagrams of OSI Internet network model from last lesson) the telnet application has access to all other application protocols on top of the TCP layer. Therefore it is via this telnet protocols that we can access other server applications (via their appropriate port number of course) running on a host machine.
We will do just this in the exercise associated with this lesson. We will use the telnet protocol to issue commands, through a network port, to the application that handles HTTP protocol, the HTTP server (otherwise known as a web server). We will in essence become a HTTP client (Better known as a web browser) and interact directly with the web server over the network. (An exercise for you to use some FTP commands will come next week.)
If, while you are doing the exercise you feel a bit like a hacker, you are right in some respects. This access to application protocols through FTP and telnet protocols is a HUGE vulnerability in networked systems. Almost all hacking of systems, and attacks on networked systems will happen through these protocols. The recent scare about the viruses attacking Microsoft web servers was a worm which propagated itself using these same protocols.
As mentioned above the protocol that is used to communicate between web client and web server is the Hypertext Transfer Protocol, HTTP, (and its more paranoid and secure cousin HTTPS). The most current version of this protocol is version 1.1 and it is specified in RFC 2616 (Read and understand at least the sections 1.4, 3.3.1 (You'll need these standard data formats for properly making good web pages), 4.1-4.3, 5.1-5.3, 6.1.1, 9.3-9.6)
This protocols specifies the steps that make up a Web transaction between client and server and how they both will exchange information. Most of the information they exchange via this protocol is in the form of statements in what is called the "header" of the transaction. The protocol provides, via these headers, the capabilities for both the client and the server to send information about themselves and the resources they are going to transmit or request.
The basics steps of an HTTP transaction are:
The 2 most common HTTP request methods (request methods are the ways a client
asks for resources from a server, see section 5.1 of RFC 2616) are
GET and POST. (N.b. all web designers specify one of
these methods or the other whenever they make a form in an HTML page. We will
discuss the difference between these 2 methods later when we discuss forms and
server-side scripts.) These are the 2 request methods that return web pages to a
web browser. Two other common methods are
HEAD:
PUT:
When the server issues a response the first thing it does is issue a response code to which tells of the success or failure of the request. This is called the status-code. The first digit of the Status-Code defines the class of response. There are 5 values for the first digit:
You can see all of the possible status codes listed in section 6.1.1 of RFC 2616 and all of section 10 of that RFC is dedicated to defining these status codes in detail.
Data about the server, the client, the resource the client is requesting, and the resource the server is returning are all embedded in the HTTP header. This data is sent in various HTTP header fields. There are too many possible Header fields to list here, but you can see a number of them (both client are server) from a little script I have which returns all of the HTTP header fields that the script sees from both client and server. An important one that is not in this list is the "Last-Modified" Header data. All of the possible header fields are discussed in RFC 2616
Most Web designers rarely see the HTTP protocol because the web server, which delivers their pages, and the users web client which views their pages handle all of the HTTP transparently in the background to properly deliver the page. However, even a simple web page designer can benefit from knowing it since there are ways to embed HTTP into a web page to force certain actions from the web server and web client when they handle your page. Web programmers on the other hand can't always depend upon the client and server to assist in their transactions and so they must sometimes send HTTP commands directly as part of their scripts.
ping
www.yahoo.com . I use this all the time if I have network trouble, its
a simple, reliable protocol that will allow me to check my network connections
without a complex client application (like a web browser) between me and the
internet.
© Copyright Scott A. Wymer, 2001. All rights reserved.