|
A protocol means the rules that
are applicable for a network. Protocol defines standardized formats for data packets,
techniques for detecting and correcting errors and so on.
To understand
the concept of communication protocol, let us assume that A and B need to
talk to none another. They want to exchange their ideas. But it turns out
that, both, A and B are egoist. They start talking again simultaneously,
then pause for breath simultaneously, and then start talking again. Now imagine
the confusion and chaos. To avoid it, they must follow a set of rules while
talking. For instance, say first A must talk, then he/she must give B a
chance to put forward his/her ideas, and so on. This common set of rules
would be known as communication protocol for A and B
Thus for
effective use of a network it must follow a standardized protocol. There are
various protocols that are used in various types of networks. For example
TCP/IP, HTTP, SMTP etc. We will be discussing some of the protocols in our
next section.
4.1
TCP/IP
TCP/IP
(Transmission Control Protocol / Internet Protocol) is the main suite of
protocols used for the Internet. This set of protocols includes TCP, IP,
HTTP, FTP, PPP, and many others.
TCP/IP
was designed as an open standard, to be capable of implementation on all
types of hardware and software systems.
TCP adds
a great deal of functionality to the IP service it is layered over:
·
Streams. TCP data is organized as a
stream of bytes, much like a file. The datagram nature of the network is
concealed. A mechanism (the Urgent Pointer) exists to let
out-of-band data be specially flagged.
·
Reliable delivery. Sequence numbers are used to
coordinate which data has been transmitted and received. TCP will arrange
for retransmission if it determines that data has been lost.
·
Network adaptation. TCP will dynamically learn the
delay characteristics of a network and adjust its operation to maximize
throughput without overloading the network.
·
Flow control. TCP manages data buffers, and
coordinates traffic so its buffers will never overflow. Fast senders will
be stopped periodically to keep up with slower receivers.
Full-duplex
Operation
No
matter what the particular application, TCP almost always operates
full-duplex. The algorithms described below operate in both directions, in
an almost completely independent manner. It's sometimes useful to think of
a TCP session as two independent byte streams, traveling in opposite
directions. No TCP mechanism exists to associate data in the forward and
reverse byte streams. Only during connection start and close sequences can
TCP exhibit asymmetric behavior (i.e. data transfer in the forward
direction but not in the reverse, or vice versa).
Sequence
Numbers
TCP uses
a 32-bit sequence number that counts bytes in the data stream.
Each TCP packet contains the starting sequence number of the data in that
packet, and the sequence number (called the acknowledgment number)
of the last byte received from the remote peer. With this information, a
sliding-window protocol is implemented. Forward and reverse sequence
numbers are completely independent, and each TCP peer must track both its
own sequence numbering and the numbering being used by the remote peer.
TCP uses
a number of control flags to manage the connection. Some of these flags
pertain to a single packet, such as the URG flag indicating valid data in
the Urgent Pointer field, but two flags (SYN and FIN),
require reliable delivery as they mark the beginning and end of the data
stream. In order to insure reliable delivery of these two flags, they are
assigned spots in the sequence number space. Each flag occupies a single
byte.
Window
Size and Buffering
Each
endpoint of a TCP connection will have a buffer for storing data that is
transmitted over the network before the application is ready to read the
data. This let's network transfers take place while applications are busy
with other processing, improving overall performance.
To avoid
overflowing the buffer, TCP sets a Window Size field in each
packet it transmits. This field contains the amount of data that may be
transmitted into the buffer. If this number falls to zero, the remote TCP
can send no more data. It must wait until buffer space becomes available
and it receives a packet announcing a non-zero window size.
Sometimes,
the buffer space is too small. This happens when the network's
bandwidth-delay product exceeds the buffer size. The simplest solution is
to increase the buffer, but for extreme cases the protocol itself becomes
the bottleneck (because it doesn't support a large enough Window Size).
Under these conditions, the network is termed an LFN (Long Fat Network -
pronounced elephant).
Round-Trip
Time Estimation
When a
host transmits a TCP packet to its peer, it must wait a period of time for
an acknowledgment. If the reply does not come within the expected period,
the packet is assumed to have been lost and the data is retransmitted. The
obvious question - How long do we wait? - lacks a simple answer. Over an
Ethernet, no more than a few microseconds should be needed for a reply. If
the traffic must flow over the wide-area Internet, a second or two might be
reasonable during peak utilization times. If we're talking to an instrument
package on a satellite hurtling toward Mars, minutes might be required
before a reply. There is no one answer to the question - How long?
All
modern TCP implementations seek to answer this question by monitoring the
normal exchange of data packets and developing an estimate of how long is
"too long". This process is called Round-Trip Time (RTT)
estimation. RTT estimates are one of the most important performance
parameters in a TCP exchange, especially when you consider that on an
indefinitely large transfer, all TCP implementations eventually
drop packets and retransmit them, no matter how good the quality of the
link. If the RTT estimate is too low, packets are retransmitted
unnecessarily; if too high, the connection can sit idle while the host
waits to timeout.
Internet
Protocol
·
The IP protocol is responsible for routing the packages created by
TCP.
·
IP is a connectionless
protocol. It is not concerned with whether or not the data actually reaches
the recipient, just with moving that data to its designated destination.
·
IP adds a header to the datagram created by TCP, resulting in a
total of two different headers added to the original source data.
·
The IP header includes the following information:
·
A checksum to provide a
means of checking data integrity at each stopover point.
·
A hop count or time to
live, which determines the maximum number of hops a package can make.
·
Both source and destination addresses are also included in the IP
header.
·
The IP protocol is used to determine the route a data packet will
take to its destination. If the destination IP address is not known by the
local gateway, that gateway will pass the packet on to its default gateway.
This process will continue until the desired destination is reached.
·
Using IP, different datagrams from a
single data source may take different routes to their destination, thus
causing some packets to arrive out of order. To avoid this randomness, it
is also possible to prescribe a set route for the data to take.
IP
Addressing
·
In order to participate in a TCP/IP network, each computer (or host) must have a unique IP
address. These addresses may be automatically assigned using DHCP (Dynamic Host Configuration
Protocol) or manually entered into the host computer.
·
An IP address is made up of a single 32-bit number (meaning it has
32 ones or zeros). This number is usually divided into four 8-bit segments
separated by dots. Each 8-bit segment has a value between 0 and 255.
Example: 011111111.00111111.00011111.00000111
= 127.63.31.7.
·
Dotted decimal notation refers to writing IP
addresses using four decimal numbers (numbers between 0 and 255) separated
by dots.
·
The first portion of an IP address is usually used to identify the
network, while the second portion identifies a particular machine within
that network.
·
An IP address composed of the network portion of the IP followed by
all zeros identifies the network itself. Example: 192.168.0.0 refer to the 192.168
network.
·
An IP address composed of the network portion of the IP followed by
all 255s is called a broadcast
address. Example: A packet
addressed to 192.168.255.255 would be delivered to every machine on the
192.168 network.
·
The IP address 192.168.x.x is reserved for private networks.
·
The current version of IP addressing is IPv4 (version 4) and allows
over 17 million addresses, which is proving insufficient. A new version,
called IPng
(IP next generation) or IPv6 is
currently being phased in and will provide more IP addresses (over 70 octillion).
·
IP addresses are divided into the following classes:
·
Class A: Highest-order bit set to zero;
IP address range from 1.x.x.x to 126.x.x.x; first octet makes up the
network portion of the IP address. There may be 127 class
A networks, each having up to 16,777,214 connected hosts. All Class A
networks are currently taken.
·
Special: The address 127.0.0.1 is
reserved for loopback tests.
·
Class B: Highest order bit set to 10; IP
address range from 128.0.x.x to 191.255.x.x; first two octets make up the
network portion of the IP address. There are no Class B addresses free.
·
Class C: Highest order bits set to 110;
IP address range from 192.0.0.x to 223.255.255.x; first three octets
determine network portion of IP address.
·
Class D: Highest order bits set to 1110;
used exclusively for multicasting
(delivery to a group of host computers.
·
Class E: Highest order bits set to 1111;
reserved for experimental use.
·
A new addressing scheme called CIDR
(Classless Inter-Domain Routing Scheme) breaks down IP addresses into
segments smaller than class C to fit the needs of different companies.
Subnet
Masks
·
A subnet mask is a way of
dividing a single network into multiple physical networks by reallocating
the hosts portion of the IP addressing scheme. The new IP address scheme
has a network portion, a subnet portion, and a host address that is shorter
than under the original scheme.
·
Subnets help reduce network traffic by keeping local traffic on one
side of a router and isolating the information from the LAN on the other
side of the router.
·
A router must be used to implement a subnet scheme.
·
To define a subnet mask, convert the network portion of the IP
address into binary notation. Next, select the number of binary digits to
use for the subnet mask. Finally, calculate the new dotted decimal ranges
available under each subnet.
Example:
·
Key:Network; Subnet; Host
·
IP Network Address:
172.25.16.x
·
Binary IP Network Address:
10101100 00011001 00010000 xxxxxxxx
·
Add Subnet Mask:
10101100 00011001 0001000 11xxxxxx
·
Four New Subnets Available:
A.10101100 00011001 0001000
00xxxxxx
B.10101100 00011001 0001000
01xxxxxx
C.10101100 00011001 0001000
10xxxxxx
D. 10101100 00011001 0001000
11xxxxxx
·
Dotted Decimals of New Subnets:
A.172.25.16.0 to 172.25.16.63
B.172.25.16.64 to 172.25.16.127
C.172.25.16.128 to 172.25.16.191
D.172.25.16.192 to 172.25.16.255
·
On a subnet, the first available address in the subnet class is the
new network number and the last available address is the new broadcast
number.
Example: In subnet A above, 172.25.16.0 is the
network number and 172.25.16.63 is the subnet broadcast number.
4.2
UDP
UDP
provides users access to IP-like services. UDP packets are delivered just
like IP packets - connection-less datagrams that
may be discarded before reaching their targets. UDP is useful when TCP
would be too complex, too slow, or just unnecessary.
UDP
provides a few functions beyond that of IP:
·
Port Numbers. UDP provides 16-bit port numbers
to let multiple processes use UDP services on the same host. A UDP address
is the combination of a 32-bit IP address and the 16-bit port number.
·
Checksumming. Unlike IP, UDP does checksum its data, ensuring data integrity. A
packet failing checksum is simply discarded, with no further action taken.
4.3
HTTP
Purpose
The
Hypertext Transfer Protocol (HTTP) is an application-level protocol for
distributed, collaborative, hypermedia information systems. HTTP has been
in use by the World-Wide Web global information initiative since 1990. The
first version of HTTP, referred to as HTTP/0.9, was a simple protocol for
raw data transfer across the Internet. HTTP/1.0, improved the protocol by
allowing messages to be in the format of MIME-like messages, containing meta-information
about the data transferred and modifiers on the request/response semantics.
However, HTTP/1.0 does not sufficiently take into consideration the effects
of hierarchical proxies, caching, the need for persistent connections, or
virtual hosts. In addition, the proliferation of incompletely implemented
applications calling themselves
"HTTP/1.0" has necessitated a protocol version change in order
for two communicating applications to determine each other's true
capabilities.
This
specification defines the protocol referred to as "HTTP/1.1".
This protocol includes more stringent requirements than HTTP/1.0 in order
to ensure reliable implementation of its features.
Practical
information systems require more functionality than simple retrieval,
including search, front-end update, and annotation. HTTP allows an
open-ended set of methods and headers that indicate the purpose of a
request. It builds on the discipline of reference provided by the Uniform
Resource Identifier (URI), as a location (URL) or name (URN), for
indicating the resource to which a method is to be applied. Messages are
passed in a format similar to that used by Internet mail [9] as defined by
the Multipurpose Internet Mail Extensions (MIME).
HTTP is
also used as a generic protocol for communication between user agents and
proxies/gateways to other Internet systems, including those supported by
the SMTP, NNTP, FTP, Gopher, and WAIS protocols. In this way, HTTP allows
basic hypermedia access to resources available from diverse applications.
Requirements
The key
words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD
NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
An
implementation is not compliant if it fails to satisfy one or more of the
MUST or REQUIRED level requirements for the protocols that it implements. An
implementation that satisfies all the MUST or REQUIRED level and all the
SHOULD level requirements for its protocols is said to be
"unconditionally compliant"; one that satisfies all the MUST
level requirements but not all the SHOULD level requirements for its
protocols is said to be "conditionally compliant."
Terminology
This
specification uses a number of terms to refer to the roles played by
participants in, and objects of, the HTTP communication.
connection
A transport layer virtual circuit
established between two programs for the purpose of communication.
message
The basic unit of HTTP communication,
consisting of a structured sequence of octets matching the syntax and
transmitted via the connection.
request
An HTTP request message.
response
An HTTP response message.
resource
A network data object or service that can be
identified by a URI. Resources may be available in multiple representations
(e.g. multiple languages, data formats, size, and resolutions) or vary in
other ways.
entity
The information transferred as the payload
of a request or response. An entity consists of meta-information in the
form of entity-header fields and content in the form of an entity-body.
representation
An entity included with a response that is
subject to content negotiation. There may exist multiple
representations associated with a particular response status.
content negotiation
The mechanism for selecting the appropriate
representation when servicing a request. The representation of entities in
any response can be negotiated (including error responses).
variant
A resource may have one, or more than one,
representation(s) associated with it at any given instant. Each of these
representations is termed a `variant'. Use of the term `variant' does not
necessarily imply that the resource is subject to content negotiation.
client
A program that establishes connections for
the purpose of sending requests.
user agent
The client which initiates a request. These
are often browsers, editors, spiders (web-traversing robots), or other end
user tools.
server
An application program that accepts
connections in order to service requests by sending back responses. Any
given program may be capable of being both a client and a server; our use
of these terms refers only to the role being performed by the program for a
particular connection, rather than to the program's capabilities in
general. Likewise, any server may act as an origin server, proxy, gateway,
or tunnel, switching behavior based on the nature of each request.
origin server
The server on which a given resource resides
or is to be created.
proxy
An intermediary program, which acts as both,
a server and a client for the purpose of making requests on behalf of other
clients. Requests are serviced internally or by passing them on, with
possible translation, to other servers. A proxy MUST implement both the
client and server requirements of this specification. A "transparent
proxy" is a proxy that does not modify the request or response beyond
what is required for proxy authentication and identification. A
"non-transparent proxy" is a proxy that modifies the request or
response in order to provide some added service to the user agent, such as
group annotation services, media type transformation, protocol reduction,
or anonymity filtering. Except where either transparent or non-transparent
behavior is explicitly stated, the HTTP proxy requirements apply to both
types of proxies.
gateway
A server which acts as an intermediary for
some other server. Unlike a proxy, a gateway receives requests as if it
were the origin server for the requested resource; the requesting client
may not be aware that it is communicating with a gateway.
tunnel
An intermediary program which is acting as a
blind relay between two connections. Once active, a tunnel is not considered
a party to the HTTP communication, though the tunnel may have been
initiated by an HTTP request. The tunnel ceases to exist when both ends of
the relayed connections are closed.
cache
A program's local store of response messages
and the subsystem that controls its message storage, retrieval, and
deletion. A cache stores cacheable responses in order to reduce the
response time and network bandwidth consumption on future, equivalent
requests. Any client or server may include a cache, though a cache cannot
be used by a server that is acting as a tunnel.
cacheable
A response is cacheable if a cache is
allowed to store a copy of the response message for use in answering
subsequent requests. The rules for determining the cacheability
of HTTP responses are defined in section 13. Even if a resource is
cacheable, there may be additional constraints on whether a cache can use
the cached copy for a particular request.
first-hand
A response is first-hand if it comes
directly and without unnecessary delay from the origin server, perhaps via
one or more proxies. A response is also first-hand if its validity has just
been checked directly with the origin server.
explicit expiration
time
The time at which the origin server intends
that an entity should no longer be returned by a cache without further
validation.
heuristic expiration
time
An expiration time assigned by a cache when
no explicit expiration time is available.
age The age of a response
is the time since it was sent by, or successfully validated with, the
origin server.
freshness lifetime
The length of time between the generation of
a response and its expiration time.
Fresh. A response is fresh
if its age has not yet exceeded its freshness lifetime.
stale
A response is stale if its age has passed
its freshness lifetime.
semantically
transparent
A cache behaves in a "semantically
transparent" manner, with respect to a particular response, when its
use affects neither the requesting client nor the origin server, except to
improve performance. When a cache is semantically transparent, the client
receives exactly the same response (except for hop-by-hop headers) that it
would have received had its request been handled directly by the origin
server.
validator
A protocol element (e.g., an entity tag or a
Last-Modified time) that is used to find out whether a cache entry is an
equivalent copy of an entity.
upstream/downstream
Upstream and downstream describe the flow of
a message: all messages flow from upstream to downstream.
inbound/outbound
Inbound and outbound refer to the request
and response paths for messages: "inbound" means "traveling
toward the origin server", and "outbound" means
"traveling toward the user agent"
Overall
Operation
The HTTP
protocol is a request/response protocol. A client sends a request to the
server in the form of a request method, URI, and protocol version, followed
by a MIME-like message containing request modifiers, client information,
and possible body content over a connection with a server. The server
responds with a status line, including the message's protocol version and a
success or error code, followed by a MIME-like message containing server
information, entity meta-information, and possible entity-body content.
Most
HTTP communication is initiated by a user agent and consists of a request
to be applied to a resource on some origin server. In the simplest case,
this may be accomplished via a single connection (v) between the user agent
(UA) and the origin server (O).
request chain ------------------------>
UA
-------------------v------------------- O
<----------------------- response chain
A more
complicated situation occurs when one or more intermediaries are present in
the request/response chain. There are three common forms of intermediary:
proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving
requests for a URI in its absolute form, rewriting all or part of the
message, and forwarding the reformatted request toward the server
identified by the URI. A gateway is a receiving agent, acting as a layer
above some other server(s) and, if necessary, translating the requests to
the underlying server's protocol. A tunnel acts as a relay point between
two connections without changing the messages; tunnels are used when the
communication needs to pass through an intermediary (such as a firewall)
even when the intermediary cannot understand the contents of the messages.
request chain -------------------------------------->
UA
-----v----- A -----v----- B -----v----- C -----v----- O
<------------------------------------- response chain
The
figure above shows three intermediaries (A, B, and C) between the user
agent and origin server. A request or response message that travels the
whole chain will pass through four separate connections. This distinction
is important because some HTTP communication options may apply only to the
connection with the nearest, non-tunnel neighbor, only to the end-points of
the chain, or to all connections along the chain. Although the diagram is
linear, each participant may be engaged in multiple, simultaneous
communications. For example, B may be receiving requests from many clients
other than A, and/or forwarding requests to servers other than C, at the
same time that it is handling A's request.
Any
party to the communication which is not acting as a tunnel may employ an
internal cache for handling requests. The effect of a cache is that the
request/response chain is shortened if one of the participants along the
chain has a cached response applicable to that request. The following
illustrates the resulting chain if B has a cached copy of an earlier
response from O (via C) for a request, which has not been cached by UA or
A.
request chain ---------->
UA
-----v----- A -----v----- B - - - - - - C - - - - - - O
<--------- response chain
Not all
responses are usefully cacheable, and some requests may contain modifiers,
which place special requirements on cache behavior.
In fact,
there are a wide variety of architectures and configurations of caches and
proxies currently being experimented with or deployed across the World Wide
Web. These systems include national hierarchies of proxy caches to save
transoceanic bandwidth, systems that broadcast or
multicast cache entries, organizations that distribute subsets of
cached data via CD-ROM, and so on. HTTP systems are used in corporate
intranets over high-bandwidth links, and for access via PDAs
with low-power radio links and intermittent connectivity. The goal of
HTTP/1.1 is to support the wide diversity of configurations already
deployed while introducing protocol constructs that meet the needs of those
who build web applications that require high reliability and, failing that,
at least reliable indications of failure.
HTTP communication usually takes place over TCP/IP
connections. The default port is TCP 80, but other ports can be used. This
does not preclude HTTP from being implemented on top of any other protocol
on the Internet, or on other networks. HTTP only presumes a reliable
transport; any protocol that provides such guarantees can be used; the
mapping of the HTTP/1.1 request and response structures onto the transport
data units of the protocol in question is outside the scope of this
specification.
In
HTTP/1.0, most implementations used a new connection for each
request/response exchange. In HTTP/1.1, a connection may be used for one or
more request/response exchanges, although connections may be closed for a variety
of reasons.
4.4
SMTP
Simple
Mail Transfer Protocol (SMTP) is Internet's standard host-to-host mail transport
protocol and traditionally operates over TCP, port 25. In other words, a
UNIX user can type telnet hostname 25 and connect with
an SMTP server, if one is present.
SMTP
uses a style of asymmetric request-response protocol popular in the early 1980s, and still seen occasionally, most often in mail
protocols. The protocol is designed to be equally useful to either a
computer or a human, though not too forgiving of the human. From the
server's viewpoint, a clear set of commands is provided and well documented
in the RFC. For the human, all the commands are clearly terminated by newlines and a HELP command lists all of them. From the
sender's viewpoint, the command replies always take the form of text lines,
each starting with a three-digit code identifying the result of the operation,
a continuation character to indicate another lines following, and then
arbitrary text information designed to be informative to a human.
If mail
delivery fails, sendmail (the most important SMTP
implementation) will queue mail messages and retry delivery later. However,
a backoff algorithm is used, and no mechanism
exists to poll all Internet hosts for mail, nor does SMTP provide any
mailbox facility, or any special features beyond mail transport. For these
reasons, SMTP isn't a good choice for hosts situated behind highly
unpredictable lines (like modems). A better-connected host can be
designated as a DNS mail exchanger, then arrange for a relay scheme. Currently,
there are two main configurations that can be used. One is to configure POP
mailboxes and a POP server on the exchange host, and let all users use
POP-enabled mail clients. The other possibility is to arrange for a
periodic SMTP mail transfer from the exchange host to another, local SMTP
exchange host which has been queuing all the outbound mail. Of course,
since this solution does not allow full-time Internet access, it is not too
preferred.
Extended
SMTP, or ESMTP by definition
extensible, allowing new service extensions to be defined and registered
with IANA. Probably the most important extension currently available is
Delivery Status Notification (DSN).
4.5
FTP
Terminology
ASCII
The
ASCII character set is as defined in the ARPA-Internet Protocol
Handbook. In FTP,
ASCII
characters are defined to be the lower half of an eight-bit code set (i.e.,
the most
significant bit is zero).
access controls
Access controls
define users' access privileges to the use of a system, and to the files in
that system. Access controls are necessary to
prevent unauthorized or accidental use of files. It is the prerogative of a
server-FTP process to invoke access controls.
byte size
There are two byte
sizes of interest in FTP: the
logical byte size of the file, and the transfer byte
size used for the transmission of the data. The transfer byte size is always 8
bits. The transfer byte size is
not necessarily the byte size in which data is to be stored in a system, nor the logical byte size for interpretation of the
structure of the data.
control connection
The
communication path between the USER-PI and SERVER-PI for the exchange of
commands and replies. This connection follows the Telnet
Protocol.
data connection
A full duplex
connection over which data is transferred, in a specified mode and type.
The data transferred may be a part of a file, an entire file or a number of
files. The path may be between a
server-DTP and a user-DTP, or between two server-DTPs.
data port
The passive data
transfer process "listens" on the data port for a connection from
the active transfer process in order to open the data connection.
DTP
The data transfer
process establishes and manages the data connection. The DTP can be passive or active.
End-of-Line
The end-of-line
sequence defines the separation of printing lines. The sequence is Carriage Return,
followed by Line Feed.
EOF
The end-of-file condition that defines the end of a file being
transferred.
EOR
The end-of-record condition that defines the end of a record being
transferred.
error recovery
A procedure that allows a user to recover from certain errors such
as failure of either host
system or transfer process. In FTP, error recovery may involve
restarting a file transfer at
a given checkpoint.
FTP commands
A set of commands that comprise the control information flowing from
the user-FTP to
the server-FTP process.
file
An ordered set of computer data (including programs), of arbitrary
length, uniquely
identified by a pathname.
mode
The mode in which
data is to be transferred via the data connection. The mode defines the data format
during transfer including EOR and EOF.
The transfer modes defined in FTP are described in the Section on
Transmission Modes.
NVT
The Network Virtual Terminal as defined in the Telnet Protocol.
NVFS
The Network Virtual
File System. A concept which
defines a standard network file system
with standard commands
and pathname conventions.
page
A file may be structured as a set of independent parts called
pages. FTP supports the
transmission of discontinuous files as
independent indexed pages.
pathname
Pathname is defined
to be the character string, which must be input to a file system by a user in
order to identify a file. Pathname normally contains device and/or
directory names, and file name specification. FTP does not yet specify a standard
pathname convention. Each user must
follow the file naming.
PI
The protocol
interpreter. The user and server sides of the protocol have distinct roles
implemented in a user-PI and a server-PI.
record
A sequential file may
be structured as a number of contiguous parts called records.
Record structures are
supported by FTP but a file need not have record structure.
reply
A reply is an
acknowledgment (positive or negative) sent from server to user via the
control connection in response to FTP commands. The general form of a reply is a
completion code (including error codes) followed by a text string. The codes are for use by programs
and the text
is usually intended for human users.
server-DTP
The data transfer
process, in its normal "active" state, establishes the data
connection with the "listening" data port. It sets up parameters
for transfer and storage, and transfers data on command from its PI. The DTP can be placed in a "passive"
state to listen for, rather than initiate a connection on the data port.
server-FTP process
A process or set of
processes which perform the function of file transfer in cooperation with a
user-FTP process and, possibly, another server. The functions consist of a protocol interpreter (PI)
and a data transfer process (DTP).
server-PI
The server protocol interpreter "listens" on Port L for a
connection from a user-PI and
establishes a control communication
connection. It receives
standard FTP commands
from the user-PI, sends replies, and
governs the server-DTP.
type
The data representation type used for data transfer and
storage. Type implies certain
transformations between the time of data
storage and data transfer. The
representation
types defined in FTP are described in the
Section on Establishing Data Connections.
user
A person or a process
on behalf of a person wishing to obtain file transfer service. The human user may interact directly
with a server-FTP process, but use of a user-FTP process is preferred since
the protocol design is weighted towards automata.
user-DTP
The data transfer process "listens" on the data port for a
connection from a server-FTP
process. If two servers are transferring data
between them, the user-DTP is inactive.
user-FTP process
A set of functions
including a protocol interpreter, a data transfer process and a user
interface which together perform the function of file transfer in
cooperation with one or more server-FTP processes. The user interface allows a local
language to be used in the command-reply dialogue with the user.
user-PI
The user protocol
interpreter initiates the control connection from its port U to the
server-FTP process, initiates FTP commands, and governs the user-DTP if
that process is part of
the file transfer.
The FTP Model
With the above
definitions in mind, the following model (shown in Fig. 4.1) may be
diagrammed for an FTP service.
-------------
|/---------\|
|| User || --------
||Interface|<--->| User |
|\----^----/|
--------
----------
|
| |
|/------\| FTP
Commands |/----V----\|
||Server|<---------------->| User ||
|| PI || FTP Replies || PI ||
|\--^---/|
|\----^----/|
| | |
|
| |
--------
|/--V---\| Data
|/----V----\|
--------
| File
|<--->|Server|<---------------->| User |<--->| File |
|System| ||
DTP || Connection || DTP || |System|
--------
|\------/|
|\---------/|
--------
----------
-------------
Server-FTP
USER-FTP
Fig. 4.1
Model for FTP Use
NOTES: 1. The data connection may be used in
either direction.
2. The data connection need not exist
all of the time.
In the model
described in Figure 1, the user-protocol interpreter initiates the control
connection. The control
connection follows the Telnet protocol. At the initiation of the user,
standard FTP commands are generated by the user-PI and transmitted to the server
process via the control connection.
(The user may establish a direct control connection to the
server-FTP, from a TAC terminal for example, and generate standard FTP
commands independently, bypassing the user-FTP process.) Standard replies
are sent from the server-PI to the user-PI over the control connection in
response to the commands. The FTP commands specify the parameters for the
data connection (data port, transfer mode, representation type, and
structure) and the nature of file system operation (store, retrieve,
append, delete, etc.). The
user-DTP or its designate should "listen" on the specified data
port, and the server initiate the data connection and data transfer in
accordance with the specified parameters. It should be noted that the data
port need not be in the same host that initiates the FTP commands via the
control connection, but the user or the user-FTP process must ensure a
"listen" on the specified data port. It ought to also be noted that the data
connection may be used for simultaneous sending and receiving.
In another situation
a user might wish to transfer files between two hosts, neither of which is
a local host. The user sets up control connections to the two servers and
then arranges for a data connection between them. In this manner, control information
is passed to the user-PI but data is transferred between the server data
transfer processes. Following
is a model of this server-server interaction.
Control
------------ Control
---------->| User-FTP |<-----------
|
| User-PI |
|
|
|
"C"
|
|
V
------------
V
--------------
--------------
| Server-FTP |
Data Connection | Server-FTP
|
|
"A"
|<---------------------->| "B" |
-------------- Port (A) Port (B) --------------
Fig. 4.2 Server-Server Interaction
The protocol requires
that the control connections be open while data transfer is in
progress. It is the
responsibility of the user to request the closing of the control
connections when finished using the FTP service, while it is the server who
takes the action. The server
may abort data transfer if the control connections are closed without
command.
The Relationship between FTP and Telnet
The FTP uses the
Telnet protocol on the control connection. This can be achieved in two
ways: first, the user-PI or the server-PI may implement the rules of the
Telnet Protocol directly in their own procedures; or, second, the user-PI
or the server-PI may make use of the existing Telnet module in the system.
Ease of implementation,
sharing code, and modular programming argue for the second approach. Efficiency and independence argue
for the first approach. In
practice, FTP relies on very little of the Telnet Protocol, so the first
approach does not necessarily involve a large amount of code.
4.6
CSMA/CD
The
acronym CSMA/CD signifies
Carrier Sense Multiple Access with Collision Detection and describes how
the Ethernet protocol regulates communication among nodes. While the term
may seem intimidating, if we break it apart into its component concepts we
will see that it describes rules very similar to those people use in polite
conversation. To help illustrate the operation of Ethernet, we will use an
analogy of a dinner table conversation. Let’s represent our Ethernet
segment as a dinner table, and let several people engaged in polite
conversation at the table represent the nodes. The term Multiple Access covers what we
already discussed above. When one Ethernet station transmits, all the
stations on the medium hear the transmission, just as when one person at
the table talks, everyone present is able to hear him or her.
Now
let's imagine that you are at the table and you have something you would
like to say. At the moment, however, I am talking. Since this is a polite
conversation, rather that immediately speak up and interrupt, you would
wait until I finished talking before making your statement. This is the
same concept described in the Ethernet protocol as Carrier Sense. Before a station transmits, it
"listens" to the medium to determine if another station is
transmitting. If the medium is quiet, the station recognizes that this is
an appropriate time to transmit.
Carrier
Sense Multiple Access gives us a good start in regulating our conversation,
but there is one scenario we still need to address. Let’s go back to
our dinner table analogy and imagine that there is a momentary lull in the
conversation. You and I both have something we would like to add and we
both "sense the carrier" based on the silence, so we begin
speaking at approximately the same time. In Ethernet terminology, a collision occurs when we both spoke
at once. In our conversation, we can handle this situation gracefully. We
will both hear the other speak at the same time we are speaking. Then we
can stop to give the other person a chance to go on. Ethernet nodes also
listen to the medium while they transmit to ensure that they are the only stations
transmitting at that time. If the stations hear their own transmission
returning in a garbled form, as would happen if some other station had
begun to transmit its own message at the same time, then they know that a
collision occurred. A single Ethernet segment is sometimes called a collision domain because no two
stations on the segment can transmit at the same time without causing a
collision. When stations detect a collision, they cease transmission, wait
a random amount of time, and attempt to transmit when they again detect
silence on the medium.
The
random pause and retry is an important part of the protocol. If two
stations collide when transmitting once, then both will need to transmit
again. At the next appropriate chance to transmit, both stations involved
with the previous collision will have data ready to transmit. If they
transmitted again at the first opportunity, they would most likely collide
again and again indefinitely. Instead, the random delay makes it unlikely
that any two stations will collide more than a few times in a row.
4.7
Other Network Protocols
·
SLIP(Serial Line Internet Protocol) is
a protocol used to transmit over serial lines, such as with a modem over
phone lines. SLIP is a simpler protocol that has a low overhead, but its
lack of some desired features (such as password encryption and error checking)
has caused it to be largely replaced by PPP.
·
PPP (Point-to-Point Protocol) is
commonly used to establish remote connections to Internet service providers
or LANs. PPP can run over various types of connections, provides error
correction, supports auotmatic TCP/IP
configuration, and provides a number of other benefits above SLIP, although
it does demand higher overhead.
·
With PPTP (Point-to-Point
Tunneling Protocol) is a Microsoft-created protocol that uses PPP to create
a Virtual Private Network (VPN).
To use PPTP, a user establishes a PPP connection to the desired server, then launches a PPTP connection. In effect, the user is
then connected to the server via PPP, but is able to transfer information
securely from within the PPP connection thanks to the PPTP session. PPTP is
currently not standard and not supported by all operating systems.
·
POP3 (Post Office Protocol 3), on the
other hand, is used to get mail off of a mail server. When used by an email
client, POP3 downloads all the client messages available from the server.
The IMAP (Internet Message
Access Protocol) is a different protocol for retrieving mail off an email
server; however, IMAP supports downloading selected messages only, and
leaving the rest on the server.
·
The NNTP (Network News
Transfer Protocol) provides the facilities for transferring information on
newsgroups (Usenet news). The protocol allows posting, distribution, and
retrieval of the messages among both clients and servers.
·
LDAP (Lightweight Directory Access
Protocol) is an open protocol for accessing information directories, which
supply such data as email addresses and names.
·
Gopher was a method for organizing and
displaying files on an Internet server before the advent of the World Wide
Web. This system has largely been replaced by the web.
·
TELNET is a protocol used mostly on Unix servers, which allows users to log onto a remote
computer and use it as they were sitting at the console themselves.
·
LPR (Line Printer Remote) allows a
user to send a print file to a remote server for printing.
back...
|