Transport Layer

urls:

http://docs.sun.com/app/docs/doc/806-4015/6jd4gh8fj?a=view - the tunable parameters manual

http://www.scit.wlv.ac.uk/~jphb/comms/tcp.html - information on the transport layer

http://www.faqs.org/rfcs/rfc2001.html - slow start algorithm RFC

http://www.sean.de/Solaris/tune.html - water - Solaris tuning

http://alive.znep.com/~marcs/mtu - MTU Path Discovery

 

 

The job of the Transport layer is to prepare data to go to the appropriate destination.  The implementations of its protocols put a transport header on a stream or message of actual data and passes the message to the Internet Layer.  The implementations of the transport layer also tell the application to break the stream or message into packets of the size necessary to pass through all interfaces, and directs the packets, called "datagrams", via a "port number" in the header, to the appropriate program (such as telnet or finger) once it reaches its destination. The two protocols of the transport layer, TCP and UDP, also determine how data is sent from the originating system and acknowledged by the destination system.. 

 

Software written in accordance with the TCP protocol is 1) connection-oriented, 2) stateful, 3)  reliable.  It 1) establishes a connection with the destination host (connection oriented), keeps track of whether packets have arrived at the destination (stateful), and retransmits them if they have not, and 3) acknowledges the arrival of packets (reliable).  It is also slow and demands more computing resources and bandwidth. TCP is said to form a "virtual circuit connection " since both hosts are aware of the communications between them and each packet is acknowledged. The stream of packets is said to be "full duplex" because the application can process received packets and generate packets at the same time. In order to increase efficiency, outgoing packets may also contain acknowledgement of incoming packets - a process called "piggybacking." The TCP header contains an acknowledgement bit that indicates that the packet contains an acknowledgement, and an acknowledgement number containing the number of the last byte sent.  Those two fields permit piggybacking, since the rest of the packet may be used for anything. TCP has an unstructured stream orientation - it receives a stream of data from the application and breaks it into segments which it then packages with a header and passes to the IP layer implementation.  The TCP protocol is used by telnet, ftp and mount. NFS uses UDP early in the boot process, then switches to TCP as the boot progresses.

 

 UDP is 1) connectionless  2) stateless and 3) unreliable.  It 1) sends packets without first contacting the receiving host (connectionless) 2) does not keep track of which packets have arrived (stateless) and 3) sends no acknowledgement of packets (unreliable). It is also fast and uses less computing resources and bandwidth than does TCP.  It is used for commands like "ping" for which it is not important to be sure that all packets arrive at the destination, or in applications when verification is built into the application.  UDP accepts packets already broken up by the application - it does not fragment.  SNMP, DHCP, RIP and NIS, as well as DNS queries use UDP. Multicast packets are handled with UDP. The use of one protocol or another is up to the developer of the application, and cannot be figured out in any simple way.

 

TCP also differs from UDP in the way it handles fragmentation of the stream of data produced by the application.  When data is sent over the network, a router may break it down into smaller pieces to go through the interface into a network using a different protocol and possessing a smaller MTU. This causes all kinds of problems. Pieces may be lost, and the router doing the fragmenting slows down. TCP header information may be lost, since the router fragments at the IP layer. Firewalls may not permit these packets to cross. To avoid these problems TCP sends out a packet with a "do not fragment" bit set. That bit prevents gateway routers from repackaging the data based on the MTU of the network to which the packet is about to be sent.  When the packet reaches an interface that it is too large to pass through, the packet is discarded, and an ICMP message "Destination Unreachable" sent back to the source along with the MTU of the next network hop and an indication that the DF (don't fragment) bit was sent.  TCP then breaks up the stream from the application into pieces of the correct size (576 for the Internet, for example) and sends them out. These packets have all the necessary header information and do not need to be fragmented.. The size of the smallest MTU in a route is called the Path MTU, and the process of fragmenting data to the Path MTU is called Path MTU Discovery and may be repeated multiple times as the packet encounters smaller and smaller MTUs.  With UDP, the application itself breaks the packets into the correct size and passes them to UDP, which sends them on.  If the pieces are too large, they are repackaged by the gateway router - it is up to the generating application to make sure that this does not cause problems.

 

Because TCP acknowledges all packets, it is important to maximize the likelihood that packets arrive without requiring that each individual packet be acknowledged. So TCP will send packets in bursts, and waits for acknowledgement that the receiving host has gotten the last byte sent before it discards local copies of those packets and sends out the next group of packets.  However, if too many packets arrive at a destination at once, they will overflow the input buffer (usually in memory), and some will be discarded.  Therefore, on the destination host, TCP establishes additional input and output buffers to hold packets until it can process them - a feature of TCP called "buffered transfer". Each time the receiving host acknowledges a group of packets, it informs the sending host receiving how many bytes of data it can receive in the next burst of packets, using the "window" field of the TCP buffer. This information is called a "window advertisement". That many packets will be sent and held in the buffer until all packets arrive and are reassembled in the correct order. As the application processes data out of the buffer, the size of the window gets larger, and the sending host can send more data. This is called a "sliding window," since the window size gets larger and smaller depending on how fast the application processes data in the buffer and space in the buffer opens.  The window size is NOT negotiable. The window size sent to the sending host is the final word. If that size is zero, the sending host cannot send any more packets. A receiving host will generally wait to send an acknowledgement for some period of time, in order to be able to acknowledge a group of packets at once, but it will only acknowledge the last packet received IN ORDER. If a packet is missing, no packets farther down the data stream will be acknowledged until that missing packet shows up. TCP allows a transmit window of up to 1 Gbyte.

 

The network itself will affect how many packets can be sent in a burst as well. TCP controls packet loss due to network capacity by using a "slow start" algorithm. The sending hosts puts out one packet. When it gets an acknowledgement for that packet it sends a burst of two packets.. The number of bytes that can be sent at any time is called the"congestion window (cwnd)." When packets are received without loss, the sending system doubles the number of packets it sends in one burst (up to the receive window size) until a packet is lost again. The sending host determines that a packet has been lost when no acknowledgement is received before a timeout occurs (the value of the timeout, called the "retransmission timeout" or "RTO" is complex and dynamic. Four kernel parameters control it, all beginning with "tcp_rexmit_interval" Do not change them).  Then the congestion window size is dropped back slightly and the send retried. This process is constantly repeated as network characteristics change.  Thus the speed at which packets are transmitted can be optimized for high speed networks and interfaces, or slowed down for low speed networks or slow interfaces

Acronyms:

ISN – Initial Sequence Number

TCP  - Transmission Control Protocol – a stateful, reliable, connection-oriented Internet layer communication protocol. 

UDP – User Datagram Protocol – a stateless, unreliable, connectionless Internet layer communication protocol.

TI-RPC – Transport Independent Remove Procedure Call – a protocol that allows a program to be written independent of the transport protocol used to implement it.

Definitions:

 

Path MTU - the smallest MTU of any link or interface in a path. This may change over time - routes are not always the same as routers go up or down, etc.

Path MTU discovery – a process conducted by TCP which passes a packet out onto the network with the "don't fragment" bit set.  Where the packet cannot pass through an interface, ICMP sends back a message with the MTU (usually this will be 576 since that is the MTU for the internet).  TCP then break packets into the size specified in the ICMP message. UDP does not require this process, since the application automatically breaks up packets, and will have to account for fragmentation issues. 

slow start algorithm – used with routing – router may be overwhelmed and lose packets.  If so, back off rate by half and then increase by one over the congestion window size. 

end to end communication – transportation of data to and from the correct application

connection-oriented – a logical connection is established before data is exchanged.

connectionless – no connection established before data is sent.

congestion window - the number of packets being sent out - this is always less than the transmit or receive window, and it depends on the ability of the network to handle bursts of packets rather than the ability of the receiving host to store bursts of packets in the buffer.

full duplex connection - data flows in both directions at once in TCP. This differs from the full duplexing discussed in the hardware section. That "full duplexing" means that data physically flows in both directions at the same time in the cabling and through the interface, and implies hardware that supports full duplexing. A full duplex connection may be sent over hardware that supports either half or full duplexing. If the hardware only supports half duplex, then data will flow in bursts in one direction then the other, but the software will process incoming packets and outgoing packets at the same time, regardless of the physical situation. A full duplex connection moves data between the interface and the application at the same time.

piggybacking - a packet of transmitted data may also carry an acknowledgement of a previously received packet. This improves efficiency.

datagram – a packet produced by the Transport layer, containing data from the application and a port number.

stateful – some of the data sent is about the client state, so that the server can keep track of the client.

stateless – no data is sent about the client state, and the server does not keep track of any information about the client.

reliable protocol – each transmission is acknowledged. If no acknowledgement is received, the packet is re-sent.

receive window - the number of bytes the receiving host can accept. Its size depends on how fast the application can process bytes already held in the TCP buffer, and it may change constantly if the application is slow, or always be the same, if the application is fast. Maximum value: 1 Gbyte.

unreliable protocol – transmissions are not acknowledged.

unstructured stream orientation - this phrase refers to TCP's ability to break a stream of data from an application into packets for encapsulation in the TCP header.

buffered transfer -TCP data moves into and out of buffers in memory to make sure it flows as fast as possible.

synchronization number – number which helps identify order of packets sent using TCP.

window advertisement – receiving host tells sending host how much data it can take, and how much it has received.

sliding window principle – window size changes as packets go and are acknowledged.  Maximum size: 1 Gbyte.

virtual circuit connection - the connection created between sending and receiving hosts in a TCP transaction, in which each packet is acknowledged.

Commands:

ndd /dev/tcp  tcp_xmit_hiwat  <value>  - the default send window size in bytes. Default: 16384

ndd  /dev/tcp  tcp_recv_hiwat  <value> - the default receive window size in bytes: Default: 16384.

ndd  /dev/tcp  tcp_sack_permitted  <value>  -gives selective acknowledgement (sack) only of packets missed, instead of all packets. Allowing sack:  2 = all programs, any time (default), 1 = prgrams can request sack, 0 = never.

 

Files:

Misc:

Jobs of the transport layer:  1. Prepare data to go to its destination and packet delivery to destination.  2.  Tell application to break data stream into pieces and passes them to the Internet layer.  3.  Gets port numbers from /etc/services and /etc/rpc.  4.  Puts transport header on and passes the packet to the Internet.  5.  Error detection and recovery  6. Flow control.

 

Characteristics of TCP:  1. Uses MTU path discovery – sends out a packet with "don't fragment" bit set.  ICMP sends back message with MTU. TCP passes information about the MTU to the application, which fragments the data stream into the correct size.  2. Establishes logical connections.  3. Is stateful, reliable and connectionless.  4. Provides full duplex connection – concurrent in both directions.  5. Has error correction and lost packets retransmission capability.  Examples: ftp, telnet, remote mount.

Characteristics of UDP:  1. Fast  2. Low overhead. Examples: SNMP, DHCP, RIP, NIS, DNS.

 

UDP header: source port (2 bytes), destination port (2 bytes), length (2 bytes) checksum (2 bytes)

 

TCP header (24 bytes total):

source port: (2 bytes) The port  assigned to the outgoing packet

destination port (2 bytes) The well-known port to which the packet is directed

 sequence number (4 bytes) Identifies the number of the first byte of data in the segment

acknowledgement number (4 bytes) The number of the next byte that the sender expects to receive from the remote host. Note that this number will depend on how many bytes have been received and will rarely be sequential.

data offset (4 bits) The length of the header measured in 32 bit (4 byte) words.

reserved space (6 bits) - doesn't contain anything!

flags (6 bits)

            URG- indicates the data is urgent

            ACK-indicates that there is a valid acknowledgement number set

            PSH - Indicates that the data should be passed to the application as soon as possible, which means any buffers collecting data should immediately be flushed to the application (pushed)

            RST - reset the connection

            SYN - establish initial agreement on sequence numbers

            FIN - the sender has sent the last packet of data.

window size (2 bytes)The amount of data that the system sending the packet is able to receive and store. The systen receiving the packet will send that much data out without waiting for an acknowledgement.

checksum (2 bytes)

urgent pointer (2 bytes) - this data is urgent and should be processed first.

options and  padding (4 bytes). The options are often not counted as part of the TCP header, in which case the header would be 20 bytes. There are many options possible defined in various RFCs.

 

Hosted by www.Geocities.ws

1