HTTP Succinctly: HTTP Connections

In the last article we looked at HTTP messages and saw examples of the text commands and codes that flow from the client to the server and back in an HTTP transaction. But, how does the information in these messages move through the network? When are the network connections opened? When are the connections closed? These are some of the questions this article will answer as we look at HTTP from a low-level perspective. But first, we'll need to understand some of the abstractions below HTTP.


A Whirlwind Tour of Networking

To understand HTTP connections we have to know just a bit about what happens in the layers underneath HTTP. Network communication, like many applications, consists of layers. Each layer in a communication stack is responsible for a specific and limited number of responsibilities.

For example, HTTP is what we call an application layer protocol because it allows two applications to communicate over a network. Quite often one of the applications is a web browser, and the other application is a web server like IIS or Apache. We saw how HTTP messages allow the browser to request resources from the server. But, the HTTP specifications don't say anything about how the messages actually cross the network and reach the server-that's the job of lower-layer protocols. A message from a web browser has to travel down a series of layers, and when it arrives at the web server it travels up through a series of layers to reach the web service process.

Figure 4 Protocol layers

Protocol layers

The layer underneath HTTP is a transport layer protocol. Almost all HTTP traffic travels over TCP (short for Transmission Control Protocol), although this isn't required by HTTP. When a user types a URL into the browser, the browser first extracts the host name from the URL (and port number, if any), and opens a TCP socket by specifying the server address (derived from the host name) and port (which defaults to 80).

Once an application has an open socket it can begin writing data into the socket. The only thing the browser needs to worry about is writing a properly formatted HTTP request message into the socket. The TCP layer accepts the data and ensures the message gets delivered to the server without getting lost or duplicated. TCP will automatically resend any information that might get lost in transit, and this is why TCP is known as a reliable protocol. In addition to error detection, TCP also provides flow control. The flow control algorithms in TCP will ensure the sender does not send data too fast for the receiver to process the data. Flow control is important in this world of varied networks and devices.

In short, TCP provides services vital to the successful delivery of HTTP messages, but it does so in a transparent way so that most applications don't need to worry about TCP. As the previous figure shows, TCP is just the first layer beneath HTTP. After TCP at the transport layer comes IP as a network layer protocol.

IP is short for Internet Protocol. While TCP is responsible for error detection, flow control, and overall reliability, IP is responsible for taking pieces of information and moving them through the various switches, routers, gateways, repeaters, and other devices that move information from one network to the next and all around the world. IP tries hard to deliver the data at the destination (but it doesn't guarantee delivery-that's TCP's job). IP requires computers to have an address (the famous IP address, an example being 208.192.32.40). IP is also responsible for breaking data into packets (often called datagrams), and sometimes fragmenting and reassembling these packets so they are optimized for a particular network segment.

Everything we've talked about so far happens inside a computer, but eventually these IP packets have to travel over a piece of wire, a fiber optic cable, a wireless network, or a satellite link. This is the responsibility of the data link layer. A common choice of technology at this point is Ethernet. At this level, data packets become frames, and low-level protocols like Ethernet are focused on 1s, 0s, and electrical signals.

Eventually the signal reaches the server and comes in through a network card where the process is reversed. The data link layer delivers packets to the IP layer, which hands over data to TCP, which can reassemble the data into the original HTTP message sent by the client and push it into the web server process. It's a beautifully engineered piece of work all made possible by standards.


Quick HTTP Request With Sockets and C#

If you are wondering what it looks like to write an application that will make HTTP requests, then the following C# code is a simple example of what the code might look like. This code does not have any error handling, and tries to write any server response to the console window (so you'll need to request a textual resource), but it works for simple requests. A copy of the following code sample is available from https://bitbucket.org/syncfusion/http-succinctly. The sample name is sockets-sample.

Notice how the program needs to look up the server address (using Dns.GetHostEntry), and formulate a proper HTTP message with a GET operator and Host header. The actual networking part is fairly easy, because the socket implementation and TCP take care of most of the work. TCP understands, for example, how to manage multiple connections to the same server (they'll all receive different port numbers locally). Because of this, two outstanding requests to the same server won't get confused and receive data intended for the other.


Networking and Wireshark

If you want some visibility into TCP and IP you can install a free program such as Wireshark (available for OSX and Windows from wireshark.org). Wireshark is a network analyzer that can show you every bit of information flowing through your network interfaces. Using Wireshark you can observe TCP handshakes, which are the TCP messages required to establish a connection between the client and server before the actual HTTP messages start flowing. You can also see the TCP and IP headers (20 bytes each) on every message. The following figure shows the last two steps of the handshake, followed by a GET request and a 304 redirect.

Figure 5 Using Wireshark

Using Wireshark

With Wireshark you can see when HTTP connections are established and closed. The important part to take away from all of this is not how handshakes and TCP work at the lowest level, but that HTTP relies almost entirely on TCP to take care of all the hard work and TCP involves some overhead, like handshakes. Thus, the performance characteristics of HTTP also rely on the performance characteristics of TCP, and this is the topic for the next section.


HTTP, TCP, and the Evolution of the Web

In the very old days of the web, most resources were textual. You could request a document from a web server, go off and read for five minutes, then request another document. The world was simple.

For today's web, most webpages require more than a single resource to fully render. Every page in a web application has one or more images, one or more JavaScript files, and one or more CSS files. It's not uncommon for the initial request for a home page to spawn off 30 or 50 additional requests to retrieve all the other resources associated with a page.

In the old days it was also simple for a browser to establish a connection with a server, send a request, receive the response, and close the connection. If today's web browsers opened connections one at a time, and waited for each resource to fully download before starting the next download, the web would feel very slow. The Internet is full of latency. Signals have to travel long distances and wind their way through different pieces of hardware. There is also some overhead in establishing a TCP connection. As we saw in the Wireshark screenshot, there is a three-step handshake to complete before an HTTP transaction can begin.

The evolution from simple documents to complex pages has required some ingenuity in the practical use of HTTP.


Parallel Connections

Most user agents (aka web browsers) will not make requests in a serial one-by-one fashion. Instead, they open multiple parallel connections to a server. For example, when downloading the HTML for a page the browser might see two <img> tags in the page, so the browser will open two parallel connections to download the two images simultaneously. The number of parallel connections depends on the user agent and the agent's configuration.

For a long time we considered two as the maximum number of parallel connections a browser would create. We considered two as the maximum because the most popular browser for many years-Internet Explorer (IE) 6-would only allow two simultaneous connections to a single host. IE was only obeying the rules spelled out in the HTTP 1.1 specification, which states:

A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.

To increase the number of parallel downloads, many websites use some tricks. For example, the two-connection limit is per host, meaning a browser like IE 6 would happily make two parallel connections to www.odetocode.com, and two parallel connections to images.odetocode.com. By hosting images on a different server, websites could increase the number of parallel downloads and make their pages load faster (even if the DNS records were set up to point all four requests to the same server, because the two-connection limit is per host name, not IP address).

Things are different today. Most user agents will use a different set of heuristics when deciding how many parallel connections to establish. For example, Internet Explorer 8 will now open up to six concurrent connections.

The real question to ask is: how many connections are too many? Parallel connections will obey the law of diminishing returns. Too many connections can saturate and congest the network, particularly when mobile devices or unreliable networks are involved. Thus, having too many connections can hurt performance. Also, a server can only accept a finite number of connections, so if 100,000 browsers simultaneously create 100 connections to a single web server, bad things will happen. Still, using more than one connection per agent is better than downloading everything in a serial fashion.

Fortunately, parallel connections are not the only performance optimization.


Persistent Connections

In the early days of the web, a user agent would open and close a connection for each individual request it sent to a server. This implementation was in accordance with HTTP's idea of being a completely stateless protocol. As the number of requests per page grew, so did the overhead generated by TCP handshakes and the in-memory data structures required to establish each TCP socket. To reduce this overhead and improve performance, the HTTP 1.1 specification suggests that clients and servers should implement persistent connections, and make persistent connections the default type of connection.

A persistent connection stays open after the completion of one request-response transaction. This behavior leaves a user agent with an already open socket it can use to continue making requests to the server without the overhead of opening a new socket. Persistent connections also avoid the slow start strategy that is part of TCP congestion control, making persistent connections perform better over time. In short, persistent connections reduce memory usage, reduce CPU usage, reduce network congestion, reduce latency, and generally improve the response time of a page. But, like everything in life there is a downside.

As mentioned earlier, a server can only support a finite number of incoming connections. The exact number depends on the amount of memory available, the configuration of the server software, the performance of the application, and many other variables. It's difficult to give an exact number, but generally speaking, if you talk about supporting thousands of concurrent connections, you'll have to start testing to see if a server will support the load. In fact, many servers are configured to limit the number of concurrent connections far below the point where the server will fall over. The configuration is a security measure to help prevent denial of service attacks. It's relatively easy for someone to create a program that will open thousands of persistent connections to a server and keep the server from responding to real clients. Persistent connections are a performance optimization but also a vulnerability.

Thinking along the lines of a vulnerability, we also have to wonder how long to keep a persistent connection open. In a world of infinite scalability, the connections could stay open for as long as the user-agent program was running. But, because a server supports a finite number of connections, most servers are configured to close a persistent connection if it is idle for some period of time (five seconds in Apache, for example). User agents can also close connections after a period of idle time. The only visibility into connections closed is through a network analyzer like Wireshark.

In addition to aggressively closing persistent connections, most web server software can be configured to disable persistent connections. This is common with shared servers. Shared servers sacrifice performance to allow as many connections as possible. Because persistent connections are the default connection style with HTTP 1.1, a server that does not allow persistent connections has to include a Connection header in every HTTP response. The following code is an example.

The Connection: close header is a signal to the user agent that the connection will not be persistent and should be closed as soon as possible. The agent isn't allowed to make a second request on the same connection.


Pipelined Connections

Parallel connections and persistent connections are both widely used and supported by clients and servers. The HTTP specification also allows for pipelined connections, which are not as widely supported by either servers or clients. In a pipelined connection, a user agent can send multiple HTTP requests on a connection before waiting for the first response. Pipelining allows for a more efficient packing of requests into packets and can reduce latency, but it's not as widely supported as parallel and persistent connections.


Where Are We?

In this chapter we've looked at HTTP connections and talked about some of the performance optimizations made possible by the HTTP specifications. Now that we have delved deep into HTTP messages and even examined the connections and TCP support underneath the protocol, we'll take a step back and look at the Internet from a wider perspective.

Tags:

Comments

Related Articles