HTTP Learning Notes
HTTP Learning Notes
HTTP is a protocol which allows the fetching of resources, such as HTML documents.
Clients and servers communicate by exchanging individual messages(as opposed to stream of data).The messages sent by the client, usually a Web browser, are called requests and the messages sent by server as an answer are called responses.
Components of HTTP-based systems
Client: The user-agent
The user-agent is any tool that acts on the behalf of the user.
The Web server
The server which serves the document as requested by the client.
- load balancing
- open a TCP connection
- Send an HTTP message
- Read the response sent by the server
- Close or reuse the connection for further requests
1990, Berners-Lee build World Wide Web, it consisted of 4 building blocks:
- A text format to represent hypertext documents, the HyperText Markup Language(HTML)
- A simple protocol to exchange these documents, the HyperText Transfer Protocol(HTTP)
- A client to display(and accidentally edit) these documents, the first Web browser called WorldWideWeb
- A server to give access to the document, an early version of httpd
HTTP/0.9 - The one-line protocol
The initial version of HTTP had no version number; it has been later called 0.9 to differentiate it from the later version.
A very simple HTML page
HTTP/1.0 - Building extensibility
- Versioning information is now sent within each request (HTTP/1.0 is appended to the GET line)
- A status code line is also sent at the beginning of the response, allowing the browser itself to understand the success or failure of the request and to adapt its behavior in consequence (like in updating or using its local cache in a specific way)
- The notion of HTTP headers has been introduced, both for the requests and the responses, allowing metadata to be transmitted and making the protocol extremely flexible and extensible.
- With the help of the new HTTP headers, the ability to transmit other documents than plain HTML files has been added (thanks to the Content-Type header).
GET /mypage.html HTTP/1.0
User-Agent: NCSA_Mosaic/2.0 (Windows 3.1)
Date: Tue, 15 Nov 1994 08:12:31 GMT
Server: CERN/3.0 libwww/2.17
A page with an image
HTTP/1.1 - The standardized protocol
The first standardized version of HTTP, HTTP/1.1 was published in early 1997, only a few months after HTTP/1.0.
- A connection can be reused, saving the time to reopen it numerous times to display the resources embedded into the single original document retrieved.
- Pipelining has been added, allowing to send a second request before the answer for the first one is fully transmitted, lowering the latency of the communication.
- Chunked responses are now also supported.
- Additional cache control mechanisms have been introduced.
- Content negotiation, including language, encoding, or type, has been introduced, and allows a client and a server to agree on the most adequate content to exchange.
- Thanks to the Host header, the ability to host different domains at the same IP address now allows server collocation.
GET /en-US/docs/Glossary/Simple_header HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept-Encoding: gzip, deflate, br
Content-Type: text/html; charset=utf-8
Date: Wed, 20 Jul 2016 10:55:30 GMT
Keep-Alive: timeout=5, max=1000
Last-Modified: Tue, 19 Jul 2016 00:59:33 GMT
Vary: Cookie, Accept-Encoding
GET /static/img/header-background.png HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept-Encoding: gzip, deflate, br
Cache-Control: public, max-age=315360000
Date: Thu, 31 Mar 2016 13:34:46 GMT
Last-Modified: Wed, 21 Oct 2015 18:27:50 GMT
(image content of 3077 bytes)
HTTP/2 - A protocol for greater performance
- It is a binary protocol rather than text. It can no longer be read and created manually despite this hurdle, improved optimization techniques can now be implemented.
- It is a multiplexed protocol. Parallel requests can be handled over the same connection, removing the order and blocking constraints of the HTTP/1.x protocol.
- It compresses headers. As these are often similar among a set of requests, this removes duplication and overhead of data transmitted.
- It allows a server to populate data in a client cache, in advance of it being required, through a mechanism called the server push.
- Support of Alt-Svc allows the dissociation of the identification and the location of a given resource, allowing for a smarter CDN caching mechanism.
- The introduction of Client-Hints allows the browser, or client, to proactively communicate information about its requirements, or hardware constraints, to the server.
- The introduction of security-related prefixes in the Cookie header, now helps guarantee a secure cookie has not been altered.
HTTP messages are how data is exchanged between a server and a client. There are two types of messages: request sent by the client to trigger an action on the server, and responses, the answer from the server.
HTTP requests, and responses, share similar structure and composed of:
- A start-line describing the requests to be implemented, or its status of whether successful or a failure. This start-line is always a single line.
- An optional set of HTTP headers specifying the request, or describing the body included in the message.
- A blank line indicating all meta-information for the request have been sent.
- An optional body containing data associated with the request(like content of an HTML form), or the document associated with a response. The presence of the body and its size is specified by the start-line and HTTP headers.
- An HTTP method, a verb(like
POST) or a noun(like
OPTIONS), that describes the action to be performed.
- The request target, usually a URL, or the absolute path of the protocol, port, and domain are usually characterized by the request context.
- The HTTP version, which defines the structure of the ramaining message, acting as an indicator of the expected version to use for the response.
HTTP headers from a request follow the same basic structure of an HTTP header: a case-insensitive string followed by a colon(’:’) and a value whose structure depends upon the header.
- General headers, like
Via, apply to the message as a whole.
- Request headers, like
Accept-Type, modify the request by specifying it further(like
Accept-Language), by giving context(like
Referer), or by conditionally restricting it(like
- Entity headers, like
Content-Lengthwhich apply to the body of the request. Obviously there is no such header transmitted if there is no body in the request.
POST request has body.
- Single-resource bodies, consisting of one single file, defined by the two headers:
- Multiple-resource bodies, consisting of a multipart body, each containing a different bit of information. This is typically associated with HTML Forms.
The start line of an HTTP response, called the status line, contains the following information:
- The protocol version, usually
- A status code, indicating success or failure of the request. Common status codes are
- A status text. A brief, purely informational, textual description of the status code to help a human understand the HTTP message.
HTTP headers for responses follow the same structure as any other header: a case-insensitive string followed by a colon (’:’) and a value whose structure depends upon the type of the header.
- General header, like
Via, apply to the whole message.
- Response headers, like
Accept-Ranges, give additional information about the server which doesn’t fit in the status line.
- Entity headers, like
Content-Length, apply to the body of the request.
Not all responses have one: responses with a status code, like
204, usually don’t.
- Single-resource bodies, consisting of a single file of known length, defined by the two headers:
- Single-resource bodies, consisting of a single file of unknown length, encoded by chunks with
Transfer-Encodingset to chunked.
- Multiple-resource bodies, consisting of a multipart body, each containing a different section of information.These are relatively rare.
A typical HTTP session
In client-server protocols, like HTTP, sessions consist of three phases:
- The client establishes a TCP connection (or the appropriate connection if the transport layer is not TCP).
- The client sends its request, and waits for the answer.
- The server processes the request, sending back its answer, providing a status code and appropriate data.
Establishing a connection
In client-server protocols, it is the client which establishes the connection.Opening a connection in HTTP means initiating a connection in the underlying transport layer, usually this is TCP.
Sending a client request
Once the connection is established, the user-agent can send the request.
Connection management in HTTP/1.x
Connection management is a key topic in HTTP: opening and maintaining connections largely impacts the performance of Web sites and Web applications.
- short-lived connections
- persistent connections
- HTTP pipelining
Two primary MIME types are important for the role of default types:
text/plainis the default value for textual files. A textual file should be human-readable and must not contain binary data.
application/octet-streamis the default value for all other cases. An unknown file type should use this type.Browsers pay a particular care when manipulating these files, attempting to safeguard the user to prevent dangerous behaviors.
Choosing between www and non-www URLs
choose one of your domains as your canonical one.
- Using HTTP 301 redirects
An HTTP cookie(web cookie, browser cookie) is a small piece of data that a sever sends to the user’s web browser.The browser may store it and send it back with the next request to the same server.It remembers stateful information for the stateless HTTP protocol.
Cookies are mainly used for three purposes:
Session management Logins, shopping carts, game scores, or anything else the server should remember
Personalization User preferences, themes, and other settings
Tracking Recording and analyzing user behavior
a server can send a
Set-Cookie header with the response.
HTTP/1.0 200 OK
every new request to the server, the browser will send back all previously stored cookies to the server using the
GET /sample_page.html HTTP/1.1
Cookie: yummy_cookie=choco; tasty_cookie=strawberry
it is deleted when the client shuts down, because it didn’t specify an Expires or Max-Age directive
Instead of expiring when the client closes,permanent cookies expire at a specific date (Expires) or after a specific length of time (Max-Age).
Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT;
Secure and HttpOnly cookies
A secure cookie is only sent to the server with an encrypted request over the HTTPS protocol. Even with Secure, sensitive information should never be stored in cookies, as they are inherently insecure and this flag can’t offer real protection.
Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnly
Scope of cookies
The Domain and Path directives define the scope of the cookie: what URLs the cookies should be sent to.
document.cookie = "yummy_cookie=choco";
document.cookie = "tasty_cookie=strawberry";
// logs "yummy_cookie=choco; tasty_cookie=strawberry"
HTTP access control(CORS)
Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin. A web application makes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, and port) than its own origin.
What requests use CORS
This cross-origin sharing standard is used to enable cross-site HTTP requests for:
- Invocations of the XMLHttpRequest or Fetch APIs in a cross-site manner, as discussed above.
- Web Fonts (for cross-domain font usage in @font-face within CSS), so that servers can deploy TrueType fonts that can only be cross-site loaded and used by web sites that are permitted to do so.
- WebGL textures.
- Images/video frames drawn to a canvas using drawImage().
- Stylesheets (for CSSOM access).
- Scripts (for unmuted exceptions).
The Cross-Origin Resource Sharing standard works by adding new HTTP headers that allow servers to describe the set of origins that are permitted to read that information using a web browser. Additionally, for HTTP request methods that can cause side-effects on server’s data (in particular, for HTTP methods other than GET, or for POST usage with certain MIME types), the specification mandates that browsers “preflight” the request, soliciting supported methods from the server with an HTTP OPTIONS request method, and then, upon “approval” from the server, sending the actual request with the actual HTTP request method. Servers can also notify clients whether “credentials” (including Cookies and HTTP Authentication data) should be sent with requests.
The HTTP response headers
The HTTP request headers
Compression in HTTP
Compression algorithms categories:
- Loss-less compression, where the compression-uncompression cycle doesn’t alter the data that is recovered. It matches (byte to byte) with the original. For images, gif or png are using loss-less compression.
- Lossy compression, where the cycle alters the original data, in an imperceptible way for the user. Video formats on the Web are lossy and for images, jpeg is.
Some formats can be used for both loss-less or lossy compression, like webp, and usually lossy algorithm can be configured to compress more or less, which then of course leads to less or more quality.
For compression, end-to-end compression is where the largest performance improvements of Web sites reside. End-to-end compression refers to a compression of the body of a message that is done by the server and will last unchanged until it reaches the client. Whatever the intermediate nodes are, they leave the body untouched.
Hop-by-hop compression, though similar to end-to-end compression, differs by one fundamental element: the compression doesn’t happen on the resource in the server, creating a specific representation that is then transmitted, but on the body of the message between any two nodes on the path between the client and the server. Connections between successive intermediate nodes may apply a different compression.
HTTP conditional requests
HTTP has a concept of conditional requests, where the result, and even the success of a request, can be changed by comparing the affected resources with the value of a validator. Such requests can be useful to validate the content of a cache, and sparing a useless control, to verify the integrity of a document, like when resuming a download, or when preventing to lose updates when uploading or modifying a document on the server.
- the date of last modification of the document, the last-modified date.
- an opaque string, uniquely identifying each version, called the entity tag, or the etag.
If-Match Succeeds if the ETag of the distant resource is equal to one listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.
If-None-Match Succeeds if the ETag of the distant resource is different to each listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.
If-Modified-Since Succeeds if the Last-Modified date of the distant resource is more recent than the one given in this header.
If-Unmodified-Since Succeeds if the Last-Modified date of the distant resource is older or the same than the one given in this header.
If-Range Similar to If-Match, or If-Unmodified-Since, but can have only one single etag, or one date. If it fails, the range request fails, and instead of a 206 Partial Content response, a 200 OK is sent with the complete resource.
HTTP Content negotiation
In HTTP, content negotiation is the mechanism that is used for serving different representations of a resource at the same URI, so that the user agent can specify which is best suited for the user (for example, which language of a document, which image format, or which content encoding).
HTTP range requests
HTTP range requests allow to send only a portion of an HTTP message from a server to a client. Partial requests are useful for large media or downloading files with pause and resume functions, for example.
URL redirection, also known as URL forwarding, is a technique to give a page, a form or a whole Web application, more than one URL address. HTTP provides a special kind of responses, HTTP redirects, to perform this operation used for numerous goals: temporary redirection while site maintenance is ongoing, permanent redirection to keep external links working after a change of the site’s architecture, progress pages when uploading a file, and so on.
There are several types of redirects and they fall into three categories: permanent, temporary and special redirections.
<meta http-equiv="refresh" content="0; URL=http://www.example.com/" />
window.location = "http://www.example.com/";
- HTTP redirects are always executed first when there is not even a page transmitted, and of course not even read.
- HTML redirects () are executed if there weren’t any HTTP redirects.