HTTP Learning Notes

2018年10月27日

HTTP Learning Notes

Intro

HTTP is a protocol which allows the fetching of resources, such as HTML documents. Clients and servers communicate by exchanging individual messages(as opposed to stream of data).The messages sent by the client, usually a Web browser, are called requests and the messages sent by server as an answer are called responses.

Components of HTTP-based systems

Client: The user-agent

The user-agent is any tool that acts on the behalf of the user.

The Web server

The server which serves the document as requested by the client.

Proxies

caching
filtering
load balancing
authentication
logging

HTTP flow

open a TCP connection
Send an HTTP message
Read the response sent by the server
Close or reuse the connection for further requests

Evolution

Invention

1990, Berners-Lee build World Wide Web, it consisted of 4 building blocks:

A text format to represent hypertext documents, the HyperText Markup Language(HTML)
A simple protocol to exchange these documents, the HyperText Transfer Protocol(HTTP)
A client to display(and accidentally edit) these documents, the first Web browser called WorldWideWeb
A server to give access to the document, an early version of httpd

HTTP/0.9 - The one-line protocol

The initial version of HTTP had no version number; it has been later called 0.9 to differentiate it from the later version.

GET /mypage.html

<HTML>
A very simple HTML page
</HTML>

HTTP/1.0 - Building extensibility

Versioning information is now sent within each request (HTTP/1.0 is appended to the GET line)
A status code line is also sent at the beginning of the response, allowing the browser itself to understand the success or failure of the request and to adapt its behavior in consequence (like in updating or using its local cache in a specific way)
The notion of HTTP headers has been introduced, both for the requests and the responses, allowing metadata to be transmitted and making the protocol extremely flexible and extensible.
With the help of the new HTTP headers, the ability to transmit other documents than plain HTML files has been added (thanks to the Content-Type header).

GET /mypage.html HTTP/1.0
User-Agent: NCSA_Mosaic/2.0 (Windows 3.1)

200 OK
Date: Tue, 15 Nov 1994 08:12:31 GMT
Server: CERN/3.0 libwww/2.17
Content-Type: text/html
<HTML> 
A page with an image
  <IMG SRC="/myimage.gif">
</HTML>

HTTP/1.1 - The standardized protocol

The first standardized version of HTTP, HTTP/1.1 was published in early 1997, only a few months after HTTP/1.0.

feature:

A connection can be reused, saving the time to reopen it numerous times to display the resources embedded into the single original document retrieved.
Pipelining has been added, allowing to send a second request before the answer for the first one is fully transmitted, lowering the latency of the communication.
Chunked responses are now also supported.
Additional cache control mechanisms have been introduced.
Content negotiation, including language, encoding, or type, has been introduced, and allows a client and a server to agree on the most adequate content to exchange.
Thanks to the Host header, the ability to host different domains at the same IP address now allows server collocation.

GET /en-US/docs/Glossary/Simple_header HTTP/1.1
Host: developer.mozilla.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://developer.mozilla.org/en-US/docs/Glossary/Simple_header

200 OK
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Wed, 20 Jul 2016 10:55:30 GMT
Etag: "547fa7e369ef56031dd3bff2ace9fc0832eb251a"
Keep-Alive: timeout=5, max=1000
Last-Modified: Tue, 19 Jul 2016 00:59:33 GMT
Server: Apache
Transfer-Encoding: chunked
Vary: Cookie, Accept-Encoding

(content)


GET /static/img/header-background.png HTTP/1.1
Host: developer.cdn.mozilla.net
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://developer.mozilla.org/en-US/docs/Glossary/Simple_header

200 OK
Age: 9578461
Cache-Control: public, max-age=315360000
Connection: keep-alive
Content-Length: 3077
Content-Type: image/png
Date: Thu, 31 Mar 2016 13:34:46 GMT
Last-Modified: Wed, 21 Oct 2015 18:27:50 GMT
Server: Apache

(image content of 3077 bytes)

HTTP/2 - A protocol for greater performance

feature:

It is a binary protocol rather than text. It can no longer be read and created manually despite this hurdle, improved optimization techniques can now be implemented.
It is a multiplexed protocol. Parallel requests can be handled over the same connection, removing the order and blocking constraints of the HTTP/1.x protocol.
It compresses headers. As these are often similar among a set of requests, this removes duplication and overhead of data transmitted.
It allows a server to populate data in a client cache, in advance of it being required, through a mechanism called the server push.

Post-HTTP/2 evolution

feature:

Support of Alt-Svc allows the dissociation of the identification and the location of a given resource, allowing for a smarter CDN caching mechanism.
The introduction of Client-Hints allows the browser, or client, to proactively communicate information about its requirements, or hardware constraints, to the server.
The introduction of security-related prefixes in the Cookie header, now helps guarantee a secure cookie has not been altered.

HTTP Messages

HTTP messages are how data is exchanged between a server and a client. There are two types of messages: request sent by the client to trigger an action on the server, and responses, the answer from the server.

HTTP requests, and responses, share similar structure and composed of:

A start-line describing the requests to be implemented, or its status of whether successful or a failure. This start-line is always a single line.
An optional set of HTTP headers specifying the request, or describing the body included in the message.
A blank line indicating all meta-information for the request have been sent.
An optional body containing data associated with the request(like content of an HTML form), or the document associated with a response. The presence of the body and its size is specified by the start-line and HTTP headers.

HTTP Requests

Start line

An HTTP method, a verb(like GET, PUT, POST) or a noun(like HEAD, OPTIONS), that describes the action to be performed.
The request target, usually a URL, or the absolute path of the protocol, port, and domain are usually characterized by the request context.
The HTTP version, which defines the structure of the ramaining message, acting as an indicator of the expected version to use for the response.

Headers

HTTP headers from a request follow the same basic structure of an HTTP header: a case-insensitive string followed by a colon(’:’) and a value whose structure depends upon the header.

General headers, like Via, apply to the message as a whole.
Request headers, like User-Agent, Accept-Type, modify the request by specifying it further(like Accept-Language), by giving context(like Referer), or by conditionally restricting it(likeIf-None).
Entity headers, like Content-Length which apply to the body of the request. Obviously there is no such header transmitted if there is no body in the request.

Body

often POST request has body.

Single-resource bodies, consisting of one single file, defined by the two headers: Content-Type and Content-Length.
Multiple-resource bodies, consisting of a multipart body, each containing a different bit of information. This is typically associated with HTML Forms.

HTTP Responses

Status line

The start line of an HTTP response, called the status line, contains the following information:

The protocol version, usually HTTP/1.1.
A status code, indicating success or failure of the request. Common status codes are 200, 404, or 302.
A status text. A brief, purely informational, textual description of the status code to help a human understand the HTTP message.

Headers

HTTP headers for responses follow the same structure as any other header: a case-insensitive string followed by a colon (’:’) and a value whose structure depends upon the type of the header.

General header, like Via, apply to the whole message.
Response headers, like Vary and Accept-Ranges, give additional information about the server which doesn’t fit in the status line.
Entity headers, like Content-Length, apply to the body of the request.

Body

Not all responses have one: responses with a status code, like 201 or 204, usually don’t.

Single-resource bodies, consisting of a single file of known length, defined by the two headers: Content-Type and Content-Length.
Single-resource bodies, consisting of a single file of unknown length, encoded by chunks with Transfer-Encoding set to chunked.
Multiple-resource bodies, consisting of a multipart body, each containing a different section of information.These are relatively rare.

A typical HTTP session

In client-server protocols, like HTTP, sessions consist of three phases:

The client establishes a TCP connection (or the appropriate connection if the transport layer is not TCP).
The client sends its request, and waits for the answer.
The server processes the request, sending back its answer, providing a status code and appropriate data.

Establishing a connection

In client-server protocols, it is the client which establishes the connection.Opening a connection in HTTP means initiating a connection in the underlying transport layer, usually this is TCP.

Sending a client request

Once the connection is established, the user-agent can send the request.

Connection management in HTTP/1.x

Connection management is a key topic in HTTP: opening and maintaining connections largely impacts the performance of Web sites and Web applications.

short-lived connections
persistent connections
HTTP pipelining

MIME Types

Two primary MIME types are important for the role of default types:

text/plain is the default value for textual files. A textual file should be human-readable and must not contain binary data.
application/octet-stream is the default value for all other cases. An unknown file type should use this type.Browsers pay a particular care when manipulating these files, attempting to safeguard the user to prevent dangerous behaviors.

Choosing between www and non-www URLs

choose one of your domains as your canonical one.

Using HTTP 301 redirects
Using <link rel="canonical">

An HTTP cookie(web cookie, browser cookie) is a small piece of data that a sever sends to the user’s web browser.The browser may store it and send it back with the next request to the same server.It remembers stateful information for the stateless HTTP protocol.

Cookies are mainly used for three purposes:

Session management Logins, shopping carts, game scores, or anything else the server should remember

Personalization User preferences, themes, and other settings

Tracking Recording and analyzing user behavior

Creating cookies

Server

a server can send a Set-Cookie header with the response.

HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: yummy_cookie=choco
Set-Cookie: tasty_cookie=strawberry

[page content]

Client

every new request to the server, the browser will send back all previously stored cookies to the server using the Cookie header.

GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: yummy_cookie=choco; tasty_cookie=strawberry

Session cookies

it is deleted when the client shuts down, because it didn’t specify an Expires or Max-Age directive

Permanent cookies

Instead of expiring when the client closes,permanent cookies expire at a specific date (Expires) or after a specific length of time (Max-Age).

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT;

Secure and HttpOnly cookies

A secure cookie is only sent to the server with an encrypted request over the HTTPS protocol. Even with Secure, sensitive information should never be stored in cookies, as they are inherently insecure and this flag can’t offer real protection.

To prevent cross-site scripting (XSS) attacks, HttpOnly cookies are inaccessible to JavaScript’s Document.cookie API; they are only sent to the server. For example, cookies that persist server-side sessions don’t need to be available to JavaScript, and the HttpOnly flag should be set.

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnly

Scope of cookies

The Domain and Path directives define the scope of the cookie: what URLs the cookies should be sent to.

JavaScript access using Document.cookie

New cookies can also be created via JavaScript using the Document.cookie property, and if the HttpOnly flag is not set, existing cookies can be accessed from JavaScript as well.

document.cookie = "yummy_cookie=choco"; 
document.cookie = "tasty_cookie=strawberry"; 
console.log(document.cookie); 
// logs "yummy_cookie=choco; tasty_cookie=strawberry"

HTTP access control(CORS)

MDN CORS

Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin. A web application makes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, and port) than its own origin.

What requests use CORS

This cross-origin sharing standard is used to enable cross-site HTTP requests for:

Invocations of the XMLHttpRequest or Fetch APIs in a cross-site manner, as discussed above.
Web Fonts (for cross-domain font usage in @font-face within CSS), so that servers can deploy TrueType fonts that can only be cross-site loaded and used by web sites that are permitted to do so.
WebGL textures.
Images/video frames drawn to a canvas using drawImage().
Stylesheets (for CSSOM access).
Scripts (for unmuted exceptions).

Functional overview

The Cross-Origin Resource Sharing standard works by adding new HTTP headers that allow servers to describe the set of origins that are permitted to read that information using a web browser. Additionally, for HTTP request methods that can cause side-effects on server’s data (in particular, for HTTP methods other than GET, or for POST usage with certain MIME types), the specification mandates that browsers “preflight” the request, soliciting supported methods from the server with an HTTP OPTIONS request method, and then, upon “approval” from the server, sending the actual request with the actual HTTP request method. Servers can also notify clients whether “credentials” (including Cookies and HTTP Authentication data) should be sent with requests.

The HTTP response headers

Access-Control-Allow-Origin
Access-Control-Expose-Headers
Access-Control-Max-Age
Access-Control-Allow-Credentials
Access-Control-Allow-Methods
Access-Control-Allow-Headers

The HTTP request headers

Origin
Access-Control-Request-Method
Access-Control-Request-Headers

HTTP caching

参考

Compression in HTTP

Compression algorithms categories:

Loss-less compression, where the compression-uncompression cycle doesn’t alter the data that is recovered. It matches (byte to byte) with the original. For images, gif or png are using loss-less compression.
Lossy compression, where the cycle alters the original data, in an imperceptible way for the user. Video formats on the Web are lossy and for images, jpeg is.

Some formats can be used for both loss-less or lossy compression, like webp, and usually lossy algorithm can be configured to compress more or less, which then of course leads to less or more quality.

End-to-end compression

For compression, end-to-end compression is where the largest performance improvements of Web sites reside. End-to-end compression refers to a compression of the body of a message that is done by the server and will last unchanged until it reaches the client. Whatever the intermediate nodes are, they leave the body untouched.

compression algorithms:

gzip
br

Hop-by-hop compression

Hop-by-hop compression, though similar to end-to-end compression, differs by one fundamental element: the compression doesn’t happen on the resource in the server, creating a specific representation that is then transmitted, but on the body of the message between any two nodes on the path between the client and the server. Connections between successive intermediate nodes may apply a different compression.

HTTP conditional requests

HTTP has a concept of conditional requests, where the result, and even the success of a request, can be changed by comparing the affected resources with the value of a validator. Such requests can be useful to validate the content of a cache, and sparing a useless control, to verify the integrity of a document, like when resuming a download, or when preventing to lose updates when uploading or modifying a document on the server.

Validators

the date of last modification of the document, the last-modified date.
an opaque string, uniquely identifying each version, called the entity tag, or the etag.

Conditional headers

If-Match Succeeds if the ETag of the distant resource is equal to one listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.

If-None-Match Succeeds if the ETag of the distant resource is different to each listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.

If-Modified-Since Succeeds if the Last-Modified date of the distant resource is more recent than the one given in this header.

If-Unmodified-Since Succeeds if the Last-Modified date of the distant resource is older or the same than the one given in this header.

If-Range Similar to If-Match, or If-Unmodified-Since, but can have only one single etag, or one date. If it fails, the range request fails, and instead of a 206 Partial Content response, a 200 OK is sent with the complete resource.

HTTP Content negotiation

In HTTP, content negotiation is the mechanism that is used for serving different representations of a resource at the same URI, so that the user agent can specify which is best suited for the user (for example, which language of a document, which image format, or which content encoding).

HTTP range requests

HTTP range requests allow to send only a portion of an HTTP message from a server to a client. Partial requests are useful for large media or downloading files with pause and resume functions, for example.

HTTP redirections

URL redirection, also known as URL forwarding, is a technique to give a page, a form or a whole Web application, more than one URL address. HTTP provides a special kind of responses, HTTP redirects, to perform this operation used for numerous goals: temporary redirection while site maintenance is ongoing, permanent redirection to keep external links working after a change of the site’s architecture, progress pages when uploading a file, and so on.

There are several types of redirects and they fall into three categories: permanent, temporary and special redirections.

Permanent redirections

Temporary redirections

Special redirections

其他重定向方式

HTML

<head> 
  <meta http-equiv="refresh" content="0; URL=http://www.example.com/" />
</head>

JavaScript

window.location = "http://www.example.com/";

使用顺序

HTTP redirects are always executed first when there is not even a page transmitted, and of course not even read.
HTML redirects () are executed if there weren’t any HTTP redirects.
JavaScript redirects are used as the last resort, and only if JavaScript is enabled on the client side.