书格前端

HTTP Learning Notes


HTTP Learning Notes

Intro

HTTP is a protocol which allows the fetching of resources, such as HTML documents. Clients and servers communicate by exchanging individual messages(as opposed to stream of data).The messages sent by the client, usually a Web browser, are called requests and the messages sent by server as an answer are called responses.

Components of HTTP-based systems

Client: The user-agent

The user-agent is any tool that acts on the behalf of the user.

The Web server

The server which serves the document as requested by the client.

Proxies

HTTP flow

  1. open a TCP connection
  2. Send an HTTP message
  3. Read the response sent by the server
  4. Close or reuse the connection for further requests

Evolution

Invention

1990, Berners-Lee build World Wide Web, it consisted of 4 building blocks:

HTTP/0.9 - The one-line protocol

The initial version of HTTP had no version number; it has been later called 0.9 to differentiate it from the later version.

GET /mypage.html
<HTML>
A very simple HTML page
</HTML>

HTTP/1.0 - Building extensibility

GET /mypage.html HTTP/1.0
User-Agent: NCSA_Mosaic/2.0 (Windows 3.1)

200 OK
Date: Tue, 15 Nov 1994 08:12:31 GMT
Server: CERN/3.0 libwww/2.17
Content-Type: text/html
<HTML> 
A page with an image
  <IMG SRC="/myimage.gif">
</HTML>

HTTP/1.1 - The standardized protocol

The first standardized version of HTTP, HTTP/1.1 was published in early 1997, only a few months after HTTP/1.0.

feature:

GET /en-US/docs/Glossary/Simple_header HTTP/1.1
Host: developer.mozilla.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://developer.mozilla.org/en-US/docs/Glossary/Simple_header

200 OK
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Wed, 20 Jul 2016 10:55:30 GMT
Etag: "547fa7e369ef56031dd3bff2ace9fc0832eb251a"
Keep-Alive: timeout=5, max=1000
Last-Modified: Tue, 19 Jul 2016 00:59:33 GMT
Server: Apache
Transfer-Encoding: chunked
Vary: Cookie, Accept-Encoding

(content)


GET /static/img/header-background.png HTTP/1.1
Host: developer.cdn.mozilla.net
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://developer.mozilla.org/en-US/docs/Glossary/Simple_header

200 OK
Age: 9578461
Cache-Control: public, max-age=315360000
Connection: keep-alive
Content-Length: 3077
Content-Type: image/png
Date: Thu, 31 Mar 2016 13:34:46 GMT
Last-Modified: Wed, 21 Oct 2015 18:27:50 GMT
Server: Apache

(image content of 3077 bytes)

HTTP/2 - A protocol for greater performance

feature:

Post-HTTP/2 evolution

feature:

HTTP Messages

HTTP messages are how data is exchanged between a server and a client. There are two types of messages: request sent by the client to trigger an action on the server, and responses, the answer from the server.

HTTP requests, and responses, share similar structure and composed of:

  1. A start-line describing the requests to be implemented, or its status of whether successful or a failure. This start-line is always a single line.
  2. An optional set of HTTP headers specifying the request, or describing the body included in the message.
  3. A blank line indicating all meta-information for the request have been sent.
  4. An optional body containing data associated with the request(like content of an HTML form), or the document associated with a response. The presence of the body and its size is specified by the start-line and HTTP headers.

HTTP Requests

Start line

  1. An HTTP method, a verb(like GET, PUT, POST) or a noun(like HEAD, OPTIONS), that describes the action to be performed.
  2. The request target, usually a URL, or the absolute path of the protocol, port, and domain are usually characterized by the request context.
  3. The HTTP version, which defines the structure of the ramaining message, acting as an indicator of the expected version to use for the response.

Headers

HTTP headers from a request follow the same basic structure of an HTTP header: a case-insensitive string followed by a colon(’:’) and a value whose structure depends upon the header.

Body

often POST request has body.

HTTP Responses

Status line

The start line of an HTTP response, called the status line, contains the following information:

  1. The protocol version, usually HTTP/1.1.
  2. A status code, indicating success or failure of the request. Common status codes are 200, 404, or 302.
  3. A status text. A brief, purely informational, textual description of the status code to help a human understand the HTTP message.

Headers

HTTP headers for responses follow the same structure as any other header: a case-insensitive string followed by a colon (’:’) and a value whose structure depends upon the type of the header.

Body

Not all responses have one: responses with a status code, like 201 or 204, usually don’t.

A typical HTTP session

In client-server protocols, like HTTP, sessions consist of three phases:

  1. The client establishes a TCP connection (or the appropriate connection if the transport layer is not TCP).
  2. The client sends its request, and waits for the answer.
  3. The server processes the request, sending back its answer, providing a status code and appropriate data.

Establishing a connection

In client-server protocols, it is the client which establishes the connection.Opening a connection in HTTP means initiating a connection in the underlying transport layer, usually this is TCP.

Sending a client request

Once the connection is established, the user-agent can send the request.

Connection management in HTTP/1.x

Connection management is a key topic in HTTP: opening and maintaining connections largely impacts the performance of Web sites and Web applications.

MIME Types

Two primary MIME types are important for the role of default types:

Choosing between www and non-www URLs

choose one of your domains as your canonical one.

An HTTP cookie(web cookie, browser cookie) is a small piece of data that a sever sends to the user’s web browser.The browser may store it and send it back with the next request to the same server.It remembers stateful information for the stateless HTTP protocol.

Cookies are mainly used for three purposes:

Session management Logins, shopping carts, game scores, or anything else the server should remember

Personalization User preferences, themes, and other settings

Tracking Recording and analyzing user behavior

Creating cookies

Server

a server can send a Set-Cookie header with the response.

HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: yummy_cookie=choco
Set-Cookie: tasty_cookie=strawberry

[page content]

Client

every new request to the server, the browser will send back all previously stored cookies to the server using the Cookie header.

GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: yummy_cookie=choco; tasty_cookie=strawberry

Session cookies

it is deleted when the client shuts down, because it didn’t specify an Expires or Max-Age directive

Permanent cookies

Instead of expiring when the client closes,permanent cookies expire at a specific date (Expires) or after a specific length of time (Max-Age).

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT;

Secure and HttpOnly cookies

A secure cookie is only sent to the server with an encrypted request over the HTTPS protocol. Even with Secure, sensitive information should never be stored in cookies, as they are inherently insecure and this flag can’t offer real protection.

To prevent cross-site scripting (XSS) attacks, HttpOnly cookies are inaccessible to JavaScript’s Document.cookie API; they are only sent to the server. For example, cookies that persist server-side sessions don’t need to be available to JavaScript, and the HttpOnly flag should be set.

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnly

Scope of cookies

The Domain and Path directives define the scope of the cookie: what URLs the cookies should be sent to.

JavaScript access using Document.cookie

New cookies can also be created via JavaScript using the Document.cookie property, and if the HttpOnly flag is not set, existing cookies can be accessed from JavaScript as well.

document.cookie = "yummy_cookie=choco"; 
document.cookie = "tasty_cookie=strawberry"; 
console.log(document.cookie); 
// logs "yummy_cookie=choco; tasty_cookie=strawberry"

HTTP access control(CORS)

MDN CORS

Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin. A web application makes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, and port) than its own origin.

What requests use CORS

This cross-origin sharing standard is used to enable cross-site HTTP requests for:

Functional overview

The Cross-Origin Resource Sharing standard works by adding new HTTP headers that allow servers to describe the set of origins that are permitted to read that information using a web browser. Additionally, for HTTP request methods that can cause side-effects on server’s data (in particular, for HTTP methods other than GET, or for POST usage with certain MIME types), the specification mandates that browsers “preflight” the request, soliciting supported methods from the server with an HTTP OPTIONS request method, and then, upon “approval” from the server, sending the actual request with the actual HTTP request method. Servers can also notify clients whether “credentials” (including Cookies and HTTP Authentication data) should be sent with requests.

The HTTP response headers

The HTTP request headers

HTTP caching

参考

Compression in HTTP

Compression algorithms categories:

Some formats can be used for both loss-less or lossy compression, like webp, and usually lossy algorithm can be configured to compress more or less, which then of course leads to less or more quality.

End-to-end compression

For compression, end-to-end compression is where the largest performance improvements of Web sites reside. End-to-end compression refers to a compression of the body of a message that is done by the server and will last unchanged until it reaches the client. Whatever the intermediate nodes are, they leave the body untouched.

compression algorithms:

Hop-by-hop compression

Hop-by-hop compression, though similar to end-to-end compression, differs by one fundamental element: the compression doesn’t happen on the resource in the server, creating a specific representation that is then transmitted, but on the body of the message between any two nodes on the path between the client and the server. Connections between successive intermediate nodes may apply a different compression.

HTTP conditional requests

HTTP has a concept of conditional requests, where the result, and even the success of a request, can be changed by comparing the affected resources with the value of a validator. Such requests can be useful to validate the content of a cache, and sparing a useless control, to verify the integrity of a document, like when resuming a download, or when preventing to lose updates when uploading or modifying a document on the server.

Validators

Conditional headers

If-Match Succeeds if the ETag of the distant resource is equal to one listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.

If-None-Match Succeeds if the ETag of the distant resource is different to each listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.

If-Modified-Since Succeeds if the Last-Modified date of the distant resource is more recent than the one given in this header.

If-Unmodified-Since Succeeds if the Last-Modified date of the distant resource is older or the same than the one given in this header.

If-Range Similar to If-Match, or If-Unmodified-Since, but can have only one single etag, or one date. If it fails, the range request fails, and instead of a 206 Partial Content response, a 200 OK is sent with the complete resource.

HTTP Content negotiation

In HTTP, content negotiation is the mechanism that is used for serving different representations of a resource at the same URI, so that the user agent can specify which is best suited for the user (for example, which language of a document, which image format, or which content encoding).

HTTP range requests

HTTP range requests allow to send only a portion of an HTTP message from a server to a client. Partial requests are useful for large media or downloading files with pause and resume functions, for example.

HTTP redirections

URL redirection, also known as URL forwarding, is a technique to give a page, a form or a whole Web application, more than one URL address. HTTP provides a special kind of responses, HTTP redirects, to perform this operation used for numerous goals: temporary redirection while site maintenance is ongoing, permanent redirection to keep external links working after a change of the site’s architecture, progress pages when uploading a file, and so on.

There are several types of redirects and they fall into three categories: permanent, temporary and special redirections.

Permanent redirections

Temporary redirections

Special redirections

其他重定向方式

HTML

<head> 
  <meta http-equiv="refresh" content="0; URL=http://www.example.com/" />
</head>

JavaScript

window.location = "http://www.example.com/";

使用顺序

  1. HTTP redirects are always executed first when there is not even a page transmitted, and of course not even read.
  2. HTML redirects () are executed if there weren’t any HTTP redirects.
  3. JavaScript redirects are used as the last resort, and only if JavaScript is enabled on the client side.