HTTP Learning Notes
HTTP Learning Notes
Intro
HTTP
is a protocol which allows the fetching of resources, such as HTML documents.
Clients and servers communicate by exchanging individual messages(as opposed to stream of data).The messages sent by the client, usually a Web browser, are called requests and the messages sent by server as an answer are called responses.
Components of HTTP-based systems
Client: The user-agent
The user-agent is any tool that acts on the behalf of the user.
The Web server
The server which serves the document as requested by the client.
Proxies
- caching
- filtering
- load balancing
- authentication
- logging
HTTP flow
- open a TCP connection
- Send an HTTP message
- Read the response sent by the server
- Close or reuse the connection for further requests
Evolution
Invention
1990, Berners-Lee build World Wide Web, it consisted of 4 building blocks:
- A text format to represent hypertext documents, the HyperText Markup Language(HTML)
- A simple protocol to exchange these documents, the HyperText Transfer Protocol(HTTP)
- A client to display(and accidentally edit) these documents, the first Web browser called WorldWideWeb
- A server to give access to the document, an early version of httpd
HTTP/0.9 - The one-line protocol
The initial version of HTTP had no version number; it has been later called 0.9 to differentiate it from the later version.
GET /mypage.html
<HTML>
A very simple HTML page
</HTML>
HTTP/1.0 - Building extensibility
- Versioning information is now sent within each request (HTTP/1.0 is appended to the GET line)
- A status code line is also sent at the beginning of the response, allowing the browser itself to understand the success or failure of the request and to adapt its behavior in consequence (like in updating or using its local cache in a specific way)
- The notion of HTTP headers has been introduced, both for the requests and the responses, allowing metadata to be transmitted and making the protocol extremely flexible and extensible.
- With the help of the new HTTP headers, the ability to transmit other documents than plain HTML files has been added (thanks to the Content-Type header).
GET /mypage.html HTTP/1.0
User-Agent: NCSA_Mosaic/2.0 (Windows 3.1)
200 OK
Date: Tue, 15 Nov 1994 08:12:31 GMT
Server: CERN/3.0 libwww/2.17
Content-Type: text/html
<HTML>
A page with an image
<IMG SRC="/myimage.gif">
</HTML>
HTTP/1.1 - The standardized protocol
The first standardized version of HTTP, HTTP/1.1 was published in early 1997, only a few months after HTTP/1.0.
feature:
- A connection can be reused, saving the time to reopen it numerous times to display the resources embedded into the single original document retrieved.
- Pipelining has been added, allowing to send a second request before the answer for the first one is fully transmitted, lowering the latency of the communication.
- Chunked responses are now also supported.
- Additional cache control mechanisms have been introduced.
- Content negotiation, including language, encoding, or type, has been introduced, and allows a client and a server to agree on the most adequate content to exchange.
- Thanks to the Host header, the ability to host different domains at the same IP address now allows server collocation.
GET /en-US/docs/Glossary/Simple_header HTTP/1.1
Host: developer.mozilla.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://developer.mozilla.org/en-US/docs/Glossary/Simple_header
200 OK
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Wed, 20 Jul 2016 10:55:30 GMT
Etag: "547fa7e369ef56031dd3bff2ace9fc0832eb251a"
Keep-Alive: timeout=5, max=1000
Last-Modified: Tue, 19 Jul 2016 00:59:33 GMT
Server: Apache
Transfer-Encoding: chunked
Vary: Cookie, Accept-Encoding
(content)
GET /static/img/header-background.png HTTP/1.1
Host: developer.cdn.mozilla.net
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://developer.mozilla.org/en-US/docs/Glossary/Simple_header
200 OK
Age: 9578461
Cache-Control: public, max-age=315360000
Connection: keep-alive
Content-Length: 3077
Content-Type: image/png
Date: Thu, 31 Mar 2016 13:34:46 GMT
Last-Modified: Wed, 21 Oct 2015 18:27:50 GMT
Server: Apache
(image content of 3077 bytes)
HTTP/2 - A protocol for greater performance
feature:
- It is a binary protocol rather than text. It can no longer be read and created manually despite this hurdle, improved optimization techniques can now be implemented.
- It is a multiplexed protocol. Parallel requests can be handled over the same connection, removing the order and blocking constraints of the HTTP/1.x protocol.
- It compresses headers. As these are often similar among a set of requests, this removes duplication and overhead of data transmitted.
- It allows a server to populate data in a client cache, in advance of it being required, through a mechanism called the server push.
Post-HTTP/2 evolution
feature:
- Support of Alt-Svc allows the dissociation of the identification and the location of a given resource, allowing for a smarter CDN caching mechanism.
- The introduction of Client-Hints allows the browser, or client, to proactively communicate information about its requirements, or hardware constraints, to the server.
- The introduction of security-related prefixes in the Cookie header, now helps guarantee a secure cookie has not been altered.
HTTP Messages
HTTP messages are how data is exchanged between a server and a client. There are two types of messages: request sent by the client to trigger an action on the server, and responses, the answer from the server.
HTTP requests, and responses, share similar structure and composed of:
- A start-line describing the requests to be implemented, or its status of whether successful or a failure. This start-line is always a single line.
- An optional set of HTTP headers specifying the request, or describing the body included in the message.
- A blank line indicating all meta-information for the request have been sent.
- An optional body containing data associated with the request(like content of an HTML form), or the document associated with a response. The presence of the body and its size is specified by the start-line and HTTP headers.
HTTP Requests
Start line
- An HTTP method, a verb(like
GET
,PUT
,POST
) or a noun(likeHEAD
,OPTIONS
), that describes the action to be performed. - The request target, usually a URL, or the absolute path of the protocol, port, and domain are usually characterized by the request context.
- The HTTP version, which defines the structure of the ramaining message, acting as an indicator of the expected version to use for the response.
Headers
HTTP headers from a request follow the same basic structure of an HTTP header: a case-insensitive string followed by a colon(’:’) and a value whose structure depends upon the header.
- General headers, like
Via
, apply to the message as a whole. - Request headers, like
User-Agent
,Accept-Type
, modify the request by specifying it further(likeAccept-Language
), by giving context(likeReferer
), or by conditionally restricting it(likeIf-None
). - Entity headers, like
Content-Length
which apply to the body of the request. Obviously there is no such header transmitted if there is no body in the request.
Body
often POST
request has body.
- Single-resource bodies, consisting of one single file, defined by the two headers:
Content-Type
andContent-Length
. - Multiple-resource bodies, consisting of a multipart body, each containing a different bit of information. This is typically associated with HTML Forms.
HTTP Responses
Status line
The start line of an HTTP response, called the status line, contains the following information:
- The protocol version, usually
HTTP/1.1
. - A status code, indicating success or failure of the request. Common status codes are
200
,404
, or302
. - A status text. A brief, purely informational, textual description of the status code to help a human understand the HTTP message.
Headers
HTTP headers for responses follow the same structure as any other header: a case-insensitive string followed by a colon (’:’) and a value whose structure depends upon the type of the header.
- General header, like
Via
, apply to the whole message. - Response headers, like
Vary
andAccept-Ranges
, give additional information about the server which doesn’t fit in the status line. - Entity headers, like
Content-Length
, apply to the body of the request.
Body
Not all responses have one: responses with a status code, like 201
or 204
, usually don’t.
- Single-resource bodies, consisting of a single file of known length, defined by the two headers:
Content-Type
andContent-Length
. - Single-resource bodies, consisting of a single file of unknown length, encoded by chunks with
Transfer-Encoding
set to chunked. - Multiple-resource bodies, consisting of a multipart body, each containing a different section of information.These are relatively rare.
A typical HTTP session
In client-server protocols, like HTTP, sessions consist of three phases:
- The client establishes a TCP connection (or the appropriate connection if the transport layer is not TCP).
- The client sends its request, and waits for the answer.
- The server processes the request, sending back its answer, providing a status code and appropriate data.
Establishing a connection
In client-server protocols, it is the client which establishes the connection.Opening a connection in HTTP means initiating a connection in the underlying transport layer, usually this is TCP.
Sending a client request
Once the connection is established, the user-agent can send the request.
Connection management in HTTP/1.x
Connection management is a key topic in HTTP: opening and maintaining connections largely impacts the performance of Web sites and Web applications.
- short-lived connections
- persistent connections
- HTTP pipelining
MIME Types
Two primary MIME types are important for the role of default types:
text/plain
is the default value for textual files. A textual file should be human-readable and must not contain binary data.application/octet-stream
is the default value for all other cases. An unknown file type should use this type.Browsers pay a particular care when manipulating these files, attempting to safeguard the user to prevent dangerous behaviors.
Choosing between www and non-www URLs
choose one of your domains as your canonical one.
- Using HTTP 301 redirects
- Using
<link rel="canonical">
Cookie
An HTTP cookie(web cookie, browser cookie) is a small piece of data that a sever sends to the user’s web browser.The browser may store it and send it back with the next request to the same server.It remembers stateful information for the stateless HTTP protocol.
Cookies are mainly used for three purposes:
Session management Logins, shopping carts, game scores, or anything else the server should remember
Personalization User preferences, themes, and other settings
Tracking Recording and analyzing user behavior
Creating cookies
Server
a server can send a Set-Cookie
header with the response.
HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: yummy_cookie=choco
Set-Cookie: tasty_cookie=strawberry
[page content]
Client
every new request to the server, the browser will send back all previously stored cookies to the server using the Cookie
header.
GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: yummy_cookie=choco; tasty_cookie=strawberry
Session cookies
it is deleted when the client shuts down, because it didn’t specify an Expires or Max-Age directive
Permanent cookies
Instead of expiring when the client closes,permanent cookies expire at a specific date (Expires) or after a specific length of time (Max-Age).
Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT;
Secure and HttpOnly cookies
A secure cookie is only sent to the server with an encrypted request over the HTTPS protocol. Even with Secure, sensitive information should never be stored in cookies, as they are inherently insecure and this flag can’t offer real protection.
To prevent cross-site scripting (XSS) attacks, HttpOnly cookies are inaccessible to JavaScript’s Document.cookie API; they are only sent to the server. For example, cookies that persist server-side sessions don’t need to be available to JavaScript, and the HttpOnly flag should be set.
Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnly
Scope of cookies
The Domain and Path directives define the scope of the cookie: what URLs the cookies should be sent to.
JavaScript access using Document.cookie
New cookies can also be created via JavaScript using the Document.cookie property, and if the HttpOnly flag is not set, existing cookies can be accessed from JavaScript as well.
document.cookie = "yummy_cookie=choco";
document.cookie = "tasty_cookie=strawberry";
console.log(document.cookie);
// logs "yummy_cookie=choco; tasty_cookie=strawberry"
HTTP access control(CORS)
Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin. A web application makes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, and port) than its own origin.
What requests use CORS
This cross-origin sharing standard is used to enable cross-site HTTP requests for:
- Invocations of the XMLHttpRequest or Fetch APIs in a cross-site manner, as discussed above.
- Web Fonts (for cross-domain font usage in @font-face within CSS), so that servers can deploy TrueType fonts that can only be cross-site loaded and used by web sites that are permitted to do so.
- WebGL textures.
- Images/video frames drawn to a canvas using drawImage().
- Stylesheets (for CSSOM access).
- Scripts (for unmuted exceptions).
Functional overview
The Cross-Origin Resource Sharing standard works by adding new HTTP headers that allow servers to describe the set of origins that are permitted to read that information using a web browser. Additionally, for HTTP request methods that can cause side-effects on server’s data (in particular, for HTTP methods other than GET, or for POST usage with certain MIME types), the specification mandates that browsers “preflight” the request, soliciting supported methods from the server with an HTTP OPTIONS request method, and then, upon “approval” from the server, sending the actual request with the actual HTTP request method. Servers can also notify clients whether “credentials” (including Cookies and HTTP Authentication data) should be sent with requests.
The HTTP response headers
- Access-Control-Allow-Origin
- Access-Control-Expose-Headers
- Access-Control-Max-Age
- Access-Control-Allow-Credentials
- Access-Control-Allow-Methods
- Access-Control-Allow-Headers
The HTTP request headers
- Origin
- Access-Control-Request-Method
- Access-Control-Request-Headers
HTTP caching
Compression in HTTP
Compression algorithms categories:
- Loss-less compression, where the compression-uncompression cycle doesn’t alter the data that is recovered. It matches (byte to byte) with the original. For images, gif or png are using loss-less compression.
- Lossy compression, where the cycle alters the original data, in an imperceptible way for the user. Video formats on the Web are lossy and for images, jpeg is.
Some formats can be used for both loss-less or lossy compression, like webp, and usually lossy algorithm can be configured to compress more or less, which then of course leads to less or more quality.
End-to-end compression
For compression, end-to-end compression is where the largest performance improvements of Web sites reside. End-to-end compression refers to a compression of the body of a message that is done by the server and will last unchanged until it reaches the client. Whatever the intermediate nodes are, they leave the body untouched.
compression algorithms:
gzip
br
Hop-by-hop compression
Hop-by-hop compression, though similar to end-to-end compression, differs by one fundamental element: the compression doesn’t happen on the resource in the server, creating a specific representation that is then transmitted, but on the body of the message between any two nodes on the path between the client and the server. Connections between successive intermediate nodes may apply a different compression.
HTTP conditional requests
HTTP has a concept of conditional requests, where the result, and even the success of a request, can be changed by comparing the affected resources with the value of a validator. Such requests can be useful to validate the content of a cache, and sparing a useless control, to verify the integrity of a document, like when resuming a download, or when preventing to lose updates when uploading or modifying a document on the server.
Validators
- the date of last modification of the document, the last-modified date.
- an opaque string, uniquely identifying each version, called the entity tag, or the etag.
Conditional headers
If-Match Succeeds if the ETag of the distant resource is equal to one listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.
If-None-Match Succeeds if the ETag of the distant resource is different to each listed in this header. By default, unless the etag is prefixed with ‘W/’, it performs a strong validation.
If-Modified-Since Succeeds if the Last-Modified date of the distant resource is more recent than the one given in this header.
If-Unmodified-Since Succeeds if the Last-Modified date of the distant resource is older or the same than the one given in this header.
If-Range Similar to If-Match, or If-Unmodified-Since, but can have only one single etag, or one date. If it fails, the range request fails, and instead of a 206 Partial Content response, a 200 OK is sent with the complete resource.
HTTP Content negotiation
In HTTP, content negotiation is the mechanism that is used for serving different representations of a resource at the same URI, so that the user agent can specify which is best suited for the user (for example, which language of a document, which image format, or which content encoding).
HTTP range requests
HTTP range requests allow to send only a portion of an HTTP message from a server to a client. Partial requests are useful for large media or downloading files with pause and resume functions, for example.
HTTP redirections
URL redirection, also known as URL forwarding, is a technique to give a page, a form or a whole Web application, more than one URL address. HTTP provides a special kind of responses, HTTP redirects, to perform this operation used for numerous goals: temporary redirection while site maintenance is ongoing, permanent redirection to keep external links working after a change of the site’s architecture, progress pages when uploading a file, and so on.
There are several types of redirects and they fall into three categories: permanent, temporary and special redirections.
Permanent redirections
- 301
- 308
Temporary redirections
- 302
- 303
- 307
Special redirections
- 300
- 304
其他重定向方式
HTML
<head>
<meta http-equiv="refresh" content="0; URL=http://www.example.com/" />
</head>
JavaScript
window.location = "http://www.example.com/";
使用顺序
- HTTP redirects are always executed first when there is not even a page transmitted, and of course not even read.
- HTML redirects () are executed if there weren’t any HTTP redirects.
- JavaScript redirects are used as the last resort, and only if JavaScript is enabled on the client side.