HTTP caching mechanism
Note
Clever talk! Clever talk! It's a cliché!
What is HTTP cache
When the client initiates a resource request to the server, it will first arrive in the browser cache. If the browser has a copy of the resource to be requested, Then you can extract this resource directly from the browser cache instead of getting this resource from the original server.
The http cache all starts with the second request to the same resource.
- On the first request, the server returns the resource and returns the cache parameters of the resource in the
response header
; - During the second request, the browser will determine whether to use the resource copy cached by the browser or obtain the resource from the server based on these cache parameters.
HTTP Cache Classification
HTTP cache can be divided into two categories according to whether it is necessary to initiate a request to the server:
- Strong cache: Forced cache, no longer initiate resource requests to the server during the cache valid time, and directly use the resource copy cached by the browser.
- Negotiation cache: During the cache valid time, you need to ask the server whether the resource needs to be updated. If it needs to be updated, you will obtain new resources from the server. If no update is required, continue to use the browser cached copy of the resource;
Another cache classification
It can also be classified into private cache and shared cache based on whether the resource can be used by a single user or multiple users.
This is generally for a proxy server, that is, the browser initiates a request -> proxy server -> original server.
Private Cache: In a proxy server, only resource caches are used by a single user. The first request for the same resource initiated by other users still needs to obtain the resource from the original server. And create a new cache resource for the user.
Shared cache: As long as there is a request initiated by one user to reach the proxy server for the first time, after the proxy server caches the resource, other users request the resource on the proxy server. During the cache valid time, the proxy server no longer obtains new resources from the original server and returns a copy of the resource that the proxy service is cached.
Main HTTP Headers
- General header field
Fields | Description |
---|---|
Cache-Control | Control cache behavior |
Pragma | Product of the http1.0 era, disable cache when the value is no-cache |
- Request Headers
Fields | Description |
---|---|
If-Match | Compare whether ETag is consistent |
If-None-Match | Compare whether ETag is inconsistent |
If-Modified-Since | Compare whether the last resource update time is consistent |
If-Unmodified-Since | Compare whether the last update time of the resource is inconsistent |
- Response Headers
Fields | Description |
---|---|
ETag | Resource Matching Information |
- Entity header field
Fields | Description |
---|---|
Expires | Products of the http1.0 era, entity subject expiration time |
Last-Modified | Last Updated Resource |
reminder
The two headers Pragma
and Expires
are contents in http1.0 and are gradually deprecated in http1.1 and later versions.
However, in order to be backward compatible with the browser, most websites still retain the declaration of these two fields in response headers when setting up the caching mechanism.
This article will also explain these two fields, and why Cache-Control
is used instead after http1.1.
reminder
In some technical article sharing, these headers fields are often directly classified into strong cache or negotiated cache. I personally think that this simple and crude division method is debatable. For example, the different values of Cache-Control
will behave as strong cache or negotiated cache according to the value.
Pragma
The Pragma
field has only one optional value of no-cache
, which will tell the client not to cache the resource and should send a resource request to the server every time.
When using the client, it is usually done to add a meta tag to the HTML:
<meta http-equiv="Pragma" content="no-cache" />
Warning
- This tag states that only IE can recognize the meaning, and other mainstream browsers are incompatible.
- In IE browser, although it can recognize the meaning, it does not necessarily add Pragma to the requested Request Header, but it does make the current page initiate a new request every time. (Only for page html files, other resources used in the page will not be affected.)
When the server is configured as a Response Header, the browser reads this field and disables the cache behavior. Subsequent requests to the same resource will re-initiate the request without using the cache.
reminder
Due to the compatibility issues of Pragma
on the browser side, and other fields on the server side can better control the cache behavior, the Pragma field has basically been abandoned and no longer used.
_In addition to some websites for compatibility reasons, this field will also be brought with them. _
Expires
In http1.0, Pragma is used to disable cache, and there is also a field that needs to enable cache and define cache time. Expires is for this purpose.
The value of Expires is a GMT time, such as: Thu Jun 07 2018 14:26:45 GMT
, which tells the browser resource's cache expiration time, if the time has not been exceeded No new resource request is initiated.
On the client, you can use the meta tag to inform the browser of cache time
<meta http-equiv="expires" content="Thu Jun 07 2018 14:26:45 GMT" />
If you want to not leave the cache, a new request will be initiated every page request. You can set content to -1 or 0.
Reminder
Like the Pragma field, this meta tag can be correctly identified by IE. Moreover, this method is only a flag that informs IE of cache time, and this field cannot be found in the Request Header.
If the server sets the Expires field in Response Headers, the resource cache time can be correctly set in any browser;
Description
If you use both Pragma and Expires fields, Pragma will be given better priority and the page will initiate a new request
reminder
Although the Expires field can define the cache valid time, the setting of this time is relative to the local time. If defined on the server side, this time is relative to the server side time. This time is returned to the client, and the client compares the client's local time with the returned server time. This will lead to a situation where when the user changes the client's time, if the cache time defined by Expires exceeds the cache time, the cache will immediately expire.
It is precisely because Expires cannot guarantee that the cache can achieve the expected performance, so it is gradually deprecated.
Cache-Control
Cache-Control
is a header attribute supported starting with http1.1
, whose value describes the behavior of using cache and the validity time of cache.
Cache-Control
can declare this property in Request Headers
when initiating a request (if the resource request is through a proxy server and then to the original server,) Notify the proxy server of how to cache resources and whether to request the latest resources from the original server.
When the Cache-Control
is returned as the Response Headers
property, it notifies the browser how to cache the resource and when it is valid.
The Cache-Control syntax is as follows:
Cache-Control: <cache-directive>
- When as
Request Headers
,cache-directive
supports the following optional values
Field Name | Description |
---|---|
no-cache | Inform (proxy) the server not to use cache directly and require the request to be initiated from the original server |
no-store | No content is saved to cache or temporary Internet files |
max-age=delta-seconds | Inform the server that the client wants to receive a resource whose existence time (age) is not greater than delta-seconds seconds |
max-stale[=delta-seconds] | Inform (proxy) the server that the client is willing to receive a resource that has exceeded the cache time. If delta-seconds is defined, it is delta-seconds seconds. If not, it is exceeded any time |
min-fresh=delta-seconds | Tell (proxy) the server that the client wants to receive a resource that has been updated within delta-seconds seconds |
no-transform | Tell (proxy) the server that the client wants to obtain a resource whose entity data has not been converted (such as compressed) |
only-if-cached | Tell (proxy) the server that the client wants to obtain cached resources (if any) without having to initiate a request to the original server |
- When as
Response Headers
,cache-directive
supports the following optional values
Field Name | Description |
---|---|
public | Indicates that the resource needs to be cached in any case |
private[="file-name"] | Indicates that all or part of the return message (if the field data of file-name is specified) is only open to certain users (share-use specified by the server) for cache use, while other users cannot cache this data |
no-cache | Do not use cache directly, and require a request to be initiated to the server (freshness verification) |
no-store | No content is saved to cache or temporary Internet files |
max-age=delta-seconds | Tell the client that the resource is fresh within delta-seconds seconds and there is no need to make a request to the server |
s-max-age=delta-seconds | Same as max-age, but only applied to shared cache |
no-transform | Inform the client that no changes to the entity data are allowed when caching files |
must-revalidate | The current resource must be sent to the original server to verify the request. If the request fails, it will return 504 (rather than the cache on the proxy server) |
proxy-revalidate | Similar to must-revalidate, but only applies to shared cache |
- You can add the
Cache-Control
field to the request header directly in the<head>
of the HTML page:
<meta http-equiv="Cache-Control" content="no-cache" />
Cache-Control
allows free combination of optional values:
Cache-Control: max-age=3600, must-revalidate
This statement states that the resource must be obtained from the original server and its cache is valid for one hour. In the next hour, users will not need to send a request to revisit the resource.
Cache Verification
The Pragma
, Expires
, and Cache-Control
fields allow the client to decide whether to send a request to the server, cache unexpired cache to obtain resources from the local cache, and cache expired cache to obtain resources from the server.
However, the client sends a request to the server, do you think that it must read and return the entity content of the resource?
If a resource expires on the client's cache time, but the server has not updated the resource, does the server have to return the entity content of the resource again?
If this resource is too large, although the cache has expired, but has not been updated, will it waste bandwidth and time to return the entity content?
For these problems, in fact, as long as you adopt a certain strategy, let the server know that the cache file saved by the client is consistent with the resource file on the server. The client is then notified that the resource can continue to use the cache file and does not need to return the resource entity content. Then the above problems can be solved, while bringing optimization and acceleration to HTTP requests.
http1.1 has added Last-Modified
, ETag
, If-Match
, If-None-Match
, If-Modified-Since
, If-Unmodified-Since` these fields are used to verify cache resources and improve cache reuse rate.
Last-Modified
When the server sends a resource to the client, it will load the entity header in the following format and return it to the client together.
The client will tag the information on the resource. The next time it requests, it will add the information in the request message and send it to the server for inspection. If the client reports the field time value and the last modification time of the corresponding resource on the server, it means that the modified resource has not been modified and the 304 status code is directly returned.
When a client reports Last-Modified, there are two Request Headers fields that can be used:
If-Modified-Since
: The field format is as follows
If-Modified-Since: <Last-Modified-Value>
The field tells the server that if the last modification time reported by the client is consistent with the last modification time on the server, it can directly return 304 and the response header.
Currently, each browser uses this field by default to report the saved Last-Modified value to the server.
If-Unmodified-Since
: The field format is as follows
If-Unmodified-Since: <Last-Modified-Value>
The field tells the server that if the last modification time reported by the client is inconsistent with the last modification time on the server, Then the 412 (Precondition Failed) status code should be returned to the client.
Last-Modified Since it is the last modification time of the resource used to determine whether the resource has been modified, However, in actual situations, there is often a resource that has been modified but the actual content has not changed. Since the resource last modification time has changed, the entire entity content will still be returned to the client, but the content is actually the same as the client cached content.
ETag
To solve the possible inaccuracy of Last-Modified
, http1.1 also introduced the ETag entity header field.
The server will calculate a unique identifier for the resource through some algorithm. When responding to the client, it will add this field to the entity header and return it to the client.
ETag: ETag-Value
The client tags this information on the resource. The next time it requests, it will add the information in the request message and send it to the server for inspection. The server only needs to compare whether the ETag sent from the client and the corresponding ETag of the resource are consistent, and then it can determine whether the resource has been modified relative to the client resource. If the ETag is consistent, then the 304 status code will be returned directly, otherwise the new resource entity content will be returned to the client.
When a client reports an ETag, there are two Request Headers fields that can be used:
If-None-Match
The field format is as follows
If-None-Match: <ETag-Value>
The field tells the server that if the ETag does not match, the new resource entity content needs to be returned, otherwise the 304 status code will be returned directly.
Currently, each browser uses this field by default to report the saved ETag value to the server.
If-Match
The field format is as follows
If-Match: <ETag-Value>
The field tells the server that if the ETag does not match, or the `"*" value is received and the resource entity is currently not present. Then the 412 (Precondition Failed) status code should be returned to the client. Otherwise the server ignores the field directly.
reminder
If Last-Modified
and ETag
are used at the same time, their verification must be passed at the same time before returning 304. If one of them fails, the server will return the resource's entity and 200 status code as usual.
Secondary HTTP Headers
Although the following fields are related to cache, they are not that important.
Vary
Vary
means what benchmark field will the server use to distinguish and filter cache versions. First, consider a problem. The server has a request address. If it is an IE user, it returns the content developed for IE, otherwise it returns the content of another mainstream browser version.
Generally speaking, the server only needs to obtain the requested User-Agent
field for processing. However, if the user requests a proxy server rather than the original server, and if the proxy server directly sends the cached IE version resources to non-IE clients, then there will be a problem.
Vary is the header field used to deal with this type of problem, and only needs to add:
Vary: User-Agent
The field tells the proxy server that the user-Agent request header field needs to distinguish the cache version and determine the version passed to the client.
The Vary field also accepts the form of conditional combinations
Vary: User-Agent, Accept-Encoding
The field tells the proxy server that it is necessary to distinguish the cache version with two request header fields, User-Agent and Accept-Encoding.
Date, Age
The Date field represents the response packet time (GMT time) of the original server sending the resource. The role of this field can help us determine whether the resource hits the original server or the proxy server.
- If the time of
Date
is different from the current time, or if the continuous F5 refreshes find that the Date value has not changed, it means that the current request hits the proxy server's cache. - If the browser re-initiates this request every time each time the page is refreshed, the value of its Date will continue to change, indicating that the resource was returned directly from the original server.
The Age field indicates the time (seconds) of a file in the proxy server. If the file is modified or replaced, Age will accumulate again from 0.
Browser Performance
Strong cache
For strongly cached resources:
When the user first accesses the resource, the server returns 200 status code, as well as the resource entity content.
If the user makes a second or more resource access without closing the browser after the first visit, the browser no longer requests the server. Instead, the resource is retrieved from the browser's memory cache and the status code is marked
200 (memory cache)
If the user has accessed the first time, closes the browser, reopens the browser for the second or more resource accesses, the browser will not request the server.
Instead, the resource is retrieved from the browser's disk cache and the status code is marked
200 (disk cache)
Negotiate cache
When the user first accesses the resource, the server returns 200 status code, as well as the resource entity content.
If the user makes a second visit, cache verification is performed. Or within the cache time, or the resource has not been modified, then return directly to the 304 status code
If the server resource has been updated when the user makes a second access, the status code 200 is returned, and the new resource entity content.