Introduction

Web cache poisoning is an advanced attack vector that forces a web cache to serve malicious content to unsuspecting users visiting the vulnerable site. Web caches are frequently employed in web application deployments for performance enhancement, as they reduce the load on the web server. By exploiting web cache poisoning, an attacker can deliver malicious content to a vast number of users. This technique can also be used to amplify the severity of existing vulnerabilities in web applications. For example, it can deliver reflected XSS payloads to all users visiting the website, thereby eliminating the need for individual user interaction typically required for exploiting reflected XSS vulnerabilities.

Commonly used web caches include Apache, Nginx, and Squid.

Inner Workings of Web Caches

Benefits of Caching

When providing a web service used by a large number of users, scalability is paramount. Increased user activity directly correlates with increased load on the web server. To mitigate web server load and distribute users among redundant web servers, Content Delivery Networks (CDNs) and reverse proxies can be utilized.

Web caches are an integral part of this performance-enhancing infrastructure. Situated between the client and the server, they serve content from their local storage rather than fetching it from the origin web server. If a client requests a resource not present in the web cache, the resource is requested from the web server. The web cache then stores this resource locally, enabling it to respond to future requests for that resource without re-querying the web server. Typically, web caches store resources for a limited time to allow changes to propagate once the cache has been refreshed.

How do Web Caches work?

As previously discussed, web caches store resources to reduce the load on the web server. These resources can be static, such as stylesheets or script files, or dynamic responses generated by the web server based on user-supplied data (e.g., search queries). To effectively serve cached resources, the web cache must differentiate between requests to determine whether two requests can be served the same cached response, or if a fresh response needs to be fetched from the web server.

Simply comparing requests byte-by-byte to determine if they should receive the same response is highly inefficient. Different browsers send various headers that do not directly influence the response, such as the User-Agent header. Furthermore, web browsers commonly populate the Referer header to inform the web server from where a resource has been requested; however, in most cases, this also does not directly influence the response.

To circumvent these issues, web caches utilize a subset of all request parameters to decide whether two requests should be served the same response. This subset is known as the Cache Key. In most default configurations, this includes the request path, GET parameters, and the Host header. However, cache keys can be configured individually to include or exclude any HTTP parameters or headers, optimizing them for a specific web application.

Let's examine an example. Assuming the web cache is configured to use the default cache key (request path, GET parameters, and Host header), the following two requests would receive the same response:

GET /index.html?language=en HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.125 Safari/537.36
Accept: text/html

The second request has the same cache key and thus receives the same response. The requests differ in the User-Agent header due to different operating systems and also contain a different Accept header. These differences do not affect the cached response.

However, if a third user requests the same page in a different language via the language GET parameter, the cache key changes (since GET parameters are part of the cache key), resulting in the web cache serving a different response. This is the intended behavior, as otherwise all users would receive the same cached response in a single language, rendering the language parameter useless:

We distinguish between keyed parameters and unkeyed parameters. All parameters that are part of the cache key are called keyed, while all other parameters are unkeyed. For instance, in the above example, the User-Agent and Accept headers are unkeyed.

Web Cache Configuration

To conclude this section, let's examine a sample Nginx config file that configures a simple web cache:

Here is a brief explanation of the parameters; for a more detailed overview, refer to the Nginx documentation:

  • proxy_cache_path: Sets general parameters of the cache, such as the storage location.

  • proxy_pass: Sets the location of the web server.

  • proxy_buffering: Enables caching.

  • proxy_cache: Sets the name of the cache (as defined in proxy_cache_path).

  • proxy_cache_valid: Sets the time after which the cache expires.

  • proxy_cache_key: Defines the cache key.

  • add_header: Adds the X-Cache-Status header to responses to indicate whether the response was cached.

Now, if we request the same resource twice, we can see that the first response is not cached (X-Cache-Status: MISS), while the second response is served from the cache (X-Cache-Status: HIT).

The cache key can be configured to include only certain GET parameters by modifying the configuration:

With this configuration, only the language parameter is keyed, while all other GET parameters are unkeyed. This means that two requests to /index.html?language=de&timestamp=1 and /index.html?language=de&timestamp=2 would be served the same response from the cache. This is useful if not all parameters influence the response itself, such as a timestamp.

Last updated