Design

As of 2012-12-21

(Since then we've added support for IPRO that has changed this a lot. Someone needs to go over the code carefully to update this. The main thing is that in step 2 below we no longer decline the request because instead we might have a cached response, and we only decline it if after checking the cache we're sure we don't.)

Most of ngx_pagespeed is glue code, attaching PageSpeed to Nginx. What's attached to what, and how does it work?

The two main places we interact with an Nginx request are in a body filter and a content handler. In the body filter we pass html to PageSpeed for optimizing, while in the content handler we respond to requests for optimized PageSpeed resources (css, images, javascript). An example might be helpful.

Nginx receives a request for http://example.com
```
 GET / HTTP/1.1
 Host: example.com
```
Nginx calls our content handler [1], which declines to handle the request because it's not for an optimized .pagespeed. resource.
Nginx continues trying other content handlers until it finds one that can handle the request. This may be a proxy_pass, fastcgi_pass, a try_files, static file, or anything else that the webmaster might have configured Nginx to use.
Whatever content handler Nginx selects will start streaming a response as a linked list of buffers ("buffer chain").
```
 ngx_chain_t in:
   ngx_buf_t* buf:
     u_char* start
     u_char* end
   ngx_chain_t* next
```
Nginx passes that chain of buffers through all registered body filters, which includes ours. If this were not html being sent, our body filter would immediately pass the buffers on to the next registered body filter.
Our body filter will see one buffer chain at a time, but it might not be the whole file's worth. For static files on disk it usually will be, but perhaps if we're proxying from an upstream that quickly dumps some layout html but takes much longer to generate personalized content. Imagine the contents of the buffer chain on the first call to the body filter are:
```
 <html>
 <script src="navbar.js">
 <link rel="stylesheet" href="site.css">
 <script src="hover.js">
```
We pass this to PageSpeed via a ProxyFetch. While PageSpeed is running in another thread, ProxyFetch handles all the thread-safety complexity here.
We need to give PageSpeed time to optimize this html, and it's running in a different thread, so we're not going to have output ready for Nginx immediately. Instead we create a pipe and tell Nginx to watch the output end. Once PageSpeed has some data ready it will be able to write a byte to the pipe and notify Nginx.
PageSpeed parses this html chunk, identifies the three resources in it, and tells the fetcher (Serf) to retrieve them. This means a loopback fetch, where Serf requests the resources from Nginx over http.
We run a Schedule thread that keeps this optimization under a very tight deadline. If Nginx takes too long to respond with resources or anything else makes us take too long we send the html out with whatever optimizations we have completed so far. Imagine that in this case only site.css is fetched and optimized by the time we hit our rewrite deadline.
PageSpeed writes a byte to the pipe Nginx is watching, which makes Nginx to invoke our code on its main thread (the only thread it knows about). We copy the output bytes from PageSpeed to an Nginx buffer chain and then Nginx sends them out to the user's browser:
```
<html>
<script src="navbar.js">
<link rel="stylesheet" href="site.css.pagespeed.cf.qRa2j71s4H.css">
<script src="hover.js">
```
The rest of the html will go through the same path: as it comes into Nginx it will go to our body filter and then to PageSpeed via the ProxyFetch, then after optimizing or hitting the deadline PageSpeed wakes up Nginx with the pipe, and Nginx sends out the rewritten html.
When the user's browser sees navbar.js and hover.js it will request them from Nginx, and while first our content filter and then our body filter will see each request we won't do anything.
The request for site.css.pagespeed.cf.qRa2j71s4H.css, however, will be answered by our content filter. We pass the request to PageSpeed via ResourceFetch, and PageSpeed will pull the rewritten resource out of cache. (In the unlikely event that the resource is not in cache, there is enough information in the requested filename that PageSpeed can fully reconstruct the optimized resource.) We go through the same output flow with writing a byte to a pipe to notify Nginx that we have data to send out.

[1] As tracked in Issue #102, our content handler is not always called first. This needs to be fixed, and this description assumes we've fixed it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design

Clone this wiki locally