Skip to content
This repository has been archived by the owner on Sep 10, 2024. It is now read-only.

Many bug fixes and misc. improvements #35

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
7f81a9f
Improve some formatting, fix some typos, and add some clearer descrip…
TylerRick Nov 27, 2013
84b786e
Fix: The args passed to phantomjs command were not shell escaped.
TylerRick Nov 20, 2013
4edd2b4
Add cmd back in (but escape the arguments this time) so that we can o…
TylerRick Nov 20, 2013
be3622b
Add a debug configuration option that you can enable if you want to s…
TylerRick Nov 20, 2013
b4779a3
Minor cleanup in middleware_spec.rb
TylerRick Nov 20, 2013
a41ffcc
Add new html_url method. Fix it to work for the case where the reque…
TylerRick Nov 20, 2013
734dd74
Add some debug output to the middleware (only enabled if the debug op…
TylerRick Nov 20, 2013
4f171b6
Extract some code out of the main Middleware class into a new BaseMid…
TylerRick Nov 20, 2013
fb3745a
Extracted out a new rendering_started_at method to make it clearer wh…
TylerRick Nov 27, 2013
8d5438a
Extract out two new methods, pdf_body and pdf_headers, into base clas…
TylerRick Nov 20, 2013
1d18440
Add some tests specifically for rendering_in_progress?, already_rende…
TylerRick Nov 20, 2013
c49fe80
Fix commented-out "should render a pdf file" test
TylerRick Nov 20, 2013
2a7c09d
Spell out test_file as 2 words.
TylerRick Nov 20, 2013
4ab7a88
Actually create an expectation from the value that valid_pdf returns.
TylerRick Nov 20, 2013
000ab9d
Move test_file and valid_pdf to spec_helper.rb so that they can be re…
TylerRick Nov 20, 2013
070b999
Add app, main_app, and middleware_options methods to spec_helper.rb s…
TylerRick Nov 21, 2013
0e0d6db
Checking start_with? is simpler than matching a Regexp.
TylerRick Nov 21, 2013
04c84b7
Set the tmpdir to a random, temporary dir in the tests instead of to …
TylerRick Nov 20, 2013
9719587
Add matching reader methods to Configuration for each option that alr…
TylerRick Nov 20, 2013
f27c16c
Change the default value for the tmpdir option to use mktmpdir (a sub…
TylerRick Nov 21, 2013
da7fb96
Rename fire_phantom to render_pdf to be more consistent with other me…
TylerRick Nov 21, 2013
7e06bf6
Moved the render_pdf method to BaseMiddleware so that it can be reuse…
TylerRick Nov 21, 2013
5219231
Merge the options that were passed in to the middleware with the opti…
TylerRick Nov 21, 2013
926d725
Remove duplication between run and run! methods.
TylerRick Nov 21, 2013
fc48307
Add page_load_error and page_load_status_code methods to Phantom.
TylerRick Nov 21, 2013
050410d
Add full response details to what rasterize.js outputs so that we can…
TylerRick Nov 21, 2013
89d7176
Add a simpler middleware, SynchronousMiddleware, that runs the phanto…
TylerRick Nov 20, 2013
f770cba
Simple change to make the Middleware thread-safe: Only make changes t…
TylerRick Nov 21, 2013
7af5815
Remove a commented-out check for headers['Content-Type'].
TylerRick Nov 27, 2013
9263c91
Parse the response_headers from the response that PhantomJS receives
TylerRick Nov 27, 2013
bb49ec0
Make it possible to control which filename should be used for the dow…
TylerRick Nov 27, 2013
c3f524f
Just render directly to the output file instead of the complication o…
TylerRick Aug 5, 2014
5966ccd
Allow content disposition (inline or attachment) to be specified with…
TylerRick Oct 22, 2014
d93b729
Allow page number in header and footer
Dec 21, 2016
87a97a9
added .to_time for rendering_timed_out?
Dec 26, 2016
6386ca2
Fixing file corrupted when rendering_time is small
Apr 28, 2017
7c1b6b1
Using Thread.new instead of Process::detach fork for render_pdf
May 4, 2017
c3202be
Remove tests for copying file
TylerRick May 4, 2017
690b6ca
Add some tests and an example in the Readme about adding header/footer
TylerRick May 4, 2017
abef084
Merge branch 'master' into synchronous_middleware
TylerRick May 4, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 175 additions & 90 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Shrimp
[![Build Status](https://travis-ci.org/adjust/shrimp.png?branch=master)](https://travis-ci.org/adjust/shrimp)
Creates PDFs from URLs using phantomjs
Creates PDFs from web pages using PhantomJS

Read our [blogpost](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works.
Read our [blog post](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works.

## Installation

Add this line to your application's Gemfile:

gem 'shrimp'
```ruby
gem 'shrimp'
```

And then execute:

Expand All @@ -18,71 +20,93 @@ Or install it yourself as:

$ gem install shrimp

### PhantomJS

### Phantomjs

See http://phantomjs.org/download.html on how to install phantomjs
See http://phantomjs.org/download.html for instructions on how to install PhantomJS.

## Usage

```
```ruby
require 'shrimp'
url = 'http://www.google.com'
options = { :margin => "1cm"}
Shrimp::Phantom.new(url, options).to_pdf("~/output.pdf")
```
## Configuration

```
Here is a list of configuration options that you can set. Unless otherwise noted in comments, the
value shown is the default value.

Many of these options correspond to a property of the [WebPage module]
(https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) in PhantomJS. Refer to that
[documentation](https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) for more information
about what those options do.

```ruby
Shrimp.configure do |config|

# The path to the phantomjs executable
# defaults to `where phantomjs`
# config.phantomjs = '/usr/local/bin/phantomjs'
# The path to the phantomjs executable. Defaults to the path returned by `which phantomjs`.
config.phantomjs = '/usr/local/bin/phantomjs'

# the default pdf output format
# e.g. "5in*7.5in", "10cm*20cm", "A4", "Letter"
# config.format = 'A4'
# The paper size/format to use for the generated PDF file. Examples: "5in*7.5in", "10cm*20cm",
# "A4", "Letter". (See https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#papersize-object
# for a list of valid options.)
config.format = 'A4'

# the default margin
# config.margin = '1cm'
# The page margin to use (part of paperSize in PhantomJS)
config.margin = '1cm'

# the zoom factor
# config.zoom = 1
# The zoom factor (zoomFactor in PhantomJS)
config.zoom = 1

# the page orientation 'portrait' or 'landscape'
# config.orientation = 'portrait'
# The page orientation ('portrait' or 'landscape') (part of paperSize in PhantomJS)
config.orientation = 'portrait'

# a temporary dir used to store tempfiles
# config.tmpdir = Dir.tmpdir
# The directory where temporary files are stored, including the generated PDF files.
config.tmpdir = Dir.mktmpdir('shrimp'),

# the default rendering time in ms
# increase if you need to render very complex pages
# config.rendering_time = 1000
# How long to wait (in ms) for PhantomJS to load the web page before saving it to a file.
# Increase this if you need to render very complex pages.
config.rendering_time = 1_000

# change the viewport size. If you rendering pages that have
# flexible page width and height then you may need to set this
# to enforce a specific size
# config.viewport_width = 600
# config.viewport_height = 600
# The timeout for the phantomjs rendering process (in ms). This needs always to be higher than
# rendering_time. If this timeout expires before the job completes, it will cause PhantomJS to
# abort and exit with an error.
config.rendering_timeout = 90_000

# the timeout for the phantomjs rendering process in ms
# this needs always to be higher than rendering_time
# config.rendering_timeout = 90000
# Change the viewport size. If you are rendering a page that adapts its layout based on the
# page width and height then you may need to set this to enforce a specific size. (viewportSize
# in PhantomJS)
config.viewport_width = 600
config.viewport_height = 600

# maximum number of redirects to follow
# by default Shrimp does not follow any redirects which means that
# if the server responds with non HTTP 200 an error will be returned
# Maximum number of redirects to follow
# By default Shrimp does not follow any redirects, which means that if the server responds with
# something other than HTTP 200 (for example, 302), an error will be returned. Setting this > 0
# causes it to follow that many redirects and only raise an error if the number of redirects exceeds
# this.
# config.max_redirect_count = 0

# the path to a json configuration file for command-line options
# config.command_config_file = "#{Rails.root.join('config', 'shrimp', 'config.json')}"
# The path to a json configuration file containing command-line options to be used by PhantomJS.
# Refer to https://github.com/ariya/phantomjs/wiki/API-Reference for a list of valid options.
# The default options are listed in the Readme. To use your own file from
# config/shrimp/config.json in Rails app, you could do this:
config.command_config_file = Rails.root.join('config/shrimp/config.json')

# Enable if you want to see details such as the phantomjs command line that it's about to execute.
config.debug = false
end
```

### Command Configuration
### Default PhantomJS Command-line Options

```
These are the PhantomJS options that will be used by default unless you set the
`config.command_config_file` option.

See the PhantomJS [API-Reference](https://github.com/ariya/phantomjs/wiki/API-Reference) for a
complete list of valid options.

```js
{
"diskCache": false,
"ignoreSslErrors": false,
Expand All @@ -94,98 +118,159 @@ end

## Middleware

Shrimp comes with a middleware that allows users to get a PDF view of any page on your site by appending .pdf to the URL.
Shrimp comes with a middleware that allows users to generate a PDF file of any page on your site
simply by appending .pdf to the URL.

For example, if your site is [example.com](http://example.com) and you go to
http://example.com/report.pdf, the middleware will detect that a PDF is being requested and will
automatically convert the web page at http://example.com/report into a PDF and send that PDF as the
response.

If you only want to allow this for some pages but not all of them, see below for how to add
conditions.

### Middleware Setup

**Non-Rails Rack apps**

# in config.ru
require 'shrimp'
use Shrimp::Middleware
```ruby
# in config.ru
require 'shrimp'
use Shrimp::Middleware
```

**Rails apps**

# in application.rb(Rails3) or environment.rb(Rails2)
require 'shrimp'
config.middleware.use Shrimp::Middleware
```ruby
# in application.rb or an initializer (Rails 3) or environment.rb (Rails 2)
require 'shrimp'
config.middleware.use Shrimp::Middleware
```

**With Shrimp options**

# options will be passed to Shrimp::Phantom.new
config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter'

**With conditions to limit routes that can be generated in pdf**
```ruby
# Options will be passed to Shrimp::Phantom.new
config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter'
```

# conditions can be regexps (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public]
config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]]
**With conditions to limit which paths can be requested in PDF format**

# conditions can be strings (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :only => '/public'
config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public']
```ruby
# conditions can be regexps (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public]
config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]]

# conditions can be regexps (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]]
# conditions can be strings (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :only => '/public'
config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public']

# conditions can be strings (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :except => ['/secret']
# conditions can be regexps (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]]

# conditions can be strings (either one or an array)
config.middleware.use Shrimp::Middleware, {}, :except => ['/secret']
```

### Polling

To avoid deadlocks Shrimp::Middleware renders the pdf in a separate process retuning a 503 Retry-After response Header.
you can setup the polling interval and the polling offset in seconds.
To avoid tying up the web server while waiting for the PDF to be rendered (which could create a
deadlock) Shrimp::Middleware starts PDF generation in the background in a separate thread and
returns a 503 (Service Unavailable) response immediately.

It also adds a [Retry-After](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) response
header, which tells the user's browser that the requested PDF resource is not available yet, but
will be soon, and instructs the browser to try again after a few seconds. When the same page is
requested again in a few seconds, it will again return a 503 if the PDF is still in the process of
being generated. This process will repeat until eventually the rendering has completed, at which
point the middleware returns a 200 (OK) response with the PDF itself.

You can adjust both the `polling_offset` (how long to wait before the first retry; default is 1
second) and the `polling_interval` (how long in seconds to wait between retries; default is 1
second). Example:

config.middleware.use Shrimp::Middleware, :polling_interval => 1, :polling_offset => 5
```ruby
config.middleware.use Shrimp::Middleware, :polling_offset => 5, :polling_interval => 1
```

### Caching

To avoid rendering the page on each request you can setup some the cache ttl in seconds
To improve performance and avoid having to re-generate the PDF file each time you request a PDF
resource, the existing PDF (that was generated the *first* time a certain URL was requested) will be
reused and sent again immediately if it already exists (for the same requested URL) and was
generated within the TTL.

The default TTL is 1 second, but can be overridden by passing a different `cache_ttl` (in seconds)
to the middleware:

```ruby
config.middleware.use Shrimp::Middleware, :cache_ttl => 3600, :out_path => "my/pdf/store"
```

To disable this caching entirely and force it to re-generate the PDF again each time a request comes
in, set `cache_ttl` to 0.

### Header/Footer

You can specify a header or footer callback, which can even include page numbers. Example:

```html
<head>
<script type="text/javascript">
var PhantomJSPrinting = {
header: {
height: "1cm",
contents: function(pageNum, numPages) { return "Page " + pageNum + "/" + numPages; }
},
footer: {
height: "1cm",
contents: function(pageNum, numPages) { return "Page " + pageNum + "/" + numPages; }
}
};
</script>
</head>
```

### Ajax requests

To include some fancy Ajax stuff with jquery
Here's an example of how to initiate an Ajax request for a PDF resource (using jQuery) and keep
polling the server until it either finishes successfully or returns with a 504 error code.

```js

var url = '/my_page.pdf'
var statusCodes = {
200: function() {
return window.location.assign(url);
},
504: function() {
console.log("Shit's being wired")
},
503: function(jqXHR, textStatus, errorThrown) {
var wait;
wait = parseInt(jqXHR.getResponseHeader('Retry-After'));
return setTimeout(function() {
return $.ajax({
url: url,
statusCode: statusCodes
});
}, wait * 1000);
}
var url = '/my_page.pdf'
var statusCodes = {
200: function() {
return window.location.assign(url);
},
504: function() {
console.log("Sorry, the request timed out.")
},
503: function(jqXHR, textStatus, errorThrown) {
var wait;
wait = parseInt(jqXHR.getResponseHeader('Retry-After'));
return setTimeout(function() {
return $.ajax({
url: url,
statusCode: statusCodes
});
}, wait * 1000);
}
}
$.ajax({
url: url,
statusCode: statusCodes
})

```

## Contributing

1. Fork it
1. Fork this repository
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request
5. Create a pull request (`git pull-request` if you've installed [hub](https://github.com/github/hub))

## Copyright
Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed under the terms
specified in the LICENSE file.

Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed
under the terms specified in the LICENSE file.
1 change: 1 addition & 0 deletions lib/shrimp.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
require 'shrimp/source'
require 'shrimp/phantom'
require 'shrimp/middleware'
require 'shrimp/synchronous_middleware'
require 'shrimp/configuration'
Loading