From 7f81a9f4419bea9be4e944101769b9df5de8233e Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 27 Nov 2013 14:15:24 -0800 Subject: [PATCH 01/39] Improve some formatting, fix some typos, and add some clearer descriptions to the Readme file. --- README.md | 233 ++++++++++++++++++++++++--------------- lib/shrimp/middleware.rb | 6 +- lib/shrimp/rasterize.js | 2 +- 3 files changed, 151 insertions(+), 90 deletions(-) diff --git a/README.md b/README.md index 1f1f622..0bcfcfd 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,16 @@ # Shrimp [![Build Status](https://travis-ci.org/adeven/shrimp.png?branch=master)](https://travis-ci.org/adeven/shrimp) -Creates PDFs from URLs using phantomjs +Creates PDFs from web pages using PhantomJS -Read our [blogpost](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works. +Read our [blog post](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works. ## Installation Add this line to your application's Gemfile: - gem 'shrimp' +```ruby +gem 'shrimp' +``` And then execute: @@ -18,14 +20,13 @@ Or install it yourself as: $ gem install shrimp +### PhantomJS -### Phantomjs - - See http://phantomjs.org/download.html on how to install phantomjs +See http://phantomjs.org/download.html for instructions on how to install PhantomJS. ## Usage -``` +```ruby require 'shrimp' url = 'http://www.google.com' options = { :margin => "1cm"} @@ -33,51 +34,69 @@ Shrimp::Phantom.new(url, options).to_pdf("~/output.pdf") ``` ## Configuration -``` +Here is a list of configuration options that you can set. Unless otherwise noted in comments, the +value shown is the default value. + +Many of these options correspond to a property of the [WebPage module] +(https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) in PhantomJS. Refer to that +[documentation](https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) for more information +about what those options do. + +```ruby Shrimp.configure do |config| - # The path to the phantomjs executable - # defaults to `where phantomjs` - # config.phantomjs = '/usr/local/bin/phantomjs' + # The path to the phantomjs executable. Defaults to the path returned by `which phantomjs`. + config.phantomjs = '/usr/local/bin/phantomjs' - # the default pdf output format - # e.g. "5in*7.5in", "10cm*20cm", "A4", "Letter" - # config.format = 'A4' + # The paper size/format to use for the generated PDF file. Examples: "5in*7.5in", "10cm*20cm", + # "A4", "Letter". (See https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#papersize-object + # for a list of valid options.) + config.format = 'A4' - # the default margin - # config.margin = '1cm' + # The page margin to use (part of paperSize in PhantomJS) + config.margin = '1cm' - # the zoom factor - # config.zoom = 1 + # The zoom factor (zoomFactor in PhantomJS) + config.zoom = 1 - # the page orientation 'portrait' or 'landscape' - # config.orientation = 'portrait' + # The page orientation ('portrait' or 'landscape') (part of paperSize in PhantomJS) + config.orientation = 'portrait' - # a temporary dir used to store tempfiles - # config.tmpdir = Dir.tmpdir + # The directory where temporary files are stored, including the generated PDF files. + config.tmpdir = Dir.tmpdir - # the default rendering time in ms - # increase if you need to render very complex pages - # config.rendering_time = 1000 + # How long to wait (in ms) for PhantomJS to load the web page before saving it to a file. + # Increase this if you need to render very complex pages. + config.rendering_time = 1_000 - # change the viewport size. If you rendering pages that have - # flexible page width and height then you may need to set this - # to enforce a specific size - # config.viewport_width = 600 - # config.viewport_height = 600 + # The timeout for the phantomjs rendering process (in ms). This needs always to be higher than + # rendering_time. If this timeout expires before the job completes, it will cause PhantomJS to + # abort and exit with an error. + config.rendering_timeout = 90_000 - # the timeout for the phantomjs rendering process in ms - # this needs always to be higher than rendering_time - # config.rendering_timeout = 90000 + # Change the viewport size. If you are rendering a page that adapts its layout based on the + # page width and height then you may need to set this to enforce a specific size. (viewportSize + # in PhantomJS) + config.viewport_width = 600 + config.viewport_height = 600 - # the path to a json configuration file for command-line options - # config.command_config_file = "#{Rails.root.join('config', 'shrimp', 'config.json')}" + # The path to a json configuration file containing command-line options to be used by PhantomJS. + # Refer to https://github.com/ariya/phantomjs/wiki/API-Reference for a list of valid options. + # The default options are listed in the Readme. To use your own file from + # config/shrimp/config.json in Rails app, you could do this: + config.command_config_file = Rails.root.join('config/shrimp/config.json') end ``` -### Command Configuration +### Default PhantomJS Command-line Options -``` +These are the PhantomJS options that will be used by default unless you set the +`config.command_config_file` option. + +See the PhantomJS [API-Reference](https://github.com/ariya/phantomjs/wiki/API-Reference) for a +complete list of valid options. + +```js { "diskCache": false, "ignoreSslErrors": false, @@ -89,82 +108,123 @@ end ## Middleware -Shrimp comes with a middleware that allows users to get a PDF view of any page on your site by appending .pdf to the URL. +Shrimp comes with a middleware that allows users to generate a PDF file of any page on your site +simply by appending .pdf to the URL. + +For example, if your site is [example.com](http://example.com) and you go to +http://example.com/report.pdf, the middleware will detect that a PDF is being requested and will +automatically convert the web page at http://example.com/report into a PDF and send that PDF as the +response. + +If you only want to allow this for some pages but not all of them, see below for how to add +conditions. ### Middleware Setup **Non-Rails Rack apps** - # in config.ru - require 'shrimp' - use Shrimp::Middleware +```ruby +# in config.ru +require 'shrimp' +use Shrimp::Middleware +``` **Rails apps** - # in application.rb(Rails3) or environment.rb(Rails2) - require 'shrimp' - config.middleware.use Shrimp::Middleware +```ruby +# in application.rb or an initializer (Rails 3) or environment.rb (Rails 2) +require 'shrimp' +config.middleware.use Shrimp::Middleware +``` **With Shrimp options** - # options will be passed to Shrimp::Phantom.new - config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter' - -**With conditions to limit routes that can be generated in pdf** +```ruby +# Options will be passed to Shrimp::Phantom.new +config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter' +``` - # conditions can be regexps (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public] - config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]] +**With conditions to limit which paths can be requested in PDF format** - # conditions can be strings (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :only => '/public' - config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public'] +```ruby +# conditions can be regexps (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public] +config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]] - # conditions can be regexps (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]] +# conditions can be strings (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :only => '/public' +config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public'] - # conditions can be strings (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :except => ['/secret'] +# conditions can be regexps (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]] +# conditions can be strings (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :except => ['/secret'] +``` ### Polling -To avoid deadlocks Shrimp::Middleware renders the pdf in a separate process retuning a 503 Retry-After response Header. -you can setup the polling interval and the polling offset in seconds. +To avoid typing up the web server while waiting for the PDF to be rendered (which could create a +deadlock) Shrimp::Middleware starts PDF generation in the background in a separate process and +returns a 503 (Service Unavailable) response immediately. + +It also adds a [Retry-After](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) response +header, which tells the user's browser that the requested PDF resource is not available yet, but +will be soon, and instructs the browser to try again after a few seconds. When the same page is +requested again in a few seconds, it will again return a 503 if the PDF is still in the process of +being generated. This process will repeat until eventually the rendering has completed, at which +point the middleware returns a 200 (OK) response with the PDF itself. - config.middleware.use Shrimp::Middleware, :polling_interval => 1, :polling_offset => 5 +You can adjust both the `polling_offset` (how long to wait before the first retry; default is 1 +second) and the `polling_interval` (how long in seconds to wait between retries; default is 1 +second). Example: + +```ruby + config.middleware.use Shrimp::Middleware, :polling_offset => 5, :polling_interval => 1 +``` ### Caching -To avoid rendering the page on each request you can setup some the cache ttl in seconds +To improve performance and avoid having to re-generate the PDF file each time you request a PDF +resource, the existing PDF (that was generated the *first* time a certain URL was requested) will be +reused and sent again immediately if it already exists (for the same requested URL) and was +generated within the TTL. + +The default TTL is 1 second, but can be overridden by passing a different `cache_ttl` (in seconds) +to the middleware: +```ruby config.middleware.use Shrimp::Middleware, :cache_ttl => 3600, :out_path => "my/pdf/store" +``` + +To disable this caching entirely and force it to re-generate the PDF again each time a request comes +in, set `cache_ttl` to 0. ### Ajax requests -To include some fancy Ajax stuff with jquery +Here's an example of how to initiate an Ajax request for a PDF resource (using jQuery) and keep +polling the server until it either finishes successfully or returns with a 504 error code. ```js - - var url = '/my_page.pdf' - var statusCodes = { - 200: function() { - return window.location.assign(url); - }, - 504: function() { - console.log("Shit's beeing wired") - }, - 503: function(jqXHR, textStatus, errorThrown) { - var wait; - wait = parseInt(jqXHR.getResponseHeader('Retry-After')); - return setTimeout(function() { - return $.ajax({ - url: url, - statusCode: statusCodes - }); - }, wait * 1000); - } + var url = '/my_page.pdf' + var statusCodes = { + 200: function() { + return window.location.assign(url); + }, + 504: function() { + console.log("Sorry, the request timed out.") + }, + 503: function(jqXHR, textStatus, errorThrown) { + var wait; + wait = parseInt(jqXHR.getResponseHeader('Retry-After')); + return setTimeout(function() { + return $.ajax({ + url: url, + statusCode: statusCodes + }); + }, wait * 1000); + } } $.ajax({ url: url, @@ -175,12 +235,13 @@ To include some fancy Ajax stuff with jquery ## Contributing -1. Fork it +1. Fork this repository 2. Create your feature branch (`git checkout -b my-new-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin my-new-feature`) -5. Create new Pull Request +5. Create a pull request (`git pull-request` if you've installed [hub](https://github.com/github/hub)) ## Copyright -Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed under the terms -specified in the LICENSE file. + +Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed +under the terms specified in the LICENSE file. diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 96e8050..69178de 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -128,7 +128,7 @@ def reload_response(interval=1) -

Preparing pdf...

+

Preparing PDF file. Please wait...

HTML @@ -146,7 +146,7 @@ def ready_response - PDF ready here + PDF file ready here HTML @@ -162,7 +162,7 @@ def error_response -

Sorry request timed out...

+

Sorry, the request timed out.

HTML diff --git a/lib/shrimp/rasterize.js b/lib/shrimp/rasterize.js index 586abf9..f833e42 100644 --- a/lib/shrimp/rasterize.js +++ b/lib/shrimp/rasterize.js @@ -12,7 +12,7 @@ var page = require('webpage').create(), address, output, size, statusCode; window.setTimeout(function () { - console.log("Shit's being weird no result within: " + time_out + "ms"); + console.log("No result within " + time_out + "ms. Aborting PhantomJS."); phantom.exit(1); }, time_out); From 84b786ea78db92533ab201b7e51e1f7b21728c4d Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Tue, 19 Nov 2013 18:50:44 -0800 Subject: [PATCH 02/39] Fix: The args passed to phantomjs command were not shell escaped. This caused problems if, for example, the URL contained a '&' character, since that has special meaning to the shell. Changed it to pass an array to IO.popen instead of using `#{cmd}`. --- lib/shrimp/phantom.rb | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index 45f23dd..896a7be 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -31,7 +31,8 @@ class Phantom # Returns the stdout output of phantomjs def run @error = nil - @result = `#{cmd}` + puts "Running command: #{cmd_array.join(' ')}" + @result = IO.popen(cmd_array).read unless $?.exitstatus == 0 @error = @result @result = nil @@ -51,14 +52,14 @@ def run! end # Public: Returns the phantom rasterize command - def cmd + def cmd_array cookie_file = dump_cookies format, zoom, margin, orientation = options[:format], options[:zoom], options[:margin], options[:orientation] rendering_time, timeout = options[:rendering_time], options[:rendering_timeout] viewport_width, viewport_height = options[:viewport_width], options[:viewport_height] @outfile ||= "#{options[:tmpdir]}/#{Digest::MD5.hexdigest((Time.now.to_i + rand(9001)).to_s)}.pdf" command_config_file = "--config=#{options[:command_config_file]}" - [Shrimp.configuration.phantomjs, command_config_file, SCRIPT_FILE, @source.to_s, @outfile, format, zoom, margin, orientation, cookie_file, rendering_time, timeout, viewport_width, viewport_height ].join(" ") + [Shrimp.configuration.phantomjs, command_config_file, SCRIPT_FILE, @source.to_s, @outfile, format, zoom, margin, orientation, cookie_file, rendering_time, timeout, viewport_width, viewport_height ].map(&:to_s) end # Public: initializes a new Phantom Object From 4edd2b4d2ea8c7e667435b1bf641c20d4354d1af Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 09:01:54 -0800 Subject: [PATCH 03/39] Add cmd back in (but escape the arguments this time) so that we can output the command exactly as it will be run for easier command-line debugging. --- lib/shrimp/phantom.rb | 13 ++++++++++--- spec/shrimp/phantom_spec.rb | 22 +++++++++++++++++----- 2 files changed, 27 insertions(+), 8 deletions(-) diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index 896a7be..8f8103d 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -1,5 +1,7 @@ require 'uri' require 'json' +require 'shellwords' + module Shrimp class NoExecutableError < StandardError def initialize @@ -31,8 +33,8 @@ class Phantom # Returns the stdout output of phantomjs def run @error = nil - puts "Running command: #{cmd_array.join(' ')}" - @result = IO.popen(cmd_array).read + #puts "Running command: #{cmd}" + @result = `#{cmd}` unless $?.exitstatus == 0 @error = @result @result = nil @@ -51,7 +53,12 @@ def run! @result end - # Public: Returns the phantom rasterize command + # Public: Returns the arguments for the PhantomJS rasterize command as a shell-escaped string + def cmd + Shellwords.join cmd_array + end + + # Public: Returns the arguments for the PhantomJS rasterize command as an array def cmd_array cookie_file = dump_cookies format, zoom, margin, orientation = options[:format], options[:zoom], options[:margin], options[:orientation] diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index d79a971..730b3d2 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -47,11 +47,23 @@ def testfile phantom.source.should be_url end - it "should parse options into a cmd line" do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") - phantom.cmd.should include "test.pdf A4 1 2cm portrait" - phantom.cmd.should include "file://#{testfile}" - phantom.cmd.should include "lib/shrimp/rasterize.js" + describe '#cmd' do + it "should generate the correct cmd" do + phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + phantom.cmd.should include "test.pdf A4 1 2cm portrait" + phantom.cmd.should include "file://#{testfile}" + phantom.cmd.should include "lib/shrimp/rasterize.js" + end + + it "cmd should escape the args" do + phantom = Shrimp::Phantom.new("http://example.com/?something") + phantom.cmd_array.should include "http://example.com/?something" + phantom.cmd. should include "http://example.com/\\?something" + + phantom = Shrimp::Phantom.new("http://example.com/path/file.html?width=100&height=100") + phantom.cmd_array.should include "http://example.com/path/file.html?width=100&height=100" + phantom.cmd. should include "http://example.com/path/file.html\\?width\\=100\\&height\\=100" + end end context "rendering to a file" do From be3622b4924bb5621e2f4ad3e33b795d91f47e53 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 09:21:30 -0800 Subject: [PATCH 04/39] Add a debug configuration option that you can enable if you want to see details such as the phantomjs command line that it's about to execute. --- README.md | 4 ++++ lib/shrimp/configuration.rb | 5 +++-- lib/shrimp/phantom.rb | 2 +- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0bcfcfd..3eb42f7 100644 --- a/README.md +++ b/README.md @@ -85,6 +85,10 @@ Shrimp.configure do |config| # The default options are listed in the Readme. To use your own file from # config/shrimp/config.json in Rails app, you could do this: config.command_config_file = Rails.root.join('config/shrimp/config.json') + + # Enable if you want to see details such as the phantomjs command line that it's about to execute. + config.debug = false + end ``` diff --git a/lib/shrimp/configuration.rb b/lib/shrimp/configuration.rb index 58193e9..b3107aa 100644 --- a/lib/shrimp/configuration.rb +++ b/lib/shrimp/configuration.rb @@ -4,7 +4,7 @@ class Configuration attr_accessor :default_options attr_writer :phantomjs - [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height].each do |m| + [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :debug].each do |m| define_method("#{m}=") do |val| @default_options[m]=val end @@ -21,7 +21,8 @@ def initialize :rendering_time => 1000, :command_config_file => File.expand_path('../config.json', __FILE__), :viewport_width => 600, - :viewport_height => 600 + :viewport_height => 600, + :debug => false } end diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index 8f8103d..8aa7335 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -33,7 +33,7 @@ class Phantom # Returns the stdout output of phantomjs def run @error = nil - #puts "Running command: #{cmd}" + puts "Running command: #{cmd}" if options[:debug] @result = `#{cmd}` unless $?.exitstatus == 0 @error = @result From b4779a3a12c8e84f716bfd5806f7f745aba60c4e Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 09:30:18 -0800 Subject: [PATCH 05/39] Minor cleanup in middleware_spec.rb --- spec/shrimp/middleware_spec.rb | 43 ++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/spec/shrimp/middleware_spec.rb b/spec/shrimp/middleware_spec.rb index 48e720f..9d643a9 100644 --- a/spec/shrimp/middleware_spec.rb +++ b/spec/shrimp/middleware_spec.rb @@ -1,13 +1,16 @@ require 'spec_helper' -def app; +def app Rack::Lint.new(@app) end def options - { :margin => "1cm", :out_path => Dir.tmpdir, - :polling_offset => 10, :polling_interval => 1, :cache_ttl => 3600, - :request_timeout => 1 } + { :margin => "1cm", + :out_path => Dir.tmpdir, + :polling_offset => 10, + :polling_interval => 1, + :cache_ttl => 3600, + :request_timeout => 1 } end def mock_app(options = { }, conditions = { }) @@ -27,8 +30,9 @@ def mock_app(options = { }, conditions = { }) context "matching pdf" do it "should render as pdf" do get '/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end + it "should return 503 the first time" do get '/test.pdf' last_response.status.should eq 503 @@ -42,7 +46,7 @@ def mock_app(options = { }, conditions = { }) last_response.header["Retry-After"].should eq "1" end - it "should set render to to outpath" do + it "should set render_to to out_path" do get '/test.pdf' @middleware.send(:render_to).should match (Regexp.new("^#{options[:out_path]}")) end @@ -64,42 +68,41 @@ def mock_app(options = { }, conditions = { }) it "should return a pdf with 200 after rendering" do mock_file = double(File, :read => "Hello World", :close => true, :mtime => Time.now) - File.should_receive(:'exists?').and_return true - File.should_receive(:'size').and_return 1000 - File.should_receive(:'open').and_return mock_file - File.should_receive(:'new').and_return mock_file + File.should_receive(:exists?).and_return true + File.should_receive(:size).and_return 1000 + File.should_receive(:open).and_return mock_file + File.should_receive(:new).and_return mock_file get '/test.pdf' last_response.status.should eq 200 last_response.body.should eq "Hello World" end - - end + context "not matching pdf" do it "should skip pdf rendering" do get 'http://www.example.org/test' last_response.body.should include "Hello world!" - @middleware.send(:'render_as_pdf?').should be false + @middleware.send(:render_as_pdf?).should be false end end end -describe "Conditions" do +describe Shrimp::Middleware, "Conditions" do context "only" do before { mock_app(options, :only => [%r[^/invoice], %r[^/public]]) } it "render pdf for set only option" do get '/invoice/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "render pdf for set only option" do get '/public/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "not render pdf for any other path" do get '/secret/test.pdf' - @middleware.send(:'render_as_pdf?').should be false + @middleware.send(:render_as_pdf?).should be false end end @@ -107,17 +110,17 @@ def mock_app(options = { }, conditions = { }) before { mock_app(options, :except => %w(/secret)) } it "render pdf for set only option" do get '/invoice/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "render pdf for set only option" do get '/public/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "not render pdf for any other path" do get '/secret/test.pdf' - @middleware.send(:'render_as_pdf?').should be false + @middleware.send(:render_as_pdf?).should be false end end end From a41ffcc8cb4b2ddc1d5829fb959356b49f625287 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 09:37:21 -0800 Subject: [PATCH 06/39] Add new html_url method. Fix it to work for the case where the request URL contains a query string. --- lib/shrimp/middleware.rb | 7 ++++++- spec/shrimp/middleware_spec.rb | 11 +++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 69178de..6393329 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -50,11 +50,16 @@ def call(env) end end + # The URL for the HTML-formatted web page that we are converting into a PDF. + def html_url + @request.url.sub(%r<\.pdf(\?|$)>, '\1') + end + private # Private: start phantom rendering in a separate process def fire_phantom - Process::detach fork { Phantom.new(@request.url.sub(%r{\.pdf$}, ''), @options, @request.cookies).to_pdf(render_to) } + Process::detach fork { Phantom.new(html_url, @options, @request.cookies).to_pdf(render_to) } end def render_to diff --git a/spec/shrimp/middleware_spec.rb b/spec/shrimp/middleware_spec.rb index 9d643a9..ec6e489 100644 --- a/spec/shrimp/middleware_spec.rb +++ b/spec/shrimp/middleware_spec.rb @@ -26,6 +26,7 @@ def mock_app(options = { }, conditions = { }) describe Shrimp::Middleware do before { mock_app(options) } + subject { @middleware } context "matching pdf" do it "should render as pdf" do @@ -76,6 +77,16 @@ def mock_app(options = { }, conditions = { }) last_response.status.should eq 200 last_response.body.should eq "Hello World" end + + describe "requesting a simple path" do + before { get '/test.pdf' } + its(:html_url) { should eq 'http://example.org/test' } + end + + describe "requesting a path with a query string" do + before { get '/test.pdf?size=10' } + its(:html_url) { should eq 'http://example.org/test?size=10' } + end end context "not matching pdf" do From 734dd74cc1b87e99bb38e84b3209f0c1bdae095c Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 09:40:30 -0800 Subject: [PATCH 07/39] Add some debug output to the middleware (only enabled if the debug option is set to true) --- lib/shrimp/middleware.rb | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 6393329..1ec79bc 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -59,6 +59,7 @@ def html_url # Private: start phantom rendering in a separate process def fire_phantom + puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) if Shrimp.configuration.default_options[:debug] Process::detach fork { Phantom.new(html_url, @options, @request.cookies).to_pdf(render_to) } end From 4f171b683e4285668f713ab64ee62b0832a0d8be Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 10:10:35 -0800 Subject: [PATCH 08/39] Extract some code out of the main Middleware class into a new BaseMiddleware class. --- lib/shrimp/base_middleware.rb | 63 +++++++++++++++++++ lib/shrimp/middleware.rb | 110 ++++++++++------------------------ 2 files changed, 95 insertions(+), 78 deletions(-) create mode 100644 lib/shrimp/base_middleware.rb diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb new file mode 100644 index 0000000..df43399 --- /dev/null +++ b/lib/shrimp/base_middleware.rb @@ -0,0 +1,63 @@ +module Shrimp + class BaseMiddleware + def initialize(app, options = { }, conditions = { }) + @app = app + @options = options + @conditions = conditions + end + + def render_as_pdf? + request_path_is_pdf = !!@request.path.match(%r{\.pdf$}) + + if request_path_is_pdf && @conditions[:only] + rules = [@conditions[:only]].flatten + rules.any? do |pattern| + if pattern.is_a?(Regexp) + @request.path =~ pattern + else + @request.path[0, pattern.length] == pattern + end + end + elsif request_path_is_pdf && @conditions[:except] + rules = [@conditions[:except]].flatten + rules.map do |pattern| + if pattern.is_a?(Regexp) + return false if @request.path =~ pattern + else + return false if @request.path[0, pattern.length] == pattern + end + end + return true + else + request_path_is_pdf + end + end + + def call(env) + @request = Rack::Request.new(env) + if render_as_pdf? #&& headers['Content-Type'] =~ /text\/html|application\/xhtml\+xml/ + render_as_pdf(env) + else + @app.call(env) + end + end + + # The URL for the HTML-formatted web page that we are converting into a PDF. + def html_url + @request.url.sub(%r<\.pdf(\?|$)>, '\1') + end + + private + + def render_to + file_name = Digest::MD5.hexdigest(@request.path) + ".pdf" + file_path = @options[:out_path] + "#{file_path}/#{file_name}" + end + + def concat(accepts, type) + (accepts || '').split(',').unshift(type).compact.join(',') + end + + end +end diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 1ec79bc..01bae04 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -1,60 +1,50 @@ +require 'shrimp/base_middleware' + module Shrimp - class Middleware + class Middleware < BaseMiddleware def initialize(app, options = { }, conditions = { }) - @app = app - @options = options - @conditions = conditions + super @options[:polling_interval] ||= 1 @options[:polling_offset] ||= 1 @options[:cache_ttl] ||= 1 @options[:request_timeout] ||= @options[:polling_interval] * 10 end - def call(env) - @request = Rack::Request.new(env) - if render_as_pdf? #&& headers['Content-Type'] =~ /text\/html|application\/xhtml\+xml/ - if already_rendered? && (up_to_date?(@options[:cache_ttl]) || @options[:cache_ttl] == 0) - if File.size(render_to) == 0 - File.delete(render_to) - remove_rendering_flag - return error_response - end - return ready_response if env['HTTP_X_REQUESTED_WITH'] - file = File.open(render_to, "rb") - body = file.read - file.close - File.delete(render_to) if @options[:cache_ttl] == 0 + def render_as_pdf(env) + if already_rendered? && (up_to_date?(@options[:cache_ttl]) || @options[:cache_ttl] == 0) + if File.size(render_to) == 0 + File.delete(render_to) remove_rendering_flag - response = [body] - headers = { } - headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s - headers["Content-Type"] = "application/pdf" - [200, headers, response] - else - if rendering_in_progress? - if rendering_timed_out? - remove_rendering_flag - error_response - else - reload_response(@options[:polling_interval]) - end + return error_response + end + return ready_response if env['HTTP_X_REQUESTED_WITH'] + file = File.open(render_to, "rb") + body = file.read + file.close + File.delete(render_to) if @options[:cache_ttl] == 0 + remove_rendering_flag + response = [body] + headers = { } + headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s + headers["Content-Type"] = "application/pdf" + [200, headers, response] + else + if rendering_in_progress? + if rendering_timed_out? + remove_rendering_flag + error_response else - File.delete(render_to) if already_rendered? - set_rendering_flag - fire_phantom - reload_response(@options[:polling_offset]) + reload_response(@options[:polling_interval]) end + else + File.delete(render_to) if already_rendered? + set_rendering_flag + fire_phantom + reload_response(@options[:polling_offset]) end - else - @app.call(env) end end - # The URL for the HTML-formatted web page that we are converting into a PDF. - def html_url - @request.url.sub(%r<\.pdf(\?|$)>, '\1') - end - private # Private: start phantom rendering in a separate process @@ -63,12 +53,6 @@ def fire_phantom Process::detach fork { Phantom.new(html_url, @options, @request.cookies).to_pdf(render_to) } end - def render_to - file_name = Digest::MD5.hexdigest(@request.path) + ".pdf" - file_path = @options[:out_path] - "#{file_path}/#{file_name}" - end - def already_rendered? File.exists?(render_to) end @@ -97,36 +81,6 @@ def rendering_in_progress? @request.session["phantom-rendering"][render_to] end - def render_as_pdf? - request_path_is_pdf = !!@request.path.match(%r{\.pdf$}) - - if request_path_is_pdf && @conditions[:only] - rules = [@conditions[:only]].flatten - rules.any? do |pattern| - if pattern.is_a?(Regexp) - @request.path =~ pattern - else - @request.path[0, pattern.length] == pattern - end - end - elsif request_path_is_pdf && @conditions[:except] - rules = [@conditions[:except]].flatten - rules.map do |pattern| - if pattern.is_a?(Regexp) - return false if @request.path =~ pattern - else - return false if @request.path[0, pattern.length] == pattern - end - end - return true - else - request_path_is_pdf - end - end - - def concat(accepts, type) - (accepts || '').split(',').unshift(type).compact.join(',') - end def reload_response(interval=1) body = <<-HTML.gsub(/[ \n]+/, ' ').strip From fb3745aa6f530a140e54efd14d3a7e21e4db40cb Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 27 Nov 2013 11:22:03 -0800 Subject: [PATCH 09/39] Extracted out a new rendering_started_at method to make it clearer what the "rendering_flag" actually means. --- lib/shrimp/middleware.rb | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 01bae04..95286ae 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -72,8 +72,12 @@ def set_rendering_flag @request.session["phantom-rendering"][render_to] = Time.now end + def rendering_started_at + @request.session["phantom-rendering"][render_to] + end + def rendering_timed_out? - Time.now - @request.session["phantom-rendering"][render_to] > @options[:request_timeout] + Time.now - rendering_started_at > @options[:request_timeout] end def rendering_in_progress? From 8d5438a9710fe893aac20f646c71d350f25035dc Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 10:48:47 -0800 Subject: [PATCH 10/39] Extract out two new methods, pdf_body and pdf_headers, into base class to simplify render_as_pdf! --- lib/shrimp/base_middleware.rb | 14 ++++++++++++++ lib/shrimp/middleware.rb | 11 +++-------- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index df43399..98749bc 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -49,6 +49,20 @@ def html_url private + def pdf_body + file = File.open(render_to, "rb") + body = file.read + file.close + body + end + + def pdf_headers(body) + { }.tap do |headers| + headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s + headers["Content-Type"] = "application/pdf" + end + end + def render_to file_name = Digest::MD5.hexdigest(@request.path) + ".pdf" file_path = @options[:out_path] diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 95286ae..fd57e30 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -18,16 +18,11 @@ def render_as_pdf(env) return error_response end return ready_response if env['HTTP_X_REQUESTED_WITH'] - file = File.open(render_to, "rb") - body = file.read - file.close + body = pdf_body() File.delete(render_to) if @options[:cache_ttl] == 0 remove_rendering_flag - response = [body] - headers = { } - headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s - headers["Content-Type"] = "application/pdf" - [200, headers, response] + headers = pdf_headers(body) + [200, headers, [body]] else if rendering_in_progress? if rendering_timed_out? From 1d184409d3e5deccdb9b46665d667f6ca7977b5c Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 11:26:41 -0800 Subject: [PATCH 11/39] Add some tests specifically for rendering_in_progress?, already_rendered?, and up_to_date? Add a test for the Content-Type header. Change rendering_in_progress? to return false when it would have returned nil. --- lib/shrimp/middleware.rb | 4 ++-- spec/shrimp/middleware_spec.rb | 28 +++++++++++++++++++--------- 2 files changed, 21 insertions(+), 11 deletions(-) diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index fd57e30..e7b754e 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -76,8 +76,8 @@ def rendering_timed_out? end def rendering_in_progress? - @request.session["phantom-rendering"]||={ } - @request.session["phantom-rendering"][render_to] + @request.session["phantom-rendering"] ||={ } + !!@request.session["phantom-rendering"][render_to] end diff --git a/spec/shrimp/middleware_spec.rb b/spec/shrimp/middleware_spec.rb index ec6e489..a332397 100644 --- a/spec/shrimp/middleware_spec.rb +++ b/spec/shrimp/middleware_spec.rb @@ -67,15 +67,25 @@ def mock_app(options = { }, conditions = { }) last_response.status.should eq 503 end - it "should return a pdf with 200 after rendering" do - mock_file = double(File, :read => "Hello World", :close => true, :mtime => Time.now) - File.should_receive(:exists?).and_return true - File.should_receive(:size).and_return 1000 - File.should_receive(:open).and_return mock_file - File.should_receive(:new).and_return mock_file - get '/test.pdf' - last_response.status.should eq 200 - last_response.body.should eq "Hello World" + describe "when already_rendered? and up_to_date?" do + before { + mock_file = double(File, :read => "Hello World", :close => true, :mtime => Time.now) + File.should_receive(:exists?).at_least(:once).and_return true + File.should_receive(:size).and_return 1000 + File.should_receive(:open).and_return mock_file + File.should_receive(:new).at_least(:once).and_return mock_file + get '/test.pdf' + } + + its(:rendering_in_progress?) { should eq false } + its(:already_rendered?) { should eq true } + its(:up_to_date?) { should eq true } + + it "should return a pdf with 200" do + last_response.status.should eq 200 + last_response.headers['Content-Type'].should eq 'application/pdf' + last_response.body.should eq "Hello World" + end end describe "requesting a simple path" do From c49fe80299959c4551014f79f7ad02ba7cbc2d1e Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 12:13:48 -0800 Subject: [PATCH 12/39] Fix commented-out "should render a pdf file" test --- spec/shrimp/phantom_spec.rb | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 730b3d2..504c2ab 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -33,8 +33,9 @@ def testfile end it "should render a pdf file" do - #phantom = Shrimp::Phantom.new("file://#{@path}") - #phantom.to_pdf("#{Dir.tmpdir}/test.pdf").first should eq "#{Dir.tmpdir}/test.pdf" + phantom = Shrimp::Phantom.new("file://#{testfile}") + phantom.to_pdf("#{Dir.tmpdir}/test.pdf").should eq "#{Dir.tmpdir}/test.pdf" + phantom.result.should start_with "rendered to: #{Dir.tmpdir}/test.pdf" end it "should accept a local file url" do From 2a7c09df67e3efde0ff0c1f81bc0b48bf20bc2f3 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 12:16:07 -0800 Subject: [PATCH 13/39] Spell out test_file as 2 words. --- spec/shrimp/phantom_spec.rb | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 504c2ab..5466871 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -10,7 +10,7 @@ def valid_pdf(io) end end -def testfile +def test_file File.expand_path('../test_file.html', __FILE__) end @@ -26,20 +26,20 @@ def testfile end it "should initialize attributes" do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") - phantom.source.to_s.should eq "file://#{testfile}" + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + phantom.source.to_s.should eq "file://#{test_file}" phantom.options[:margin].should eq "2cm" phantom.outfile.should eq "#{Dir.tmpdir}/test.pdf" end it "should render a pdf file" do - phantom = Shrimp::Phantom.new("file://#{testfile}") + phantom = Shrimp::Phantom.new("file://#{test_file}") phantom.to_pdf("#{Dir.tmpdir}/test.pdf").should eq "#{Dir.tmpdir}/test.pdf" phantom.result.should start_with "rendered to: #{Dir.tmpdir}/test.pdf" end it "should accept a local file url" do - phantom = Shrimp::Phantom.new("file://#{testfile}") + phantom = Shrimp::Phantom.new("file://#{test_file}") phantom.source.should be_url end @@ -50,9 +50,9 @@ def testfile describe '#cmd' do it "should generate the correct cmd" do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") phantom.cmd.should include "test.pdf A4 1 2cm portrait" - phantom.cmd.should include "file://#{testfile}" + phantom.cmd.should include "file://#{test_file}" phantom.cmd.should include "lib/shrimp/rasterize.js" end @@ -69,7 +69,7 @@ def testfile context "rendering to a file" do before(:all) do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") @result = phantom.to_file end @@ -84,7 +84,7 @@ def testfile context "rendering to a pdf" do before(:all) do - @phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }) + @phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }) @result = @phantom.to_pdf("#{Dir.tmpdir}/test.pdf") end @@ -100,7 +100,7 @@ def testfile context "rendering to a String" do before(:all) do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }) + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }) @result = phantom.to_string("#{Dir.tmpdir}/test.pdf") end @@ -127,7 +127,7 @@ def testfile end it "should be unable to copy file" do - phantom = Shrimp::Phantom.new("file://#{testfile}") + phantom = Shrimp::Phantom.new("file://#{test_file}") phantom.to_pdf("/foo/bar/") phantom.error.should include "Unable to copy file " end @@ -141,7 +141,7 @@ def testfile end it "should be unable to copy file" do - phantom = Shrimp::Phantom.new("file://#{testfile}") + phantom = Shrimp::Phantom.new("file://#{test_file}") expect { phantom.to_pdf!("/foo/bar/") }.to raise_error Shrimp::RenderingError end end From 4ab7a88e1044db416597088fc6dcf74d76c65bbe Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 12:18:04 -0800 Subject: [PATCH 14/39] Actually create an expectation from the value that valid_pdf returns. --- spec/shrimp/phantom_spec.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 5466871..d2fdbd5 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -78,7 +78,7 @@ def test_file end it "should be a valid pdf" do - valid_pdf(@result) + valid_pdf(@result).should eq true end end @@ -94,7 +94,7 @@ def test_file end it "should be a valid pdf" do - valid_pdf(@result) + valid_pdf(@result).should eq true end end @@ -109,7 +109,7 @@ def test_file end it "should be a valid pdf" do - valid_pdf(@result) + valid_pdf(@result).should eq true end end From 000ab9d7a01ce9961356c4a679db26b8b9b38fb0 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 12:29:27 -0800 Subject: [PATCH 15/39] Move test_file and valid_pdf to spec_helper.rb so that they can be reused from various test files. --- spec/shrimp/phantom_spec.rb | 19 +++---------------- spec/spec_helper.rb | 12 ++++++++++++ 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index d2fdbd5..5979420 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -1,19 +1,6 @@ #encoding: UTF-8 require 'spec_helper' -def valid_pdf(io) - case io - when File - io.read[0...4] == "%PDF" - when String - io[0...4] == "%PDF" || File.open(io).read[0...4] == "%PDF" - end -end - -def test_file - File.expand_path('../test_file.html', __FILE__) -end - Shrimp.configure do |config| config.rendering_time = 1000 end @@ -78,7 +65,7 @@ def test_file end it "should be a valid pdf" do - valid_pdf(@result).should eq true + valid_pdf?(@result).should eq true end end @@ -94,7 +81,7 @@ def test_file end it "should be a valid pdf" do - valid_pdf(@result).should eq true + valid_pdf?(@result).should eq true end end @@ -109,7 +96,7 @@ def test_file end it "should be a valid pdf" do - valid_pdf(@result).should eq true + valid_pdf?(@result).should eq true end end diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 154a571..3770b68 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -5,3 +5,15 @@ include Rack::Test::Methods end +def test_file + File.expand_path('../shrimp/test_file.html', __FILE__) +end + +def valid_pdf?(io) + case io + when File + io.read[0...4] == "%PDF" + when String + io[0...4] == "%PDF" || File.open(io).read[0...4] == "%PDF" + end +end From 070b99914790bb48446b674e8775b9b771f8a633 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 16:20:03 -0800 Subject: [PATCH 16/39] Add app, main_app, and middleware_options methods to spec_helper.rb so that they can be reused from various test files. Enclose the definition of mock_app within a shared_context block so that we can have different mock_app methods for different middleware test files. --- spec/shrimp/middleware_spec.rb | 39 ++++++++++++---------------------- spec/spec_helper.rb | 23 ++++++++++++++++++++ 2 files changed, 36 insertions(+), 26 deletions(-) diff --git a/spec/shrimp/middleware_spec.rb b/spec/shrimp/middleware_spec.rb index a332397..dd7e143 100644 --- a/spec/shrimp/middleware_spec.rb +++ b/spec/shrimp/middleware_spec.rb @@ -1,31 +1,16 @@ require 'spec_helper' -def app - Rack::Lint.new(@app) -end - -def options - { :margin => "1cm", - :out_path => Dir.tmpdir, - :polling_offset => 10, - :polling_interval => 1, - :cache_ttl => 3600, - :request_timeout => 1 } -end - -def mock_app(options = { }, conditions = { }) - main_app = lambda { |env| - headers = { 'Content-Type' => "text/html" } - [200, headers, ['Hello world!']] - } - - @middleware = Shrimp::Middleware.new(main_app, options, conditions) - @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session') +shared_context Shrimp::Middleware do + def mock_app(options = { }, conditions = { }) + @middleware = Shrimp::Middleware.new(main_app, options, conditions) + @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session') + end end - describe Shrimp::Middleware do - before { mock_app(options) } + include_context Shrimp::Middleware + + before { mock_app(middleware_options) } subject { @middleware } context "matching pdf" do @@ -49,7 +34,7 @@ def mock_app(options = { }, conditions = { }) it "should set render_to to out_path" do get '/test.pdf' - @middleware.send(:render_to).should match (Regexp.new("^#{options[:out_path]}")) + @middleware.send(:render_to).should match (Regexp.new("^#{middleware_options[:out_path]}")) end it "should return 504 on timeout" do @@ -109,8 +94,10 @@ def mock_app(options = { }, conditions = { }) end describe Shrimp::Middleware, "Conditions" do + include_context Shrimp::Middleware + context "only" do - before { mock_app(options, :only => [%r[^/invoice], %r[^/public]]) } + before { mock_app(middleware_options, :only => [%r[^/invoice], %r[^/public]]) } it "render pdf for set only option" do get '/invoice/test.pdf' @middleware.send(:render_as_pdf?).should be true @@ -128,7 +115,7 @@ def mock_app(options = { }, conditions = { }) end context "except" do - before { mock_app(options, :except => %w(/secret)) } + before { mock_app(middleware_options, :except => %w(/secret)) } it "render pdf for set only option" do get '/invoice/test.pdf' @middleware.send(:render_as_pdf?).should be true diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 3770b68..1746b7f 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -17,3 +17,26 @@ def valid_pdf?(io) io[0...4] == "%PDF" || File.open(io).read[0...4] == "%PDF" end end + +# Used by rack-test when we call get +def app + Rack::Lint.new(@app) +end + +def main_app + lambda { |env| + headers = { 'Content-Type' => "text/html" } + [200, headers, ['Hello world!']] + } +end + +def middleware_options + { + :margin => "1cm", + :out_path => Dir.tmpdir, + :polling_offset => 10, + :polling_interval => 1, + :cache_ttl => 3600, + :request_timeout => 1 + } +end From 0e0d6dbb68b0c607c0655ed19e5a481a5d3fa1c1 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 16:53:09 -0800 Subject: [PATCH 17/39] Checking start_with? is simpler than matching a Regexp. --- spec/shrimp/middleware_spec.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/shrimp/middleware_spec.rb b/spec/shrimp/middleware_spec.rb index dd7e143..2f4dfb8 100644 --- a/spec/shrimp/middleware_spec.rb +++ b/spec/shrimp/middleware_spec.rb @@ -34,7 +34,7 @@ def mock_app(options = { }, conditions = { }) it "should set render_to to out_path" do get '/test.pdf' - @middleware.send(:render_to).should match (Regexp.new("^#{middleware_options[:out_path]}")) + @middleware.send(:render_to).should start_with middleware_options[:out_path] end it "should return 504 on timeout" do From 04c84b769cc63f3a870207bfddb16c7040112868 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 15:42:11 -0800 Subject: [PATCH 18/39] Set the tmpdir to a random, temporary dir in the tests instead of to Dir.tmpdir (which is a fixed path like '/tmp'). This prevents intermittent test failures like this one that are caused by the file already existing because it was leftover from the previous test run: 1) Shrimp::Middleware matching pdf should return 503 the first time Failure/Error: last_response.status.should eq 503 expected: 503 got: 200 --- spec/shrimp/phantom_spec.rb | 18 +++++++++--------- spec/spec_helper.rb | 10 +++++++++- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 5979420..d4cd99d 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -13,16 +13,16 @@ end it "should initialize attributes" do - phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{tmpdir}/test.pdf") phantom.source.to_s.should eq "file://#{test_file}" phantom.options[:margin].should eq "2cm" - phantom.outfile.should eq "#{Dir.tmpdir}/test.pdf" + phantom.outfile.should eq "#{tmpdir}/test.pdf" end it "should render a pdf file" do phantom = Shrimp::Phantom.new("file://#{test_file}") - phantom.to_pdf("#{Dir.tmpdir}/test.pdf").should eq "#{Dir.tmpdir}/test.pdf" - phantom.result.should start_with "rendered to: #{Dir.tmpdir}/test.pdf" + phantom.to_pdf("#{tmpdir}/test.pdf").should eq "#{tmpdir}/test.pdf" + phantom.result.should start_with "rendered to: #{tmpdir}/test.pdf" end it "should accept a local file url" do @@ -37,7 +37,7 @@ describe '#cmd' do it "should generate the correct cmd" do - phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{tmpdir}/test.pdf") phantom.cmd.should include "test.pdf A4 1 2cm portrait" phantom.cmd.should include "file://#{test_file}" phantom.cmd.should include "lib/shrimp/rasterize.js" @@ -56,7 +56,7 @@ context "rendering to a file" do before(:all) do - phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{tmpdir}/test.pdf") @result = phantom.to_file end @@ -72,12 +72,12 @@ context "rendering to a pdf" do before(:all) do @phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }) - @result = @phantom.to_pdf("#{Dir.tmpdir}/test.pdf") + @result = @phantom.to_pdf("#{tmpdir}/test.pdf") end it "should return a path to pdf" do @result.should be_a String - @result.should eq "#{Dir.tmpdir}/test.pdf" + @result.should eq "#{tmpdir}/test.pdf" end it "should be a valid pdf" do @@ -88,7 +88,7 @@ context "rendering to a String" do before(:all) do phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }) - @result = phantom.to_string("#{Dir.tmpdir}/test.pdf") + @result = phantom.to_string("#{tmpdir}/test.pdf") end it "should return the File IO String" do diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 1746b7f..f435890 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -5,6 +5,14 @@ include Rack::Test::Methods end +Shrimp.configure do |config| + config.tmpdir = Dir.mktmpdir('shrimp') +end + +def tmpdir + Shrimp.configuration.default_options[:tmpdir] +end + def test_file File.expand_path('../shrimp/test_file.html', __FILE__) end @@ -33,7 +41,7 @@ def main_app def middleware_options { :margin => "1cm", - :out_path => Dir.tmpdir, + :out_path => tmpdir, :polling_offset => 10, :polling_interval => 1, :cache_ttl => 3600, From 9719587b7e48cef7caa6baa54d3b6c1d4afb6500 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 15:55:17 -0800 Subject: [PATCH 19/39] Add matching reader methods to Configuration for each option that already has a writer method. Rename the misleading variable name @default_options to just @options, because the writer methods (such as margin=) actually *modify* this hash, so they are not just the default values but rather the current values. Replace attr_accessor :default_options with a read-only to_h method for when you need to convert the Configuration option to a Hash. Add a Shrimp.config alias for Shrimp.configuration. --- lib/shrimp/configuration.rb | 58 ++++++++++++++++++++----------------- lib/shrimp/middleware.rb | 2 +- lib/shrimp/phantom.rb | 8 ++--- spec/spec_helper.rb | 2 +- 4 files changed, 38 insertions(+), 32 deletions(-) diff --git a/lib/shrimp/configuration.rb b/lib/shrimp/configuration.rb index b3107aa..a96f681 100644 --- a/lib/shrimp/configuration.rb +++ b/lib/shrimp/configuration.rb @@ -1,38 +1,51 @@ require 'tmpdir' module Shrimp class Configuration - attr_accessor :default_options - attr_writer :phantomjs + def initialize + @options = { + :format => 'A4', + :margin => '1cm', + :zoom => 1, + :orientation => 'portrait', + :tmpdir => Dir.tmpdir, + :rendering_timeout => 90000, + :rendering_time => 1000, + :command_config_file => File.expand_path('../config.json', __FILE__), + :viewport_width => 600, + :viewport_height => 600, + :debug => false + } + end + + def to_h + @options + end [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :debug].each do |m| define_method("#{m}=") do |val| - @default_options[m]=val + @options[m] = val end - end - def initialize - @default_options = { - :format => 'A4', - :margin => '1cm', - :zoom => 1, - :orientation => 'portrait', - :tmpdir => Dir.tmpdir, - :rendering_timeout => 90000, - :rendering_time => 1000, - :command_config_file => File.expand_path('../config.json', __FILE__), - :viewport_width => 600, - :viewport_height => 600, - :debug => false - } + define_method("#{m}") do + @options[m] + end end def phantomjs @phantomjs ||= (defined?(Bundler::GemfileError) ? `bundle exec which phantomjs` : `which phantomjs`).chomp end + attr_writer :phantomjs end class << self - attr_accessor :configuration + def configuration + @configuration ||= Configuration.new + end + alias_method :config, :configuration + + def configure + yield(configuration) + end end # Configure Phantomjs someplace sensible, @@ -44,11 +57,4 @@ class << self # config.format = 'Letter' # end - def self.configuration - @configuration ||= Configuration.new - end - - def self.configure - yield(configuration) - end end diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index e7b754e..d3ea143 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -44,7 +44,7 @@ def render_as_pdf(env) # Private: start phantom rendering in a separate process def fire_phantom - puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) if Shrimp.configuration.default_options[:debug] + puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) if Shrimp.config.debug Process::detach fork { Phantom.new(html_url, @options, @request.cookies).to_pdf(render_to) } end diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index 8aa7335..1b88850 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -5,7 +5,7 @@ module Shrimp class NoExecutableError < StandardError def initialize - msg = "No phantomjs executable found at #{Shrimp.configuration.phantomjs}\n" + msg = "No phantomjs executable found at #{Shrimp.config.phantomjs}\n" msg << ">> Please install phantomjs - http://phantomjs.org/download.html" super(msg) end @@ -66,7 +66,7 @@ def cmd_array viewport_width, viewport_height = options[:viewport_width], options[:viewport_height] @outfile ||= "#{options[:tmpdir]}/#{Digest::MD5.hexdigest((Time.now.to_i + rand(9001)).to_s)}.pdf" command_config_file = "--config=#{options[:command_config_file]}" - [Shrimp.configuration.phantomjs, command_config_file, SCRIPT_FILE, @source.to_s, @outfile, format, zoom, margin, orientation, cookie_file, rendering_time, timeout, viewport_width, viewport_height ].map(&:to_s) + [Shrimp.config.phantomjs, command_config_file, SCRIPT_FILE, @source.to_s, @outfile, format, zoom, margin, orientation, cookie_file, rendering_time, timeout, viewport_width, viewport_height ].map(&:to_s) end # Public: initializes a new Phantom Object @@ -83,10 +83,10 @@ def cmd_array # Returns self def initialize(url_or_file, options = { }, cookies={ }, outfile = nil) @source = Source.new(url_or_file) - @options = Shrimp.configuration.default_options.merge(options) + @options = Shrimp.config.to_h.merge(options) @cookies = cookies @outfile = File.expand_path(outfile) if outfile - raise NoExecutableError.new unless File.exists?(Shrimp.configuration.phantomjs) + raise NoExecutableError.new unless File.exists?(Shrimp.config.phantomjs) end # Public: renders to pdf diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index f435890..d7322f9 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -10,7 +10,7 @@ end def tmpdir - Shrimp.configuration.default_options[:tmpdir] + Shrimp.config.tmpdir end def test_file From f27c16cc11ede9e483e4d41d7df709dfb325b03f Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 22:05:39 -0800 Subject: [PATCH 20/39] Change the default value for the tmpdir option to use mktmpdir (a subdirectory within /tmp like /tmp/shrimp20131120-28880-1bll5ur) instead of just saving everything directly under /tmp (Dir.tmpdir). This way you can easily list or delete all shrimp-generated temp files with something like: rm -r /tmp/shrimp* --- README.md | 2 +- lib/shrimp/configuration.rb | 2 +- spec/spec_helper.rb | 1 - 3 files changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 3eb42f7..6a4064a 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ Shrimp.configure do |config| config.orientation = 'portrait' # The directory where temporary files are stored, including the generated PDF files. - config.tmpdir = Dir.tmpdir + config.tmpdir = Dir.mktmpdir('shrimp'), # How long to wait (in ms) for PhantomJS to load the web page before saving it to a file. # Increase this if you need to render very complex pages. diff --git a/lib/shrimp/configuration.rb b/lib/shrimp/configuration.rb index a96f681..36efd57 100644 --- a/lib/shrimp/configuration.rb +++ b/lib/shrimp/configuration.rb @@ -7,7 +7,7 @@ def initialize :margin => '1cm', :zoom => 1, :orientation => 'portrait', - :tmpdir => Dir.tmpdir, + :tmpdir => Dir.mktmpdir('shrimp'), :rendering_timeout => 90000, :rendering_time => 1000, :command_config_file => File.expand_path('../config.json', __FILE__), diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index d7322f9..73f63de 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -6,7 +6,6 @@ end Shrimp.configure do |config| - config.tmpdir = Dir.mktmpdir('shrimp') end def tmpdir From da7fb9609e484fb4a661a3d260dc681ee5bda8d0 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 16:36:00 -0800 Subject: [PATCH 21/39] Rename fire_phantom to render_pdf to be more consistent with other methods like render_as_pdf and already_rendered? that use the verb "render". --- lib/shrimp/middleware.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index d3ea143..03de42d 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -34,7 +34,7 @@ def render_as_pdf(env) else File.delete(render_to) if already_rendered? set_rendering_flag - fire_phantom + render_pdf reload_response(@options[:polling_offset]) end end @@ -43,7 +43,7 @@ def render_as_pdf(env) private # Private: start phantom rendering in a separate process - def fire_phantom + def render_pdf puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) if Shrimp.config.debug Process::detach fork { Phantom.new(html_url, @options, @request.cookies).to_pdf(render_to) } end From 7e06bf6353df309459f65f545ea141f35922c122 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 20:22:14 -0800 Subject: [PATCH 22/39] Moved the render_pdf method to BaseMiddleware so that it can be reused by both the current middleware and other middlewares. Added log_render_pdf_start and log_render_pdf_completion, which give useful debugging information if config.debug is true. --- lib/shrimp/base_middleware.rb | 24 ++++++++++++++++++++++++ lib/shrimp/middleware.rb | 12 +++++------- 2 files changed, 29 insertions(+), 7 deletions(-) diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index 98749bc..148e41b 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -49,6 +49,30 @@ def html_url private + def render_pdf + log_render_pdf_start + Phantom.new(html_url, @options, @request.cookies).tap do |phantom| + @phantom = phantom + phantom.to_pdf(render_to) + log_render_pdf_completion + end + end + + def log_render_pdf_start + return unless Shrimp.config.debug + puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) + end + + def log_render_pdf_completion + return unless Shrimp.config.debug + puts "#{self.class}: Finished converting web page at #{(html_url).inspect} into a PDF" + if @phantom.error? + puts "#{self.class}: Error: #{@phantom.error}" + else + puts "#{self.class}: Saved PDF to #{render_to}" + end + end + def pdf_body file = File.open(render_to, "rb") body = file.read diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 03de42d..fc7b101 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -34,7 +34,11 @@ def render_as_pdf(env) else File.delete(render_to) if already_rendered? set_rendering_flag - render_pdf + # Start PhantomJS rendering in a separate process and then immediately render a web page + # that continuously reloads (polls) until the rendering is complete. + Process::detach fork { + render_pdf + } reload_response(@options[:polling_offset]) end end @@ -42,12 +46,6 @@ def render_as_pdf(env) private - # Private: start phantom rendering in a separate process - def render_pdf - puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) if Shrimp.config.debug - Process::detach fork { Phantom.new(html_url, @options, @request.cookies).to_pdf(render_to) } - end - def already_rendered? File.exists?(render_to) end From 521923180db25ee08b83da20fd0a0e28ace5f593 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 20:40:54 -0800 Subject: [PATCH 23/39] Merge the options that were passed in to the middleware with the options configured in Shrimp.config, so that you can override an option from Shrimp.config by passing in a different value to the middleware. --- lib/shrimp/base_middleware.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index 148e41b..5999b67 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -2,7 +2,7 @@ module Shrimp class BaseMiddleware def initialize(app, options = { }, conditions = { }) @app = app - @options = options + @options = Shrimp.config.to_h.merge(options) @conditions = conditions end @@ -59,12 +59,12 @@ def render_pdf end def log_render_pdf_start - return unless Shrimp.config.debug + return unless @options[:debug] puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) end def log_render_pdf_completion - return unless Shrimp.config.debug + return unless @options[:debug] puts "#{self.class}: Finished converting web page at #{(html_url).inspect} into a PDF" if @phantom.error? puts "#{self.class}: Error: #{@phantom.error}" From 926d72531f67559c17eabe970c6bc2c346c026f9 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 17:06:34 -0800 Subject: [PATCH 24/39] Remove duplication between run and run! methods. Change run! to simply call run and then check the @error status afterwards. --- lib/shrimp/phantom.rb | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index 1b88850..b8a1c4b 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -30,7 +30,7 @@ class Phantom # Public: Runs the phantomjs binary # - # Returns the stdout output of phantomjs + # Returns the stdout output from phantomjs def run @error = nil puts "Running command: #{cmd}" if options[:debug] @@ -43,14 +43,9 @@ def run end def run! - @error = nil - @result = `#{cmd}` - unless $?.exitstatus == 0 - @error = @result - @result = nil - raise RenderingError.new(@error) - end - @result + run.tap { + raise RenderingError.new(@error) if @error + } end # Public: Returns the arguments for the PhantomJS rasterize command as a shell-escaped string From fc4830777cfd2d13cee337003651a00de8221215 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 17:42:05 -0800 Subject: [PATCH 25/39] Add page_load_error and page_load_status_code methods to Phantom. --- lib/shrimp/phantom.rb | 23 ++++++++++++++++++++++- spec/shrimp/phantom_spec.rb | 32 ++++++++++++++++++++------------ 2 files changed, 42 insertions(+), 13 deletions(-) diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index b8a1c4b..c31fadc 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -44,10 +44,31 @@ def run def run! run.tap { - raise RenderingError.new(@error) if @error + raise RenderingError.new(error) if error? } end + def error? + !!error + end + + def match_page_load_error + error.to_s.match /^(null|\S+) Unable to load the address!$/ + end + def page_load_error? + !!match_page_load_error + end + def page_load_status_code + if match = match_page_load_error + status_code = match[1].to_s + if status_code =~ /\A\d+\Z/ + status_code.to_i + else + status_code + end + end + end + # Public: Returns the arguments for the PhantomJS rasterize command as a shell-escaped string def cmd Shellwords.join cmd_array diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index d4cd99d..2230297 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -100,17 +100,25 @@ end end - context "Error" do - it "should return result nil" do - phantom = Shrimp::Phantom.new("file://foo/bar") - @result = phantom.run - @result.should be_nil - end - - it "should be unable to load the address" do - phantom = Shrimp::Phantom.new("file:///foo/bar") - phantom.run - phantom.error.should include "Unable to load the address" + context "Errors" do + describe "'Unable to load the address' error" do + before { @result = phantom.run } + + context 'an invalid http: address' do + subject(:phantom) { Shrimp::Phantom.new("http://example.com/foo/bar") } + it { @result.should be_nil } + its(:error) { should include "Unable to load the address" } + its(:page_load_error?) { should eq true } + its(:page_load_status_code) { should eq 404 } + end + + context 'an invalid file: address' do + subject(:phantom) { Shrimp::Phantom.new("file:///foo/bar") } + it { @result.should be_nil } + its(:error) { should include "Unable to load the address" } + its(:page_load_error?) { should eq true } + its(:page_load_status_code) { should eq 'null' } + end end it "should be unable to copy file" do @@ -120,7 +128,7 @@ end end - context "Error Bang!" do + context "Errors (using bang methods)" do it "should be unable to load the address" do phantom = Shrimp::Phantom.new("file:///foo/bar") From 050410d3ecde0489ab121a3055465ee7247ccbd8 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 19:38:00 -0800 Subject: [PATCH 26/39] Add full response details to what rasterize.js outputs so that we can get the redirect URL, etc. Add response and redirect_to methods to Phantom to make it easy to get at that data. Add a .chomp to @result to get rid of the extra newline at the end. --- lib/shrimp/phantom.rb | 19 +++++++++++++++++-- lib/shrimp/rasterize.js | 1 + shrimp.gemspec | 1 + spec/shrimp/phantom_spec.rb | 29 ++++++++++++++++++++++++++--- spec/spec_helper.rb | 28 ++++++++++++++++++++++++++++ 5 files changed, 73 insertions(+), 5 deletions(-) diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index c31fadc..264f845 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -25,7 +25,7 @@ def initialize(msg = nil) class Phantom attr_accessor :source, :configuration, :outfile - attr_reader :options, :cookies, :result, :error + attr_reader :options, :cookies, :result, :error, :response SCRIPT_FILE = File.expand_path('../rasterize.js', __FILE__) # Public: Runs the phantomjs binary @@ -35,8 +35,12 @@ def run @error = nil puts "Running command: #{cmd}" if options[:debug] @result = `#{cmd}` + if match = @result.match(response_line_regexp) + @response = JSON.parse match[1] + @result.gsub! response_line_regexp, '' + end unless $?.exitstatus == 0 - @error = @result + @error = @result.chomp @result = nil end @result @@ -48,6 +52,17 @@ def run! } end + def response_line_regexp + /^response: (.*)$\n?/ + end + def redirect? + page_load_status_code == 302 + end + def redirect_to + return unless redirect? + response['redirectURL'] if response + end + def error? !!error end diff --git a/lib/shrimp/rasterize.js b/lib/shrimp/rasterize.js index f833e42..239a295 100644 --- a/lib/shrimp/rasterize.js +++ b/lib/shrimp/rasterize.js @@ -46,6 +46,7 @@ if (system.args.length < 3 || system.args.length > 12) { // determine the statusCode page.onResourceReceived = function (resource) { if (resource.url == address) { + console.log('response: ' + JSON.stringify(resource)) statusCode = resource.status; } }; diff --git a/shrimp.gemspec b/shrimp.gemspec index 755850d..5c08e87 100644 --- a/shrimp.gemspec +++ b/shrimp.gemspec @@ -24,4 +24,5 @@ Gem::Specification.new do |gem| gem.add_development_dependency(%q, [">= 2.2.0"]) gem.add_development_dependency(%q, [">= 0.5.6"]) gem.add_development_dependency(%q, ["= 1.4.1"]) + gem.add_development_dependency(%q) end diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 2230297..5fdc9ee 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -22,7 +22,7 @@ it "should render a pdf file" do phantom = Shrimp::Phantom.new("file://#{test_file}") phantom.to_pdf("#{tmpdir}/test.pdf").should eq "#{tmpdir}/test.pdf" - phantom.result.should start_with "rendered to: #{tmpdir}/test.pdf" + phantom.result.should include "rendered to: #{tmpdir}/test.pdf" end it "should accept a local file url" do @@ -107,15 +107,38 @@ context 'an invalid http: address' do subject(:phantom) { Shrimp::Phantom.new("http://example.com/foo/bar") } it { @result.should be_nil } - its(:error) { should include "Unable to load the address" } + its(:error) { should eq "404 Unable to load the address!" } its(:page_load_error?) { should eq true } its(:page_load_status_code) { should eq 404 } end + context 'an http: response that redirects' do + around(:each) do |example| + with_local_server do |server| + server.mount_proc '/' do |request, response| + response.body = 'Home' + raise WEBrick::HTTPStatus::OK + end + server.mount_proc '/redirect_me' do |request, response| + response['Location'] = '/' + raise WEBrick::HTTPStatus::Found + end + example.run + end + end + subject(:phantom) { Shrimp::Phantom.new("http://#{local_server_host}/redirect_me") } + it { @result.should be_nil } + its(:error) { should eq "302 Unable to load the address!" } + its(:page_load_error?) { should eq true } + its(:page_load_status_code) { should eq 302 } + its('response.keys') { should include 'redirectURL' } + its(:redirect_to) { should eq "http://#{local_server_host}/" } + end + context 'an invalid file: address' do subject(:phantom) { Shrimp::Phantom.new("file:///foo/bar") } it { @result.should be_nil } - its(:error) { should include "Unable to load the address" } + its(:error) { should eq "null Unable to load the address!" } its(:page_load_error?) { should eq true } its(:page_load_status_code) { should eq 'null' } end diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 73f63de..81adc6e 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -1,5 +1,6 @@ require 'rack/test' require 'shrimp' +require 'webrick' RSpec.configure do |config| include Rack::Test::Methods @@ -47,3 +48,30 @@ def middleware_options :request_timeout => 1 } end + +def local_server_port + 8800 +end +def local_server_host + "localhost:#{local_server_port}" +end + +def with_local_server + webrick_options = { + :Port => local_server_port, + :AccessLog => [], + :Logger => WEBrick::Log::new(RUBY_PLATFORM =~ /mswin|mingw/ ? 'NUL:' : '/dev/null', 7) + } + begin + # The "TCPServer Error: Address already in use - bind(2)" warning here appears to be bogus, + # because it occurs even the first time we start the server and nothing else is bound to the + # port. + server = WEBrick::HTTPServer.new(webrick_options) + trap("INT") { server.shutdown } + Thread.new { server.start } + yield server + server.shutdown + ensure + server.shutdown if server + end +end From 89d7176525f0809af751e2d8550afb2950291651 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 12:37:50 -0800 Subject: [PATCH 27/39] Add a simpler middleware, SynchronousMiddleware, that runs the phantomjs command synchronously and doesn't use any of the fancy polling that Shrimp::Middleware uses. It also avoids showing the user an ugly "Preparing pdf..." page, which doesn't even go away after the PDF file has been downloaded. The caveat is that you must make sure that your web server is able to process any number of concurrent requests (or at least enough concurrent requests that you aren't worried). If your web server reaches its concurrency limit and then receives a request for a .pdf resource, then when phantomjs attempts to request the .html version of the resource, this new HTTP request will have to wait until the server finishes some other request and is able to serve this new HTTP request. That should generally be okay as long as most of the other requests are quick enough. But in the worst case, where each of the requests currently being served is a request for a .pdf resource (which must generate an additional request for the .html request), then it will create a deadlock, because request A blocks while waiting for request B to complete, while at the same time request B can't complete because it is blocking while waiting for request A to finish (in the simple case where the max concurrency is 2). In Rails, if you are using a multi-threaded web server such as puma, then you *must* set config.allow_concurrency = true to allow concurrent requests. Otherwise, the Rack::Lock middleware will prevent the 2nd request (that PhantomJS generates) from being processed until the 1st request (for the .pdf file) has completed, which immediately creates a deadlock! --- lib/shrimp.rb | 1 + lib/shrimp/base_middleware.rb | 12 +-- lib/shrimp/synchronous_middleware.rb | 30 ++++++ spec/shrimp/synchronous_middleware_spec.rb | 112 +++++++++++++++++++++ 4 files changed, 149 insertions(+), 6 deletions(-) create mode 100644 lib/shrimp/synchronous_middleware.rb create mode 100644 spec/shrimp/synchronous_middleware_spec.rb diff --git a/lib/shrimp.rb b/lib/shrimp.rb index f12222d..3f92c44 100644 --- a/lib/shrimp.rb +++ b/lib/shrimp.rb @@ -2,4 +2,5 @@ require 'shrimp/source' require 'shrimp/phantom' require 'shrimp/middleware' +require 'shrimp/synchronous_middleware' require 'shrimp/configuration' diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index 5999b67..84cf1ec 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -42,6 +42,12 @@ def call(env) end end + def render_to + file_name = Digest::MD5.hexdigest(@request.path) + ".pdf" + file_path = @options[:out_path] + "#{file_path}/#{file_name}" + end + # The URL for the HTML-formatted web page that we are converting into a PDF. def html_url @request.url.sub(%r<\.pdf(\?|$)>, '\1') @@ -87,12 +93,6 @@ def pdf_headers(body) end end - def render_to - file_name = Digest::MD5.hexdigest(@request.path) + ".pdf" - file_path = @options[:out_path] - "#{file_path}/#{file_name}" - end - def concat(accepts, type) (accepts || '').split(',').unshift(type).compact.join(',') end diff --git a/lib/shrimp/synchronous_middleware.rb b/lib/shrimp/synchronous_middleware.rb new file mode 100644 index 0000000..6897fea --- /dev/null +++ b/lib/shrimp/synchronous_middleware.rb @@ -0,0 +1,30 @@ +require 'shrimp/base_middleware' + +module Shrimp + class SynchronousMiddleware < BaseMiddleware + def render_as_pdf(env) + # Start PhantomJS rendering in the same process (synchronously) and wait until it completes. + render_pdf + return phantomjs_error_response if phantom.error? + + body = pdf_body() + headers = pdf_headers(body) + [200, headers, [body]] + end + + attr_reader :phantom + + private + + def phantomjs_error_response + headers = {'Content-Type' => 'text/html'} + if phantom.page_load_error? + status_code = phantom.page_load_status_code + headers['Location'] = phantom.redirect_to if phantom.redirect? + else + status_code = 500 + end + [status_code, headers, [phantom.error]] + end + end +end diff --git a/spec/shrimp/synchronous_middleware_spec.rb b/spec/shrimp/synchronous_middleware_spec.rb new file mode 100644 index 0000000..ffcff4b --- /dev/null +++ b/spec/shrimp/synchronous_middleware_spec.rb @@ -0,0 +1,112 @@ +require 'spec_helper' + +shared_context Shrimp::SynchronousMiddleware do + def mock_app(options = { }, conditions = { }) + @middleware = Shrimp::SynchronousMiddleware.new(main_app, options, conditions) + @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session') + end +end + +describe Shrimp::SynchronousMiddleware do + include_context Shrimp::SynchronousMiddleware + + before { mock_app(middleware_options) } + subject { @middleware } + + context "matching pdf" do + describe "requesting a simple path" do + before { get '/test.pdf' } + its(:html_url) { should eq 'http://example.org/test' } + its(:render_as_pdf?) { should be true } + it { @middleware.send(:render_to).should start_with middleware_options[:out_path] } + it "should return a 404 status because http://example.org/test does not exist" do + last_response.status.should eq 404 + last_response.body. should eq "404 Unable to load the address!" + @middleware.phantom.error.should eq "404 Unable to load the address!" + end + end + + describe "requesting a path with a query string" do + before { get '/test.pdf?size=10' } + its(:render_as_pdf?) { should be true } + its(:html_url) { should eq 'http://example.org/test?size=10' } + end + + describe "requesting a simple path (and we stub html_url to a file url)" do + before { @middleware.stub(:html_url).and_return "file://#{test_file}" } + before { get '/test.pdf' } + it "should return a valid pdf with 200 status" do + last_response.status.should eq 200 + last_response.headers['Content-Type'].should eq 'application/pdf' + valid_pdf?(last_response.body).should eq true + @middleware.phantom.result.should start_with "rendered to: #{@middleware.render_to}" + end + end + + context 'requesting an HTML resource that redirects' do + before { + phantom = Shrimp::Phantom.new('http://example.org/redirect_me') + phantom.should_receive(:to_pdf).and_return nil + phantom.stub :error => "302 Unable to load the address!", + :redirect_to => "http://localhost:8800/sign_in" + Shrimp::Phantom.should_receive(:new).and_return phantom + } + before { get '/redirect_me.pdf' } + it "should follow the redirect that the phantomjs request encountered" do + # This tests the phantomjs_error_response method + last_response.status.should eq 302 + last_response.headers['Content-Type'].should eq 'text/html' + last_response.headers['Location'].should eq "http://#{local_server_host}/sign_in" + @middleware.phantom.error. should eq "302 Unable to load the address!" + end + end + end + + context "not matching pdf" do + it "should skip pdf rendering" do + get 'http://www.example.org/test' + last_response.body.should include "Hello world!" + @middleware.render_as_pdf?.should be false + end + end +end + +describe Shrimp::SynchronousMiddleware, "Conditions" do + include_context Shrimp::SynchronousMiddleware + + context "only" do + before { mock_app(middleware_options, :only => [%r[^/invoice], %r[^/public]]) } + it "render pdf for set only option" do + get '/invoice/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "render pdf for set only option" do + get '/public/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "not render pdf for any other path" do + get '/secret/test.pdf' + @middleware.render_as_pdf?.should be false + end + end + + context "except" do + before { mock_app(middleware_options, :except => %w(/secret)) } + it "render pdf for set only option" do + get '/invoice/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "render pdf for set only option" do + get '/public/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "not render pdf for any other path" do + get '/secret/test.pdf' + @middleware.render_as_pdf?.should be false + end + end +end From f770cbaeee426c0c845a6680a071fd9844901abd Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 20 Nov 2013 20:30:57 -0800 Subject: [PATCH 28/39] Simple change to make the Middleware thread-safe: Only make changes to instance variables on a copy of the self. By making a copy of the middleware object, even if two web server threads started out with the same middleware object, when they call call(), a new copy is created in each thread, and only that copy is modified, so that the threads do not sharing any data. This is one of the approaches mentioned at http://railscasts.com/episodes/151-rack-middleware. --- lib/shrimp/base_middleware.rb | 8 ++++++++ lib/shrimp/configuration.rb | 5 +++-- spec/spec_helper.rb | 6 ++++++ 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index 84cf1ec..3838698 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -34,6 +34,14 @@ def render_as_pdf? end def call(env) + if @options[:thread_safe] + dup._call(env) + else + _call(env) + end + end + + def _call(env) @request = Rack::Request.new(env) if render_as_pdf? #&& headers['Content-Type'] =~ /text\/html|application\/xhtml\+xml/ render_as_pdf(env) diff --git a/lib/shrimp/configuration.rb b/lib/shrimp/configuration.rb index 36efd57..3563c32 100644 --- a/lib/shrimp/configuration.rb +++ b/lib/shrimp/configuration.rb @@ -13,7 +13,8 @@ def initialize :command_config_file => File.expand_path('../config.json', __FILE__), :viewport_width => 600, :viewport_height => 600, - :debug => false + :debug => false, + :thread_safe => true } end @@ -21,7 +22,7 @@ def to_h @options end - [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :debug].each do |m| + [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :debug, :thread_safe].each do |m| define_method("#{m}=") do |val| @options[m] = val end diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 81adc6e..310754c 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -7,6 +7,12 @@ end Shrimp.configure do |config| + # If we left this as the default value of true, then we couldn't check things like + # @middleware.render_as_pdf? in our tests after initiating a request with get '/test.pdf', because + # render_as_pdf? depends on @request, which doesn't get set until *after* we call call(env) with the + # request env. But when thread_safe is true, it actually prevents call(env) from changing any + # instance variables in the original object. (In the original object, @request will still be nil.) + config.thread_safe = false end def tmpdir From 7af5815a20a5ce446570cd50299651ef426d7d5f Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 27 Nov 2013 12:30:10 -0800 Subject: [PATCH 29/39] Remove a commented-out check for headers['Content-Type']. Remove unused concat method. --- lib/shrimp/base_middleware.rb | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index 3838698..9387e67 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -43,7 +43,7 @@ def call(env) def _call(env) @request = Rack::Request.new(env) - if render_as_pdf? #&& headers['Content-Type'] =~ /text\/html|application\/xhtml\+xml/ + if render_as_pdf? render_as_pdf(env) else @app.call(env) @@ -101,9 +101,5 @@ def pdf_headers(body) end end - def concat(accepts, type) - (accepts || '').split(',').unshift(type).compact.join(',') - end - end end From 9263c9187ab7106a46d39e556e60a25ac251e339 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 27 Nov 2013 10:16:17 -0800 Subject: [PATCH 30/39] Parse the response_headers from the response that PhantomJS receives --- lib/shrimp/phantom.rb | 5 ++++- spec/shrimp/phantom_spec.rb | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index 264f845..84c058e 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -25,7 +25,7 @@ def initialize(msg = nil) class Phantom attr_accessor :source, :configuration, :outfile - attr_reader :options, :cookies, :result, :error, :response + attr_reader :options, :cookies, :result, :error, :response, :response_headers SCRIPT_FILE = File.expand_path('../rasterize.js', __FILE__) # Public: Runs the phantomjs binary @@ -37,6 +37,9 @@ def run @result = `#{cmd}` if match = @result.match(response_line_regexp) @response = JSON.parse match[1] + @response_headers = @response['headers'].inject({}) {|hash, header| + hash[header['name']] = header['value']; hash + } @result.gsub! response_line_regexp, '' end unless $?.exitstatus == 0 diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 5fdc9ee..745af48 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -132,6 +132,7 @@ its(:page_load_error?) { should eq true } its(:page_load_status_code) { should eq 302 } its('response.keys') { should include 'redirectURL' } + its('response_headers.keys') { should == ['Location', 'Server', 'Date', 'Content-Length', 'Connection'] } its(:redirect_to) { should eq "http://#{local_server_host}/" } end From bb49ec05e2b47d3edfb884b3f45a350ca7493e7d Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 27 Nov 2013 11:24:33 -0800 Subject: [PATCH 31/39] Make it possible to control which filename should be used for the downloaded PDF file, by having the HTML resource respond with a X-Pdf-Filename header containing the desired file name. (This is only possible for the SynchronousMiddleware. In the asynchronous Middleware, the code that sends the PDF file to the user doesn't have access to the Phantom object, since that object was created in the separate forked process.) --- lib/shrimp/base_middleware.rb | 23 +++++++++++++++++++++- lib/shrimp/synchronous_middleware.rb | 2 +- spec/shrimp/synchronous_middleware_spec.rb | 18 +++++++++++++++++ 3 files changed, 41 insertions(+), 2 deletions(-) diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index 9387e67..e385b2e 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -94,10 +94,31 @@ def pdf_body body end - def pdf_headers(body) + def default_pdf_options + { + :type => 'application/octet-stream'.freeze, + :disposition => 'attachment'.freeze, + } + end + + def pdf_headers(body, options = {}) { }.tap do |headers| headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s headers["Content-Type"] = "application/pdf" + + # Based on send_file_headers! from actionpack/lib/action_controller/metal/data_streaming.rb + options = default_pdf_options.merge(@options).merge(options) + [:type, :disposition].each do |arg| + raise ArgumentError, ":#{arg} option required" if options[arg].nil? + end + + disposition = options[:disposition] + disposition += %(; filename="#{options[:filename]}") if options[:filename] + + headers.merge!( + 'Content-Disposition' => disposition, + 'Content-Transfer-Encoding' => 'binary' + ) end end diff --git a/lib/shrimp/synchronous_middleware.rb b/lib/shrimp/synchronous_middleware.rb index 6897fea..f0acd3b 100644 --- a/lib/shrimp/synchronous_middleware.rb +++ b/lib/shrimp/synchronous_middleware.rb @@ -8,7 +8,7 @@ def render_as_pdf(env) return phantomjs_error_response if phantom.error? body = pdf_body() - headers = pdf_headers(body) + headers = pdf_headers(body, filename: @phantom.response_headers['X-Pdf-Filename']) [200, headers, [body]] end diff --git a/spec/shrimp/synchronous_middleware_spec.rb b/spec/shrimp/synchronous_middleware_spec.rb index ffcff4b..1ebb139 100644 --- a/spec/shrimp/synchronous_middleware_spec.rb +++ b/spec/shrimp/synchronous_middleware_spec.rb @@ -43,6 +43,24 @@ def mock_app(options = { }, conditions = { }) end end + context 'requesting an HTML resource that sets a X-Pdf-Filename header' do + before { + @middleware.stub(:html_url).and_return "file://#{test_file}" + phantom = Shrimp::Phantom.new(@middleware.html_url) + phantom.stub :response_headers => { + 'X-Pdf-Filename' => 'Some Fancy Report Title.pdf' + } + Shrimp::Phantom.should_receive(:new).and_return phantom + } + before { get '/use_different_filename.pdf' } + it "should use the filename from the X-Pdf-Filename header" do + last_response.status.should eq 200 + last_response.headers['Content-Type'].should eq 'application/pdf' + last_response.headers['Content-Disposition'].should eq %(attachment; filename="Some Fancy Report Title.pdf") + valid_pdf?(last_response.body).should eq true + end + end + context 'requesting an HTML resource that redirects' do before { phantom = Shrimp::Phantom.new('http://example.org/redirect_me') From c3f524ff8646ab953e775a0c5b11a1db16178834 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Tue, 5 Aug 2014 11:45:43 -0700 Subject: [PATCH 32/39] Just render directly to the output file instead of the complication of renaming from '_tmp.pdf'. This allows us to specify an output format other than '.pdf', like '.png'. --- lib/shrimp/rasterize.js | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/lib/shrimp/rasterize.js b/lib/shrimp/rasterize.js index 239a295..5c7b40a 100644 --- a/lib/shrimp/rasterize.js +++ b/lib/shrimp/rasterize.js @@ -67,20 +67,7 @@ if (system.args.length < 3 || system.args.length > 12) { phantom.exit(1); } else { window.setTimeout(function () { - page.render(output + '_tmp.pdf'); - - if (fs.exists(output)) { - fs.remove(output); - } - - try { - fs.move(output + '_tmp.pdf', output); - } - catch (e) { - console.log(e); - phantom.exit(1); - throw e - } + page.render(output); console.log('rendered to: ' + output, new Date().getTime()); phantom.exit(); }, render_time); From 5966ccd3e42f5be7c6fa645e67f890be0cf0d515 Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Wed, 22 Oct 2014 09:42:39 -0700 Subject: [PATCH 33/39] Allow content disposition (inline or attachment) to be specified with a X-Pdf-Disposition header. --- lib/shrimp/synchronous_middleware.rb | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/shrimp/synchronous_middleware.rb b/lib/shrimp/synchronous_middleware.rb index f0acd3b..8ed7775 100644 --- a/lib/shrimp/synchronous_middleware.rb +++ b/lib/shrimp/synchronous_middleware.rb @@ -8,7 +8,11 @@ def render_as_pdf(env) return phantomjs_error_response if phantom.error? body = pdf_body() - headers = pdf_headers(body, filename: @phantom.response_headers['X-Pdf-Filename']) + headers = pdf_headers(body, { + disposition: @phantom.response_headers['X-Pdf-Disposition'], + filename: @phantom.response_headers['X-Pdf-Filename'] + }.reject {|k, v| v.nil? } + ) [200, headers, [body]] end From d93b7291107bddd8743982538bcefc91caef86ec Mon Sep 17 00:00:00 2001 From: Chuck Yang Date: Tue, 20 Dec 2016 19:56:46 -0600 Subject: [PATCH 34/39] Allow page number in header and footer --- lib/shrimp/rasterize.js | 44 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/lib/shrimp/rasterize.js b/lib/shrimp/rasterize.js index 5c7b40a..0f61efb 100644 --- a/lib/shrimp/rasterize.js +++ b/lib/shrimp/rasterize.js @@ -36,8 +36,10 @@ if (system.args.length < 3 || system.args.length > 12) { page.viewportSize = { width: viewport_width, height: viewport_height }; if (system.args.length > 3 && system.args[2].substr(-4) === ".pdf") { size = system.args[3].split('*'); - page.paperSize = size.length === 2 ? { width:size[0], height:size[1], margin:'0px' } - : { format:system.args[3], orientation:orientation, margin:margin }; + header = { height: '1cm', contents: phantom.callback(function(pageNum, numPages) { return ""; }) }; + footer = { height: '1cm', contents: phantom.callback(function(pageNum, numPages) { return ""; }) }; + page.paperSize = size.length === 2 ? { width:size[0], height:size[1], margin:'0px', header: header, footer: footer } + : { format:system.args[3], orientation:orientation, margin:margin, header: header, footer: footer }; } if (system.args.length > 4) { page.zoomFactor = system.args[4]; @@ -66,7 +68,45 @@ if (system.args.length < 3 || system.args.length > 12) { } phantom.exit(1); } else { + /* check whether the loaded page overwrites the header/footer setting, + i.e. whether a PhantomJSPriting object exists. Use that then instead + of our defaults above. + example: + + + + +

asdfadsf

asdfadsfycvx

+ + */ window.setTimeout(function () { + if (page.evaluate(function(){return typeof PhantomJSPrinting == "object";})) { + paperSize = page.paperSize; + paperSize.header.height = page.evaluate(function() { + return PhantomJSPrinting.header.height; + }); + paperSize.header.contents = phantom.callback(function(pageNum, numPages) { + return page.evaluate(function(pageNum, numPages){return PhantomJSPrinting.header.contents(pageNum, numPages);}, pageNum, numPages); + }); + paperSize.footer.height = page.evaluate(function() { + return PhantomJSPrinting.footer.height; + }); + paperSize.footer.contents = phantom.callback(function(pageNum, numPages) { + return page.evaluate(function(pageNum, numPages){return PhantomJSPrinting.footer.contents(pageNum, numPages);}, pageNum, numPages); + }); + page.paperSize = paperSize; + } page.render(output); console.log('rendered to: ' + output, new Date().getTime()); phantom.exit(); From 87a97a9a0da485e4d17f8435f7f366817dcf79d2 Mon Sep 17 00:00:00 2001 From: Chuck Yang Date: Mon, 26 Dec 2016 15:58:17 -0600 Subject: [PATCH 35/39] added .to_time for rendering_timed_out? --- lib/shrimp/middleware.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index fc7b101..efaec77 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -66,7 +66,7 @@ def set_rendering_flag end def rendering_started_at - @request.session["phantom-rendering"][render_to] + @request.session["phantom-rendering"][render_to].to_time end def rendering_timed_out? From 6386ca208c2c6697fd41824eddde2f06db6f7e8f Mon Sep 17 00:00:00 2001 From: Chuck Yang Date: Fri, 28 Apr 2017 14:55:06 -0500 Subject: [PATCH 36/39] Fixing file corrupted when rendering_time is small - ensure this is checking phantomjs is done with the rendering by writing to a done file when the phantomjs command is done - before this, the code was just checking if the file exists and if the file size is not 0 --- lib/shrimp/base_middleware.rb | 5 +++++ lib/shrimp/middleware.rb | 12 ++++++++---- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb index e385b2e..cd45387 100644 --- a/lib/shrimp/base_middleware.rb +++ b/lib/shrimp/base_middleware.rb @@ -56,6 +56,10 @@ def render_to "#{file_path}/#{file_name}" end + def render_to_done + "#{render_to}.done" + end + # The URL for the HTML-formatted web page that we are converting into a PDF. def html_url @request.url.sub(%r<\.pdf(\?|$)>, '\1') @@ -69,6 +73,7 @@ def render_pdf @phantom = phantom phantom.to_pdf(render_to) log_render_pdf_completion + File.open(render_to_done, 'w') { |f| f.write('done') } unless @phantom.error? end end diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index efaec77..d81b5ba 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -13,13 +13,13 @@ def initialize(app, options = { }, conditions = { }) def render_as_pdf(env) if already_rendered? && (up_to_date?(@options[:cache_ttl]) || @options[:cache_ttl] == 0) if File.size(render_to) == 0 - File.delete(render_to) + delete_tmp_files remove_rendering_flag return error_response end return ready_response if env['HTTP_X_REQUESTED_WITH'] body = pdf_body() - File.delete(render_to) if @options[:cache_ttl] == 0 + delete_tmp_files if @options[:cache_ttl] == 0 remove_rendering_flag headers = pdf_headers(body) [200, headers, [body]] @@ -32,7 +32,7 @@ def render_as_pdf(env) reload_response(@options[:polling_interval]) end else - File.delete(render_to) if already_rendered? + delete_tmp_files if already_rendered? set_rendering_flag # Start PhantomJS rendering in a separate process and then immediately render a web page # that continuously reloads (polls) until the rendering is complete. @@ -47,13 +47,17 @@ def render_as_pdf(env) private def already_rendered? - File.exists?(render_to) + File.exists?(render_to_done) && File.exists?(render_to) end def up_to_date?(ttl = 30) (Time.now - File.new(render_to).mtime) <= ttl end + def delete_tmp_files + File.delete(render_to) + File.delete(render_to_done) + end def remove_rendering_flag @request.session["phantom-rendering"] ||={ } From 7c1b6b14b13fa2345fbfcd9a13a6a13f3ed6d8e5 Mon Sep 17 00:00:00 2001 From: Chuck Yang Date: Wed, 3 May 2017 19:01:22 -0500 Subject: [PATCH 37/39] Using Thread.new instead of Process::detach fork for render_pdf --- lib/shrimp/middleware.rb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index d81b5ba..2bf066c 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -36,7 +36,9 @@ def render_as_pdf(env) set_rendering_flag # Start PhantomJS rendering in a separate process and then immediately render a web page # that continuously reloads (polls) until the rendering is complete. - Process::detach fork { + # Chuck: using Thread.new instead of Process::detach fork because Process fork will cause + # database disconnection when the forked process ended + Thread.new { render_pdf } reload_response(@options[:polling_offset]) From c3202be052771f8fadf9ecaf38228a2177436aca Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Thu, 4 May 2017 13:02:04 -0700 Subject: [PATCH 38/39] Remove tests for copying file To match changes made in c3f524ff --- spec/shrimp/phantom_spec.rb | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 745af48..2351636 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -144,12 +144,6 @@ its(:page_load_status_code) { should eq 'null' } end end - - it "should be unable to copy file" do - phantom = Shrimp::Phantom.new("file://#{test_file}") - phantom.to_pdf("/foo/bar/") - phantom.error.should include "Unable to copy file " - end end context "Errors (using bang methods)" do @@ -159,9 +153,5 @@ expect { phantom.run! }.to raise_error Shrimp::RenderingError end - it "should be unable to copy file" do - phantom = Shrimp::Phantom.new("file://#{test_file}") - expect { phantom.to_pdf!("/foo/bar/") }.to raise_error Shrimp::RenderingError - end end end From 690b6ca86c972d842d51d88b6476a84a1459320f Mon Sep 17 00:00:00 2001 From: Tyler Rick Date: Thu, 4 May 2017 15:46:10 -0700 Subject: [PATCH 39/39] Add some tests and an example in the Readme about adding header/footer with page numbers --- README.md | 20 ++++++++++++++++ lib/shrimp/middleware.rb | 4 ++-- lib/shrimp/rasterize.js | 25 ++++---------------- shrimp.gemspec | 1 + spec/shrimp/phantom_spec.rb | 15 ++++++++++++ spec/shrimp/test_file_with_page_numbers.html | 21 ++++++++++++++++ spec/spec_helper.rb | 8 +++++-- 7 files changed, 69 insertions(+), 25 deletions(-) create mode 100644 spec/shrimp/test_file_with_page_numbers.html diff --git a/README.md b/README.md index 6a4064a..d5c031d 100644 --- a/README.md +++ b/README.md @@ -204,6 +204,26 @@ to the middleware: To disable this caching entirely and force it to re-generate the PDF again each time a request comes in, set `cache_ttl` to 0. +### Header/Footer + +You can specify a header or footer callback, which can even include page numbers. Example: + +```html + + + +``` ### Ajax requests diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 2bf066c..66fad5d 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -34,9 +34,9 @@ def render_as_pdf(env) else delete_tmp_files if already_rendered? set_rendering_flag - # Start PhantomJS rendering in a separate process and then immediately render a web page + # Start PhantomJS rendering in a separate thread and then immediately render a web page # that continuously reloads (polls) until the rendering is complete. - # Chuck: using Thread.new instead of Process::detach fork because Process fork will cause + # Using Thread.new instead of Process::detach fork because Process fork will cause # database disconnection when the forked process ended Thread.new { render_pdf diff --git a/lib/shrimp/rasterize.js b/lib/shrimp/rasterize.js index 0f61efb..36787c5 100644 --- a/lib/shrimp/rasterize.js +++ b/lib/shrimp/rasterize.js @@ -68,27 +68,10 @@ if (system.args.length < 3 || system.args.length > 12) { } phantom.exit(1); } else { - /* check whether the loaded page overwrites the header/footer setting, - i.e. whether a PhantomJSPriting object exists. Use that then instead - of our defaults above. - example: - - - - -

asdfadsf

asdfadsfycvx

- + /* Check whether the loaded page overwrites the header/footer setting, + i.e. whether a PhantomJSPriting object exists. Use that then instead + of our defaults above. + See https://github.com/ariya/phantomjs/blob/master/examples/printheaderfooter.js#L66 */ window.setTimeout(function () { if (page.evaluate(function(){return typeof PhantomJSPrinting == "object";})) { diff --git a/shrimp.gemspec b/shrimp.gemspec index 5c08e87..cd24bb8 100644 --- a/shrimp.gemspec +++ b/shrimp.gemspec @@ -25,4 +25,5 @@ Gem::Specification.new do |gem| gem.add_development_dependency(%q, [">= 0.5.6"]) gem.add_development_dependency(%q, ["= 1.4.1"]) gem.add_development_dependency(%q) + gem.add_development_dependency(%q) end diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 2351636..55102bd 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -66,6 +66,7 @@ it "should be a valid pdf" do valid_pdf?(@result).should eq true + pdf_strings(@result).should eq "Hello\tWorld!" end end @@ -82,6 +83,7 @@ it "should be a valid pdf" do valid_pdf?(@result).should eq true + pdf_strings(Pathname(@result)).should eq "Hello\tWorld!" end end @@ -97,6 +99,7 @@ it "should be a valid pdf" do valid_pdf?(@result).should eq true + pdf_strings(@result).should eq "Hello\tWorld!" end end @@ -152,6 +155,18 @@ phantom = Shrimp::Phantom.new("file:///foo/bar") expect { phantom.run! }.to raise_error Shrimp::RenderingError end + end + + context 'test_file_with_page_numbers.html' do + let(:test_file) { super('test_file_with_page_numbers.html') } + + before do + phantom = Shrimp::Phantom.new("file://#{test_file}") + @result = phantom.to_string("#{tmpdir}/test.pdf") + end + it "PDF should contain page numbers" do + pdf_strings(@result).should eq "Header:\tPage\t1/2Footer:\tPage\t1/2Hello\tWorld!Hello\tWorld!Header:\tPage\t2/2Footer:\tPage\t2/2" + end end end diff --git a/spec/shrimp/test_file_with_page_numbers.html b/spec/shrimp/test_file_with_page_numbers.html new file mode 100644 index 0000000..77f995a --- /dev/null +++ b/spec/shrimp/test_file_with_page_numbers.html @@ -0,0 +1,21 @@ + + + + + +

Hello World!

+

Hello World!

+ + + diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 310754c..7bb6ac4 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -1,6 +1,7 @@ require 'rack/test' require 'shrimp' require 'webrick' +require 'pdf/inspector' RSpec.configure do |config| include Rack::Test::Methods @@ -19,8 +20,8 @@ def tmpdir Shrimp.config.tmpdir end -def test_file - File.expand_path('../shrimp/test_file.html', __FILE__) +def test_file(file_name = 'test_file.html') + File.expand_path("../shrimp/#{file_name}", __FILE__) end def valid_pdf?(io) @@ -31,6 +32,9 @@ def valid_pdf?(io) io[0...4] == "%PDF" || File.open(io).read[0...4] == "%PDF" end end +def pdf_strings(pdf) + PDF::Inspector::Text.analyze(pdf).strings.join +end # Used by rack-test when we call get def app