Skip to content

This plugin enables 'enterprise-class' Google Sitemaps to be easily generated for a Rails site as a rake task, using a simple 'Rails Routes'-like DSL. It allows you to take care of familiar Sitemap issues like: Gzip of Sitemap files, variable priority links, paging/sorting links (e.g. my_list?page=3), SSL host links (e.g. https:), Rails apps whi…

License

Notifications You must be signed in to change notification settings

gonzoyumo/sitemap_generator

 
 

Repository files navigation

SitemapGenerator

This plugin enables 'enterprise-class' Google Sitemaps to be easily generated for a Rails site as a rake task, using a simple 'Rails Routes'-like DSL.

Foreword

Unfortunately, Adam Salter passed away recently. Those who knew him know what an amazing guy he was, and what an excellent Rails programmer he was. His passing is a great loss to the Rails community.

I will be taking over maintaining this gem from him. -- Karl

Raison d'être

Most of the Sitemap plugins out there seem to try to recreate the Sitemap links by iterating the Rails routes. In some cases this is possible, but for a great deal of cases it isn't.

a) There are probably quite a few routes in your routes file that don't need inclusion in the Sitemap. (AJAX routes I'm looking at you.)

and

b) How would you infer the correct series of links for the following route?

map.zipcode 'location/:state/:city/:zipcode', :controller => 'zipcode', :action => 'index'

Don't tell me it's trivial, because it isn't. It just looks trivial.

So my idea is to have another file similar to 'routes.rb' called 'sitemap.rb', where you can define what goes into the Sitemap.

Here's my solution:

Zipcode.find(:all, :include => :city).each do |z|
  sitemap.add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
end

Easy hey?

Other Sitemap settings for the link, like lastmod, priority, changefreq and host are entered automatically, although you can override them if you need to.

Other "difficult" Sitemap issues, solved by this plugin:

  • Support for more than 50,000 urls (using a Sitemap Index file)
  • Gzip of Sitemap files
  • Variable priority of links
  • Paging/sorting links (e.g. my_list?page=3)
  • SSL host links (e.g. https:)
  • Rails apps which are installed on a sub-path (e.g. example.com/blog_app/)

Installation

As a gem

  1. Add the gem as a dependency in your config/environment.rb

    config.gem 'sitemap_generator', :lib => false, :source => 'http://gemcutter.org'

  2. $ rake gems:install

  3. Add the following line to your RAILS_ROOT/Rakefile

    require 'sitemap_generator/tasks' rescue LoadError

  4. $ rake sitemap:install

As a plugin

  1. Install plugin as normal

    $ ./script/plugin install git://github.com/kjvarga/sitemap_generator.git


Installation should create a 'config/sitemap.rb' file which will contain your logic for generation of the Sitemap files. (If you want to recreate this file manually run rake sitemap:install)

You can run rake sitemap:refresh as needed to create Sitemap files. This will also ping all the 'major' search engines. (if you want to disable all non-essential output run the rake task thusly rake -s sitemap:refresh)

Sitemaps with many urls (100,000+) take quite a long time to generate, so if you need to refresh your Sitemaps regularly you can set the rake task up as a cron job. Most cron agents will only send you an email if there is output from the cron task.

Optionally, you can add the following to your robots.txt file, so that robots can find the sitemap file.

Sitemap: <hostname>/sitemap_index.xml.gz

The robots.txt Sitemap URL should be the complete URL to the Sitemap Index, such as: http://www.example.org/sitemap_index.xml.gz

Example 'config/sitemap.rb'

# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://www.example.com"

SitemapGenerator::Sitemap.add_links do |sitemap|
  # Put links creation logic here.
  #
  # The Root Path ('/') and Sitemap Index file are added automatically.
  # Links are added to the Sitemap output in the order they are specified.
  #
  # Usage: sitemap.add path, options
  #        (default options are used if you don't specify them)
  #
  # Defaults: :priority => 0.5, :changefreq => 'weekly', 
  #           :lastmod => Time.now, :host => default_host


  # Examples:

  # add '/articles'
  sitemap.add articles_path, :priority => 0.7, :changefreq => 'daily'

  # add all individual articles
  Article.find(:all).each do |a|
    sitemap.add article_path(a), :lastmod => a.updated_at
  end

  # add merchant path
  sitemap.add '/purchase', :priority => 0.7, :host => "https://www.example.com"

end

# Including Sitemaps from Rails Engines.
#
# These Sitemaps should be almost identical to a regular Sitemap file except 
# they needn't define their own SitemapGenerator::Sitemap.default_host since
# they will undoubtedly share the host name of the application they belong to.
#
# As an example, say we have a Rails Engine in vendor/plugins/cadability_client
# We can include its Sitemap here as follows:
# 
file = File.join(Rails.root, 'vendor/plugins/cadability_client/config/sitemap.rb')
eval(open(file).read, binding, file)

Notes

  1. Tested/working on Rails 1.x.x <=> 2.x.x, no guarantees made for Rails 3.0.

  2. For large sitemaps it may be useful to split your generation into batches to avoid running out of memory. E.g.:

    add movies

    Movie.find_in_batches(:batch_size => 1000) do |movies| movies.each do |movie| sitemap.add "/movies/show/#{movie.to_param}", :lastmod => movie.updated_at, :changefreq => 'weekly' end end

  3. New Capistrano deploys will remove your Sitemap files, unless you run rake sitemap:refresh. The way around this is to create a cap task:

    after "deploy:update_code", "deploy:copy_old_sitemap"

    namespace :deploy do task :copy_old_sitemap do run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi" end end

  4. If generation of your sitemap fails for some reason, the old sitemap will remain in public/. This ensures that robots will always find a valid sitemap. Running silently (rake -s sitemap:refresh) and with email forwarding setup you'll only get an email if your sitemap fails to build, and no notification when everything is fine - which will be most of the time.

Known Bugs

  • Sitemaps.org states that no Sitemap XML file should be more than 10Mb uncompressed. The plugin will warn you about this, but does nothing to avoid it (like move some URLs into a later file).
  • There's no check on the size of a URL which isn't supposed to exceed 2,048 bytes.
  • Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it only supports up to 2,500,000,000 (2.5 billion) urls. I personally have no need of support for more urls, but plugin could be improved to support this.

Wishlist

  • Auto coverage testing. Generate a report of broken URLs by checking the status codes of each page in the sitemap.

Thanks (in no particular order)

Follow me on:

Twitter: twitter.com/adamsalter
Github: github.com/adamsalter

Copyright (c) 2009 Adam @ Codebright.net, released under the MIT license

About

This plugin enables 'enterprise-class' Google Sitemaps to be easily generated for a Rails site as a rake task, using a simple 'Rails Routes'-like DSL. It allows you to take care of familiar Sitemap issues like: Gzip of Sitemap files, variable priority links, paging/sorting links (e.g. my_list?page=3), SSL host links (e.g. https:), Rails apps whi…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 100.0%