Skip to content

LeakyBucket/mr_roboto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MrRoboto

Coverage Status Build

A simple robots.txt service. MrRoboto will fetch and parse robots.txt files for you and indicate whether a path is crawlable by your user agent. It also has primitive support for the Crawl-dealy directive.

Installation

Available in Hex, the package can be installed as:

  1. Add mr_roboto to your list of dependencies in mix.exs:

    def deps do [{:mr_roboto, "~> 1.0.0"}] end

  2. Ensure mr_roboto is started before your application:

    def application do [applications: [:mr_roboto]] end

Usage

Checking a URL

Checking a URL is simple, just send {:crawl?, {agent, url}} to the Warden server.

GenServer.call MrRoboto.Warden, {:crawl?, {"mybot", "http://www.google.com"}}

The Warden server will reply with :allowed, :disallowed or :ambiguous.

In the case of an :ambiguous response it is up to your discression how to proceed. The :ambiguous response indicates that there were matching allow and disallow directives of the same length.

Crawl Delay

Crawl-delay isn't a very common directive but MrRoboto attempts to support it. The Warden server supports a delay_info call which will return the delay value and when the rule was last checked.

iex> GenServer.call MrRoboto.Warden, {:delay_info, {"mybog", "http://www.google.com"}}
%{delay: 1, last_checked: 1459425496}

About

A simple Robots.txt Application for Elixir

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages