MrRoboto

A simple robots.txt service. MrRoboto will fetch and parse robots.txt files for you and indicate whether a path is crawlable by your user agent. It also has primitive support for the Crawl-dealy directive.

Installation

Available in Hex, the package can be installed as:

Add mr_roboto to your list of dependencies in mix.exs:

def deps do [{:mr_roboto, "~> 1.0.0"}] end
Ensure mr_roboto is started before your application:

def application do [applications: [:mr_roboto]] end

Usage

Checking a URL

Checking a URL is simple, just send {:crawl?, {agent, url}} to the Warden server.

GenServer.call MrRoboto.Warden, {:crawl?, {"mybot", "http://www.google.com"}}

The Warden server will reply with :allowed, :disallowed or :ambiguous.

In the case of an :ambiguous response it is up to your discression how to proceed. The :ambiguous response indicates that there were matching allow and disallow directives of the same length.

Crawl Delay

Crawl-delay isn't a very common directive but MrRoboto attempts to support it. The Warden server supports a delay_info call which will return the delay value and when the rule was last checked.

iex> GenServer.call MrRoboto.Warden, {:delay_info, {"mybog", "http://www.google.com"}}
%{delay: 1, last_checked: 1459425496}

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
config		config
lib		lib
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MrRoboto

Installation

Usage

Checking a URL

Crawl Delay

About

Releases

Packages

Languages

License

LeakyBucket/mr_roboto

Folders and files

Latest commit

History

Repository files navigation

MrRoboto

Installation

Usage

Checking a URL

Crawl Delay

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages