forked from jkraemer/rdig
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES
42 lines (34 loc) · 1.24 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
0.3.6
- remove bundled htmlentities in favor of a gem dependency
- also extract links from area and frame tags
- fix etagfilter bug
0.3.5
- Add max_depth option to crawler configuration for limiting the crawl to a
specific depth
- add support for http proxies including basic authentication
- remove rubyful_soup support
0.3.4
0.3.2
- make RDig compatible with Ferret 0.10.x
- won't work any more with Ferret 0.9.x and before
0.3.1
- Bug fix release: fixed handling of unparseable URLs
0.3.0
- file system crawling
- optional url rewriting before indexing, e.g. for linking to results
via http and building the index directly from the file system
- PDF title extraction with pdfinfo
- removed dependency on mkmf which doesn't seem to exist in Ruby 1.8.2
- made content extractors more flexible - instances now use a given
configuration instead of the global one. This allows the
WordContentExtractor to use an HtmlContentExtractor with it's own
configuration that is independent of the global config.
0.2.1
- Bugfix release
0.2.0
- add pdf and Word content extraction capabilities using the tools
from the xpdf-utils and wv packages
- additional content extractors may be plugged in by extending
the ContentExtractor class
0.1.0
initial release