Releases: ronin-rb/ronin-web-spider
Releases · ronin-rb/ronin-web-spider
0.2.0
- Added
Ronin::Web::Spider::Agent#every_javascript_url_string
. - Added
Ronin::Web::Spider::Agent#every_javascript_relative_path_string
. - Added
Ronin::Web::Spider::Agent#every_javascript_absolute_path_string
. - Added
Ronin::Web::Spider::Agent#every_javascript_path_string
. - Allow
Ronin::Web::Spider::Agent#every_html_comment
,Ronin::Web::Spider::Agent#every_javascript every_javascript
,Ronin::Web::Spider::Agent#every_javascript_string every_javascript_string
,Ronin::Web::Spider::Agent#every_javascript_relative_path_string every_javascript_relative_path_string
,Ronin::Web::Spider::Agent#every_javascript_absolute_path_string every_javascript_absolute_path_string
,Ronin::Web::Spider::Agent#every_javascript_url_string every_javascript_url_string
, andRonin::Web::Spider::Agent#every_javascript_comment every_javascript_comment
to also yield aSpidr::Page
block argument for additional context.
0.1.1
- Fixed
Ronin::Web::Spider::Agent#every_html_comment
andRonin::Web::Spider::Agent#every_javascript
when the page'sContent-Type
header includedtext/html
but lacked a response body, causingpage.doc
to benil
. - Fixed a bug in
Ronin::Web::Spider::Agent#every_javascript
where parsed JavaScript source code strings containing UTF-8 characters where being incorrectly encoded as ASCII-8bit strings, if the page'sContent-Type
header did not include acharset=
attribute. - Fixed a bug in
Ronin::Web::Spider::Agent#every_javascript_string
where inline JavaScript regexes containing the"
or'
characters (ex:/["'=]/
) would incorrectly be treated as the beginning or ends of JavaScript string literals. Note that while this greatly improves the accuracy ofRonin::Web::Spider::Agent#every_javascript_string
, it still does not support parsing JavaScript template literals that may also contain string literals (ex:Hello \"World\"
orHello ${myFunc("string literal")}
).
0.1.0
- Extracted and refactored from ronin-web.
- Relicensed as LGPL-3.0.
- Initial release:
- Requires
ruby
>= 3.0.0. - Built on top of the battle tested and versatile spidr gem.
- Provides additional callback methods:
every_host
- yields every unique host name that's spidered.every_cert
- yields every unique SSL/TLS certificate encountered while
spidering.every_favicon
- yields every favicon file that's encountered while
spidering.every_html_comment
- yields every HTML comment.every_javascript
- yields all JavaScript source code from either inline
<script>
or.js
files.every_javascript_string
- yields every single-quoted or double-quoted
String literal from all JavaScript source code.every_javascript_comment
- yields every JavaScript comment.every_comment
- yields every HTML or JavaScript comment.
- Supports archiving spidered pages to a directory or git repository.
- Requires