- Adds
request_kwargs
argument toExtractor
- Adds note about URL content extraction to README
- Adds more type hints
- Converts more camel case variables to snake case
- Specifies Python 3.10 compatibility, adds version to package
- Fixes marked HTML extraction
- Adds new methods and documentation for marked HTML extraction
- Restores
TextBlock.set_is_content()
method
- Added 'raise_on_failure' parameter (default
True
) to extractors to raise exceptions when HTML extraction errors are encountered (they will be handled otherwise). - Updated unit tests
- Fixed some camel-cased variable names
- Made some minor optimizations
- Added CI
- Updated test requirements
- Added Flake8 config
- Fixed containedTextElements variable (#1)
- Initial release.