-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 #10
base: main
Are you sure you want to change the base?
Conversation
…dencies with domhandler and jsdom
…anced data validation support
…ation using Zod and Effect libraries
…extraction handling in the scraper framework
…idation and optional data transformation support
… structure in the scraper framework
… and styling in scraper framework
…ata extraction and validation from HTML sources
|
…clarity and maintainability
…ion definitions for better readability
…type in Effect test for clarity
…ion in Effect test for improved clarity
Note
This is a draft of a rewritten v2 of
xscrape
Pull Request Type
Summary
This pull request adds a range of new features and updates to improve the HTML scraping and validation framework. Key changes include introducing continuous testing, adding a
defineScraper
function, implementing data validation with Zod and Effect libraries, and creating extensive HTML fixtures and unit tests.Changes Made
Testing Enhancements:
test:watch
command topackage.json
for continuous testing using Vitest.vitest.config.ts
to includeglobals
andjsdom
as the testing environment for better DOM support.Validation and Schema:
SchemaValidator
andcreateValidator
functions invalidators.ts
, allowing flexible schema creation with Zod or Effect-based validators.types/main.ts
, supporting a wide range of validation and transformation requirements.cheerio
type file intypes/cheerio.ts
for extracting and defining HTML elements.Define Scraper Function:
defineScraper
function, which provides robust HTML scraping capabilities with validation and optional data transformation.Testing Fixtures:
test/__fixtures__/html.ts
to cover various HTML structures for robust testing.Unit Testing:
test/zod.test.ts
to ensure data extraction and validation, handling multiple scenarios like nested fields, missing data, and invalid data.Dependency Upgrades:
jsdom
anddomhandler
to enhance DOM manipulation and parsing capabilities.How to Test
pnpm test:watch
to verify continuous testing functionality.pnpm test
, especially checking data extraction from various HTML structures intest/zod.test.ts
.defineScraper
function properly extracts and validates data using Zod-based schemas, as well as Effect-based validation if configured.Possible Regressions
exports
could impact other modules importing fromindex.ts
, so ensure compatibility if there are dependencies on legacy exports.Additional Notes
defineScraper
function provides flexible validation and extraction but requires additional configuration when using nested schemas.