Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 #10

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

V2 #10

wants to merge 15 commits into from

Conversation

johnie
Copy link
Owner

@johnie johnie commented Oct 30, 2024

Note

This is a draft of a rewritten v2 of xscrape

Pull Request Type

  • Feature

Summary

This pull request adds a range of new features and updates to improve the HTML scraping and validation framework. Key changes include introducing continuous testing, adding a defineScraper function, implementing data validation with Zod and Effect libraries, and creating extensive HTML fixtures and unit tests.

Changes Made

  1. Testing Enhancements:

    • Added a test:watch command to package.json for continuous testing using Vitest.
    • Updated vitest.config.ts to include globals and jsdom as the testing environment for better DOM support.
  2. Validation and Schema:

    • Implemented SchemaValidator and createValidator functions in validators.ts, allowing flexible schema creation with Zod or Effect-based validators.
    • Added TypeScript types for schema validation and scraper configuration in types/main.ts, supporting a wide range of validation and transformation requirements.
    • Created a separate cheerio type file in types/cheerio.ts for extracting and defining HTML elements.
  3. Define Scraper Function:

    • Introduced the defineScraper function, which provides robust HTML scraping capabilities with validation and optional data transformation.
  4. Testing Fixtures:

    • Added comprehensive HTML test fixtures in test/__fixtures__/html.ts to cover various HTML structures for robust testing.
  5. Unit Testing:

    • Developed extensive tests in test/zod.test.ts to ensure data extraction and validation, handling multiple scenarios like nested fields, missing data, and invalid data.
  6. Dependency Upgrades:

    • Upgraded and added dependencies such as jsdom and domhandler to enhance DOM manipulation and parsing capabilities.

How to Test

  1. Run pnpm test:watch to verify continuous testing functionality.
  2. Ensure tests pass with pnpm test, especially checking data extraction from various HTML structures in test/zod.test.ts.
  3. Verify that the defineScraper function properly extracts and validates data using Zod-based schemas, as well as Effect-based validation if configured.

Possible Regressions

  • Potential compatibility issues with older Node versions due to new dependencies requiring Node 18+.
  • Changes in exports could impact other modules importing from index.ts, so ensure compatibility if there are dependencies on legacy exports.

Additional Notes

  • The defineScraper function provides flexible validation and extraction but requires additional configuration when using nested schemas.
  • Tests cover a range of HTML structures; however, additional cases can be added if further flexibility is required.

@johnie johnie added the enhancement New feature or request label Oct 30, 2024
Copy link

changeset-bot bot commented Oct 30, 2024

⚠️ No Changeset found

Latest commit: 4f56604

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant