Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix XGBoost/LightGBM test execution and scikit-learn compatibility #846

Merged
merged 11 commits into from
Feb 9, 2025

Conversation

hainaweiben
Copy link
Collaborator

@hainaweiben hainaweiben commented Feb 9, 2025

What do these changes do?

  • Fix XGBoost/LightGBM integration tests not being executed
  • Rename force_all_finite to ensure_all_finite for scikit-learn API compatibility
  • Enhance complex data structure merging in distributed environments (support OrderedDict and nested dicts)
  • Fix remote code trust issues in HuggingFace dataset loading

Related issue number

Fixes #xxxx

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass

1. Replace force_all_finite with ensure_all_finite to fix FutureWarning
   - Update validation.py, _data.py and test_validation.py
   - This change aligns with sklearn's future deprecation plan

2. Fix DataConversionWarning in LabelEncoder
   - Add ravel() preprocessing before column_or_1d
   - This ensures input data is in the correct 1D format

3. Fix ValueError in lightgbm classifier test
   - Add rechunk after boolean filtering
   - This ensures chunk sizes match the actual data size

Fixes the following issues:
- FutureWarning about force_all_finite deprecation
- DataConversionWarning about column vector input
- ValueError about mismatched chunk sizes
This fixes the F821 undefined name 'force_all_finite' lint error in validation.py.
The parameter was already renamed to ensure_all_finite in other places,
but was missed in the check_X_y function.
This fixes the black formatting issue in the CI.
- Remove strict requirement for identical keys in dictionaries
- Add support for nested dictionaries and mixed value types
- Handle list and non-list values intelligently
- Fix XGBoost training history merging issue
Due to recent security updates in the HuggingFace datasets library,
explicitly set trust_remote_code=True when loading datasets with custom code.
This change ensures tests continue to work with newer versions of the datasets library
while maintaining security best practices.
@XprobeBot XprobeBot added the bug Something isn't working label Feb 9, 2025
Copy link

codecov bot commented Feb 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.06%. Comparing base (d0ef304) to head (f3aa57c).
Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (82.06%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #846      +/-   ##
==========================================
+ Coverage   82.04%   82.06%   +0.02%     
==========================================
  Files        1071     1071              
  Lines       80155    80162       +7     
  Branches    12203    12206       +3     
==========================================
+ Hits        65760    65783      +23     
- Misses      12834    12837       +3     
+ Partials     1561     1542      -19     
Flag Coverage Δ
unittests 81.96% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@luweizheng luweizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luweizheng luweizheng merged commit f5cb36f into xorbitsai:main Feb 9, 2025
21 of 23 checks passed
@hainaweiben hainaweiben deleted the fix-xgboost-lightgbm branch February 9, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants