Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix coclustering and isprint instabilities #124

Merged
merged 4 commits into from
Jan 17, 2024

Conversation

marcboulle
Copy link
Collaborator

@marcboulle marcboulle commented Jan 12, 2024

Quatre commits, avec corrections principales:

  • probleme d'instabilite du coclustering
  • mise en place d'une version portable de isprint

@marcboulle marcboulle added the Type/Enhancement New feature or request label Jan 12, 2024
@marcboulle marcboulle linked an issue Jan 12, 2024 that may be closed by this pull request
@marcboulle marcboulle force-pushed the 122-fix-datagrid-instability branch 8 times, most recently from 6029f4f to 90925b8 Compare January 16, 2024 13:41
@marcboulle marcboulle changed the base branch from dev to release-10-2-0 January 17, 2024 09:02
@marcboulle marcboulle force-pushed the 122-fix-datagrid-instability branch from f99bf16 to 0cc0940 Compare January 17, 2024 09:03
Choix en dur du repertoire de lancement pour lancer un session de debug depuis Visual C++ 2022
- impact dans MODL.cpp et MODL_Coclustering.cpp
Implementation windows
- nettoyage uniquement si, pointeur hFinf non NUL
- correction a reporter en V11
Portability.h
    - p_isprint: mise en placxe d'une implementation portable
    - impacts sur tous les isprint existant:
      - KWCLex.lex
      - KWDatabaseFormatDetector::ComputeSeparatorPriority
      - KWDREncrypt::InitWorkingArrays
      - KWTest

Test sur la stabilisation effectives des resultats sur Windows, Linux, Mac
- TestKhiops/Rules/EncryptRules
- TestKhiops/Bugs/DicoSpecialChars

Correction a reporter en V11
@marcboulle marcboulle force-pushed the 122-fix-datagrid-instability branch from 0cc0940 to dfd3c81 Compare January 17, 2024 10:30
@marcboulle marcboulle changed the title WIP step 1 Fix portability instabilities Jan 17, 2024
@marcboulle marcboulle changed the title Fix portability instabilities Fix coclustering and isprint instabilities Jan 17, 2024
Copy link
Member

@folmos-at-orange folmos-at-orange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small stuff.

@marcboulle marcboulle force-pushed the 122-fix-datagrid-instability branch from dfd3c81 to 48c251d Compare January 17, 2024 14:02
Correction dans les algorithmes d'optimisation:
- partout, tester l'amelioration du cout en remplacant ~if (dCost < dBestCost)~ par ~if (dCost < dBestCost - dEpsilon)~
- correction principale dans KWDataGridOptimizer::OptimizeDataGrid
- impacts pour propager la correction partout
  - DTDiscretizerMODL::DiscretizeNEW
  - DTDiscretizerMODL::DiscretizeOLD
  - DTDiscretizerMODL::DiscretizeGranularizedFrequencyTableNEW (pour le nul cost: < dBestCost + dEpsilon)
  - DTGrouperMODL::GroupPreprocessedTable
  - DTGrouperMODL::SmallSourceNumberGroup
  - KWDiscretizerMODL::Discretize
  - KWGrouperMODL::GroupPreprocessedTable
  - KWGrouperMODL::SmallSourceNumberGroup
  - KWDensityEstimationTest::SearchBestInstanceGridSize
  - MHDiscretizerHistogramMODL::GranularizedDiscretizeValues
  - MHDiscretizerHistogramMODL_fp::OptimizeGranularity
  - MHFloatingPointFrequencyTableBuilder::InitializeDomainBounds

Stabilisation lie au probleme de choix d'une partition aleatoire, base sur un Shuffle, puis un Sort
- le sort est instable entre Windows et Linux en cas d'egalite du critere
- correction en memoirsant un index aleatoire suite au Shuffle, puis en utilisant cet index en critere de tri secondaire
  - KWDataGridManager::SortAttributeParts: reimplementation du tri avec random index en cas d'egalite
  - KWSortableSymbolCompareValue: utilisation de l'index comme critere de tri secondaire

Teste sur les 12 jeux donnees instables entre Windows, Linux, Mac
- quelques jeux de tests de reference ont change
- les resultats sont maintenant identique sur les trois OS

Teste sur LearningTest entier

Memorisation des jeux de test pour les test de non regression sur git
- test\LearningTest\TestKhiops\Standard\IrisU
  - deplace depuis Standard-unstable, desormais supprime
- test\LearningTest\TestCoclustering\Standard\Adult2varsTiny
  - deplace depuis Standard-unstable, desormais supprime

Correction a reporter en V11
@marcboulle marcboulle force-pushed the 122-fix-datagrid-instability branch from 48c251d to 133ec96 Compare January 17, 2024 14:35
@marcboulle marcboulle merged commit 3bc17ff into release-10-2-0 Jan 17, 2024
26 checks passed
@marcboulle marcboulle deleted the 122-fix-datagrid-instability branch January 17, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type/Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix datagrid instability
2 participants