Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nvtext::generate_character_ngrams performance regression for longer strings #13874

Merged

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Aug 14, 2023

Description

Fixes performance regression when generating character ngrams. The regression was introduced as part of refactoring common code when adding the nvtext::hash_character_ngrams function (Reference #13654). Defactoring the code fixed the regression. Overall, these functions only share about 6 lines of code in common so the defactoring is expected to require minimal maintenance.
The defactoring involves re-instating the original kernel code logic for nvtext::generate_character_ngrams.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added bug Something isn't working 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) non-breaking Non-breaking change labels Aug 14, 2023
@davidwendt davidwendt self-assigned this Aug 14, 2023
@davidwendt
Copy link
Contributor Author

Benchmark results

Baseline (before refactor)
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
TextNGrams/characters/4096/5/manual_time           0.083 ms        0.102 ms         7867 bytes_per_second=93.6258M/s
TextNGrams/characters/4096/10/manual_time          0.090 ms        0.108 ms         7785 bytes_per_second=215.997M/s
TextNGrams/characters/4096/20/manual_time          0.112 ms        0.130 ms         6190 bytes_per_second=345.847M/s
TextNGrams/characters/4096/40/manual_time          0.148 ms        0.165 ms         4738 bytes_per_second=526.029M/s
TextNGrams/characters/32768/5/manual_time          0.081 ms        0.098 ms         8688 bytes_per_second=768.627M/s
TextNGrams/characters/32768/10/manual_time         0.100 ms        0.116 ms         7041 bytes_per_second=1.51678G/s
TextNGrams/characters/32768/20/manual_time         0.122 ms        0.138 ms         5750 bytes_per_second=2.48205G/s
TextNGrams/characters/32768/40/manual_time         0.176 ms        0.192 ms         3963 bytes_per_second=3.42638G/s
TextNGrams/characters/262144/5/manual_time         0.097 ms        0.113 ms         7243 bytes_per_second=5.00418G/s
TextNGrams/characters/262144/10/manual_time        0.154 ms        0.170 ms         4549 bytes_per_second=7.85904G/s
TextNGrams/characters/262144/20/manual_time        0.296 ms        0.314 ms         2363 bytes_per_second=8.18583G/s
TextNGrams/characters/262144/40/manual_time        0.761 ms        0.779 ms          913 bytes_per_second=6.37766G/s
TextNGrams/characters/2097152/5/manual_time        0.303 ms        0.319 ms         2311 bytes_per_second=12.7862G/s
TextNGrams/characters/2097152/10/manual_time       0.574 ms        0.591 ms         1219 bytes_per_second=16.9356G/s
TextNGrams/characters/2097152/20/manual_time        1.53 ms         1.55 ms          456 bytes_per_second=12.6629G/s
TextNGrams/characters/2097152/40/manual_time        4.90 ms         4.92 ms          143 bytes_per_second=7.92802G/s
TextNGrams/characters/16777216/5/manual_time        1.88 ms         1.89 ms          373 bytes_per_second=16.5321G/s
TextNGrams/characters/16777216/10/manual_time       3.95 ms         3.97 ms          177 bytes_per_second=19.6652G/s
TextNGrams/characters/16777216/20/manual_time       11.7 ms         11.7 ms           60 bytes_per_second=13.3073G/s
TextNGrams/characters/16777216/40/manual_time       37.9 ms         37.9 ms           18 bytes_per_second=8.20249G/s

after refactor

--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
TextNGrams/characters/4096/5/manual_time           0.084 ms        0.102 ms         7877 bytes_per_second=92.7183M/s
TextNGrams/characters/4096/10/manual_time          0.090 ms        0.108 ms         7737 bytes_per_second=214.974M/s
TextNGrams/characters/4096/20/manual_time          0.113 ms        0.130 ms         6146 bytes_per_second=344.499M/s
TextNGrams/characters/4096/40/manual_time          0.151 ms        0.168 ms         4631 bytes_per_second=514.544M/s
TextNGrams/characters/32768/5/manual_time          0.081 ms        0.098 ms         8573 bytes_per_second=766.268M/s
TextNGrams/characters/32768/10/manual_time         0.099 ms        0.115 ms         7080 bytes_per_second=1.53308G/s
TextNGrams/characters/32768/20/manual_time         0.122 ms        0.138 ms         5764 bytes_per_second=2.48131G/s
TextNGrams/characters/32768/40/manual_time         0.176 ms        0.192 ms         3969 bytes_per_second=3.42508G/s
TextNGrams/characters/262144/5/manual_time         0.095 ms        0.111 ms         7354 bytes_per_second=5.09831G/s
TextNGrams/characters/262144/10/manual_time        0.151 ms        0.167 ms         4626 bytes_per_second=8.02426G/s
TextNGrams/characters/262144/20/manual_time        0.293 ms        0.311 ms         2384 bytes_per_second=8.26735G/s
TextNGrams/characters/262144/40/manual_time         1.06 ms         1.08 ms          657 bytes_per_second=4.5601G/s
TextNGrams/characters/2097152/5/manual_time        0.294 ms        0.310 ms         2385 bytes_per_second=13.1984G/s
TextNGrams/characters/2097152/10/manual_time       0.559 ms        0.576 ms         1251 bytes_per_second=17.4048G/s
TextNGrams/characters/2097152/20/manual_time        1.62 ms         1.63 ms          433 bytes_per_second=12.0312G/s
TextNGrams/characters/2097152/40/manual_time        8.44 ms         8.46 ms           83 bytes_per_second=4.60654G/s
TextNGrams/characters/16777216/5/manual_time        1.80 ms         1.81 ms          389 bytes_per_second=17.2602G/s
TextNGrams/characters/16777216/10/manual_time       3.82 ms         3.84 ms          183 bytes_per_second=20.3582G/s
TextNGrams/characters/16777216/20/manual_time       12.3 ms         12.4 ms           57 bytes_per_second=12.6016G/s
TextNGrams/characters/16777216/40/manual_time       66.4 ms         66.4 ms           11 bytes_per_second=4.68516G/s

The regression was mostly on longer strings (40) with large rows (2097152, 16777216)

Defactoring the code fixed the regression:

--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
TextNGrams/characters/4096/5/manual_time           0.083 ms        0.101 ms         7922 bytes_per_second=93.8831M/s
TextNGrams/characters/4096/10/manual_time          0.089 ms        0.106 ms         7888 bytes_per_second=219M/s
TextNGrams/characters/4096/20/manual_time          0.110 ms        0.127 ms         6361 bytes_per_second=353.994M/s
TextNGrams/characters/4096/40/manual_time          0.144 ms        0.160 ms         4873 bytes_per_second=541.144M/s
TextNGrams/characters/32768/5/manual_time          0.081 ms        0.099 ms         8584 bytes_per_second=761.802M/s
TextNGrams/characters/32768/10/manual_time         0.098 ms        0.114 ms         7165 bytes_per_second=1.54836G/s
TextNGrams/characters/32768/20/manual_time         0.119 ms        0.135 ms         5876 bytes_per_second=2.53698G/s
TextNGrams/characters/32768/40/manual_time         0.172 ms        0.188 ms         4069 bytes_per_second=3.5083G/s
TextNGrams/characters/262144/5/manual_time         0.095 ms        0.111 ms         7351 bytes_per_second=5.08841G/s
TextNGrams/characters/262144/10/manual_time        0.150 ms        0.165 ms         4671 bytes_per_second=8.10945G/s
TextNGrams/characters/262144/20/manual_time        0.293 ms        0.311 ms         2385 bytes_per_second=8.27545G/s
TextNGrams/characters/262144/40/manual_time        0.738 ms        0.756 ms          949 bytes_per_second=6.57469G/s
TextNGrams/characters/2097152/5/manual_time        0.303 ms        0.318 ms         2313 bytes_per_second=12.8132G/s
TextNGrams/characters/2097152/10/manual_time       0.564 ms        0.581 ms         1238 bytes_per_second=17.24G/s
TextNGrams/characters/2097152/20/manual_time        1.53 ms         1.55 ms          456 bytes_per_second=12.6741G/s
TextNGrams/characters/2097152/40/manual_time        4.75 ms         4.76 ms          147 bytes_per_second=8.1885G/s
TextNGrams/characters/16777216/5/manual_time        1.88 ms         1.89 ms          373 bytes_per_second=16.5346G/s
TextNGrams/characters/16777216/10/manual_time       3.88 ms         3.90 ms          180 bytes_per_second=20.0379G/s
TextNGrams/characters/16777216/20/manual_time       11.7 ms         11.7 ms           60 bytes_per_second=13.3186G/s
TextNGrams/characters/16777216/40/manual_time       36.8 ms         36.8 ms           19 bytes_per_second=8.46076G/s

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 14, 2023
@davidwendt davidwendt marked this pull request as ready for review August 14, 2023 17:37
@davidwendt davidwendt requested a review from a team as a code owner August 14, 2023 17:37
@davidwendt davidwendt changed the title Fix nvtext::generate_character_ngrams performance regression for longer strings Fix nvtext::generate_character_ngrams performance regression for longer strings Aug 14, 2023
@davidwendt davidwendt requested a review from bdice August 15, 2023 14:00
@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 709b15f into rapidsai:branch-23.10 Aug 16, 2023
@davidwendt davidwendt deleted the fix-char-ngram-perf-regression branch August 16, 2023 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants