Skip to content

Commit

Permalink
Improve Pagefind's handling of special word characters
Browse files Browse the repository at this point in the history
  • Loading branch information
bglw committed Jul 29, 2022
1 parent ebdf4b1 commit 2e774c7
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 2 deletions.
32 changes: 32 additions & 0 deletions pagefind/features/characters.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Feature: Character Tests
Background:
Given I have the environment variables:
| PAGEFIND_SOURCE | public |
Given I have a "public/index.html" file with the body:
"""
<p data-url>Nothing</p>
"""

Scenario: Pagefind matches special characters
Given I have a "public/apiary/index.html" file with the body:
"""
<h1>Béës</h1>
"""
When I run my program
Then I should see "Running Pagefind" in stdout
Then I should see the file "public/_pagefind/pagefind.js"
When I serve the "public" directory
When I load "/"
When I evaluate:
"""
async function() {
let pagefind = await import("/_pagefind/pagefind.js");
let search = await pagefind.search("Béës");
let data = await search.results[0].data();
document.querySelector('[data-url]').innerText = data.url;
}
"""
Then There should be no logs
Then The selector "[data-url]" should contain "/apiary/"
5 changes: 3 additions & 2 deletions pagefind/src/output/stubs/search.js
Original file line number Diff line number Diff line change
Expand Up @@ -190,9 +190,10 @@ class Pagefind {
const log = str => { if (options.verbose) console.log(str) };
let start = Date.now();
let ptr = await this.getPtr();
// Strip special characters to match the indexing operation
let exact_search = /^\s*".+"\s*$/.test(term);
term = term.toLowerCase().trim().replace(/[^\w\s]/g, "").replace(/\s{2,}/g, " ").trim();
// Strip special characters to match the indexing operation
// TODO: Maybe move regex over the wasm boundary, or otherwise work to match the Rust regex engine
term = term.toLowerCase().trim().replace(/[\.`~!@#\$%\^&\*\(\)\{\}\[\]\\\|:;'",<>\/\?]/g, "").replace(/\s{2,}/g, " ").trim();

let filter_list = [];
for (let [filter, values] of Object.entries(options.filters)) {
Expand Down

0 comments on commit 2e774c7

Please sign in to comment.