Restructuring the Word Lists #134

abuango · 2022-04-19T16:49:47Z

This PR resolves #131 & #128.

The current wordlist pages contain all words listed on a single page for each tier, which is difficult to scale. In the last Language workstream meeting, there were suggestions on how it will be more efficient to have individual files per word and a listing of the words per tier.

This PR solves the following:

Provides a template for adding new words
Restructure the file listing into wordlist > tier > word files (HTML & JSON], where the wordlist and tier folders contain index files for listing the words in each tier
Generate both HTML and JSON files for each word and the tiers. The JSON files can be consumed as an API endpoint.

Reference:

Build a JSON API With Hugo's Custom Output Formats

markcmiller86 · 2022-04-19T16:58:58Z

@abuango Great Work 💪🏻

This looks like a wonderful start 🎉

I went looking for a preview URL and found only this one which either I don't understand or is not fully working. Do you have a full preview URL anywhere handy?

abuango · 2022-04-19T17:17:40Z

@abuango Great Work 💪🏻

This looks like a wonderful start 🎉

I went looking for a preview URL and found only this one which either I don't understand or is not fully working. Do you have a full preview URL anywhere handy?

Oh, I didn't know it generated a preview. It's not added to the navigation yet, I tried to figure out the structure first before polishing it but here is the preview link to the skeletal test word page and here is what the JSON file will look like.

The difference between both is the file extension:

wordlist/tier-1/abort/index.json

/wordlist/tier-1/abort/index.html

Hugo will generate both files from the markdown file of the word. I will clean it up tomorrow so we can have a preview.

markcmiller86 · 2022-04-19T17:39:27Z

@abuango thanks for the links to the word page and json file.

Can you confirm...do authors of new word recommendations continue to compose their recommendations in markdown (maybe as a separate file here in the proper wordlist/tier-X directory and then that information gets used to generate the json file (for API endpoints) and word-list file? If so, that sounds cool.

I might recommend that the json file contain a sparing amount of information such as just...

the word (or phrase) to be replaced
- Do we need to think a bit more about how to best store this information in json context to facilitate downstream matching and replacement automation? Some issues I can think of are case sensitivity, punctuation (if the phrase has any), singular/plural, derived words (e.g. segregate vs. segregation)
its tier (1, 2 or 3)
the INI recommended replacements (perhaps as an ordered array)
URL to the INI recommendation page where the word (or phrase) is reviewed.

abuango · 2022-04-19T18:10:56Z

Can you confirm...do authors of new word recommendations continue to compose their recommendations in markdown (maybe as a separate file here in the proper wordlist/tier-X directory and then that information gets used to generate the json file (for API endpoints) and word-list file? If so, that sounds cool.

Yes, new word recommendations are created using markdown in the respective tier folder.

And thank you for the recommended fields for JSON, it will be cleaner that way.

markcmiller86 · 2022-04-21T13:50:56Z

It looks like quite by coinicidence another developer, @jamesgeddes, has proposed an example of how the json file should be structured in this issue which is on the INI org repo.

jamesgeddes · 2022-04-21T13:55:36Z

Thanks @markcmiller86 !

As I have suggested in inclusivenaming/org#108 I would suggest that the main version of the INI suggested language list should live in its own repo. This then allows any client, including the INI website, to use it as a single source of truth.

Muddling it in with the INI website could make things unclear.

jamesgeddes · 2022-04-21T14:04:54Z

Additionally, I would suggest that the process(es) for adding new terms should be kept separate to the wordlist itself, so it would be best practice to separate these two features into two PRs.

Another benefit of having it in its own repo is that updates can be done both via PR and via a GUI.

jamesgeddes · 2022-04-21T14:14:43Z

Regarding the efficiency of "individual files per word", I would suggest that simply allowing clients to download one file and for over it is probably a more simple solution than building out an INI API, which would require

many calls per client
additional hosting costs for the INI

markcmiller86 · 2022-04-21T14:15:03Z

@jamesgeddes in comments 1 and 2 above you have proposed ideas for logistical questions that are far beyond my bailiwick. I can be sure to bring these questions to other's attention though.

On the specific issue of having the json file in its own repo...I see your point about it maybe being hard to find here. But, I think that is perhaps fixable using other approaches. That said, our intention is for it to become an auto-generated work-product and the true source of INI language recommendations remains the hosted web pages which include the author-crafted (and researched) word/phrase recommendations.

jamesgeddes · 2022-04-21T14:19:31Z

@markcmiller86 The true source must be the one that everyone reads from, which would be the JSON main. The INI website would be a client of it, compiling the list based on the JSON. Separately, it would also have the ability to also write to it. This would still allow the website to generate and update the JSON without humans needing to manually write JSON.

From a human perspective, it makes zero difference, its purely a technicality.

markcmiller86 · 2022-04-21T14:19:58Z

Regarding the efficiency of "individual files per word", I would suggest that simply allowing clients to download one file and for over it is probably a more simple solution than building out an INI API, which would require

Sorry...perhaps I mis-wrote. What I think we mean is that the machine readable version of published and released INI recommendations will take the form of the json file...which will be released on a still TBD periodic basis. Perhaps for each release, we host that file somewhere other than the website repo. Downstream tools just take up that file.

jamesgeddes · 2022-04-21T14:22:18Z

Downstream tools just take up that file

I think we might be circling around agreement here 😂

jamesgeddes · 2022-04-21T14:36:25Z

Here is a very rough sketch of my suggestion.

Using this method, the INI would still be able to ensure that the SSoT is updated via GUI, but it would get the added benefits of

version control
separation
clarity
openness

abuango · 2022-04-21T16:15:14Z

I understand the approach @jamesgeddes is proposing and I agree with separating the wordlist from the website, it makes contributing to it easier and can form a single source of truth that the website also consumes from. The proposed wordlist repo can be owned by the Language workstream and used mainly for the wordlist, separating it from the website repo. Like @markcmiller86 mentioned, this will be shared with the rest of the folks in the workstream in the next meeting.

My only concern about maintaining a single file is the scalability as the wordlist grows, yes it makes it a lot easier for the clients but maintaining a single file that has the potential of growing into thousands of lines in the nearest future is something that needs to be carefully looked into. I will suggest at the very least organising them into separate files for each tier if individual word files is a major deal-breaker. cc: @quaid

abuango · 2022-04-21T16:37:01Z

@jamesgeddes The next Language Workstream meeting is April 26, 2022 at 11:30am Pacific time. You are welcome to join the call when we discuss this.

jamesgeddes · 2022-04-21T17:41:43Z

@abuango For me, the deciding factor is how often we would be updating the list. If a new list version is likely to occur every day, then I wholeheartedly agree that separate files makes sense. If it is every month, then new versions can be compiled into a staging branch before they make it into the main branch. Glad we agree about separating it into its own repo 🙂

I'll be at the meeting next week, thanks for the invite!

abuango · 2022-04-26T18:54:04Z

Today's meeting didn't hold so, I could not share the PoC of the WordLists page. A preview is available and currently contains some real data mixed with test data, so we can see how it works. A page has also been created for the Word List term template. The key concept behind this template is the use of frontmatter entirely without content, Hugo generates the HTML and JSON file for each term using the data supplied in the frontmatter.

quaid · 2022-05-24T19:08:05Z

Hey @abuango we've had some missed meetings and lost track of closing this discussion. It does seem this is doing what we discussed in that previous meeting, thank you so much. I'll look through and see if I have any questions, and if we need a meeting to decide or can do it async.

abuango · 2022-06-07T18:59:47Z

Feedback from Workstream call:

Add more details to the Template page, so contributors can be well educated on how to new content
Single JSON file for all terms
Add "Replacement terms" to JSON file
Move No-Change Tier down on the overview page and remove 0
Move "All terms" section on the overview page to the right and make it more visible.

abuango · 2022-06-10T07:12:48Z

@LarryKunz @markcmiller86 I have implemented the feedback from our last meeting, kindly review.

abuango added 2 commits April 16, 2022 05:57

Reorganize Wordlists

8ffdea5

Created wordlist layout

6f01937

abuango requested a review from a team as a code owner April 19, 2022 16:49

abuango self-assigned this Apr 19, 2022

abuango marked this pull request as draft April 19, 2022 16:50

markcmiller86 mentioned this pull request Apr 21, 2022

Adopt https://github.com/GeekZoneHQ/recommended-language inclusivenaming/org#108

Closed

Updated Wordlist configuration

ce8664e

abuango marked this pull request as ready for review April 26, 2022 18:30

abuango marked this pull request as draft April 26, 2022 18:30

quaid self-assigned this May 24, 2022

abuango added 2 commits June 7, 2022 19:19

Added existing terms

e18b8c8

Fix bug listing terms on wrong page

88dec92

Add All terms section to wordlist page

972f321

abuango changed the title ~~[WIP] Restructuring the Word Lists~~ Restructuring the Word Lists Jun 7, 2022

abuango added 2 commits June 10, 2022 05:00

Formatted pages & Created single JSON file

d0c568a

Remove link to inidividual term JSON file

7dfa5ad

Added link to Template from Wordlist overview page

73a18f0

abuango marked this pull request as ready for review July 19, 2022 20:26

taylorwaggoner approved these changes Jul 19, 2022

View reviewed changes

taylorwaggoner merged commit c4e45f1 into inclusivenaming:main Jul 19, 2022

abuango deleted the abubakar-wordlist-structure branch July 19, 2022 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructuring the Word Lists #134

Restructuring the Word Lists #134

abuango commented Apr 19, 2022

markcmiller86 commented Apr 19, 2022

abuango commented Apr 19, 2022

markcmiller86 commented Apr 19, 2022

abuango commented Apr 19, 2022

markcmiller86 commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022 •

edited

Loading

jamesgeddes commented Apr 21, 2022 •

edited

Loading

jamesgeddes commented Apr 21, 2022

markcmiller86 commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022

markcmiller86 commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022

abuango commented Apr 21, 2022

abuango commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022 •

edited

Loading

abuango commented Apr 26, 2022

quaid commented May 24, 2022

abuango commented Jun 7, 2022 •

edited

Loading

abuango commented Jun 10, 2022

Restructuring the Word Lists #134

Restructuring the Word Lists #134

Conversation

abuango commented Apr 19, 2022

markcmiller86 commented Apr 19, 2022

abuango commented Apr 19, 2022

markcmiller86 commented Apr 19, 2022

abuango commented Apr 19, 2022

markcmiller86 commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022 • edited Loading

jamesgeddes commented Apr 21, 2022 • edited Loading

jamesgeddes commented Apr 21, 2022

markcmiller86 commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022

markcmiller86 commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022

abuango commented Apr 21, 2022

abuango commented Apr 21, 2022

jamesgeddes commented Apr 21, 2022 • edited Loading

abuango commented Apr 26, 2022

quaid commented May 24, 2022

abuango commented Jun 7, 2022 • edited Loading

abuango commented Jun 10, 2022

jamesgeddes commented Apr 21, 2022 •

edited

Loading

jamesgeddes commented Apr 21, 2022 •

edited

Loading

jamesgeddes commented Apr 21, 2022 •

edited

Loading

abuango commented Jun 7, 2022 •

edited

Loading