Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuring the Word Lists #134

Merged

Conversation

abuango
Copy link
Contributor

@abuango abuango commented Apr 19, 2022

This PR resolves #131 & #128.

The current wordlist pages contain all words listed on a single page for each tier, which is difficult to scale. In the last Language workstream meeting, there were suggestions on how it will be more efficient to have individual files per word and a listing of the words per tier.

This PR solves the following:

  • Provides a template for adding new words
  • Restructure the file listing into wordlist > tier > word files (HTML & JSON], where the wordlist and tier folders contain index files for listing the words in each tier
  • Generate both HTML and JSON files for each word and the tiers. The JSON files can be consumed as an API endpoint.

Reference:

@abuango abuango requested a review from a team as a code owner April 19, 2022 16:49
@abuango abuango self-assigned this Apr 19, 2022
@abuango abuango marked this pull request as draft April 19, 2022 16:50
@markcmiller86
Copy link
Contributor

@abuango Great Work 💪🏻

This looks like a wonderful start 🎉

I went looking for a preview URL and found only this one which either I don't understand or is not fully working. Do you have a full preview URL anywhere handy?

@abuango
Copy link
Contributor Author

abuango commented Apr 19, 2022

@abuango Great Work 💪🏻

This looks like a wonderful start 🎉

I went looking for a preview URL and found only this one which either I don't understand or is not fully working. Do you have a full preview URL anywhere handy?

Oh, I didn't know it generated a preview. It's not added to the navigation yet, I tried to figure out the structure first before polishing it but here is the preview link to the skeletal test word page and here is what the JSON file will look like.

The difference between both is the file extension:

wordlist/tier-1/abort/index.json

/wordlist/tier-1/abort/index.html

Hugo will generate both files from the markdown file of the word. I will clean it up tomorrow so we can have a preview.

@markcmiller86
Copy link
Contributor

@abuango thanks for the links to the word page and json file.

Can you confirm...do authors of new word recommendations continue to compose their recommendations in markdown (maybe as a separate file here in the proper wordlist/tier-X directory and then that information gets used to generate the json file (for API endpoints) and word-list file? If so, that sounds cool.

I might recommend that the json file contain a sparing amount of information such as just...

  • the word (or phrase) to be replaced
    • Do we need to think a bit more about how to best store this information in json context to facilitate downstream matching and replacement automation? Some issues I can think of are case sensitivity, punctuation (if the phrase has any), singular/plural, derived words (e.g. segregate vs. segregation)
  • its tier (1, 2 or 3)
  • the INI recommended replacements (perhaps as an ordered array)
  • URL to the INI recommendation page where the word (or phrase) is reviewed.

@abuango
Copy link
Contributor Author

abuango commented Apr 19, 2022

Can you confirm...do authors of new word recommendations continue to compose their recommendations in markdown (maybe as a separate file here in the proper wordlist/tier-X directory and then that information gets used to generate the json file (for API endpoints) and word-list file? If so, that sounds cool.

Yes, new word recommendations are created using markdown in the respective tier folder.

And thank you for the recommended fields for JSON, it will be cleaner that way.

@markcmiller86
Copy link
Contributor

It looks like quite by coinicidence another developer, @jamesgeddes, has proposed an example of how the json file should be structured in this issue which is on the INI org repo.

@jamesgeddes
Copy link

jamesgeddes commented Apr 21, 2022

Thanks @markcmiller86 !

As I have suggested in inclusivenaming/org#108 I would suggest that the main version of the INI suggested language list should live in its own repo. This then allows any client, including the INI website, to use it as a single source of truth.

Muddling it in with the INI website could make things unclear.

@jamesgeddes
Copy link

jamesgeddes commented Apr 21, 2022

Additionally, I would suggest that the process(es) for adding new terms should be kept separate to the wordlist itself, so it would be best practice to separate these two features into two PRs.

Another benefit of having it in its own repo is that updates can be done both via PR and via a GUI.

@jamesgeddes
Copy link

Regarding the efficiency of "individual files per word", I would suggest that simply allowing clients to download one file and for over it is probably a more simple solution than building out an INI API, which would require

  • many calls per client
  • additional hosting costs for the INI

@markcmiller86
Copy link
Contributor

@jamesgeddes in comments 1 and 2 above you have proposed ideas for logistical questions that are far beyond my bailiwick. I can be sure to bring these questions to other's attention though.

On the specific issue of having the json file in its own repo...I see your point about it maybe being hard to find here. But, I think that is perhaps fixable using other approaches. That said, our intention is for it to become an auto-generated work-product and the true source of INI language recommendations remains the hosted web pages which include the author-crafted (and researched) word/phrase recommendations.

@jamesgeddes
Copy link

@markcmiller86 The true source must be the one that everyone reads from, which would be the JSON main. The INI website would be a client of it, compiling the list based on the JSON. Separately, it would also have the ability to also write to it. This would still allow the website to generate and update the JSON without humans needing to manually write JSON.

From a human perspective, it makes zero difference, its purely a technicality.

@markcmiller86
Copy link
Contributor

Regarding the efficiency of "individual files per word", I would suggest that simply allowing clients to download one file and for over it is probably a more simple solution than building out an INI API, which would require

Sorry...perhaps I mis-wrote. What I think we mean is that the machine readable version of published and released INI recommendations will take the form of the json file...which will be released on a still TBD periodic basis. Perhaps for each release, we host that file somewhere other than the website repo. Downstream tools just take up that file.

@jamesgeddes
Copy link

Downstream tools just take up that file

I think we might be circling around agreement here 😂

@jamesgeddes
Copy link

Here is a very rough sketch of my suggestion.

ini-json

Using this method, the INI would still be able to ensure that the SSoT is updated via GUI, but it would get the added benefits of

  • version control
  • separation
  • clarity
  • openness

@abuango
Copy link
Contributor Author

abuango commented Apr 21, 2022

I understand the approach @jamesgeddes is proposing and I agree with separating the wordlist from the website, it makes contributing to it easier and can form a single source of truth that the website also consumes from. The proposed wordlist repo can be owned by the Language workstream and used mainly for the wordlist, separating it from the website repo. Like @markcmiller86 mentioned, this will be shared with the rest of the folks in the workstream in the next meeting.

My only concern about maintaining a single file is the scalability as the wordlist grows, yes it makes it a lot easier for the clients but maintaining a single file that has the potential of growing into thousands of lines in the nearest future is something that needs to be carefully looked into. I will suggest at the very least organising them into separate files for each tier if individual word files is a major deal-breaker. cc: @quaid

@abuango
Copy link
Contributor Author

abuango commented Apr 21, 2022

@jamesgeddes The next Language Workstream meeting is April 26, 2022 at 11:30am Pacific time. You are welcome to join the call when we discuss this.

@jamesgeddes
Copy link

jamesgeddes commented Apr 21, 2022

@abuango For me, the deciding factor is how often we would be updating the list. If a new list version is likely to occur every day, then I wholeheartedly agree that separate files makes sense. If it is every month, then new versions can be compiled into a staging branch before they make it into the main branch. Glad we agree about separating it into its own repo 🙂

I'll be at the meeting next week, thanks for the invite!

@abuango abuango marked this pull request as ready for review April 26, 2022 18:30
@abuango abuango marked this pull request as draft April 26, 2022 18:30
@abuango
Copy link
Contributor Author

abuango commented Apr 26, 2022

Today's meeting didn't hold so, I could not share the PoC of the WordLists page. A preview is available and currently contains some real data mixed with test data, so we can see how it works. A page has also been created for the Word List term template. The key concept behind this template is the use of frontmatter entirely without content, Hugo generates the HTML and JSON file for each term using the data supplied in the frontmatter.

@quaid
Copy link
Contributor

quaid commented May 24, 2022

Hey @abuango we've had some missed meetings and lost track of closing this discussion. It does seem this is doing what we discussed in that previous meeting, thank you so much. I'll look through and see if I have any questions, and if we need a meeting to decide or can do it async.

@quaid quaid self-assigned this May 24, 2022
@abuango abuango changed the title [WIP] Restructuring the Word Lists Restructuring the Word Lists Jun 7, 2022
@abuango
Copy link
Contributor Author

abuango commented Jun 7, 2022

Feedback from Workstream call:

  • Add more details to the Template page, so contributors can be well educated on how to new content
  • Single JSON file for all terms
  • Add "Replacement terms" to JSON file
  • Move No-Change Tier down on the overview page and remove 0
  • Move "All terms" section on the overview page to the right and make it more visible.

@abuango
Copy link
Contributor Author

abuango commented Jun 10, 2022

@LarryKunz @markcmiller86 I have implemented the feedback from our last meeting, kindly review.

@abuango abuango marked this pull request as ready for review July 19, 2022 20:26
@taylorwaggoner taylorwaggoner merged commit c4e45f1 into inclusivenaming:main Jul 19, 2022
@abuango abuango deleted the abubakar-wordlist-structure branch July 19, 2022 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need a template for individual term files
5 participants