Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging language_data with langcodes #14

Open
georgkrause opened this issue Sep 6, 2024 · 2 comments
Open

Merging language_data with langcodes #14

georgkrause opened this issue Sep 6, 2024 · 2 comments

Comments

@georgkrause
Copy link
Owner

I am starting the discussion here in order to allow people to contribute to the discussion. Since I entered the project while it was already existing for a long time, I am not aware of any original reasoning why language_data is distributed as its own package. I am wondering if that's actually useful and I am therefore considering to merge the two projects into one and the two packages into one, too.

From my perspective the two projects are pretty interconnected. You cannot run the tests in language_data without having langcodes installed and vice versa. Also, you'd have to run both test suites anyways in order to achieve a high coverage, or to phrase it differently: It might be the case that changes in language_data break test in langcodes, but we can't really tell because the changes in language_data don't trigger another test run in langcodes. I am aware this is a solvable problem, however it would had a huge bunch of complexity.

For the users I am not sure if there is much value in using only one the packages. I never used one of them isolated from the other, so here I'd be interested in input. Are you using only one of the packages? Whats your use case? Would it hurt to just install one langcodes package which includes the data? Let me know!

I wont rush this change, because its a big one. I will let some months pass to allow everyone to leave their comment.

@matthewdeanmartin
Copy link

2 ways to find out who uses your library, other than waiting a few years for your users to organically visit this post:

Search github for requirements.txt or pyproject.toml or Pipfile for the library name e.g.

https://github.com/search?q=path%3Arequirements.txt%20language-data&type=code

Libraries IO - looks like they are only aware of 1 published user (and that might be a mistake)
https://libraries.io/pypi/language-data
https://libraries.io/pypi/langcodes

Anyhow, changing namespaces is a breaking change, if you merge the libraries and change the namespace it won't break anything but no one gets any benefit until they start referencing the new names. I'd chalk this up to a regrettable day 1 mistake and not change it.

If you merge them anyhow, I'd suggest checking how they behave when you install a merged library with the stand alone, i.e. will the one overwrite the folder of the other in the venv folder.

@georgkrause
Copy link
Owner Author

Search github for requirements.txt or pyproject.toml or Pipfile for the library name e.g.

Thank you for the idea! But even your linked search shows >250 projects, I don't think I have the capacity to reach out to all of them :)

Anyhow, changing namespaces is a breaking change

I wasn't clear enough here, I don't want to break the namespace but integrate language_data into langcodes.

If you merge them anyhow, I'd suggest checking how they behave when you install a merged library with the stand alone, i.e. will the one overwrite the folder of the other in the venv folder.

If I do this, some proper testing of the installations will be on my todo list, thank you for the hint. Thats a good think to watch out for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants