Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

Remove msgcat dependency #76

Open
mondeja opened this issue Sep 15, 2020 · 8 comments
Open

Remove msgcat dependency #76

mondeja opened this issue Sep 15, 2020 · 8 comments
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@mondeja
Copy link
Contributor

mondeja commented Sep 15, 2020

Polib library is clossplatform and has a stable API. Can be used to create the differences between pofiles without the need of install gettext.

@Seluj78
Copy link
Collaborator

Seluj78 commented Sep 15, 2020

Hi!

I use the polib library in https://github.com/seluj78/potodo because it is appropriate there and allows to read efficiently the pofiles.

For powrap, @JulienPalard uses the msgcat command to fix the wrapping automatically. polib isn't able to do that.

See https://github.com/JulienPalard/powrap/blob/master/powrap/powrap.py#L36-L45

@Seluj78 Seluj78 added invalid This doesn't seem right wontfix This will not be worked on labels Sep 15, 2020
@mondeja
Copy link
Contributor Author

mondeja commented Sep 15, 2020

Check this:

Code

import polib

po = polib.pofile("tests/bad/glossary.po")
print(po.__unicode__())

Input

Extracted from tests/bad/glossary.po:

# Copyright (C) 2001-2018, Python Software Foundation
# For licence information, see README file.
#
msgid ""
msgstr ""
"Project-Id-Version: Python 3\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2019-10-09 17:54+0200\n"
"PO-Revision-Date: 2019-12-10 09:26+0100\n"
"Last-Translator: Grenoya <[email protected]>\n"
"Language-Team: FRENCH <[email protected]>\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.2.1\n"

#: ../Doc/glossary.rst:5
msgid "Glossary"
msgstr "Glossaire"

#: ../Doc/glossary.rst:10
msgid "``>>>``"
msgstr "``>>>``"

#: ../Doc/glossary.rst:12
msgid ""
"The default Python prompt of the interactive shell.  Often seen for code "
"examples which can be executed interactively in the interpreter."
msgstr "L'invite de commande utilisée par défaut dans l'interpréteur interactif. On la voit souvent dans des exemples de code qui peuvent être exécutés interactivement dans l'interpréteur."

Output

# Copyright (C) 2001-2018, Python Software Foundation
# For licence information, see README file.
#
msgid ""
msgstr ""
"Project-Id-Version: Python 3\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2019-10-09 17:54+0200\n"
"PO-Revision-Date: 2019-12-10 09:26+0100\n"
"Last-Translator: Grenoya <[email protected]>\n"
"Language-Team: FRENCH <[email protected]>\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.2.1\n"

#: ../Doc/glossary.rst:5
msgid "Glossary"
msgstr "Glossaire"

#: ../Doc/glossary.rst:10
msgid "``>>>``"
msgstr "``>>>``"

#: ../Doc/glossary.rst:12
msgid ""
"The default Python prompt of the interactive shell.  Often seen for code "
"examples which can be executed interactively in the interpreter."
msgstr ""
"L'invite de commande utilisée par défaut dans l'interpréteur interactif. On "
"la voit souvent dans des exemples de code qui peuvent être exécutés "
"interactivement dans l'interpréteur."

In fact, polib is able to specify a custom wrapping width with the parameter wrapwidth. So I do not agree with marking this issue as invalid. 🤔

@Seluj78 Seluj78 added help wanted Extra attention is needed question Further information is requested and removed invalid This doesn't seem right wontfix This will not be worked on labels Sep 15, 2020
@Seluj78
Copy link
Collaborator

Seluj78 commented Sep 15, 2020

Hmm my bad, I did see the wrapwidth parametter in the polib doc but didn't bother to check it.

One argument tho is that you are using the __unicode__ protected method which returns the unicode representation of the string.

But it does seem that you can use unicode to save the file. https://github.com/izimobil/polib/blob/master/polib.py#L424

I'll leave it up to @JulienPalard 😀

@mondeja
Copy link
Contributor Author

mondeja commented Sep 15, 2020

Yes, __unicode__ is the method used to construct the file in save methods of polib. I will send a pull request in a while.

@JulienPalard
Copy link
Collaborator

JulienPalard commented Oct 7, 2020

Tried to remember what I did in https://github.com/JulienPalard/powrap/tree/from_msgcat:

The idea is to use the same implementation as poedit / msgcat (they use the same code IIRC, which explains why they indent identically). I was able to write a little main around their function, here https://github.com/JulienPalard/powrap/blob/from_msgcat/wrap.c, and it goes like this:

$ cc wrap.c -lunistring -o wrap
$ ./wrap msgstr "Pouette Pouette pouette tagada tagada t'soin t'soin le C c'est bien, le C c'est bien, tagada tagada t'soin t'soin"
msgstr ""
"Pouette Pouette pouette tagada tagada t'soin t'soin le C c'est bien, le C "
"c'est bien, tagada tagada t'soin t'soin"

So it indents properly. From here it could be wrapped as a Python module quite easily:

static PyObject *
py_wrap(PyObject *self, PyObject *args)
{
    const char *prefix;
    const char *str;

    if (!PyArg_ParseTuple(args, "ss", &prefix, &str))
        return NULL;
    return PyUnicode_FromString(wrap(prefix, str, NULL));
}

static PyMethodDef POWrapMethods[] = {
    {"wrap",  py_wrap, METH_VARARGS, "Wraps a string."},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef powrapmodule = {
    PyModuleDef_HEAD_INIT,
    "wrap",
    NULL,
    -1,
    POWrapMethods
};

PyMODINIT_FUNC
PyInit_wrap(void)
{
    PyObject *m;

    m = PyModule_Create(&powrapmodule);
    if (m == NULL)
        return NULL;

    return m;
}

But It still depends on libunistring, and I'm not sure it's the easiest path... and I didn't found much time to work on it... (there's a lot behind libunistring, looks like it implements https://www.unicode.org/reports/tr14/tr14-45.html#Algorithm wich would probably be nice to have in Python to start with).

@mondeja mondeja changed the title Use polib instead of gettext msgcat Remove msgcat dependency Oct 16, 2020
@dancergraham
Copy link

Uniseg says that it implements the UAX 29 algorithm in pure python https://pypi.org/project/uniseg/ - could that help ?

@JulienPalard
Copy link
Collaborator

Maybe, yes. If someone want to look at it, be welcome, I'll not have the time any time soon.

@jeanas
Copy link

jeanas commented Feb 19, 2023

I'm going to submit a PR for this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants