CoreNLG is an easy to use and productivity oriented Python library for Natural Language Generation.
It aims to provide the essential tools for developers to structure and write NLG projects.
Auto-agreement tools based on extra-resources are not provided in this library.
You can chose a language (French or English) and typography will be automatically handled based on it.
For example:
In French 'Ma liste d'éléments:' becomes "Ma liste d'éléments :".
In English "My list of items :" will become "My list of items:"
A period will always be followed by a capitalized word.
Contractions are automatically handled based on the selected language (French or English).
word_1 = 'le dépassement'
word_2 = 'les hausses'
self.free_text('À cause de', word_1) # "À cause du dépassement"
self.free_text('Ă€ cause de', word_2) # "Ă€ cause des hausses"
The free_text method takes multiple strings or nested list/tuple of strings and return a string where each parameter is separated by a space. It aims to avoid forgetting the spaces between each element of a string when concatenating it.
self.free_text(
"The variation of the",
indicator.label,
"is",
"positive" if indicator.variation > 0 else "negative" if indicator.variation < 0 else "not significant",
"compared to last year."
)
self.free_text(
"We can also use collection of strings as parameter,",
(
"if the next is true",
"this text will be written"
) if test else (
"else, we will",
"have this text"
),
"."
)
The nlg_syn method takes multiples strings as parameters and return a string based on two modes.
def synonym(self, *words, mode="smart")
- "random": one of the strings in parameter will be chosen randomly.
- "smart": the chosen string will be the best as possible considering previously chosen synonyms in order to avoid repetitions.
# Basic use
self.free_text(
'I was',
self.nlg_syn('hungry', 'starving'),
'so I decided to eat',
self.nlg_syn('one apple', 'three apples'),
'.'
)
# Synonyms trees can be made
self.free_text(
'I was',
self.nlg_syn(
'hungry so I decided to eat ' + self.nlg_syn('one apple', 'three apples'),
'starving and I went to the restaurant'
),
'.'
)
As you build complex structure, you will want to know at some point what word will be chosen to be able to match the rest of the sentence with it.
Instead of a string, you can send a tuple as an argument to the nlg_syn method :
self.nlg_syn(
'one',
('three', 'PLURAL')
)
You can now use the post_eval method which is defined as follow :
def post_eval(
key_to_check,
string_to_write_if_active='',
string_to_write_if_inactive='',
deactivate_the_key=False
)
You can now build sentences like that :
self.free_text(
'I decided to eat',
self.nlg_syn(
'one',
('three', 'PLURAL')
),
self.post_eval('PLURAL', 'apples', 'apple', True),
'.'
)
# This will give you either "I decided to eat one apple." or "I decided to eat three apples."
# The 'PLURAL' key is now deactivated so next post_eval method would not find it.
The nlg_enum method takes a list of element and an severatal arguments to create the output string. It returns a string.\
def enum(self, my_list_of_elements,
max_elem=None,
nb_elem_bullet=None,
begin_w=None,
end_w=None,
sep=None,
last_sep=None,
capitalize_bullets=None,
text_if_empty_list=None,
end_of_bullet=None,
end_of_last_bullet=None)
class IterElems:
def __init__(
self,
# maximum number of elements of the list that will be displayed
max_elem=None,
# if the size of the list is superior to this number, it will create a bullet-point list
nb_elem_bullet=None,
# the output string will begin with this string
begin_w="",
# the output string will end with this string
end_w="",
# separator for each element except the last
sep=",",
# separator for the last item
last_sep="and",
# each beginning of bullet-point should be capitalized
capitalize_bullets=True,
# if the list is empty, this string will appear
text_if_empty_list="",
# at the end of each bullet point except the last
end_of_bullet = "",
# at the end of the last bullet-point
end_of_last_bullet = ""
)
my_list = ["six apples", "three bananas", "two peaches"]
self.nlg_enum(my_list)
# "six apples, three bananas and two peaches"
self.nlg_enum(my_list, last_sep="but also")
# "six apples, three bananas but also two peaches"
my_list = ['apples', 'bananas', 'peaches']
self.nlg_enum(
my_list,
max_elem=2, nb_elem_bullet=2, begin_w='Fruits I like :', end_w='Delicious, right ?', end_of_bullet=',', end_of_last_bullet='.'
)
"""
Fruits I like :
- Apples,
- Bananas.
Delicious, right ?
"""
my_list = ['apples', 'bananas']
self.nlg_enum([self.free_text(
fruit,
self.nlg_syn('so', '') + ' ' + self.nlg_syn('succulent', 'tasty')
) for fruit in my_list],
begin_w='I find', end_w='.'
)
"""
One of the following:
I find apples so tasty and bananas succulent.
I find apples tasty and bananas so succulent.
I find apples so succulent and bananas tasty.
I find apples succulent and bananas so tasty.
"""
The nlg_enum method is a wrapper of nlg_iter which allows to do a bit more complex things.
Instead of a list of elements, it takes a list of lists and strings. Through the iteration it maps every element with its associated ones. It then stops when there is no more elements in the smaller list.
my_list_of_fruits = ['apples', 'bananas', 'peaches']
my_list_of_syno = [self.nlg_syn('succulent', 'tasty') for i in range(2)]
self.nlg_iter([
my_list_of_fruits,
"are",
my_list_of_syno
])
# apples are tasty and bananas are succulent
The nlg_num method allows to transform a number in a string following several criterion.
def nlg_num(self, num, short="", sep=".", mile_sep=" ", dec=None, force_sign=False, remove_trailing_zeros=True)
my_number = 10000.66028
self.nlg_num(my_number, dec=3, force_sign=True)
# +10 000.66
# The remove_trailing_zeros parameter will remove the last decimal even though we indicated 3 decimals because it is a 0.
The nlg_tags method allows to create HTML tags with attributes and encapsulate text into them.
def nlg_tag(self, tag, text="", _class=None, **kwargs)
self.nlg_tags('br')
# <br>
self.nlg_tags('p', self.free_text(
'This is a',
self.nlg_tags('b', 'sentence with bold'),
'in a paragraph.'
))
# <p>This is a <b>sentence with bold</p> in a paragraph.</p>
self.nlg_tags('div',
self.nlg_tags('h1', "My content"),
id="title_div"
)
# <div id="title_div"><h1>My content</h1></div>
The no_interpret method allows to deactivate the nlg interpretation (automatic contractions and typographical conventions) for a given string.
# "This is a string.with a dot inside ." becomes "This is a string. With a dot inside." after NLG processing.
self.no_interpret("This is a string.with a dot inside .")
# This is a string.with a dot inside .
The Datas class is used to store the input you receive.
It should be inherited by your own custom data classes.
class Datas:
def __init__(self, json_in)
class MyDatas(Datas)
def __init__(self, json_in)
super().__init__(json_in)
my_datas = MyDatas(input)
The Document class is your final document wrapper.
class Document:
def __init__(self, datas, title="", log_level="ERROR", css_path="css/styles.css", lang="fr", freeze=False)
my_datas = MyDatas(input)
document = Document(my_datas)
It takes at least an instance of a Datas class (or your custom one) as parameter.
The 'freeze' parameter means that for every nlg_syn call, the chosen string will always be the first. It is useful for non-regression tests.
The Section class is a text zone of your document independant of others for the draw of synonyms.
It is created from the Document class with the new_section method.
You can give a HTML tag name in parameter (by_default 'div') and HTML attributes.
my_datas = MyDatas(input)
document = Document(my_datas)
first_paragraph_section = document.new_section(html_elem_attr={"id": "firstParagraph"})
second_paragraph_section = document.new_section(html_elem_attr={"id": "secondParagraph"})
document.write()
You should write your sections in the document with the write method of the class Document.
You can also write each section separately to manage the order of the sections in the document with the write_section method.
def write_section(self, section, parent_elem=None, parent_id=None)
You should not confuse a Section with a simple text zone.
If you want your first and second paragraph to be independant, you create sections like we saw it above.
If you just want to have two separates text zone in your document but without indepedancy on the synonyms, you create tags with nlg_tags.
paragraph_section = document.new_section()
paragraph_section.text = (
paragraph_section.tools.add_tag('div', id='first_paragraph', text='First paragraph text'),
paragraph_section.tools.add_tag('div', id='two_paragraph', text='Second paragraph text')
)
You will never use this way of calling the nlg_tags function because we created the TextClass object.
A TextClass is a class in which you will write your text. You should create your own sub-class for each part of your text.
A TextClass takes a Section as parameter.
class MyDatas(Datas)
def __init__(self, json_in)
super().__init__(json_in)
self.my_job = "developer"
class MyText(TextClass):
def __init__(self, section):
super().__init__(section)
self.text = (
"Hello",
self.nlg_syn("world", "everyone"),
".",
self.nlg_tags('br'),
self.nlg_tags('b', "Nice to meet you."),
"I am a",
self.my_job,
"."
)
my_datas = MyDatas(input)
document = Document(my_datas)
my_section = document.new_section(html_elem_attr={"id": "mySection"})
MyText(my_section)
document.write()
# <div id="mySection">Hello everyone.<br> <b>Nice to meet you.</b> I am a developer.</div>
The TextClass is a powerful object wich allows you to call all the CoreNLG functions with self.
You can also access every attributes of your Datas class the same way.
The self.text write your text in the Section that was send as a parameter to your TextClass.
You can use it with strings, nested lists or tuples and it will do the same job as the free_text function.
Don't be afraid ! The '=' operator is override, to enjoy all the possibility of it, you should do :
self.text = "Hello,"
self.text = "this is one sentence"
self.text = (
"that I am",
"writing here."
)
# Hello, this is one sentence that I am writing here.
The TextVar is a simple object, sub-class of str, whose '+=' operator is overloaded.
It's the same principle as free_text and self.text, it works with strings and nested lists/tuples.
It aims to ease the concatenation of strings.
class MyText(TextClass):
def __init__(self, section):
super().__init__(section)
self.text = self.nlg_tags('b', self.text_with_free_text())
self.text = self.nlg_tags('b', self.text_with_text_var())
def text_with_free_text(self):
return self.free_text(
"first test is true" if test_1 else "first test is false",
"and",
(
"second test",
"is true"
) if test_2 else (
"second test",
"is false"
)
)
def text_with_text_var(self):
my_text = TextVar()
if test_1:
my_text += "first test is true"
else:
my_text += "first test is false"
my_text += "and"
if test_2:
my_text += "second test", "is true"
else:
my_text += (
"second test",
"is false"
)
return my_text
In this example, the two methods returns equivalent strings. You can use both depending on which one you find the simpler to understand and the number of nested tests you have to write.
Install the library:
pip install CoreNLG
Create a basic template with cookiecutter:
pip install cookiecutter
cookiecutter https://github.com/societe-generale/core-nlg.git
You should obtain this architecture of project:
MyProject
|-- ProjectEntryPoint.py
|-- MyProject
| |-- Datas
| | |-- MyDatas.py
| |-- TextClass
| | |-- Introduction.py
| | |-- Content.py
| |-- Resources
| |-- Tools
|-- inputs
| |-- test.json
ProjectEntryPoint.py will be your main, you can use it to test locally your application.
Run this file and you will see the HTML result in your console and your browser will render it automatically.
Happy coding !