The data format and light syntaxes for creating datasets for supervised fine-tuning (SFT) and policy optimization (PO) of text-generation models such as large-language-models.
"Preference-Tree" is an original graph structure designed to represent text-based conversational data and preference data.
Main Node: The orange nodes
Subnode and Turn: Each main node has not only a next main node but also additional nodes called "subnodes". Subnodes do not have the next node are considered dead-ends. The main node and its associated subnodes together form "turn" in our system. There are several types of subnodes, each serving a specific role:
- Upvoted Subnode: Represents a desired, nice, or preferred message. Used for chosen response in PO.
- Downvoted Subnode: Represents a undesired, bad, or not-preferred message. Used for rejected response in PO.
- Writing Subnode: Indicates that the message is in the process of being written. Expected to be completed, e.g., by an LLM or human to make response candidates.
- Unscored Subnode: Indicates that the message is not yet been scored. Expected to be change to an upvoted or downvoted subnode, typically through human evaluation.
SFT data: The SFT data is extracted from the Preference-Tree as a sequence of main nodes:
PO data: To construct PO data, we need to prepare the preference pairs, in the form of (prompt, chosen, rejected).
These pairs are also extracted from the Preference-Tree.
For each turn
- The main node
$M_i$ is also considered a chosen sample in response to the prompt$[M_1,...,M_{i-1}]$ . - The number of preference pairs starting from
$M_i$ is$(|\mathcal{U}_i|+1)|\mathcal{D}_i|$ . - Multi-turn chosen and rejected samples are currently not available.
This structure and extraction method facilitate the creation of datasets for both SFT and PO tasks.
"Plain-Preference-Tree" is a plain-text-based lightweight format for making up the Preference-Tree.
Signs
Sign | Meaning |
---|---|
: |
Line break |
+ |
Upvoted subnode |
- |
Downvoted subnode |
* |
Writing subnode |
? |
Unscored subnode |
Main Node: Main nodes a represented by lines starting with no sign.
Example:
Hello.
Hello. How can I assist today?
I'd like to do something fun!
The line 1 and 3 are user messages, and the line 2 is assistant message.
Line Break: To add a line-break within a message, the subsequent lines need to start with :
.
Example:
Hello.
Hello. How can I assist today?
I'd like to do something fun!
:Do you have any recommendations?
The 2nd user turn (line 3-4) has 1 line break.
Subnode
You can add subnode by adding the corresponding sign at the beginning of the line.
The type of subnode and the corresponding sign is shown in the signs table.
You can also line break :
at any type of subnode.
The subnodes in the turn are appended after the main node of the turn.
Example:
Hello.
Hello. How can I assist today?
+Hello. I'm glad to see you.
:Here are news.
+Hello! You look happy!
-I'm sleepy. Bye.
*I am a
?Good morning. What are you planning to do today?
I'd like to do something fun!
:Do you have any recommendations?
In the example above, line 2-7 represents the 1st assistant turn.
- Line 2 is the main node of the 1st assistant turn.
- Line 3-8 is the subnodes of the 1st assistant turn.
- Line 3-4 is an upvoted subnode (with a line break).
- Line 5 is another upvoted subnode.
- Line 6 is a downvoted subnode.
- Line 7 is a writing subnode.
- Line 8 is an unscored subnode.
Note
We have not implemented the escape sequence for main nodes.
Lines starting with :
, +
, -
, *
or ?
are not valid main nodes.
First, clone this repository.
mkdir example
cd example
git clone https://github.com/Mya-Mya/PlainPreferenceTree/
And then import the repository.
from PlainPreferenceTree import *
Make the Plain-Preference-Tree format text.
ppttext = """Hello.
Hello. How can I assist today?
I'd like to do something fun!
:Do you have any recommendations?
How about walking around in your town?
+How about listening to music?
:It is relaxing to listen to music!
+How about reading books?
-I don't want to answer. Bye
*How about going
?So, you can play with me. Let's play together!
That sounds fun. What should I watch out for when walking?
When walking, it's important to be aware of your surroundings.
"""
Launch the parser.
The Preference-Tree is obtained by loads
method.
parser = PPTParserV1()
pt = parser.loads(ppttext)
Result:
[
Turn(role='user', main='Hello.', subnodes=[]),
Turn(role='assistant', main='Hello. How can I assist today?', subnodes=[]),
Turn(role='user', main="I'd like to do something fun!\nDo you have any recommendations?", subnodes=[]),
Turn(
role='assistant',
main='How about walking around in your town?',
subnodes=[
Subnode(type='upvoted', content='How about listening to music?\nIt is relaxing to listen to music!'),
Subnode(type='upvoted', content='How about reading books?'),
Subnode(type='downvoted', content="I don't want to answer. Bye"),
Subnode(type='writing', content='How about going'),
Subnode(type='unrated', content="So, you can play with me. Let's play together!")
]
),
Turn(role='user', main='That sounds fun. What should I watch out for when walking?', subnodes=[]),
Turn(role='assistant', main="When walking, it's important to be aware of your surroundings.", subnodes=[])
]
The SFT data is obtained by make_conversation
method.
sft_data = make_conversation(pt)
Result:
[
{'role': 'user', 'content': 'Hello.'},
{'role': 'assistant', 'content': 'Hello. How can I assist today?'},
{'role': 'user', 'content': "I'd like to do something fun!\nDo you have any recommendations?"},
{'role': 'assistant', 'content': 'How about walking around in your town?'},
{'role': 'user', 'content': 'That sounds fun. What should I watch out for when walking?'},
{'role': 'assistant', 'content': "When walking, it's important to be aware of your surroundings."}
]
The PO data is obtained by make_preferences
method.
preferences = make_preferences(pt)
Result:
[
{
'prompt': [
{'role': 'user', 'content': 'Hello.'},
{'role': 'assistant', 'content': 'Hello. How can I assist today?'},
{'role': 'user', 'content': "I'd like to do something fun!\nDo you have any recommendations?"}
],
'chosen': [
{'role': 'assistant', 'content': 'How about listening to music?\nIt is relaxing to listen to music!'}
],
'rejected': [{'role': 'assistant', 'content': "I don't want to answer. Bye"}]
},
{
'prompt': [
{'role': 'user', 'content': 'Hello.'},
{'role': 'assistant', 'content': 'Hello. How can I assist today?'},
{'role': 'user', 'content': "I'd like to do something fun!\nDo you have any recommendations?"}
],
'chosen': [{'role': 'assistant', 'content': 'How about reading books?'}],
'rejected': [{'role': 'assistant', 'content': "I don't want to answer. Bye"}]
},
{
'prompt': [
{'role': 'user', 'content': 'Hello.'},
{'role': 'assistant', 'content': 'Hello. How can I assist today?'},
{'role': 'user', 'content': "I'd like to do something fun!\nDo you have any recommendations?"}
],
'chosen': [{'role': 'assistant', 'content': 'How about walking around in your town?'}],
'rejected': [{'role': 'assistant', 'content': "I don't want to answer. Bye"}]
}
]