Animating Emotions by interpolating Fibonacci Chains and Iconic Gestures
The ACTR project is a 3D presentation standard for avatars. It is a method of generating real-time avatar animation from natural language generation models (NLP). Check out the full documentation with tutorial here.
Part of the SEED token project. This is a sneak preview - there is more to come. See the Wiki for more information.
SEED democratizes AI by offering an open and independent alternative to the monopolies of a few large corporations that currently control conversational user interfaces (CUIs) and AI technologies. SEED's licensed, monetized open-source platform for bots on blockchain supports collaboration and creative compensation that will exceed the proprietary deployments from industry giants. We are also giving users back control of their personal data. Find out more about the SEED Token project at seedtoken.io. See the Connect section at the end for contact info.
Go to our Developers Garden to see all featured projects, pick one and fill out the Developer Interest Form. If you rather like to discuss ideas before rolling up your sleeves, please come visit our Discord channels for developers
To drive realtime animations that are contextually appropriate to arbitrary natural language input or output. This may include animating an avatar based on the output of a natural language processing system, driving a VR scenegraph via voice control, or other purposes. Ultimately, ACTR generates realtime cinema from natural language strings. The following specifications detail animating avatars that are connected to chatbots.
We began with a breakout of 8 emotions (though this number can be different it has been convenient for us) and a single, nul / non-emotion. For a very similar model see Plutchik's wheel of emotions (which doesn’t work as well for reasons beyond our scope in this explanation). We’ll call these “emotion buckets.” We do not name the emotions as it is both unimportant and can even be detrimental to the development process.
We then built eight 33-second animations. Each of the eight animations cor- responded to the list of eight emotions. Two nul / non-emotion animations of the same duration were made, giving us a total of ten animations. Each of the 33 second animations were split into a Fibonacci sequence (1, 1, 2, 3, 5, 8, and 13 second durations). These were saved for later use, intended as a li- brary for later real-time concatenation during the conversation with the avatar. The link animation is then assigned a name based on these values, something like d13_e7 for emotion number seven at 13 seconds. This is in- tended as a reference for the player to then interpolate with other anima- tions.
The NLP system produces a block of output text in human-readable format. In order to leverage the prepared ACTR library (above) we need a duration value and an affect value. Duration is the time it takes to speak that text and is calculated by assuming that one second equals 7 bytes. Second, the undetermined emotion (the affect or emotion of the output NLP text) is calculated by an external service (such as WordNet Affect, manually entered, or others) and the affect report is delivered to the PHP script that is monitoring MultiMedia tag generation. The script then pro- duces a control file that roughly matches that emotion and duration without having repeated animations happening adjacent to one another.
A block of output text is evaluated so as to determine two values
- The duration report. Listed in seconds, this is based on the number of bytes if using a TTS system, or recording length, or whatever to deter- mine how long it takes to speak it.
- The affect report. Listed as an integer from 1-10, to correspond with our emotional model, that corresponds to the emotion number. These two values are then used to determine the duration and emotion of the composed, or chained, animation. This chained animation is a sequence of the links mentioned above generated by interpolating between the end-val- ues and start-values of successive link animations.
- Note: This multimedia script simultaneously prevents recurring anima- tions. An example of what needs to be avoided is if we have two re- sponses that are each the same emotion and the same duration. So consider the possibility that the system produces a block of text that amounts to 9 seconds of emotion #2, then another block of text, on the subsequent output, that is also 9 seconds of emotion #2. We do not want the avatar to have precisely the same animation so the sys- tem might use 8+1/#2 to generate the first animation, but we do not want to run that same (8+1/#2) again, so we use instead anything but that. One option would then be 5+1+3/#2. This avoids repeats.
At particular moments that require particular emphasis we use Iconic Ges- tures to break this up and bring attention to the words being said such that the Iconic Gesture matches the duration and sentiment of what is being said. ACTR provides 50 default iconic gestures (see below) however there is a spectrum of integration / customisation which is usually gated by scope
As well as driving the individual character animation ACTR may also be used:
- To drive the entire scene graph (sun rises, flowers bloom, etc)
- To drive camera direction (Whip-pan to CU, to Extreme CU, dolly etc). Used in cinematic techniques, this is key to building affective relationships with protagonists of stories, or other characters framed by the camera.
ACTR JSON Format Overview |
---|
The ACTR JSON format is designed to allow the communication of avatar con- trol data to an ACTR capable client for the purpose of rendering synchronized mouth movement, emotional expression, and iconic gestures. Required animations:
|
Base English language visemes |
---|
o [o] a [a] c [cdgknrsy] e [e] f [fv] m [m] u [u] neutral |
Other details |
---|
|
{ "sequence": {
"element": string, //name of the scene element that is
// the target of this sequence
"start_delay": integer, //start time delay from "0" for this se-quence(millisecs)
"total_duration": integer, //duration of all animations together (mil-liseconds)
"speech": string, //Human readable speech text
"phoneme": string, //International Phonetic Alphabet version of speech
"visemes": array of strings, //array of viseme animation names
//mapped by server from phonemes
"entries: [ //array of animation entries
{
"type": string, //entry type [head|body|iconic]
"label": string, //label used in the model
"description": string, //human readable helper description
"duration": float, //duration of this entry in sec-onds.
"order": integer, //playback order of this entry
//(preserved within type)
"info": { //non-common fields, custom per type
"type": string, //should match parent type
"emotion": string, //emotion label, typically 1 through 10
"duration": string, //duration label
// [1000a,1000b,2000,3000,5000,8000,13000]
"mouth": float, //used for custom mixing of mouth varia-ble
"eyes": float //used for custom mixing of eye variable
}
}
]
}
}
Full list of ACTR animations
-
The character should be split into multiple FBX files per markup on the list
-
Each FBX file MUST use the same RIG
-
Each FBX file should contain one animation clip per named animation assigned to it
-
It is preferred that the animations are not set to have one keyframe per frame
-
durations are in milliseconds
-
@ 30 frames per second, the here are the mappings:
- 500ms = 15F
- 1000ms = 30F
- 2000ms = 60F
- 3000ms = 90F
- 4000ms = 120F
- 5000ms = 150F
- 8000ms = 240F
- 13000ms = 390F
Mouth entries are so short that ACTR will just use LERP to transition between them.
Emotions are defined according to the emotion wheel (see related image file). For head, emotions are essentially the various combinations of smile/frown with eyebrows up/down.
Here are the names of the emotions to map to the numbers
e1 - Mischievous
e2 - Happy
e3 - Kind/Sympathetic
e4 - Worried/Uncertain
e5 - Sad/Hurt
e6 - Disapproving/Disappointed
e7 - Angry
e8 - Determined
e9,10 - Neutral
Here is the full set of categories: e1,e2,e3,e4,e5,e6,e7,e8,e9,admiring,encouraging,appreciative,congratulatory,positive,cheers,wave,hello,goodbye,thumbsup,onethumbup,pout
NLP can send comma-delimited sets of the categories to be played back and ACTR will generate a playlist of the according length combining the different cat-egories.
e.g. affect=thumbsup,e2
Head emotions should also include blinking, etc. For body, emotions should include hand gestures, breathing, etc.
Neutral emotions e9 and e10 will be "loopable" on the base neutral keyframe and so do not need transitions. Transitions to emotions and transitions from emotions must be played before each emotion animation sequence. This is true for both head emotion tracks and body emotion tracks.
Head and body for a given emotion will start and end in the "emotion" state and so are loopable, but they require a transition to emotion animation and transition from emotion animation to be played before a given emotion sequence is played.
Iconics are also loopable, starting and ending on the "neutral" keyframe to avoid the need for specific iconic transition animations.
BELOW IS THE FULL SET OF EMOTION ANIMATIONS AND TRANSITIONS IN ORDER OF IM-PLEMENTATION PRIORITY.
Please NOTE the naming convention for the A and B variants on the 1 second ver-sions of each emotion.
Please NOTE the markup indicating which glTF/glb file should contain the animation.
<mouth_and_iconic.glb>
mouth_o_d500 mouth_a_d500 mouth_c_d500 mouth_e_d500 mouth_f_d500 mouth_m_d500 mouth_u_d500 mouth_neutral_d500
</mouth_and_iconic.glb>
<e9.glb>
head_e9-A_d1000 head_e9-B_d1000 head_e9_d2000 head_e9_d3000 head_e9_d5000 head_e9_d8000 head_e9_d13000
body_e9-A_d1000 body_e9-B_d1000 body_e9_d2000 body_e9_d3000 body_e9_d5000 body_e9_d8000 body_e9_d13000
</e9.glb>
<e2.glb>
head_transition-to-e2_d500 head_transition-from-e2_d500 body_transition-to-e2_d500 body_transition-from-e2_d500
head_e2-A_d1000 head_e2-B_d1000 head_e2_d2000 head_e2_d3000 head_e2_d5000 head_e2_d8000 head_e2_d13000
body_e2-A_d1000 body_e2-B_d1000 body_e2_d2000 body_e2_d3000 body_e2_d5000 body_e2_d8000 body_e2_d13000
</e2.glb>
<e10.glb>
head_e10-A_d1000 head_e10-B_d1000 head_e10_d2000 head_e10_d3000 head_e10_d5000 head_e10_d8000 head_e10_d13000
body_e10-A_d1000 body_e10-B_d1000 body_e10_d2000 body_e10_d3000 body_e10_d5000 body_e10_d8000 body_e10_d13000
</e10.glb>
<e3.glb>
head_transition-to-e3_d500 head_transition-from-e3_d500 body_transition-to-e3_d500 body_transition-from-e3_d500
head_e3-A_d1000 head_e3-B_d1000 head_e3_d2000 head_e3_d3000 head_e3_d5000 head_e3_d8000 head_e3_d13000
body_e3-A_d1000 body_e3-B_d1000 body_e3_d2000 body_e3_d3000 body_e3_d5000 body_e3_d8000 body_e3_d13000
</e3.glb>
<e4.glb>
head_transition-to-e4_d500 head_transition-from-e4_d500 body_transition-to-e4_d500 body_transition-from-e4_d500
head_e4-A_d1000 head_e4-B_d1000 head_e4_d2000 head_e4_d3000 head_e4_d5000 head_e4_d8000 head_e4_d13000
body_e4-A_d1000 body_e4-B_d1000 body_e4_d2000 body_e4_d3000 body_e4_d5000 body_e4_d8000 body_e4_d13000
</e4.glb>
<mouth_and_iconic.glb>
iconic_admiring-A_d1000 iconic_admiring-A_d2000 iconic_admiring-A_d3000 iconic_admiring-A_d4000 iconic_admiring-A_d5000 iconic_encouraging-A_d1000 iconic_encouraging-A_d2000 iconic_encouraging-A_d3000 iconic_encouraging-A_d4000 iconic_encouraging-A_d5000 iconic_appreciative-A_d1000 iconic_appreciative-A_d2000 iconic_appreciative-A_d3000 iconic_appreciative-A_d4000 iconic_appreciative-A_d5000 iconic_congratulatory-A_d1000 iconic_congratulatory-A_d2000 iconic_congratulatory-A_d3000 iconic_congratulatory-A_d4000 iconic_congratulatory-A_d5000 iconic_positive-A_d1000 iconic_positive-A_d2000 iconic_positive-A_d3000 iconic_positive-A_d4000 iconic_positive-A_d5000
</mouth_and_iconic.glb>
<e6.glb>
head_transition-to-e6_d500 head_transition-from-e6_d500 body_transition-to-e6_d500 body_transition-from-e6_d500
head_e6-A_d1000 head_e6-B_d1000 head_e6_d2000 head_e6_d3000 head_e6_d5000 head_e6_d8000 head_e6_d13000
body_e6-A_d1000 body_e6-B_d1000 body_e6_d2000 body_e6_d3000 body_e6_d5000 body_e6_d8000 body_e6_d13000
</e6.glb>
<e5.glb>
head_transition-to-e5_d500 head_transition-from-e5_d500 body_transition-to-e5_d500 body_transition-from-e5_d500
head_e5-A_d1000 head_e5-B_d1000 head_e5_d2000 head_e5_d3000 head_e5_d5000 head_e5_d8000 head_e5_d13000
body_e5-A_d1000 body_e5-B_d1000 body_e5_d2000 body_e5_d3000 body_e5_d5000 body_e5_d8000 body_e5_d13000
</e5.glb>
<e1.glb>
head_transition-to-e1_d500 head_transition-from-e1_d500 body_transition-to-e1_d500 body_transition-from-e1_d500
head_e1-A_d1000 head_e1-B_d1000 head_e1_d2000 head_e1_d3000 head_e1_d5000 head_e1_d8000 head_e1_d13000
body_e1-A_d1000 body_e1-B_d1000 body_e1_d2000 body_e1_d3000 body_e1_d5000 body_e1_d8000 body_e1_d13000
</e1.glb>
<e8.glb>
head_transition-to-e8_d500 head_transition-from-e9_d500 body_transition-to-e8_d500 body_transition-from-e9_d500
head_e8-A_d1000 head_e8-B_d1000 head_e8_d2000 head_e8_d3000 head_e8_d5000 head_e8_d8000 head_e8_d13000
body_e8-A_d1000 body_e8-B_d1000 body_e8_d2000 body_e8_d3000 body_e8_d5000 body_e8_d8000 body_e8_d13000
</e8.glb>
<e7.glb>
head_transition-to-e7_d500 head_transition-from-e7_d500 body_transition-to-e7_d500 body_transition-from-e7_d500
head_e7-A_d1000 head_e7-B_d1000 head_e7_d2000 head_e7_d3000 head_e7_d5000 head_e7_d8000 head_e7_d13000
body_e7-A_d1000 body_e7-B_d1000 body_e7_d2000 body_e7_d3000 body_e7_d5000 body_e7_d8000 body_e7_d13000
<mouth_and_iconic.glb>
iconic_cheers_d3000 iconic_wave_d3000 iconic_hello_d3000 iconic_goodbye_d3000 iconic_thumbsup_d3000 iconic_onethumbup_d3000 iconic_pout_d3000
</mouth_and_iconic.glb>
//Iconics are also loopable, starting and ending on the "neutral" keyframe to avoid the //need for specific iconic transition animations iconic_cheers_d1000 iconic_wave_d1000 iconic_hello_d1000 iconic_goodbye_d1000 iconic_thumbsup_d1000 iconic_onethumbup_d1000 iconic_pout_d1000
# | User Says (NLP_IN) |
Gesture Description | Gesture Name / ID |
---|---|---|---|
1 | (idiosyncratic) | iconic_interestedA_d1000 | |
2 | (idiosyncratic) | iconic_interestedB_d2000 | |
3 | (idiosyncratic) | iconic_interestedC_d3000 | |
4 | (idiosyncratic) | iconic_interestedD_d4000 | |
5 | (idiosyncratic) | iconic_interestedE_d5000 | |
6 | head cocked to left | iconic_thinkingA_d1000 | |
7 | chin raised, eyes narrowed | iconic_thinkingB_d2000 | |
8 | chin lowered, looks at user | iconic_thinkingC_d3000 | |
9 | holds chin in right hand, folds left arm across abdomen | iconic_thinkingD_d4000 | |
10 | rests chin on right fist, folds left arm across abdomen | iconic_thinkingE_d5000 | |
11 | Eyes widen/eyebrows raise | iconic_surprisedA_d1000 | |
12 | (idiosyncratic) | iconic_surprisedB_d2000 | |
13 | (idiosyncratic) | iconic_surprisedC_d3000 | |
14 | (idiosyncratic) | iconic_surprisedD_d4000 | |
15 | (idiosyncratic) | iconic_surprisedE_d5000 | |
16 | Right finger to chin | iconic_listeningA_d1000 | |
17 | Right finger to right cheek | iconic_listeningB_d2000 | |
18 | Eyes squint, nod | iconic_listeningC_d3000 | |
19 | one eyebrow raised | iconic_listeningD_d4000 | |
20 | head cocked to right | iconic_listeningE_d5000 | |
21 | Puts hands behind body and nods head | iconic_concernedA_d1000 | |
22 | Cocks head, chin moving to left, and furrows forehead | iconic_concernedB_d2000 | |
23 | Holds chin in left hand, folds right arm across abdomen, left elbow resting on top of right hand | iconic_concernedC_d3000 | |
24 | Places left palm across left cheek, folds right arm across abdomen | iconic_concernedD_d4000 | |
25 | Jut chin forward and up, furrow brow | iconic_concernedE_d5000 |
Prioritisation (top ten in order of most useful down with an evenly scattered range of duration)
- iconic_listeningA_d1000
- iconic_listeningB_d2000
- iconic_listeningC_d3000
- iconic_listeningD_d4000
- iconic_listeningE_d5000
- iconic_thinkingA_d1000
- iconic_thinkingB_d2000
- iconic_thinkingC_d3000
- iconic_thinkingD_d4000
- iconic_thinkingE_d5000
# | Bot Says (NLP_OUT) | Gesture Description | Gesture Name |
---|---|---|---|
1 | Eyebrows raise, small smile | iconic_admiringA_d1000 | |
2 | Smile, nod head | iconic_admiringB_d2000 | |
3 | Eyebrows raise, head swivel so that chin moves down and to the left (head cock) | iconic_admiringC_d3000 | |
4 | Holds up right hand index finger, points at user (You!) | iconic_admiringD_d4000 | |
5 | Thumbs up, right hand iconic_admiringE_d5000 | ||
6 | (idiosyncratic) | iconic_encouragingA_d1000 | |
7 | (idiosyncratic) | iconic_encouragingB_d2000 | |
8 | (idiosyncratic) | iconic_encouragingC_d3000 | |
9 | Nods head | iconic_encouragingD_d4000 | |
10 | Folds hands in front of body and nods head | iconic_encouragingE_d5000 | |
11 | Thumbs up, right hand | iconic_appreciativeA_d1000 | |
12 | OK sign, right hand | iconic_appreciativeB_d2000 | |
13 | (idiosyncratic) | iconic_appreciativeC_d3000 | |
14 | (idiosyncratic) | iconic_appreciativeD_d4000 | |
15 | (idiosyncratic) | iconic_appreciativeE_d5000 | |
16 | Thumbs up | iconic_congratulatoryA_d1000 | |
17 | Smile, thumbs up | iconic_congratulatoryB_d2000 | |
18 | Claps (five beats) | iconic_congratulatoryC_d3000 | |
19 | Nod, smile, thumbs up | iconic_congratulatoryD_d4000 | |
20 | Fist pump in the air, smile | iconic_congratulatoryE_d5000 | |
21 | Nod head | iconic_positiveA_d1000 | |
22 | Close eyes, nod head two beats, open eyes | iconic_positiveB_d2000 | |
23 | (idiosyncratic) | iconic_positiveC_d3000 | |
24 | (idiosyncratic) | iconic_positiveD_d4000 | |
25 | (idiosyncratic) | iconic_positiveE_d5000 |
Prioritisation (top ten in order of most useful down with a mostly-evenly scat-tered range of duration)
- iconic_admiringA_d1000
- iconic_admiringB_d2000
- iconic_admiringC_d3000
- iconic_admiringD_d4000
- iconic_admiringE_d5000
- iconic_appreciativeA_d1000
- iconic_appreciativeB_d2000
- iconic_positiveA_d1000
- iconic_positiveB_d2000 10 iconic_encouragingD_d4000
###Prioritization of Affect In 100 outputs the avatar will be neutral (#9 & #10) and happy (#2) the majority of the time. Generally speaking the left side of the chart is less commonly used, so the avatar is rarely seen with its eyebrows down, and even less commonly seen with both eyebrows and mouth down.
Group | Prio |
---|---|
#9 | 20% |
#2 | 20% |
#10 | 20% = 60% of animations |
#3 | 12% |
#4 | 9% |
#6 | 6% |
#5 | 6% |
#1 | 3% |
#8 | 2% |
#7 | 2% |
These files are made available to you on an as-is and restricted basis, and may only be redistributed or sold to any third party as expressly indicated in the Terms of Use for Seed Vault.
Patents and related intellectual property (full applications available upon request)
SYSTEMS AND METHODS FOR AN AUTONOMOUS AVATAR DRIVER;
- Application #14/536,626
- Means of using NLP and identification methods to provide a visual representa-tion of an avatar to talk and perform in-world (virtual) functions.
SYSTEMS AND METHODS FOR CINEMATIC DIRECTION AND DYNAMIC CHARACTER CON-TROL VIA NATURAL LANGUAGE OUTPUT
- Application #15/49,164
- Driving realtime 3D graphics with linguistic (NLG) output, this is a method of turn-ing text into a cinematic sequence of animations, such as animating an avatar and an associated scenegraph.
Feel free to throw general questions regarding SEED and what to expect in the following months here on GitHub at @consiliera ([email protected]) ☀️
Connect with us elsewhere
- Discuss ACTR on the actr channel in Discord
- Go to our Developers Garden to see all SEED Community projects and find resources
- Follow us on Twitter
- Always here the latest news first on Telegram
Seed Vault Code (c) Botanic Technologies, Inc. Used under license.