Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only pretraining comparisons appear in the labeling interface #36

Open
mixuala opened this issue Nov 27, 2017 · 2 comments
Open

only pretraining comparisons appear in the labeling interface #36

mixuala opened this issue Nov 27, 2017 · 2 comments

Comments

@mixuala
Copy link

mixuala commented Nov 27, 2017

I got to this point following the RL-teacher Usage docs

Once you have finished labeling the 175 pre-training comparisons, we train the predictor to ? convergence on the initial comparisons. After that, it will request additional comparisons every few seconds."

I was able to use the human-feedback-api webapp to provide feedback for the 175 pre-training labels. After that, the agent began to learn based on the pre-training feedback

8900/10000 predictor pretraining iters... 
9000/10000 predictor pretraining iters... 
9100/10000 predictor pretraining iters... 
9200/10000 predictor pretraining iters... 
9300/10000 predictor pretraining iters... 
9400/10000 predictor pretraining iters... 
9500/10000 predictor pretraining iters... 
9600/10000 predictor pretraining iters... 
9700/10000 predictor pretraining iters... 
9800/10000 predictor pretraining iters... 
9900/10000 predictor pretraining iters... 
Starting joint training of predictor and agent

But joint training failed. The human-feedback-api webapp displayed only blank screens. When I checked the URL for the videos in a separate tab, I got an XML error message that said The specified key does not exist

At the same time, the teacher.py script continued to generate video samples and upload to GoogleCloud

Operation completed over 1 objects/14.4 KiB.                                     
Copying media to gs://rl-teacher-snappi/abb3e1ed-f78e-459d-bed8-a1865ed541b1-right.mp4 in a background process
Copying media to gs://rl-teacher-snappi/c21384b2-7395-49b5-b263-5200221a3a36-right.mp4 in a background process
Copying media to gs://rl-teacher-snappi/c21384b2-7395-49b5-b263-5200221a3a36-left.mp4 in a background process
Copying file:///tmp/rl_teacher_media/c21384b2-7395-49b5-b263-5200221a3a36-left.mp4 [Content-Type=video/mp4]...
Copying file:///tmp/rl_teacher_media/c21384b2-7395-49b5-b263-5200221a3a36-right.mp4 [Content-Type=video/mp4]...
Copying file:///tmp/rl_teacher_media/abb3e1ed-f78e-459d-bed8-a1865ed541b1-right.mp4 [Content-Type=video/mp4]...
\ [1 files][ 14.8 KiB/ 14.8 KiB]                                                
Operation completed over 1 objects/14.8 KiB.                                     
\ [1 files][ 15.8 KiB/ 15.8 KiB]                                                
Operation completed over 1 objects/16.1 KiB.                                     

Operation completed over 1 objects/15.8 KiB.                             

I can manually confirm that the media files exist in Google Cloud

I waited many minutes, refreshed the webapp, even clicked can't tell a few times, but the video never reappeared after the (successful) pre-training.

@nottombrown
Copy link
Owner

What is the URL for the key that does not exist? Perhaps your human-feedback-api webapp doesn't know what the correct bucket to look at is

@mixuala
Copy link
Author

mixuala commented Dec 5, 2017

I'm not an expert in django yet, so I've just been hacking away. But it seems that the problem is in a sort order between the process which records video segments order_by('+created_at') and the way human-feedback-api webapp displays segments by order_by('-created_at')

I added the following hack and it seems to fix the problem. But I think

# ./rl-teacher/human-feedback-api/human_feedback_api/views.py
def _all_comparisons(experiment_name, comparison_id=None, use_locking=True):
    not_responded = Q(responded_at__isnull=True)

    cutoff_time = timezone.now() - timedelta(minutes=5)
    not_in_progress = Q(shown_to_tasker_at__isnull=True) | Q(shown_to_tasker_at__lte=cutoff_time)
    finished_uploading_media = Q(created_at__lte=timezone.now() - timedelta(seconds=25)) # Give time for upload
    ready = not_responded & not_in_progress & finished_uploading_media

    ##  order by created_at ASC, same as id
    ascending=True
    if ascending:
        ## Sort by priority, then put OLDEST labels first
        ready = not_responded & finished_uploading_media
        return Comparison.objects.filter(ready, experiment_name=experiment_name).order_by('-priority', 'id')
    else:
        return Comparison.objects.filter(ready, experiment_name=experiment_name).order_by('-priority', '-created_at')

But I'm not exactly clear how RL with human feedback is supposed to work. I'm running the experiments on an old MacBook Pro, so the availability of recorded video is always behind the latest comparison as shown by whats uploading in the logfile. I give feedback on 3-5 comparisons, then come back 10-20 mins later for the next batch.

But it seems to me that the most recent comparison/video segments have the benefit of more Q-learning––and rating these comparisons would have a greater learning benefit. If I only provide feedback on a few comparisons every 20 mins, would I get better results by giving feedback for the most recent ones? Does the learning algorithm still work if I offer sparse feedback, or do I need to provide feedback for every comparison?

if yes, then I suppose it would be better to record and provide feedback on video segments based on the most recent experiments first. right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants