Visual Behavior release data has bad session type #1975

alexpiet · 2021-03-02T01:20:11Z

Describe the bug
When loading the list of training sessions that will be released in the march data release, there are some sessions that do not have the correct format of session type. For instance they have '1_gratings' instead of 'TRAINING_1_gratings'. It is limited to two donor_ids: 744911447, 722884873

@dougollerenshaw @matchings Says this could be an issue in the VBA repo

To Reproduce
table = loading.get_filtered_behavior_session_table(release_data_only=True) table.session_type.unique()

Actual Behavior

Environment (please complete the following information):

OS & version: centos
Python version 3.7
AllenSDK version 2.8.0

Remaining work 3/12/2021

Ask Doug to manually update the 36 files listed below
Manually update records in mtrain to match
update relevant NWB files, and validate

The text was updated successfully, but these errors were encountered:

djkapner · 2021-03-05T20:05:37Z

I think I have demonstrated:

there is not any underlying data issue
there is not any underlying SDK issue

so I suspect this might be a VBA problem (@alexpiet @dougollerenshaw):

In [57]: run problem_pkl.py
Getting behavior-only session data. This might take a while...
Getting all ophys sessions. This might take a while.
0 data problems
0 sdk problems

code (using dev branch of VBA and AllenSDK v2.8.0)

import numpy as np
import pandas as pd
from pathlib import Path

from visual_behavior.data_access import loading
from visual_behavior import database as db
from allensdk.brain_observatory.behavior.behavior_session import BehaviorSession

table = loading.get_filtered_behavior_session_table(release_data_only=True)
stypes = table.session_type.unique()
bad_stypes = [s for s in stypes
              if (not s.startswith("TRAINING")) &
                 (not s.startswith("OPHYS"))]

# behavior session ids with bad session_types
sids = []
for stype in bad_stypes:
    sids.extend(table[table.session_type == stype].index.values)
sids = np.unique(sids)

# get the pkl_paths for these sessions
qstring = (
        """
        SELECT wkf.storage_directory || wkf.filename as path
        FROM well_known_files as wkf
        JOIN well_known_file_types as wkft
        ON wkft.id=wkf.well_known_file_type_id
        WHERE wkft.name='StimulusPickle'
        AND attachable_id IN ({})""")
ids_as_str = ",".join([str(i) for i in sids])
qresult = db.lims_query(qstring.format(ids_as_str))
pkl_paths = [Path(i) for i in qresult['path'].values]

# check that the data itself is ok
data_problems = []
for pkl_path in pkl_paths:
    data = pd.read_pickle(pkl_path)
    s1 = data['items']['behavior']['params']['stage']
    s2 = data['items']['behavior']['cl_params']['stage']
    try:
        assert s1.startswith("OPHYS") | s1.startswith("TRAINING")
        assert s2.startswith("OPHYS") | s2.startswith("TRAINING")
        assert s1 == s2
    except AssertionError:
        data_problems.append(pkl_path)
print(f"{len(data_problems)} data problems")

# check what AllenSDK says
sdk_problems = []
for behavior_id in sids:
    bs = BehaviorSession.from_lims(behavior_id)
    st = bs.task_parameters['session_type']
    try:
        assert st.startswith("OPHYS") | st.startswith("TRAINING")
    except AssertionError:
        sdk_problems.append(behavior_id)
print(f"{len(sdk_problems)} sdk problems")

alexpiet · 2021-03-05T20:39:33Z

@matchings

alexpiet · 2021-03-08T17:12:51Z

I created a VBA issue: AllenInstitute/visual_behavior_analysis#722

matchings · 2021-03-08T19:45:21Z

@alexpiet @djkapner loading.get_filtered_behavior_session_table() gets data from lims using the SDK BehaviorProjectCache

https://github.com/AllenInstitute/visual_behavior_analysis/blob/dev/visual_behavior/data_access/loading.py#L232

Loading behavior sessions from the SDK directly also shows cases where session_type does not start with TRAINING.

I don't think this is a VBA issue. This may be a case where @dougollerenshaw needs to edit the pkl files to correct the session_type so that it conforms to our expected naming scheme.

djkapner · 2021-03-10T23:01:55Z

Here's what is happening:
AllenSDK BehaviorProjectCache pings mtrain for stimulus_name (alternately session_type, alternately stage). It does this because mtrain has a table in it that contains both foraging_id or behavior_session_id and stimulus_name.
In VBA, AllenSDK is being used like this:

    cache = BehaviorProjectCache.from_lims()
    behavior_sessions = cache.get_behavior_session_table()

which creates a table of every behavior session ever. 27,000. To query the mtrain table takes 1.5 seconds.
LIMS, on the other hand, does not know about stimulus_name. Without using mtrain, AllenSDK needs to open a pkl file to get that information. To do that for 27000 behavior sessions takes 45 minutes.
I don't know the root cause, but, some pkl files had some problem stimulus_names. Doug fixed this manually in the pkl file, but, mtrain was never updated with this change.
We're planning to:

manually fix the incorrect mtrain entries
update LIMS so the behavior_sessions table also has stimulus_name as a column, so that we can distance ourselves and API from mtrain, which we don't really have control over.

djkapner · 2021-03-10T23:51:23Z

For manually updating mtrain, we should run these 9 commands:

UPDATE stages SET name='TRAINING_0_gratings_autorewards_15min' WHERE id=66
UPDATE stages SET name='TRAINING_1_gratings' WHERE id=67
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=62
UPDATE stages SET name='TRAINING_3_images_a_10uL_reward' WHERE id=63
UPDATE stages SET name='TRAINING_4_images_a_training' WHERE id=65
UPDATE stages SET name='TRAINING_4_images_a_handoff_ready' WHERE id=70
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=69
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=108
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=100

generated from:

import json
import numpy as np
from visual_behavior.data_access import loading
from allensdk.internal.api import db_connection_creator
from allensdk.core.auth_config import MTRAIN_DB_CREDENTIAL_MAP
from allensdk.brain_observatory.behavior.behavior_session import BehaviorSession

# visual behavior release data
table = loading.get_filtered_behavior_session_table(release_data_only=True)
stypes = table.session_type.unique()
bad_stypes = [s for s in stypes
              if (not s.startswith("TRAINING")) &
                 (not s.startswith("OPHYS"))]

# behavior session ids with bad session_types
problem_ids = []
for stype in bad_stypes:
    problem_ids.extend(table[table.session_type == stype].index.values)
problem_ids = [int(i) for i in np.unique(problem_ids)]

qstring = """
          SELECT stages.id, stages.name as session_type
          FROM behavior_sessions bs
          JOIN stages ON stages.id = bs.state_id
          WHERE bs.id='{}'"""
mtrain = db_connection_creator(fallback_credentials=MTRAIN_DB_CREDENTIAL_MAP)
update_template = "UPDATE stages SET name='{}' WHERE id={}"

# figure out what update statements we need to run for mtrain
mymap = []
my_updates = []
for problem_id in problem_ids:
    session = BehaviorSession.from_lims(problem_id)
    foraging_id = session.api.extractor.foraging_id
    lims_session_type = session.task_parameters['session_type']
    mtrain_result = mtrain.select(qstring.format(foraging_id))
    mtrain_stages_id = int(mtrain_result['id'].values[0])
    mtrain_session_type = mtrain_result['session_type'].values[0]
    assert mtrain_session_type in lims_session_type
    mymap.append({
            "behavior_session_id": problem_id,
            "lims": lims_session_type,
            "mtrain": mtrain_session_type,
            "foraging_id": foraging_id,
            "mtrain_stages_id": mtrain_stages_id})
    ustr = update_template.format(lims_session_type, mtrain_stages_id)
    if ustr not in my_updates:
        my_updates.append(ustr)
for u in my_updates:
    print(u)

djkapner · 2021-03-11T04:31:19Z

Wayne ran these mtrain updates, and this is fixed.
the mtrain configuration itself was fixed in October 2018 and these entries preceded that fix.

alexpiet · 2021-03-11T23:25:04Z

For manually updating mtrain, we should run these 9 commands:

UPDATE stages SET name='TRAINING_0_gratings_autorewards_15min' WHERE id=66
UPDATE stages SET name='TRAINING_1_gratings' WHERE id=67
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=62
UPDATE stages SET name='TRAINING_3_images_a_10uL_reward' WHERE id=63
UPDATE stages SET name='TRAINING_4_images_a_training' WHERE id=65
UPDATE stages SET name='TRAINING_4_images_a_handoff_ready' WHERE id=70
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=69
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=108
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=100

@djkapner This is a great solution except the pattern is TRAINING_3_images A_10uL_reward not TRAINING_3_images_a_10uL_reward, same for TRAINING_4_images_a_* (with the image set "a" captialized "A")

djkapner · 2021-03-11T23:42:57Z

@alexpiet
these names are what exist in the pkl files. These files are inputs to us. Thus far, when manual correction has been necessary for these files, your team has done it (@dougollerenshaw).
If you do change the files again, mtrain needs to be updated as well.

alexpiet · 2021-03-11T23:46:12Z

Ok thanks @djkapner.

@dougollerenshaw, can you update the pkl files so the session type is consistent?

djkapner · 2021-03-11T23:57:48Z

looks like 36 files:

['/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_746928360/180907132507_403491_e59bbcc2-f516-4357-9771-3cc2d40cc022.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_750415261/180910131659_403491_cf69cb33-fe93-40cd-b461-1ce9bc35facc.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_750781358/180911125733_403491_f704777c-51e6-43d0-ae06-4858f819dd13.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_752002559/180912140203_403491_a8982681-7cc7-42a0-998b-fa6786370b47.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_753011296/180913132324_403491_d128fed8-3f8a-4e9e-9d3a-be9821d7c566.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_753936007/180914144415_403491_1fb8d318-49c2-469b-9423-235dec7a1016.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_754509820/180917140145_403491_161ebbbc-379b-405e-bc71-75f476f84df8.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_754896244/180918143907_403491_75f0c88c-acaa-47d1-993e-592cbc4696dd.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_756106020/180920143927_403491_2ed1df9a-7fe2-4460-8d35-c82f4918fe56.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_756632727/180921134342_403491_862d3092-f7cd-4752-9183-67eff6433f7a.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_757431083/180924132535_403491_dbc1c5ac-3d7d-4333-a715-7ff651deac91.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_758162988/180925143806_403491_5b73fcf3-ee2b-4313-93d0-948f96f1d5ab.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_758942142/180926142214_403491_797d7b32-f9fc-4bbe-8012-e56690cce1a3.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_759251920/180927134942_403491_32d42ee8-e718-4eba-b902-aef185c5d0a4.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_759756829/181001142502_403491_d1f7ee69-023c-4da5-b023-ef902d38cf3e.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_760007428/181002140740_403491_61a70fd4-845f-4b0b-91b8-930ec1bfbf63.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_760409951/181003130612_403491_bc0f0741-1e8a-43e7-a1c2-20030d5c31d4.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_760698062/181004141444_403491_2bc1c7d8-b233-413f-9b16-6c8999a81ee4.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_761499069/181008134549_403491_f234b78a-f6c8-4123-8e91-5b3c37ead5a8.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_761831761/181009133840_403491_25b91f1a-0c34-4c62-a157-5fa48f51119c.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_762178061/181010141410_403491_30553aab-61b3-4a0b-b581-152af630f5e7.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_762722108/181011135443_403491_94505d0f-6175-4157-a1d8-9a4a50b7c01e.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_763291103/181012133035_403491_d5d2eb0f-121f-4900-9924-a9a5962a9f1d.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_763980798/181015142459_403491_a0958e10-307e-495c-9b37-645d44589fd3.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_764495320/181016133900_403491_7acf6b69-7da7-4637-b998-3afde524f981.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_764897570/181017135239_403491_70ccb550-821e-43bb-af15-5ff9a71aa89c.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_765396393/181018125644_403491_97f212db-d589-4d9a-8c4a-9d499175ceda.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_766110635/181019141354_403491_ad044630-1a02-4f38-8ea6-569dd30ef5d1.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_766753111/181022140853_403491_8fe66351-aa3c-4212-b384-1fb11982dc18.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_767275980/181023134243_403491_0e8cf5ed-c504-40d9-a4a9-16ff7bbb39ef.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_768042187/181024131007_403491_90df0395-4c76-443a-8eb5-8d64825dc922.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_768815421/181025141743_403491_7709cadd-b08b-476b-ab0c-fc5347942992.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_769452637/181026134231_403491_4035fc05-1c10-4372-b329-f7207a830aa3.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_769902997/181029130319_403491_df72472c-1809-4847-8680-deede567e6a2.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_770244427/181030131729_403491_cbc4388d-1c2f-4ee2-953d-ca17dfb6c95c.pkl']

djkapner · 2021-03-16T01:47:36Z

@wbwakeman

What I did not realize is that the pkl basename will change. So, I think there is an additional step to:

update pkl (Doug did)
replace production pkls (file below is the map)
update mtrain (I will provide update statements)
~~update LIMS (wkf name will have changed)~~
update NWB

maybe we should rehash this in standup tomorrow to be sure.

https://app.zenhub.com/files/35236880/1b84cd23-ad6b-4f39-b4f1-11daa945a455/download

wbwakeman · 2021-03-16T14:35:45Z

updated pkl files have been moved to production location. Each directory now has 3 copies of the pkl file. The original one from 2018, one that was replaced on Feb 22, 2021 (".bak") and the one replaced today (".bak2").

e.g.

-rwxrwxr-x 1 mongrel mongrel  9474829 Mar 16 07:26 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl
-rwxrwxrwx 1 mongrel mongrel 23818803 Sep 19  2018 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl.bak
-rwxrwxr-x 1 mongrel mongrel  9474815 Mar 16 07:25 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl.bak2

I don't think I understand this step. What wkf name has changed?

update LIMS (wkf name will have changed)

djkapner · 2021-03-16T15:26:30Z

You're right, I guess the basename has not changed. Somehow, I thought those last 10-11 characters before .pkl were different, but, they are not.

djkapner · 2021-03-16T15:32:37Z

These are the 5 mtrain update statements:

UPDATE stages SET name='TRAINING_3_images_A_10uL_reward' WHERE id=63
UPDATE stages SET name='TRAINING_4_images_A_training' WHERE id=65
UPDATE stages SET name='TRAINING_4_images_A_handoff_ready' WHERE id=70
UPDATE stages SET name='TRAINING_4_images_A_handoff_lapsed' WHERE id=69
UPDATE stages SET name='TRAINING_4_images_A_handoff_lapsed' WHERE id=108

wbwakeman · 2021-03-16T15:42:06Z

mtrain records updated now. Will restart Behavior NWB file creation for the 36

djkapner · 2021-03-16T15:51:33Z

confirmed. no longer getting lower-case a when using the BehaviorProjectCache/mtrain

wbwakeman · 2021-03-16T15:55:01Z

The 36 NWB files have been regenerated

alexpiet · 2021-03-18T02:22:02Z

This looks resolved from my end. Anything else before the issues gets closed?

alexpiet added the bug label Mar 2, 2021

wbwakeman added behavior braintv relates to Insitute BrainTV program labels Mar 2, 2021

alexpiet mentioned this issue Mar 8, 2021

get_filtered_behavior_session_table has bad session_types AllenInstitute/visual_behavior_analysis#722

Open

alexpiet closed this as completed Mar 8, 2021

alexpiet reopened this Mar 8, 2021

djkapner self-assigned this Mar 10, 2021

djkapner added this to the Pika 2021-03-12 milestone Mar 10, 2021

wbwakeman modified the milestones: Pika 2021-03-12, Pika 2021-03-26 Mar 12, 2021

wbwakeman closed this as completed Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visual Behavior release data has bad session type #1975

Visual Behavior release data has bad session type #1975

alexpiet commented Mar 2, 2021 •

edited by wbwakeman

Loading

djkapner commented Mar 5, 2021 •

edited

Loading

alexpiet commented Mar 5, 2021

alexpiet commented Mar 8, 2021

matchings commented Mar 8, 2021

djkapner commented Mar 10, 2021

djkapner commented Mar 10, 2021

djkapner commented Mar 11, 2021

alexpiet commented Mar 11, 2021

djkapner commented Mar 11, 2021

alexpiet commented Mar 11, 2021

djkapner commented Mar 11, 2021

djkapner commented Mar 16, 2021 •

edited by wbwakeman

Loading

wbwakeman commented Mar 16, 2021

djkapner commented Mar 16, 2021

djkapner commented Mar 16, 2021

wbwakeman commented Mar 16, 2021

djkapner commented Mar 16, 2021

wbwakeman commented Mar 16, 2021

alexpiet commented Mar 18, 2021

Visual Behavior release data has bad session type #1975

Visual Behavior release data has bad session type #1975

Comments

alexpiet commented Mar 2, 2021 • edited by wbwakeman Loading

djkapner commented Mar 5, 2021 • edited Loading

alexpiet commented Mar 5, 2021

alexpiet commented Mar 8, 2021

matchings commented Mar 8, 2021

djkapner commented Mar 10, 2021

djkapner commented Mar 10, 2021

djkapner commented Mar 11, 2021

alexpiet commented Mar 11, 2021

djkapner commented Mar 11, 2021

alexpiet commented Mar 11, 2021

djkapner commented Mar 11, 2021

djkapner commented Mar 16, 2021 • edited by wbwakeman Loading

wbwakeman commented Mar 16, 2021

djkapner commented Mar 16, 2021

djkapner commented Mar 16, 2021

wbwakeman commented Mar 16, 2021

djkapner commented Mar 16, 2021

wbwakeman commented Mar 16, 2021

alexpiet commented Mar 18, 2021

alexpiet commented Mar 2, 2021 •

edited by wbwakeman

Loading

djkapner commented Mar 5, 2021 •

edited

Loading

djkapner commented Mar 16, 2021 •

edited by wbwakeman

Loading