Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visual Behavior release data has bad session type #1975

Closed
3 tasks
alexpiet opened this issue Mar 2, 2021 · 19 comments
Closed
3 tasks

Visual Behavior release data has bad session type #1975

alexpiet opened this issue Mar 2, 2021 · 19 comments
Assignees
Labels
behavior braintv relates to Insitute BrainTV program bug

Comments

@alexpiet
Copy link
Contributor

alexpiet commented Mar 2, 2021

Describe the bug
When loading the list of training sessions that will be released in the march data release, there are some sessions that do not have the correct format of session type. For instance they have '1_gratings' instead of 'TRAINING_1_gratings'. It is limited to two donor_ids: 744911447, 722884873

@dougollerenshaw @matchings Says this could be an issue in the VBA repo

To Reproduce
table = loading.get_filtered_behavior_session_table(release_data_only=True) table.session_type.unique()

Actual Behavior
Screen Shot 2021-03-01 at 4 54 41 PM

Environment (please complete the following information):

  • OS & version: centos
  • Python version 3.7
  • AllenSDK version 2.8.0

Remaining work 3/12/2021

  • Ask Doug to manually update the 36 files listed below
  • Manually update records in mtrain to match
  • update relevant NWB files, and validate
@alexpiet alexpiet added the bug label Mar 2, 2021
@wbwakeman wbwakeman added behavior braintv relates to Insitute BrainTV program labels Mar 2, 2021
@djkapner
Copy link
Contributor

djkapner commented Mar 5, 2021

I think I have demonstrated:

  • there is not any underlying data issue
  • there is not any underlying SDK issue

so I suspect this might be a VBA problem (@alexpiet @dougollerenshaw):

In [57]: run problem_pkl.py
Getting behavior-only session data. This might take a while...
Getting all ophys sessions. This might take a while.
0 data problems
0 sdk problems

code (using dev branch of VBA and AllenSDK v2.8.0)

import numpy as np
import pandas as pd
from pathlib import Path

from visual_behavior.data_access import loading
from visual_behavior import database as db
from allensdk.brain_observatory.behavior.behavior_session import BehaviorSession

table = loading.get_filtered_behavior_session_table(release_data_only=True)
stypes = table.session_type.unique()
bad_stypes = [s for s in stypes
              if (not s.startswith("TRAINING")) &
                 (not s.startswith("OPHYS"))]

# behavior session ids with bad session_types
sids = []
for stype in bad_stypes:
    sids.extend(table[table.session_type == stype].index.values)
sids = np.unique(sids)

# get the pkl_paths for these sessions
qstring = (
        """
        SELECT wkf.storage_directory || wkf.filename as path
        FROM well_known_files as wkf
        JOIN well_known_file_types as wkft
        ON wkft.id=wkf.well_known_file_type_id
        WHERE wkft.name='StimulusPickle'
        AND attachable_id IN ({})""")
ids_as_str = ",".join([str(i) for i in sids])
qresult = db.lims_query(qstring.format(ids_as_str))
pkl_paths = [Path(i) for i in qresult['path'].values]

# check that the data itself is ok
data_problems = []
for pkl_path in pkl_paths:
    data = pd.read_pickle(pkl_path)
    s1 = data['items']['behavior']['params']['stage']
    s2 = data['items']['behavior']['cl_params']['stage']
    try:
        assert s1.startswith("OPHYS") | s1.startswith("TRAINING")
        assert s2.startswith("OPHYS") | s2.startswith("TRAINING")
        assert s1 == s2
    except AssertionError:
        data_problems.append(pkl_path)
print(f"{len(data_problems)} data problems")

# check what AllenSDK says
sdk_problems = []
for behavior_id in sids:
    bs = BehaviorSession.from_lims(behavior_id)
    st = bs.task_parameters['session_type']
    try:
        assert st.startswith("OPHYS") | st.startswith("TRAINING")
    except AssertionError:
        sdk_problems.append(behavior_id)
print(f"{len(sdk_problems)} sdk problems")

@alexpiet
Copy link
Contributor Author

alexpiet commented Mar 5, 2021

@matchings

@alexpiet
Copy link
Contributor Author

alexpiet commented Mar 8, 2021

I created a VBA issue: AllenInstitute/visual_behavior_analysis#722

@alexpiet alexpiet closed this as completed Mar 8, 2021
@matchings
Copy link
Collaborator

@alexpiet @djkapner loading.get_filtered_behavior_session_table() gets data from lims using the SDK BehaviorProjectCache

https://github.com/AllenInstitute/visual_behavior_analysis/blob/dev/visual_behavior/data_access/loading.py#L232

Loading behavior sessions from the SDK directly also shows cases where session_type does not start with TRAINING.

image
image

I don't think this is a VBA issue. This may be a case where @dougollerenshaw needs to edit the pkl files to correct the session_type so that it conforms to our expected naming scheme.

@alexpiet alexpiet reopened this Mar 8, 2021
@djkapner djkapner self-assigned this Mar 10, 2021
@djkapner djkapner added this to the Pika 2021-03-12 milestone Mar 10, 2021
@djkapner
Copy link
Contributor

Here's what is happening:
AllenSDK BehaviorProjectCache pings mtrain for stimulus_name (alternately session_type, alternately stage). It does this because mtrain has a table in it that contains both foraging_id or behavior_session_id and stimulus_name.
In VBA, AllenSDK is being used like this:

    cache = BehaviorProjectCache.from_lims()
    behavior_sessions = cache.get_behavior_session_table()

which creates a table of every behavior session ever. 27,000. To query the mtrain table takes 1.5 seconds.
LIMS, on the other hand, does not know about stimulus_name. Without using mtrain, AllenSDK needs to open a pkl file to get that information. To do that for 27000 behavior sessions takes 45 minutes.
I don't know the root cause, but, some pkl files had some problem stimulus_names. Doug fixed this manually in the pkl file, but, mtrain was never updated with this change.
We're planning to:

  • manually fix the incorrect mtrain entries
  • update LIMS so the behavior_sessions table also has stimulus_name as a column, so that we can distance ourselves and API from mtrain, which we don't really have control over.

@djkapner
Copy link
Contributor

For manually updating mtrain, we should run these 9 commands:

UPDATE stages SET name='TRAINING_0_gratings_autorewards_15min' WHERE id=66
UPDATE stages SET name='TRAINING_1_gratings' WHERE id=67
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=62
UPDATE stages SET name='TRAINING_3_images_a_10uL_reward' WHERE id=63
UPDATE stages SET name='TRAINING_4_images_a_training' WHERE id=65
UPDATE stages SET name='TRAINING_4_images_a_handoff_ready' WHERE id=70
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=69
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=108
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=100

generated from:

import json
import numpy as np
from visual_behavior.data_access import loading
from allensdk.internal.api import db_connection_creator
from allensdk.core.auth_config import MTRAIN_DB_CREDENTIAL_MAP
from allensdk.brain_observatory.behavior.behavior_session import BehaviorSession

# visual behavior release data
table = loading.get_filtered_behavior_session_table(release_data_only=True)
stypes = table.session_type.unique()
bad_stypes = [s for s in stypes
              if (not s.startswith("TRAINING")) &
                 (not s.startswith("OPHYS"))]

# behavior session ids with bad session_types
problem_ids = []
for stype in bad_stypes:
    problem_ids.extend(table[table.session_type == stype].index.values)
problem_ids = [int(i) for i in np.unique(problem_ids)]

qstring = """
          SELECT stages.id, stages.name as session_type
          FROM behavior_sessions bs
          JOIN stages ON stages.id = bs.state_id
          WHERE bs.id='{}'"""
mtrain = db_connection_creator(fallback_credentials=MTRAIN_DB_CREDENTIAL_MAP)
update_template = "UPDATE stages SET name='{}' WHERE id={}"

# figure out what update statements we need to run for mtrain
mymap = []
my_updates = []
for problem_id in problem_ids:
    session = BehaviorSession.from_lims(problem_id)
    foraging_id = session.api.extractor.foraging_id
    lims_session_type = session.task_parameters['session_type']
    mtrain_result = mtrain.select(qstring.format(foraging_id))
    mtrain_stages_id = int(mtrain_result['id'].values[0])
    mtrain_session_type = mtrain_result['session_type'].values[0]
    assert mtrain_session_type in lims_session_type
    mymap.append({
            "behavior_session_id": problem_id,
            "lims": lims_session_type,
            "mtrain": mtrain_session_type,
            "foraging_id": foraging_id,
            "mtrain_stages_id": mtrain_stages_id})
    ustr = update_template.format(lims_session_type, mtrain_stages_id)
    if ustr not in my_updates:
        my_updates.append(ustr)
for u in my_updates:
    print(u)

@djkapner
Copy link
Contributor

Wayne ran these mtrain updates, and this is fixed.
the mtrain configuration itself was fixed in October 2018 and these entries preceded that fix.

@alexpiet
Copy link
Contributor Author

For manually updating mtrain, we should run these 9 commands:

UPDATE stages SET name='TRAINING_0_gratings_autorewards_15min' WHERE id=66
UPDATE stages SET name='TRAINING_1_gratings' WHERE id=67
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=62
UPDATE stages SET name='TRAINING_3_images_a_10uL_reward' WHERE id=63
UPDATE stages SET name='TRAINING_4_images_a_training' WHERE id=65
UPDATE stages SET name='TRAINING_4_images_a_handoff_ready' WHERE id=70
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=69
UPDATE stages SET name='TRAINING_4_images_a_handoff_lapsed' WHERE id=108
UPDATE stages SET name='TRAINING_2_gratings_flashed' WHERE id=100

@djkapner This is a great solution except the pattern is TRAINING_3_images A_10uL_reward not TRAINING_3_images_a_10uL_reward, same for TRAINING_4_images_a_* (with the image set "a" captialized "A")

@djkapner
Copy link
Contributor

@alexpiet
these names are what exist in the pkl files. These files are inputs to us. Thus far, when manual correction has been necessary for these files, your team has done it (@dougollerenshaw).
If you do change the files again, mtrain needs to be updated as well.

@alexpiet
Copy link
Contributor Author

Ok thanks @djkapner.

@dougollerenshaw, can you update the pkl files so the session type is consistent?

@djkapner
Copy link
Contributor

looks like 36 files:

['/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_746928360/180907132507_403491_e59bbcc2-f516-4357-9771-3cc2d40cc022.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_750415261/180910131659_403491_cf69cb33-fe93-40cd-b461-1ce9bc35facc.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_750781358/180911125733_403491_f704777c-51e6-43d0-ae06-4858f819dd13.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_752002559/180912140203_403491_a8982681-7cc7-42a0-998b-fa6786370b47.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_753011296/180913132324_403491_d128fed8-3f8a-4e9e-9d3a-be9821d7c566.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_753936007/180914144415_403491_1fb8d318-49c2-469b-9423-235dec7a1016.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_754509820/180917140145_403491_161ebbbc-379b-405e-bc71-75f476f84df8.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_754896244/180918143907_403491_75f0c88c-acaa-47d1-993e-592cbc4696dd.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_756106020/180920143927_403491_2ed1df9a-7fe2-4460-8d35-c82f4918fe56.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_756632727/180921134342_403491_862d3092-f7cd-4752-9183-67eff6433f7a.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_757431083/180924132535_403491_dbc1c5ac-3d7d-4333-a715-7ff651deac91.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_758162988/180925143806_403491_5b73fcf3-ee2b-4313-93d0-948f96f1d5ab.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_758942142/180926142214_403491_797d7b32-f9fc-4bbe-8012-e56690cce1a3.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_759251920/180927134942_403491_32d42ee8-e718-4eba-b902-aef185c5d0a4.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_759756829/181001142502_403491_d1f7ee69-023c-4da5-b023-ef902d38cf3e.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_760007428/181002140740_403491_61a70fd4-845f-4b0b-91b8-930ec1bfbf63.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_760409951/181003130612_403491_bc0f0741-1e8a-43e7-a1c2-20030d5c31d4.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_760698062/181004141444_403491_2bc1c7d8-b233-413f-9b16-6c8999a81ee4.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_761499069/181008134549_403491_f234b78a-f6c8-4123-8e91-5b3c37ead5a8.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_761831761/181009133840_403491_25b91f1a-0c34-4c62-a157-5fa48f51119c.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_762178061/181010141410_403491_30553aab-61b3-4a0b-b581-152af630f5e7.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_762722108/181011135443_403491_94505d0f-6175-4157-a1d8-9a4a50b7c01e.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_763291103/181012133035_403491_d5d2eb0f-121f-4900-9924-a9a5962a9f1d.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_763980798/181015142459_403491_a0958e10-307e-495c-9b37-645d44589fd3.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_764495320/181016133900_403491_7acf6b69-7da7-4637-b998-3afde524f981.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_764897570/181017135239_403491_70ccb550-821e-43bb-af15-5ff9a71aa89c.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_765396393/181018125644_403491_97f212db-d589-4d9a-8c4a-9d499175ceda.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_766110635/181019141354_403491_ad044630-1a02-4f38-8ea6-569dd30ef5d1.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_766753111/181022140853_403491_8fe66351-aa3c-4212-b384-1fb11982dc18.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_767275980/181023134243_403491_0e8cf5ed-c504-40d9-a4a9-16ff7bbb39ef.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_768042187/181024131007_403491_90df0395-4c76-443a-8eb5-8d64825dc922.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_768815421/181025141743_403491_7709cadd-b08b-476b-ab0c-fc5347942992.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_769452637/181026134231_403491_4035fc05-1c10-4372-b329-f7207a830aa3.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_769902997/181029130319_403491_df72472c-1809-4847-8680-deede567e6a2.pkl',
 '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_770244427/181030131729_403491_cbc4388d-1c2f-4ee2-953d-ca17dfb6c95c.pkl']

@djkapner
Copy link
Contributor

djkapner commented Mar 16, 2021

@wbwakeman

What I did not realize is that the pkl basename will change. So, I think there is an additional step to:

  • update pkl (Doug did)
  • replace production pkls (file below is the map)
  • update mtrain (I will provide update statements)
  • update LIMS (wkf name will have changed)
  • update NWB

maybe we should rehash this in standup tomorrow to be sure.

https://app.zenhub.com/files/35236880/1b84cd23-ad6b-4f39-b4f1-11daa945a455/download

@wbwakeman
Copy link
Contributor

updated pkl files have been moved to production location. Each directory now has 3 copies of the pkl file. The original one from 2018, one that was replaced on Feb 22, 2021 (".bak") and the one replaced today (".bak2").

e.g.

-rwxrwxr-x 1 mongrel mongrel  9474829 Mar 16 07:26 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl
-rwxrwxrwx 1 mongrel mongrel 23818803 Sep 19  2018 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl.bak
-rwxrwxr-x 1 mongrel mongrel  9474815 Mar 16 07:25 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_755551879/180919132333_403491_5baf86f9-48b8-4638-b31e-55f23d25b57a.pkl.bak2

I don't think I understand this step. What wkf name has changed?

  • update LIMS (wkf name will have changed)

@djkapner
Copy link
Contributor

You're right, I guess the basename has not changed. Somehow, I thought those last 10-11 characters before .pkl were different, but, they are not.

@djkapner
Copy link
Contributor

These are the 5 mtrain update statements:

UPDATE stages SET name='TRAINING_3_images_A_10uL_reward' WHERE id=63
UPDATE stages SET name='TRAINING_4_images_A_training' WHERE id=65
UPDATE stages SET name='TRAINING_4_images_A_handoff_ready' WHERE id=70
UPDATE stages SET name='TRAINING_4_images_A_handoff_lapsed' WHERE id=69
UPDATE stages SET name='TRAINING_4_images_A_handoff_lapsed' WHERE id=108

@wbwakeman
Copy link
Contributor

mtrain records updated now. Will restart Behavior NWB file creation for the 36

@djkapner
Copy link
Contributor

confirmed. no longer getting lower-case a when using the BehaviorProjectCache/mtrain

@wbwakeman
Copy link
Contributor

The 36 NWB files have been regenerated

@alexpiet
Copy link
Contributor Author

This looks resolved from my end. Anything else before the issues gets closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
behavior braintv relates to Insitute BrainTV program bug
Projects
None yet
Development

No branches or pull requests

4 participants