How is "train_data_processed_w_static.csv" obtained for out-of-domain task on MIMIC? #23

jjgarciac · 2022-02-24T23:12:22Z

When executing python3 src/experiments/out_of_domain.py --models PPCA, I encounter: FileNotFoundError: (...)/in-hospital-mortality/train_data_processed_w_static.csv

I performed the 6 pre-processing steps listed here to setup the MIMIC mortality benchmark. The resulting directory does not include the file causing the error. It is as follows:
-in-hospital-mortality/
--train/
--test/

Note: I am using the MIMIC-III-demo dataset; I was able to run python -um mimic3models.in_hospital_mortality.logistic.main --l2 --C 0.001 --output_dir mimic3models/in_hospital_mortality/logistic from mimic3-benchmarks.

The text was updated successfully, but these errors were encountered:

Kaleidophon · 2022-03-05T17:41:56Z

Hey! Sorry for the late response. I haven't touched this project in a long while, but it suspect that perhaps the preprocessing pipeline from the MIMIC benchmark produces train / test splits that are named differently than in this repo. The names of the splits are defined in this module, and you could try changing them according to your output files in your local clone.

It occurs to me that based on this script that at least the test split output is simply named testset.csv. What is your training set file called after running the pipeline?

jjgarciac · 2022-03-05T23:04:29Z

No worries and thank you for the response.

After running the pipeline, the \train and \test folders contain multiple files. Inside \train, there are 46 <.#>_episode<#>_timeseries.csv (e.g. 10013_episode1_timeseries.csv) and a single listfile.csv. <.#>_episode<#>_timeseries.csv contains the feature values. listfile.csv relates the timeseries files to a label (e.g. 10013_episode1_timeseries.csv, 0). The format of the files is the same for \test

What format should the files in that module be?

I also notice that train_X array inside this function produces the feature set described in the paper (with the inclusion of a couple extra variables).

Kaleidophon · 2022-03-08T12:10:21Z

Mhh, is that the output even after running the python -m mimic3benchmark.scripts.create_in_hospital_mortality data/root/ data/in-hospital-mortality/ command? I will check my the format of the dataset again until the end of the week. Until then, I will also mention @karinazad and especially @LMeijerink here in case they have some advice on this issue.

jjgarciac · 2022-03-08T19:49:31Z

Yes. Though this is expected, as it is mentioned here:
"After the above commands are done, there will be a directory data/{task} for each created benchmark task. These directories have two sub-directories: train and test. Each of them contains bunch of ICU stays and one file with name listfile.csv, which lists all samples in that particular set. Each row of listfile.csv has the following form: icu_stay, period_length, label(s). A row specifies a sample for which the input is the collection of ICU event of icu_stay that occurred in the first period_length hours of the stay and the target is/are label(s). In in-hospital mortality prediction task period_length is always 48 hours, so it is not listed in corresponding listfiles."

Kaleidophon · 2022-03-12T13:17:02Z

Heyo! I was looking at some code again. Since this project happened some time ago and I hadn't worked on that specific aspect of it, I am not sure if I can comprehensively help you solving this problem, unfortunately. Since I do not work at Pacmed anymore, I also don't have access to data, so I can't provide any more detail on the format of the dataset. It should correspond to the names in this pickle file, though. What I feel like I am understanding is the following:

After producing the timeseries, it does seem like me that the function you pointed out here seems to be useful to read the time series. You might be able to look at this script here for eICU to get a better idea on how to process the resulting data to reproduce the data as mentioned in the paper.
Afterwards, adapt the paths in this module to correspond to the file you created.

I am very sorry reproducing this part is being such a hassle! Also mentioning @Giovannicina in case he might be able to provide some more info on this problem, and best of luck. In case you do figure out the procedure, let me know here so that we can improve the documentation of the repo in this regard.

yowald · 2022-06-09T07:12:00Z

Hi there. @jjgarciac, did you find a solution to this issue?
I was also trying this out and ran into the same error. Any help from the authors would also be greatly appreciated of course.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is "train_data_processed_w_static.csv" obtained for out-of-domain task on MIMIC? #23

How is "train_data_processed_w_static.csv" obtained for out-of-domain task on MIMIC? #23

jjgarciac commented Feb 24, 2022 •

edited

Loading

Kaleidophon commented Mar 5, 2022

jjgarciac commented Mar 5, 2022 •

edited

Loading

Kaleidophon commented Mar 8, 2022

jjgarciac commented Mar 8, 2022

Kaleidophon commented Mar 12, 2022

yowald commented Jun 9, 2022

How is "train_data_processed_w_static.csv" obtained for out-of-domain task on MIMIC? #23

How is "train_data_processed_w_static.csv" obtained for out-of-domain task on MIMIC? #23

Comments

jjgarciac commented Feb 24, 2022 • edited Loading

Kaleidophon commented Mar 5, 2022

jjgarciac commented Mar 5, 2022 • edited Loading

Kaleidophon commented Mar 8, 2022

jjgarciac commented Mar 8, 2022

Kaleidophon commented Mar 12, 2022

yowald commented Jun 9, 2022

jjgarciac commented Feb 24, 2022 •

edited

Loading

jjgarciac commented Mar 5, 2022 •

edited

Loading