You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a logging obfuscation function where we simulate the behaviours of patients logging their meal
All meals - keep all meals for now
Multiple meals per day (1-2 largest meals) - Find a threshold so that we have an average of 1.8 meals logged per day
Once per day (largest meal) - Find the largest one in a day
A few times per week - Find a threshold so that we have an average of 3 meals logged per week
Never - Wipe all data
We have a logging timing habit function where we simulate the habits of patients logging when theyare actually log their meals.
Temporally right skewed -> forgetful loggers - Gamma function with right-skewed. Fixed value distribution with minor randomness.
Temporally left skewed -> hasty loggers - Gamma function with left-skewed (less skewed because a patient probably won't log their meal too early most of the time) - Fixed value distribution with minor randomness.
Normal Distribution - Gaussian distribution with fixed valued spread
Unchanged
Data flow:
data/raw/sim -> logging obfuscation function to create msg_type_log -> logging timing habit function to create 'msg_type_log_shiftedfrommsg_type_log->data/raw/obfuscated`
Improvement:
Find out the right distribution between each type of user for both functions. For example, loggers who might log all of their meal consist of 25% rather than 30%.
Fine-tune the default distribution (we need a better param for gamma distribution to reflect the true behaviour of patients) or find a better distribution.
Left and right skewed distribution should be different. For hasty loggers, maybe on average, they log their meals 10 mins early and probably wouldn't be longer than that but for forgetful loggers, it may go up to >40 mins.
Remove the original csv file when generating a new file name (bug)
Investigate new line characters at the end of some files (bug?)
Clean up columns from the simulation_data_generation script. We have Unnamed: 0 column maybe we should have dropped it.
The text was updated successfully, but these errors were encountered:
State of first draft data obfuscation:
All meals - keep all meals for now
Multiple meals per day (1-2 largest meals) - Find a threshold so that we have an average of 1.8 meals logged per day
Once per day (largest meal) - Find the largest one in a day
A few times per week - Find a threshold so that we have an average of 3 meals logged per week
Never - Wipe all data
Temporally right skewed -> forgetful loggers - Gamma function with right-skewed. Fixed value distribution with minor randomness.
Temporally left skewed -> hasty loggers - Gamma function with left-skewed (less skewed because a patient probably won't log their meal too early most of the time) - Fixed value distribution with minor randomness.
Normal Distribution - Gaussian distribution with fixed valued spread
Unchanged
Data flow:
data/raw/sim
->logging obfuscation function
to createmsg_type_log
->logging timing habit function
to create 'msg_type_log_shiftedfrom
msg_type_log->
data/raw/obfuscated`Improvement:
Find out the right distribution between each type of user for both functions. For example, loggers who might log all of their meal consist of 25% rather than 30%.
Fine-tune the default distribution (we need a better param for gamma distribution to reflect the true behaviour of patients) or find a better distribution.
Left and right skewed distribution should be different. For hasty loggers, maybe on average, they log their meals 10 mins early and probably wouldn't be longer than that but for forgetful loggers, it may go up to >40 mins.
Remove the original csv file when generating a new file name (bug)
Investigate new line characters at the end of some files (bug?)
Clean up columns from the
simulation_data_generation
script. We haveUnnamed: 0
column maybe we should have dropped it.The text was updated successfully, but these errors were encountered: