You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There might be a bug when running tabular.fit() and tabular.sample() with device='cpu' (might also be a case in relational models, haven't tested).
I have trained a tabular model with CPU with a dataframe containing the columns in the following example. Their original data types were {integer_as_str: object[str], integer: int64, float: float64, boolean: bool, datetime: datetime64[ns], string: object[str]}.
integer_as_str
integer
float
boolean
datetime
string
03
6214
54.09
false
2002-10-15 03:07:53
qyjib
31
2997
39.15
false
1999-05-18 01:09:18
mjuvv
38
3362
52.91
true
1999-08-27 10:44:03
ffskd
47
2286
50.68
false
1999-02-02 05:48:06
evqml
24
14482
77.8
true
2001-09-08 13:56:20
wieai
In my case, I want to be able to generate only values that are present in the training data, indepedently of their type. In other words, I don't want to generate new values, that do not exist in training data.
In order to be able to achieve that, I have experimented with adding a letter in the beginning of each value (see transformation example below). What I was expecting was to see no new values in any of the columns. Instead, what I got were values of another data type (if we ignored a_, b_, etc). For example I got in datetime column a value of b_2997 (valid value but for another column!!), or I got in float column a value of e_1999-02-02 05:48:06 (again valid value but for another column!!)
integer_as_str
integer
float
boolean
datetime
string
a_03
b_6214
c_54.09
d_false
e_2002-10-15 03:07:53
f_qyjib
a_31
b_2997
c_39.15
d_false
e_1999-05-18 01:09:18
f_mjuvv
a_38
b_3362
c_52.91
d_true
e_1999-08-27 10:44:03
f_ffskd
a_47
b_2286
c_50.68
d_false
e_1999-02-02 05:48:06
f_evqml
a_24
b_14482
c_77.8
d_true
e_2001-09-08 13:56:20
f_wieai
Let me note here, that everything works as expected when both tabular.fit() and tabular.sample() run with device='cuda'. What do you think of this? Maybe this is a bug that happens only with CPU?
The text was updated successfully, but these errors were encountered:
Thanks for the quick response! I am attaching here a zip with the colab notebook, which has a working example for you to be able to reproduce. There is a section in the end where you can check if new values have been generated.
Hello @avsolatorio,
There might be a bug when running tabular.fit() and tabular.sample() with device='cpu' (might also be a case in relational models, haven't tested).
I have trained a tabular model with CPU with a dataframe containing the columns in the following example. Their original data types were {integer_as_str: object[str], integer: int64, float: float64, boolean: bool, datetime: datetime64[ns], string: object[str]}.
In my case, I want to be able to generate only values that are present in the training data, indepedently of their type. In other words, I don't want to generate new values, that do not exist in training data.
In order to be able to achieve that, I have experimented with adding a letter in the beginning of each value (see transformation example below). What I was expecting was to see no new values in any of the columns. Instead, what I got were values of another data type (if we ignored a_, b_, etc). For example I got in datetime column a value of b_2997 (valid value but for another column!!), or I got in float column a value of e_1999-02-02 05:48:06 (again valid value but for another column!!)
Let me note here, that everything works as expected when both tabular.fit() and tabular.sample() run with device='cuda'. What do you think of this? Maybe this is a bug that happens only with CPU?
The text was updated successfully, but these errors were encountered: