Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify code for linear transformers #332

Open
mattansb opened this issue Dec 12, 2022 · 1 comment
Open

Simplify code for linear transformers #332

mattansb opened this issue Dec 12, 2022 · 1 comment
Assignees

Comments

@mattansb
Copy link
Member

mattansb commented Dec 12, 2022

All of the following are linear transformers:

  • center()
  • standardize()
  • slide()
  • reverse()
  • rescale()
  • normalize()

So hypothetically we should be able to simplify their code to a common underlying function. Idially, each function would compute the needed shift and scale values, and these will be passed to standardize(..., center = <shift>, scale = <scale>).
This will have 0 effect on usability, but will make maintenance of the code much easier.

Function Supports weights? Shift Scale
center() mean(x) or median(x) 1
standardize() mean(x) or median(x) sd(x) or mad(x)
slide() min(x) - lowest 1
reverse() -(max(x)+min(x)) -1
rescale() min(x)*<scale>-to[1] (to[2]-to[1])/(max(x)-min(x))
normalize() same as rescale(x,to=c(0,1)+c(-1,1)*include_bounds)
where include_bounds is a number [0,1] or 0.5/length(x)

In any place where a reference= arg is provided, replace x with reference above.


All functions would have to deal with missing values, non-finite values, and labels.

@strengejacke
Copy link
Member

I looked at the code, and I'm not sure if there's really much we can simplify. Due to the possible transformation inside formulas, we have the dw_transformer class, which requires different attributes (depending on the transformation function), so we need these information anyway, and can't "standardize" this code. Furthermore, there are a few exceptions that we must handle anyway, e.g. reversing for factors (where min()/max() don't work).

You can try to improve things here, but my first impression is that not much simplification is possible beyond what we already did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants