Warning on RCA with chunk labeling not starting on zero or with gaps #275

grudloff · 2020-01-20T13:52:16Z

Description

RCA chunks are expected to start at zero and increase one by one, this raises a warning in case it doesn't start at zero or has any gap. Although these warnings are raised this doesn't affect the result of the RCA fit.

Maybe we should be more user friendly and allow the user to specify chunk ids by arbitrary non-negative integers (negative is interpreted as "not in chunk"), even if they do start at 0 and are not contiguous, just like we (and sklearn) do for methods which are fitted on a classic class vector y.

Steps/Code to Reproduce

from metric_learn import RCA
X = [[-0.05,  3.0],[0.05, -3.0],
    [0.1, -3.55],[-0.1, 3.55],
    [-0.95, -0.05],[0.95, 0.05],
    [0.4,  0.05],[-0.4, -0.05]]
chunks = [1, 1, 2, 2, 3, 3, 4, 4]

rca = RCA()
X=rca.fit_transform(X, chunks)

Expected Results

No thrown unexpected warnings.

Actual Results

The following warnings are thrown:

./metric-learn/rca.py:28: RuntimeWarning: Mean of empty slice.
  chunk_data[mask] -= chunk_data[mask].mean(axis=0)
./numpy/core/_methods.py:154: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)

Versions

Linux-5.0.0-37-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.22.1
Metric-Learn 0.5.0

The text was updated successfully, but these errors were encountered:

perimosocordiae · 2020-01-21T14:00:00Z

Yes, it would be nicer to do pre-processing on chunk labels with np.unique(), similar to scikit-learn. It's potentially a performance hit, though, so we might want to allow users to skip it.

grudloff · 2020-01-22T14:10:13Z

To solve this it should be necessary to replace the current max() computation by the unique() computation, I don't think the performance difference should be significant. In any case, this issue arises in the chunk mean centering which is going to be deprecated in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning on RCA with chunk labeling not starting on zero or with gaps #275

Warning on RCA with chunk labeling not starting on zero or with gaps #275

grudloff commented Jan 20, 2020

perimosocordiae commented Jan 21, 2020

grudloff commented Jan 22, 2020

Warning on RCA with chunk labeling not starting on zero or with gaps #275

Warning on RCA with chunk labeling not starting on zero or with gaps #275

Comments

grudloff commented Jan 20, 2020

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

perimosocordiae commented Jan 21, 2020

grudloff commented Jan 22, 2020