[ConceptDriftStream] How generate a sinthetic dataset with set size? #1152

albertfrancajosuacosta · 2023-01-09T04:26:36Z

albertfrancajosuacosta
Jan 9, 2023

Hey.

I need to generate a sinthetic dataset with set size, this dataset there are concept drift.
Example.

stream1 = Sine with classification function equal to 1 and with size of 1000 samples.
stream2 = Sine with classification function equal to 0 and with size of 1000 samples.
stream3 - Sine with classification function equal to 2 and with size of 1000 samples.

final_stream = concate (stream1, stream2, stream3) with size of 3000 samples.

When i try

dataset = synth.ConceptDriftStream(stream=synth.SEA(seed=42, variant=0), drift_stream=synth.SEA(seed=42, variant=1), seed=1, position=5, width=2)

I get a dataset with number of samples equal to infinity.

In the last code, if i try

dataset.take(2000):

I will have the first 1000 sample of stream and the last 1000 samples of drift_stream ?

I need to generate more that 1 drift, similar to how it's done in Weka?

Thank you.

Albert França Josuá Costa

MaxHalford · 2023-01-09T06:40:40Z

MaxHalford
Jan 9, 2023
Maintainer

Hey there. It's a good question! The answer is actually quite simple. The trick is that synth.ConceptDriftStream is too fancy: it simulates a smooth drift, by transitioning from one dataset to another with a sigmoid.

You're looking for an abrupt drift, which can done by simply concatenating datasets. In Python, the itertools.chain function can be used to concatenate two generators (River datasets are essentially generators). Here is an example:

import itertools
from river import datasets

dataset = itertools.chain(
    datasets.synth.SEA(seed=42, variant=0).take(1000),
    datasets.synth.SEA(seed=42, variant=1).take(1000)
)

for x, y in dataset:
    ...

I hope that helps! Have a good week.

3 replies

albertfrancajosuacosta Jan 9, 2023
Author

Thank you for your answer.

I will try to use itertools.chain.

But, i have more one doubt.

I can to generate gradual drift itertools.chain?

MaxHalford Jan 9, 2023
Maintainer

I can to generate gradual drift itertools.chain?

As I said, chaining two datasets with itertools.chain produces an abrupt drift: you switch from one dataset to the other.

On the other hand, synth.ConceptDriftStream will generate a gradual drift. For instance, if you want 2000 samples, with a gradual drift occurring around the 1000th sample:

import itertools
from river import datasets

dataset = datasets.synth.ConceptDriftStream(
    stream=datasets.synth.SEA(seed=42, variant=0),
    drift_stream=datasets.synth.SEA(seed=42, variant=1),
    position=1000,
    width=100,
    seed=1
)

for x, y in dataset.take(2000):
    ...

albertfrancajosuacosta Jan 9, 2023
Author

Thank you.

albertfrancajosuacosta · 2023-03-17T17:53:17Z

albertfrancajosuacosta
Mar 17, 2023
Author

Resolved.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ConceptDriftStream] How generate a sinthetic dataset with set size? #1152

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[ConceptDriftStream] How generate a sinthetic dataset with set size? #1152

albertfrancajosuacosta Jan 9, 2023

Replies: 2 comments · 3 replies

MaxHalford Jan 9, 2023 Maintainer

albertfrancajosuacosta Jan 9, 2023 Author

MaxHalford Jan 9, 2023 Maintainer

albertfrancajosuacosta Jan 9, 2023 Author

albertfrancajosuacosta Mar 17, 2023 Author

albertfrancajosuacosta
Jan 9, 2023

Replies: 2 comments 3 replies

MaxHalford
Jan 9, 2023
Maintainer

albertfrancajosuacosta Jan 9, 2023
Author

MaxHalford Jan 9, 2023
Maintainer

albertfrancajosuacosta Jan 9, 2023
Author

albertfrancajosuacosta
Mar 17, 2023
Author