Skip to content

Commit

Permalink
First commmit
Browse files Browse the repository at this point in the history
  • Loading branch information
MogicianXD committed Dec 25, 2024
1 parent c83e9f5 commit c267dc8
Show file tree
Hide file tree
Showing 184 changed files with 13,772 additions and 5 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,5 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/
/dataset/*.csv
82 changes: 78 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,82 @@
![Stars](https://img.shields.io/github/stars/SJTU-DMTai/OnlineTSF)
[![Visits Badge](https://badges.pufler.dev/visits/SJTU-DMTai/OnlineTSF)](https://badges.pufler.dev/visits/SJTU-Quant/OnlineTSF)

This repo will provide the official code of PROCEED, an online time series forecasting method proposed in [Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting](https://arxiv.org/pdf/2412.08435) (Research Track in KDD 2025).
This is an online time series forecasting framework empowered by online model adaptation (i.e., continual model updates).

> We are refining our paper (arxiv version) with more experiments and details this week.
>
> After that, we will finish preparing the code and release it soon! Please stay tuned!
This repo also contains the official code of PROCEED, a novel online method proposed in [Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting](https://arxiv.org/pdf/2412.08435) (Research Track in KDD 2025).

## 🚀Outstanding Features
### No Information Leakage
Our key observation is that,
the ground truth of each time series sample is not available until after the forecast horizon.
However, most existing online learning implementations involve information leakage as
they update models with the latest forecasted sample.

Thus, we timely provide this framework that always update the model with the observed ground truth.

### Supported Methods
- **Proceed**: Proactive Model Adaptation. [[KDD 2025]](https://arxiv.org/pdf/2412.08435) [[Code]](https://github.com/SJTU-DMTai/OnlineTSF)
- **SOLID**: Sample-level Contextualized Adapter. [[KDD 2024]](https://arxiv.org/abs/2310.14838) [[Code]](https://github.com/HALF111/calibration_CDS)
- **OneNet**: Online Ensembling Network. [[NeurIPS 2024]](https://arxiv.org/abs/2309.12659) [[Code]](https://github.com/yfzhang114/OneNet)
- **FSNet**: Learning Fast and Slow. [[ICLR 2023]](https://openreview.net/forum?id=RNMIIZwzCg) [[Code]](https://github.com/salesforce/fsnet)
- **DER++**: Dark Experience Replay. [[NeurIPS 2020]](https://arxiv.org/abs/2004.07211) [[Code]](https://github.com/aimagelab/mammoth)
- **ER**: Experience Replay.
- **Naive**: Online Gradient Descent.

### Fair Experiments and Easier Development
Another observation is that most methods conduct experiments in isolated settings,
where the optimizer and other training details may differ from each other.
Also, their codes are of low reusability as all operations mixed together.
It would be hard for the followers to keep fair comparison by carefully checking their coding details.

In this framework, we make an endeavor to improve the code structure.
We develop some basic functions and override them with existing methods,
making the method differences more clear for the readers now.

It would also be easier for the followers to explore different experiment settings by modifying the common functions shared by all methods.

### Full Data Utilization
To mitigate data hungry and concept drift,
we involve the validation data in online learning.
It is also noteworthy that the common practice leaves out several time series samples to
keep no overlap between the training & validation & test data.
Assuming that online learning is a continual process, we also take these samples for updates.


## Usage
### Scripts
Please refer to the directory `scripts`.

First run scripts for pretraining; then run scripts for online learning.

### Arguments
Basic arguments:
- `--model` decides the forecast backbone.
- `--seq_len` decides the lookback length.
- `--pred_len` decides the forecast horizon.
- `--dataset` decides the dataset of which the file path is configured in `settings.py`.
- `--learning_rate` controls the learning rate when training on historical data.
- `--online_learning_rate`: controls the learning rate when training on online data.
> By default, the argument `--border_type` is set to `online` for a data split of 20:5:75 for training/validation/test.
> Empirically, we can set a higher value of `online_learning_rate` than the value tuned on the validation data,
> since the online phase is much longer.
> In practice, we can periodically tune the online learning rate on new recent data, instead of keeping a fixed one.
Hyperparameters of Proceed:
- `--concept_dim` corresponds to $d_c$ in our paper.
- `--bottleneck_dim` corresponds to $r$ in our paper.


## Citation
If you find this repo useful, please consider citing:
```
@InProceedings{Proceed,
author = {Lifan Zhao and Yanyan Shen},
booktitle = {Proceedings of the 31st {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining},
title = {Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting},
year = {2025},
month = {feb},
publisher = {{ACM}},
doi = {10.1145/3690624.3709210},
}
```
19 changes: 19 additions & 0 deletions adapter/module/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

import torch.nn as nn


class Adaptation(object):
def __init__(self, flag_adapt_bias: bool, flag_adapt_weight: bool = True,
merge_weights: bool = True, freeze_weight: bool = True, freeze_bias: bool = True):
assert isinstance(self, nn.Module)
self.flag_adapt_bias = flag_adapt_bias
self.flag_adapt_weight = flag_adapt_weight
self.weight.requires_grad = not freeze_weight
if self.bias is not None:
self.bias.requires_grad = not freeze_bias
# Mark the weight as unmerged
self.merged = False
self.merge_weights = merge_weights

def assign_adaptation(self, adaptation):
raise NotImplementedError
181 changes: 181 additions & 0 deletions adapter/module/down_up.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# ------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License (MIT). See LICENSE in the repo root for license information.
# ------------------------------------------------------------------------------------------
import torch
import torch.nn as nn
import torch.nn.functional as F
import transformers

from adapter.module import ssf, up
from adapter.module.up import LayerNorm, BatchNorm1d, RevIN, _RevIN


def add_down_up_(parent_module: nn.Module, module_name: str, freeze_weight: bool,
merge_weights=False, load_weights=True, **kwargs):
old_module = getattr(parent_module, module_name)
if isinstance(old_module, nn.Linear):
new_module = Linear(in_features=old_module.in_features, out_features=old_module.out_features,
bias=old_module.bias is not None, freeze_weight=freeze_weight,
device=old_module.weight.device, dtype=old_module.weight.dtype,
merge_weights=merge_weights, **kwargs)
elif isinstance(old_module, transformers.Conv1D):
new_module = TFMConv1D(nx=old_module.weight.shape[0], nf=old_module.nf, merge_weights=merge_weights,
freeze_weight=freeze_weight, **kwargs)
elif isinstance(old_module, _RevIN) and old_module.affine:
new_module = RevIN(num_features=old_module.num_features, eps=old_module.eps, affine=True,
subtract_last=old_module.subtract_last,
merge_weights=merge_weights, freeze_weight=freeze_weight, **kwargs)
elif isinstance(old_module, nn.BatchNorm1d) and old_module.affine:
new_module = BatchNorm1d(num_features=old_module.num_features, eps=old_module.eps,
momentum=old_module.momentum, affine=old_module.affine,
track_running_stats=old_module.track_running_stats,
device=old_module.weight.device, dtype=old_module.weight.dtype,
merge_weights=merge_weights, freeze_weight=freeze_weight, **kwargs)
elif isinstance(old_module, nn.LayerNorm) and len(old_module.normalized_shape) == 1 and old_module.elementwise_affine:
new_module = LayerNorm(normalized_shape=old_module.normalized_shape[-1], eps=old_module.eps,
elementwise_affine=True, device=old_module.weight.device, dtype=old_module.weight.dtype,
merge_weights=merge_weights, freeze_weight=freeze_weight, **kwargs)
elif isinstance(old_module, nn.Conv1d):
new_module = Conv1d(old_module.in_channels, old_module.out_channels,
kernel_size=old_module.kernel_size, stride=old_module.stride, padding=old_module.padding,
dilation=old_module.dilation, groups=old_module.groups, bias=old_module.bias is not None,
padding_mode=old_module.padding_mode,
device=old_module.weight.device, dtype=old_module.weight.dtype,
freeze_weight=freeze_weight, merge_weights=merge_weights, **kwargs)
else:
raise NotImplementedError
if load_weights:
new_module.load_state_dict(old_module.state_dict(), strict=False)
setattr(parent_module, module_name, new_module)


class Down_Up(up.Adaptation_Up):
def __init__(self, in_features: int, *args, **kwargs):
up.Adaptation_Up.__init__(self, *args, **kwargs)
self.in_features = in_features
self.register_buffer('scale2', None, persistent=False)

def assign_adaptation(self, adaptation):
if adaptation is None:
self.scale, self.scale2, self.shift = None, None, None
else:
self.scale = adaptation[..., :self.out_features] + 1
self.scale2 = adaptation[..., self.out_features:self.out_features + self.in_features] + 1
if self.flag_adapt_bias:
self.shift = adaptation[..., -self.out_features:]
if self.scale.dim() == 2:
self.scale = self.scale.unsqueeze(1)
self.scale2 = self.scale2.unsqueeze(1)
self.shift = self.shift.unsqueeze(1)

def _merge(self, weight, bias):
weight, bias = super()._merge(weight, bias)
if weight is not None and self.flag_adapt_weight:
scale2 = self.scale2.squeeze()
if self.fan_in_fan_out:
weight = weight * scale2.reshape(scale2.shape[-1:] + (1,) * (weight.dim() - 1))
else:
weight = weight * scale2.reshape((1, scale2.shape[-1]) + (1,) * (weight.dim() - 2))
return weight, bias

def _ssf_input(self, x: torch.Tensor):
batch_size = x.size()[:-1]
x = x.reshape(self.scale2.shape[0], -1, x.shape[-1])
return (x * self.scale2).view(*batch_size, x.shape[-1])

def _ssf(self, res: torch.Tensor):
batch_size = res.size()[:-1]
res = res.view(self.scale.shape[0], -1, res.shape[-1])
if self.bias is not None:
res = res * self.scale + (self.shift + self.bias)
else:
res = res * self.scale
return res.view(*batch_size, res.shape[-1])


class Linear(Down_Up, ssf.Linear):
def __init__(
self,
in_features: int,
out_features: int,
bias: bool = True,
device=None, dtype=None,
merge_weights: bool = True, freeze_weight: bool = True,
**kwargs
):
nn.Linear.__init__(self, in_features, out_features, bias=bias, device=device, dtype=dtype)
Down_Up.__init__(self, in_features=in_features, out_features=out_features, flag_adapt_bias=bias,
merge_weights=merge_weights, freeze_weight=freeze_weight, **kwargs)

def forward(self, x: torch.Tensor):
if not hasattr(self, 'scale') or self.scale is None:
return nn.Linear.forward(self, x)
if self.merged:
return F.linear(x, self.weight, bias=self.bias)
elif self.scale.shape[0] == 1 and self.merge_weights:
weight, bias = self._merge(self.weight, self.bias)
return F.linear(x, weight, bias=bias)
else:
return self._ssf(F.linear(self._ssf_input(x), self.weight, bias=None))


class TFMConv1D(Down_Up, ssf.AttnConv1D):
def __init__(
self,
nx: int,
nf: int,
merge_weights: bool = True, freeze_weight: bool = True,
**kwargs
):
transformers.Conv1D.__init__(self, nx=nx, nf=nf)
Down_Up.__init__(self, in_features=nx, out_features=nf, flag_adapt_bias=True, fan_in_fan_out=True,
merge_weights=merge_weights, freeze_weight=freeze_weight, **kwargs)

def forward(self, x: torch.Tensor):
if not hasattr(self, 'scale') or self.scale is None:
return transformers.Conv1D.forward(self, x)
if self.merged:
size_out = x.size()[:-1] + (self.nf,)
return torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight).view(size_out)
elif self.scale.shape[0] == 1 and self.merge_weights:
weight, bias = self._merge(self.weight, self.bias)
return F.linear(x, weight.transpose(-1, -2), bias=bias)
else:
return self._ssf(F.linear(self._ssf_input(x), self.weight.transpose(-1, -2), bias=None))


class Conv1d(Down_Up, ssf.Conv1d):
def __init__(self, in_channels: int, out_channels: int, kernel_size: int,
stride=1, padding=0, dilation=1, groups: int = 1, bias: bool = True,
padding_mode: str = 'zeros', device=None, dtype=None,
merge_weights: bool = True, freeze_weight: bool = True, **kwargs):
nn.Conv1d.__init__(self, in_channels, out_channels, kernel_size, stride=stride, padding=padding, dilation=dilation,
groups=groups, bias=bias, padding_mode=padding_mode, device=device, dtype=dtype)
Down_Up.__init__(self, in_features=in_channels, out_features=out_channels, flag_adapt_bias=bias,
merge_weights=merge_weights, freeze_weight=freeze_weight, **kwargs)

def forward(self, x):
if not hasattr(self, 'scale') or self.scale is None:
return nn.Conv1d.forward(self, x)
if self.merged:
return self._conv_forward(x, self.weight, bias=self.bias)
elif self.scale.shape[0] == 1 and self.merge_weights:
weight, bias = self._merge(self.weight, self.bias)
return self._conv_forward(x, weight, bias=bias)
else:
return self._ssf(self._conv_forward(self._ssf_input(x), self.weight, bias=None))

def _ssf_input(self, x: torch.Tensor):
batch_size = x.size()[:-2]
x = x.view(self.scale2.shape[0], -1, *x.shape[-2:]) * self.scale2.unsqueeze(-1)
return x.view(*batch_size, *x.shape[-2:])

def _ssf(self, res: torch.Tensor):
batch_size = res.size()[:-2]
res = res.view(self.scale.shape[0], -1, *res.shape[-2:])
if self.bias is not None:
res = res * self.scale.unsqueeze(-1) + (self.shift + self.bias).unsqueeze(-1)
else:
res = res * self.scale.unsqueeze(-1)
return res.view(*batch_size, *res.shape[-2:])
99 changes: 99 additions & 0 deletions adapter/module/generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import collections
import math

import torch
import torch.nn as nn
import transformers
import typing

from adapter.module import down_up
from adapter.module.base import Adaptation


def clip(x, max_norm=1):
x_norm = torch.norm(x, dim=-1, keepdim=True)
scale = torch.clip(max_norm / x_norm, max=1)
return x * scale, x_norm


class Bottleneck(nn.Module):
def __init__(self, in_dim: int, out_dim: int, n_layers: int,
bottleneck_dim: typing.Union[typing.List[int], int], activation=nn.LeakyReLU, need_bias: bool = False,
shared=False, rand_init: dict = None):
super().__init__()
self.out_dim = out_dim
self.activation = activation()
self.shared = shared
self.need_bias = need_bias
self.biases = nn.ParameterList([nn.Parameter(torch.empty(n_layers, 1, bottleneck_dim))])
self.weights = nn.ParameterList([nn.Linear(in_dim, bottleneck_dim, bias=False) if self.shared else
nn.Parameter(torch.empty(n_layers, 1, bottleneck_dim, in_dim))])
if self.need_bias:
self.biases.append(nn.Parameter(torch.zeros(n_layers, 1, out_dim)))
self.weights.append(nn.Parameter(torch.zeros(1 if self.shared else n_layers, 1, out_dim, bottleneck_dim)))
for i in range(0, len(self.weights) - 1):
for j in range(n_layers):
fan_in, _ = nn.init._calculate_fan_in_and_fan_out(
self.weights[i].weight if shared else self.weights[i][0, 0])
bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
nn.init.uniform_(self.biases[i][j], -bound, bound)
if not self.shared:
nn.init.kaiming_uniform_(self.weights[i][j, 0], a=math.sqrt(5))
if self.need_bias:
nn.init.zeros_(self.biases[-1])
if rand_init is not None:
for pos, (r, in_fea) in rand_init.items():
nn.init.kaiming_uniform_(self.biases[-1][pos, :, :r * in_fea], a=math.sqrt(5))

self.memories = None
self.last_adaptation = None

def forward(self, x, mask=None, training=False):
if self.shared:
x = self.activation(self.weights[0](x) + self.biases[0]).unsqueeze(-1)
else:
x = self.activation(self.weights[0] @ x.unsqueeze(-1) + self.biases[0].unsqueeze(-1))
x = self.weights[-1] @ x
x = x.squeeze(-1)
if training and mask is not None:
x = x * mask.unsqueeze(-1)
if self.need_bias:
x = x + self.biases[-1]
return x


class AdaptGenerator(nn.Module):
def __init__(self, backbone: nn.Module, concept_features: int, activation=nn.LeakyReLU,
shared: bool = True, need_bias: bool = True, adaptive_dim: bool = False, mid_dim: int = None):
super().__init__()
self.dim_name_dict = collections.defaultdict(list)
self.bottlenecks = nn.ModuleDict()
self.loras = nn.ModuleDict()
self.memories = None
self.memories_lora = None
for name, module in backbone.named_modules():
if isinstance(module, Adaptation):
_weight = module.affine_weight if hasattr(module, 'affine_weight') else module.weight
out_features = _weight.shape[1 if isinstance(module, transformers.Conv1D) else 0]
if _weight.dim() == 1:
in_features = 1
else:
in_features = _weight.shape[0 if isinstance(module, transformers.Conv1D) else 1]
out_dim = out_features
if module.bias is not None:
out_dim += out_features
if isinstance(module, down_up.Down_Up):
out_dim += in_features
self.dim_name_dict[name.split('.')[-1] + '_' + str(out_dim)].append(name)

for key, names in self.dim_name_dict.items():
out_dim = int(key.split('_')[-1])
_hid_dims = min(mid_dim, out_dim // 4) if adaptive_dim else mid_dim
self.bottlenecks[key] = Bottleneck(concept_features, out_dim, len(names), _hid_dims,
activation, shared=shared, need_bias=need_bias)

def forward(self, x, need_clip=False):
if need_clip:
x, x_norm = clip(x)
coefs = {k: bottleneck(x) for k, bottleneck in self.bottlenecks.items()}
return coefs
Loading

0 comments on commit c267dc8

Please sign in to comment.