Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📝first documentation #7

Merged
merged 1 commit into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 103 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,107 @@
# StateSpaceLearning

[![ci](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml/badge.svg)](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml)
| **Build Status** | **Coverage** |
|:-----------------:|:-----------------:|
| [![ci](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml/badge.svg)](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml) | [![codecov](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning/graph/badge.svg?token=VDpuXvPSI2)](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning) |

[![codecov](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning/graph/badge.svg?token=VDpuXvPSI2)](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning)

StateSpaceLearning.jl is a package for modeling and forecasting time series in a high-dimension regression framework.
StateSpaceLearning.jl is a package for modeling and forecasting time series in a high-dimension regression framework.

## Quickstart

```julia
using StateSpaceLearning

y = randn(100)

#Fit Model
output = StateSpaceLearning.fit_model(y)

#Main output options
model_type = output.model_type # State Space Equivalent model utilized in the estimation (default = Basic Structural).
X = output.X # High Dimension Regression utilized in the estimation.
coefs = output.coefs # High Dimension Regression coefficients estimated in the estimation.
ϵ = output.ϵ # Residuals of the model.
fitted = output.fitted # Fit in Sample of the model.
components = output.components # Dictionary containing information about each component of the model, each component has the keys: "Values" (The value of the component in each timestamp) , "Coefs" (The coefficients estimated for each element of the component) and "Indexes" (The indexes of the elements of the component in the high dimension regression "X").
residuals_variances = output.residuals_variances # Dictionary containing the estimated variances for the innovations components (that is the information that can be utilized to initialize the state space model).
s = ouotput.s # The seasonal frequency utilized in the model (default = 12).
T = output.T # The length of the original time series.
outlier = output.outlier # Boolean indicating the presence of outlier component (default = false).
valid_indexes = output.valid_indexes # Vector containing valid indexes of the time series (non valid indexes represent NaN values in the time series).
stabilize_ζ = output.stabilize_ζ # Stabilize_ζ parameter (default = 0). A non 0 value for this parameter might be important in terms of forecast for some time series to lead to more stable predictions (we recommend stabilize_ζ = 11 for monthly series).

#Forecast
prediction = StateSpaceLearning.forecast(output, 12) #Gets a 12 steps ahead prediction

```

## Features

Current features include:
* Estimation
* Components decomposition
* Forecasting
* Completion of missing values
* Predefined models, including:
* Basic Structural"
* Local Linear Trend
* Local Level

## Quick Examples

### Fitting and forecasting
Quick example of fit and forecast for the air passengers time-series.

```julia
using CSV
using DataFrames
using Plots
using StateSpaceModels

airp = CSV.File(StateSpaceModels.AIR_PASSENGERS) |> DataFrame
log_air_passengers = log.(airp.passengers)
steps_ahead = 30

output = StateSpaceLearning.fit_model(log_air_passengers)
prediction_raw = StateSpaceLearning.forecast(output, steps_ahead)
prediction = exp.(prediction_raw)

plot(airp.passengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
plot!(vcat(ones(output.T).*NaN, prediction), lab = "Forcast", w=2, color = "blue")
```
![quick_example_airp](./docs/assets/quick_example_airp.png)

### Completion of missing values
Quick example of completion of missing values for the air passengers time-series (artificial NaN values are added to the original time-series).

```julia
using CSV
using DataFrames
using Plots
using StateSpaceModels

airp = CSV.File(StateSpaceModels.AIR_PASSENGERS) |> DataFrame
airpassengers = (airp.passengers).*1.0
log_air_passengers = log.(airpassengers)
steps_ahead = 30

log_air_passengers[60:72] .= NaN

output = StateSpaceLearning.fit_model(log_air_passengers)

fitted_completed_missing_values = ones(144).*NaN; fitted_completed_missing_values[60:72] = exp.(output.fitted[60:72])
real_removed_valued = ones(144).*NaN; real_removed_valued[60:72] = deepcopy(airpassengers[60:72])
airpassengers[60:72] .= NaN

plot(airpassengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
plot!(real_removed_valued, lab = "Real Removed Values", w=2, color = "red")
plot!(fitted_completed_missing_values, lab = "Fit in Sample completed values", w=2, color = "blue")

```
![quick_example_completion_airp](./docs/assets/quick_example_completion_airp.png)

## Contributing

* PRs such as adding new models and fixing bugs are very welcome!
* For nontrivial changes, you'll probably want to first discuss the changes via issue.
Binary file added docs/assets/quick_example_airp.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/quick_example_completion_airp.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 40 additions & 1 deletion src/StateSpaceLearning.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,31 @@ include("utils.jl")

export fit_model, forecast

"""
fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogenous_X::Union{Matrix{Fl}, Missing}=missing,
estimation_procedure::String="AdaLasso", s::Int64=12, outlier::Bool=false, stabilize_ζ::Int64=0,
α::Float64=0.1, hyperparameter_selection::String="aic", adalasso_coef::Float64=0.1,
select_exogenous::Bool=true)::Output where Fl

Fits the StateSpaceLearning model using specified parameters and estimation procedures.

# Arguments
- `y::Vector{Fl}`: Vector of data.
- `model_type::String`: Type of model (default: "Basic Structural").
- `Exogenous_X::Union{Matrix{Fl}, Missing}`: Exogenous variables matrix (default: missing).
- `estimation_procedure::String`: Estimation procedure (default: "AdaLasso").
- `s::Int64`: Seasonal period (default: 12).
- `outlier::Bool`: Flag for considering outlier component (default: false).
- `stabilize_ζ::Int64`: Stabilize_ζ parameter (default: 0).
- `α::Float64`: Elastic net control factor between ridge (α=0) and lasso (α=1) (default: 0.1).
- `hyperparameter_selection::String`: Information criteria method for hyperparameter selection (default: "aic").
- `adalasso_coef::Float64`: AdaLasso adjustment coefficient (default: 0.1).
- `select_exogenous::Bool`: Flag to select exogenous variables. When false the penalty factor for these variables will be set to 0 (default: true).

# Returns
- `Output`: Output object containing model information, coefficients, residuals, etc.

"""
function fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogenous_X::Union{Matrix{Fl}, Missing}=missing,
estimation_procedure::String="AdaLasso", s::Int64=12, outlier::Bool=false, stabilize_ζ::Int64=0,
α::Float64=0.1, hyperparameter_selection::String="aic", adalasso_coef::Float64=0.1, select_exogenous::Bool=true)::Output where Fl
Expand All @@ -25,7 +50,7 @@ function fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogeno
@assert (model_type in AVAILABLE_MODELS) "Unavailable Model"
@assert (estimation_procedure in AVAILABLE_ESTIMATION_PROCEDURES) "Unavailable estimation procedure"
@assert (hyperparameter_selection in AVAILABLE_HYPERPARAMETER_SELECTION) "Unavailable hyperparameter selection method"
@assert 0 < α <= 1 "α must be in (0, 1], Lasso.jl cannot handle α = 0"
@assert 0 <= α <= 1 "α must be in (0, 1], Lasso.jl cannot handle α = 0"

valid_indexes = findall(i -> !isnan(i), y)
estimation_y = y[valid_indexes]
Expand All @@ -48,6 +73,20 @@ function fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogeno
return Output(model_type, X, coefs, ϵ, fitted, components, residuals_variances, s, T, outlier, valid_indexes, stabilize_ζ)
end

"""
forecast(output::Output, steps_ahead::Int64; Exogenous_Forecast::Union{Matrix{Fl}, Missing}=missing)::Vector{Float64} where Fl

Returns the forecast for a given number of steps ahead using the provided StateSpaceLearning output and exogenous forecast data.

# Arguments
- `output::Output`: Output object obtained from model fitting.
- `steps_ahead::Int64`: Number of steps ahead for forecasting.
- `Exogenous_Forecast::Union{Matrix{Fl}, Missing}`: Exogenous variables forecast (default: missing).

# Returns
- `Vector{Float64}`: Vector containing forecasted values.

"""
function forecast(output::Output, steps_ahead::Int64; Exogenous_Forecast::Union{Matrix{Fl}, Missing}=missing)::Vector{Float64} where Fl
@assert steps_ahead > 0 "steps_ahead must be a positive integer"
Exogenous_Forecast = ismissing(Exogenous_Forecast) ? zeros(steps_ahead, 0) : Exogenous_Forecast
Expand Down
23 changes: 22 additions & 1 deletion src/estimation_procedure/adalasso.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
"""
fit_adalasso(Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64,
hyperparameter_selection::String,
components_indexes::Dict{String, Vector{Int64}},
adalasso_coef::Float64, select_exogenous::Bool)::Tuple{Vector{Float64}, Vector{Float64}} where {Tl, Fl}

Fits an Adaptive Lasso (AdaLasso) regression model to the provided data and returns coefficients and residuals.

# Arguments
- `Estimation_X::Matrix{Tl}`: Matrix of predictors for estimation.
- `estimation_y::Vector{Fl}`: Vector of response values for estimation.
- `α::Float64`: Elastic net control factor between ridge (α=0) and lasso (α=1) (default: 0.1).
- `hyperparameter_selection::String`: Information Criteria method for hyperparameter selection (default: aic).
- `components_indexes::Dict{String, Vector{Int64}}`: Dictionary containing indexes for different components.
- `adalasso_coef::Float64`: AdaLasso adjustment coefficient (default: 0.1).
- `select_exogenous::Bool`: Flag for selecting exogenous variables. When false the penalty factor for these variables will be set to 0.

# Returns
- `Tuple{Vector{Float64}, Vector{Float64}}`: Tuple containing coefficients and residuals of the fitted AdaLasso model.

"""
function fit_adalasso(Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64,
hyperparameter_selection::String,
components_indexes::Dict{String, Vector{Int64}},
Expand All @@ -12,7 +33,7 @@ function fit_adalasso(Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Fl
if key != "initial_states" && key != "μ₁"
component = components_indexes[key]
if key != "Exogenous_X" && key != "o" && !(key in ["ν₁", "γ₁"])
κ = count(i -> i == 0, coefs[component]) < 1 ? 0 : std(coefs[component])
κ = count(i -> i != 0, coefs[component]) < 1 ? 0 : std(coefs[component])
penalty_factor[component .- 1] .= (1 / (κ + adalasso_coef))
else
penalty_factor[component .- 1] = (1 ./ (abs.(coefs[component]) .+ adalasso_coef))
Expand Down
46 changes: 46 additions & 0 deletions src/estimation_procedure/estimation_utils.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
"""
get_dummy_indexes(Exogenous_X::Matrix{Fl}) where {Fl}

Identifies and returns the indexes of columns in the exogenous matrix that contain dummy variables.

# Arguments
- `Exogenous_X::Matrix{Fl}`: Exogenous variables matrix.

# Returns
- `Vector{Int}`: Vector containing the indexes of columns with dummy variables.

"""
function get_dummy_indexes(Exogenous_X::Matrix{Fl}) where{Fl}

T, p = size(Exogenous_X)
Expand All @@ -12,6 +24,19 @@ function get_dummy_indexes(Exogenous_X::Matrix{Fl}) where{Fl}
return dummy_indexes
end

"""
get_outlier_duplicate_columns(Estimation_X::Matrix{Tl}, components_indexes::Dict{String, Vector{Int64}}) where{Tl}

Identifies and returns the indexes of outlier columns that are duplicates of dummy variables in the exogenous matrix.

# Arguments
- `Estimation_X::Matrix{Tl}`: Matrix used for estimation.
- `components_indexes::Dict{String, Vector{Int64}}`: Dictionary containing indexes for different components.

# Returns
- `Vector{Int}`: Vector containing the indexes of outlier columns that are duplicates of dummy variables in the exogenous matrix.

"""
function get_outlier_duplicate_columns(Estimation_X::Matrix{Tl}, components_indexes::Dict{String, Vector{Int64}}) where{Tl}
o_indexes = components_indexes["o"]
exogenous_indexes = components_indexes["Exogenous_X"]
Expand All @@ -21,6 +46,27 @@ function get_outlier_duplicate_columns(Estimation_X::Matrix{Tl}, components_inde
return o_indexes[dummy_indexes] .- 1
end

"""
fit_estimation_procedure(estimation_procedure::String, Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64,
hyperparameter_selection::String, components_indexes::Dict{String, Vector{Int64}},
adalasso_coef::Float64, select_exogenous::Bool)::Tuple{Vector{Float64}, Vector{Float64}} where {Tl, Fl}

Fits the specified estimation procedure (currently Lasso or AdaLasso) to the provided data and returns coefficients and residuals.

# Arguments
- `estimation_procedure::String`: The chosen estimation procedure (either "Lasso" or "AdaLasso").
- `Estimation_X::Matrix{Tl}`: Matrix of predictors for estimation.
- `estimation_y::Vector{Fl}`: Vector of response values for estimation.
- `α::Float64`: Elastic net control factor between ridge (α=0) and lasso (α=1) (default: 0.1).
- `hyperparameter_selection::String`: Information Criteria method for hyperparameter selection (default: aic).
- `components_indexes::Dict{String, Vector{Int64}}`: Dictionary containing indexes for different components.
- `adalasso_coef::Float64`: AdaLasso adjustment coefficient (default: 0.1).
- `select_exogenous::Bool`: Flag for selecting exogenous variables. When false the penalty factor for these variables will be set to 0.

# Returns
- `Tuple{Vector{Float64}, Vector{Float64}}`: Tuple containing coefficients and residuals of the fitted estimation procedure.

"""
function fit_estimation_procedure(estimation_procedure::String, Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64,
hyperparameter_selection::String, components_indexes::Dict{String, Vector{Int64}},
adalasso_coef::Float64, select_exogenous::Bool)::Tuple{Vector{Float64}, Vector{Float64}} where {Tl, Fl}
Expand Down
17 changes: 17 additions & 0 deletions src/estimation_procedure/information_criteria.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
"""
get_information(T::Int64, K::Int64, ϵ::Vector{Float64};
hyperparameter_selection::String = "bic", p::Int64 = 0)::Float64

Calculates information criterion value based on the provided parameters and residuals.

# Arguments
- `T::Int64`: Number of observations.
- `K::Int64`: Number of selected predictors.
- `ϵ::Vector{Float64}`: Vector of residuals.
- `hyperparameter_selection::String`: Method for hyperparameter selection (default: "bic").
- `p::Int64`: Number of total predictors (default: 0).

# Returns
- `Float64`: Information criterion value.

"""
function get_information(T::Int64, K::Int64, ϵ::Vector{Float64}; hyperparameter_selection::String = "bic", p::Int64 = 0)::Float64
if hyperparameter_selection == "bic"
return T*log(var(ϵ)) + K*log(T)
Expand Down
Loading
Loading