LAMPSPUC · andreramosfdc · Nov 22, 2023 · Nov 22, 2023
diff --git a/README.md b/README.md
@@ -1,7 +1,107 @@
 # StateSpaceLearning
 
-[![ci](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml/badge.svg)](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml)
+| **Build Status** | **Coverage** |
+|:-----------------:|:-----------------:|
+| [![ci](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml/badge.svg)](https://github.com/LAMPSPUC/StateSpaceLearning/actions/workflows/ci.yml) | [![codecov](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning/graph/badge.svg?token=VDpuXvPSI2)](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning) |
 
-[![codecov](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning/graph/badge.svg?token=VDpuXvPSI2)](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning)
 
-StateSpaceLearning.jl is a package for modeling and forecasting time series in a high-dimension regression framework.
+StateSpaceLearning.jl is a package for modeling and forecasting time series in a high-dimension regression framework.
+
+## Quickstart
+
+```julia
+using StateSpaceLearning
+
+y = randn(100)
+
+#Fit Model
+output = StateSpaceLearning.fit_model(y)
+
+#Main output options 
+model_type = output.model_type # State Space Equivalent model utilized in the estimation (default = Basic Structural).
+X                   = output.X # High Dimension Regression utilized in the estimation.
+coefs               = output.coefs # High Dimension Regression coefficients estimated in the estimation.
+ϵ                   = output.ϵ # Residuals of the model.
+fitted              = output.fitted # Fit in Sample of the model.
+components          = output.components # Dictionary containing information about each component of the model, each component has the keys: "Values" (The value of the component in each timestamp) , "Coefs" (The coefficients estimated for each element of the component) and "Indexes" (The indexes of the elements of the component in the high dimension regression "X").
+residuals_variances = output.residuals_variances # Dictionary containing the estimated variances for the innovations components (that is the information that can be utilized to initialize the state space model).
+s                   = ouotput.s # The seasonal frequency utilized in the model (default = 12).
+T                   = output.T # The length of the original time series.
+outlier             = output.outlier # Boolean indicating the presence of outlier component (default = false).
+valid_indexes       = output.valid_indexes # Vector containing valid indexes of the time series (non valid indexes represent NaN values in the time series).
+stabilize_ζ         = output.stabilize_ζ # Stabilize_ζ parameter (default = 0). A non 0 value for this parameter might be important in terms of forecast for some time series to lead to more stable predictions (we recommend stabilize_ζ = 11 for monthly series).
+
+#Forecast
+prediction = StateSpaceLearning.forecast(output, 12) #Gets a 12 steps ahead prediction
+
+```
+
+## Features
+
+Current features include:
+* Estimation
+* Components decomposition
+* Forecasting
+* Completion of missing values
+* Predefined models, including:
+  * Basic Structural"
+  * Local Linear Trend
+  * Local Level
+
+## Quick Examples
+
+### Fitting and forecasting
+Quick example of fit and forecast for the air passengers time-series.
+
+```julia
+using CSV
+using DataFrames
+using Plots
+using StateSpaceModels
+
+airp = CSV.File(StateSpaceModels.AIR_PASSENGERS) |> DataFrame
+log_air_passengers = log.(airp.passengers)
+steps_ahead = 30
+
+output = StateSpaceLearning.fit_model(log_air_passengers)
+prediction_raw = StateSpaceLearning.forecast(output, steps_ahead)
+prediction = exp.(prediction_raw)
+
+plot(airp.passengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
+plot!(vcat(ones(output.T).*NaN, prediction), lab = "Forcast", w=2, color = "blue")
+```
+![quick_example_airp](./docs/assets/quick_example_airp.png)
+
+### Completion of missing values
+Quick example of completion of missing values for the air passengers time-series (artificial NaN values are added to the original time-series).
+
+```julia
+using CSV
+using DataFrames
+using Plots
+using StateSpaceModels
+
+airp = CSV.File(StateSpaceModels.AIR_PASSENGERS) |> DataFrame
+airpassengers = (airp.passengers).*1.0
+log_air_passengers = log.(airpassengers)
+steps_ahead = 30
+
+log_air_passengers[60:72] .= NaN
+
+output = StateSpaceLearning.fit_model(log_air_passengers)
+
+fitted_completed_missing_values = ones(144).*NaN; fitted_completed_missing_values[60:72] = exp.(output.fitted[60:72])
+real_removed_valued = ones(144).*NaN; real_removed_valued[60:72] = deepcopy(airpassengers[60:72])
+airpassengers[60:72] .= NaN
+
+plot(airpassengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
+plot!(real_removed_valued, lab = "Real Removed Values", w=2, color = "red")
+plot!(fitted_completed_missing_values, lab = "Fit in Sample completed values", w=2, color = "blue")
+
+```
+![quick_example_completion_airp](./docs/assets/quick_example_completion_airp.png)
+
+## Contributing
+
+* PRs such as adding new models and fixing bugs are very welcome!
+* For nontrivial changes, you'll probably want to first discuss the changes via issue.
diff --git a/docs/assets/quick_example_airp.PNG b/docs/assets/quick_example_airp.PNG
diff --git a/docs/assets/quick_example_completion_airp.PNG b/docs/assets/quick_example_completion_airp.PNG
diff --git a/src/StateSpaceLearning.jl b/src/StateSpaceLearning.jl
@@ -16,6 +16,31 @@ include("utils.jl")
 
 export fit_model, forecast
 
+"""
+    fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogenous_X::Union{Matrix{Fl}, Missing}=missing,
+              estimation_procedure::String="AdaLasso", s::Int64=12, outlier::Bool=false, stabilize_ζ::Int64=0,
+              α::Float64=0.1, hyperparameter_selection::String="aic", adalasso_coef::Float64=0.1,
+              select_exogenous::Bool=true)::Output where Fl
+
+    Fits the StateSpaceLearning model using specified parameters and estimation procedures.
+
+    # Arguments
+    - `y::Vector{Fl}`: Vector of data.
+    - `model_type::String`: Type of model (default: "Basic Structural").
+    - `Exogenous_X::Union{Matrix{Fl}, Missing}`: Exogenous variables matrix (default: missing).
+    - `estimation_procedure::String`: Estimation procedure (default: "AdaLasso").
+    - `s::Int64`: Seasonal period (default: 12).
+    - `outlier::Bool`: Flag for considering outlier component (default: false).
+    - `stabilize_ζ::Int64`: Stabilize_ζ parameter (default: 0).
+    - `α::Float64`: Elastic net control factor between ridge (α=0) and lasso (α=1) (default: 0.1).
+    - `hyperparameter_selection::String`: Information criteria method for hyperparameter selection (default: "aic").
+    - `adalasso_coef::Float64`: AdaLasso adjustment coefficient (default: 0.1).
+    - `select_exogenous::Bool`: Flag to select exogenous variables. When false the penalty factor for these variables will be set to 0 (default: true).
+
+    # Returns
+    - `Output`: Output object containing model information, coefficients, residuals, etc.
+
+"""
 function fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogenous_X::Union{Matrix{Fl}, Missing}=missing,
                     estimation_procedure::String="AdaLasso", s::Int64=12, outlier::Bool=false, stabilize_ζ::Int64=0,
                     α::Float64=0.1, hyperparameter_selection::String="aic", adalasso_coef::Float64=0.1, select_exogenous::Bool=true)::Output where Fl
@@ -25,7 +50,7 @@ function fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogeno
     @assert (model_type in AVAILABLE_MODELS) "Unavailable Model"
     @assert (estimation_procedure in AVAILABLE_ESTIMATION_PROCEDURES) "Unavailable estimation procedure"
     @assert (hyperparameter_selection in AVAILABLE_HYPERPARAMETER_SELECTION) "Unavailable hyperparameter selection method"
-    @assert 0 < α <= 1 "α must be in (0, 1], Lasso.jl cannot handle α = 0"
+    @assert 0 <= α <= 1 "α must be in (0, 1], Lasso.jl cannot handle α = 0"
 
     valid_indexes = findall(i -> !isnan(i), y)
     estimation_y  = y[valid_indexes]
@@ -48,6 +73,20 @@ function fit_model(y::Vector{Fl}; model_type::String="Basic Structural", Exogeno
     return Output(model_type, X, coefs, ϵ, fitted, components, residuals_variances, s, T, outlier, valid_indexes, stabilize_ζ)
 end
 
+"""
+    forecast(output::Output, steps_ahead::Int64; Exogenous_Forecast::Union{Matrix{Fl}, Missing}=missing)::Vector{Float64} where Fl
+
+    Returns the forecast for a given number of steps ahead using the provided StateSpaceLearning output and exogenous forecast data.
+
+    # Arguments
+    - `output::Output`: Output object obtained from model fitting.
+    - `steps_ahead::Int64`: Number of steps ahead for forecasting.
+    - `Exogenous_Forecast::Union{Matrix{Fl}, Missing}`: Exogenous variables forecast (default: missing).
+
+    # Returns
+    - `Vector{Float64}`: Vector containing forecasted values.
+
+"""
 function forecast(output::Output, steps_ahead::Int64; Exogenous_Forecast::Union{Matrix{Fl}, Missing}=missing)::Vector{Float64} where Fl
     @assert steps_ahead > 0 "steps_ahead must be a positive integer"
     Exogenous_Forecast = ismissing(Exogenous_Forecast) ? zeros(steps_ahead, 0) : Exogenous_Forecast

diff --git a/src/estimation_procedure/adalasso.jl b/src/estimation_procedure/adalasso.jl
@@ -1,3 +1,24 @@
+"""
+    fit_adalasso(Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64,
+                 hyperparameter_selection::String,
+                 components_indexes::Dict{String, Vector{Int64}},
+                 adalasso_coef::Float64, select_exogenous::Bool)::Tuple{Vector{Float64}, Vector{Float64}} where {Tl, Fl}
+
+    Fits an Adaptive Lasso (AdaLasso) regression model to the provided data and returns coefficients and residuals.
+
+    # Arguments
+    - `Estimation_X::Matrix{Tl}`: Matrix of predictors for estimation.
+    - `estimation_y::Vector{Fl}`: Vector of response values for estimation.
+    - `α::Float64`: Elastic net control factor between ridge (α=0) and lasso (α=1) (default: 0.1).
+    - `hyperparameter_selection::String`: Information Criteria method for hyperparameter selection (default: aic).
+    - `components_indexes::Dict{String, Vector{Int64}}`: Dictionary containing indexes for different components.
+    - `adalasso_coef::Float64`: AdaLasso adjustment coefficient (default: 0.1).
+    - `select_exogenous::Bool`: Flag for selecting exogenous variables. When false the penalty factor for these variables will be set to 0.
+
+    # Returns
+    - `Tuple{Vector{Float64}, Vector{Float64}}`: Tuple containing coefficients and residuals of the fitted AdaLasso model.
+
+"""
 function fit_adalasso(Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64, 
                         hyperparameter_selection::String, 
                         components_indexes::Dict{String, Vector{Int64}},
@@ -12,7 +33,7 @@ function fit_adalasso(Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Fl
         if key != "initial_states" && key != "μ₁"
             component = components_indexes[key]
             if key != "Exogenous_X" && key != "o" && !(key in ["ν₁", "γ₁"])
-                κ = count(i -> i == 0, coefs[component]) < 1 ? 0 : std(coefs[component])
+                κ = count(i -> i != 0, coefs[component]) < 1 ? 0 : std(coefs[component])
                 penalty_factor[component .- 1] .= (1 / (κ + adalasso_coef))
             else
                 penalty_factor[component .- 1]  = (1 ./ (abs.(coefs[component]) .+ adalasso_coef))

diff --git a/src/estimation_procedure/estimation_utils.jl b/src/estimation_procedure/estimation_utils.jl
@@ -1,3 +1,15 @@
+"""
+    get_dummy_indexes(Exogenous_X::Matrix{Fl}) where {Fl}
+
+    Identifies and returns the indexes of columns in the exogenous matrix that contain dummy variables.
+
+    # Arguments
+    - `Exogenous_X::Matrix{Fl}`: Exogenous variables matrix.
+
+    # Returns
+    - `Vector{Int}`: Vector containing the indexes of columns with dummy variables.
+
+"""
 function get_dummy_indexes(Exogenous_X::Matrix{Fl}) where{Fl}
 
     T, p = size(Exogenous_X)
@@ -12,6 +24,19 @@ function get_dummy_indexes(Exogenous_X::Matrix{Fl}) where{Fl}
     return dummy_indexes
 end
 
+"""
+    get_outlier_duplicate_columns(Estimation_X::Matrix{Tl}, components_indexes::Dict{String, Vector{Int64}}) where{Tl}
+
+    Identifies and returns the indexes of outlier columns that are duplicates of dummy variables in the exogenous matrix.
+
+    # Arguments
+    - `Estimation_X::Matrix{Tl}`: Matrix used for estimation.
+    - `components_indexes::Dict{String, Vector{Int64}}`: Dictionary containing indexes for different components.
+
+    # Returns
+    - `Vector{Int}`: Vector containing the indexes of outlier columns that are duplicates of dummy variables in the exogenous matrix.
+
+"""
 function get_outlier_duplicate_columns(Estimation_X::Matrix{Tl}, components_indexes::Dict{String, Vector{Int64}}) where{Tl}
     o_indexes = components_indexes["o"]
     exogenous_indexes = components_indexes["Exogenous_X"]
@@ -21,6 +46,27 @@ function get_outlier_duplicate_columns(Estimation_X::Matrix{Tl}, components_inde
     return o_indexes[dummy_indexes] .- 1
 end
 
+"""
+    fit_estimation_procedure(estimation_procedure::String, Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64,
+                             hyperparameter_selection::String, components_indexes::Dict{String, Vector{Int64}},
+                             adalasso_coef::Float64, select_exogenous::Bool)::Tuple{Vector{Float64}, Vector{Float64}} where {Tl, Fl}
+
+    Fits the specified estimation procedure (currently Lasso or AdaLasso) to the provided data and returns coefficients and residuals.
+
+    # Arguments
+    - `estimation_procedure::String`: The chosen estimation procedure (either "Lasso" or "AdaLasso").
+    - `Estimation_X::Matrix{Tl}`: Matrix of predictors for estimation.
+    - `estimation_y::Vector{Fl}`: Vector of response values for estimation.
+    - `α::Float64`: Elastic net control factor between ridge (α=0) and lasso (α=1) (default: 0.1).
+    - `hyperparameter_selection::String`: Information Criteria method for hyperparameter selection (default: aic).
+    - `components_indexes::Dict{String, Vector{Int64}}`: Dictionary containing indexes for different components.
+    - `adalasso_coef::Float64`: AdaLasso adjustment coefficient (default: 0.1).
+    - `select_exogenous::Bool`: Flag for selecting exogenous variables. When false the penalty factor for these variables will be set to 0.
+
+    # Returns
+    - `Tuple{Vector{Float64}, Vector{Float64}}`: Tuple containing coefficients and residuals of the fitted estimation procedure.
+
+"""
 function fit_estimation_procedure(estimation_procedure::String, Estimation_X::Matrix{Tl}, estimation_y::Vector{Fl}, α::Float64, 
                                     hyperparameter_selection::String, components_indexes::Dict{String, Vector{Int64}}, 
                                     adalasso_coef::Float64, select_exogenous::Bool)::Tuple{Vector{Float64}, Vector{Float64}} where {Tl, Fl}

diff --git a/src/estimation_procedure/information_criteria.jl b/src/estimation_procedure/information_criteria.jl
@@ -1,3 +1,20 @@
+"""
+    get_information(T::Int64, K::Int64, ϵ::Vector{Float64};
+                    hyperparameter_selection::String = "bic", p::Int64 = 0)::Float64
+
+    Calculates information criterion value based on the provided parameters and residuals.
+
+    # Arguments
+    - `T::Int64`: Number of observations.
+    - `K::Int64`: Number of selected predictors.
+    - `ϵ::Vector{Float64}`: Vector of residuals.
+    - `hyperparameter_selection::String`: Method for hyperparameter selection (default: "bic").
+    - `p::Int64`: Number of total predictors (default: 0).
+
+    # Returns
+    - `Float64`: Information criterion value.
+
+"""
 function get_information(T::Int64, K::Int64, ϵ::Vector{Float64}; hyperparameter_selection::String = "bic", p::Int64 = 0)::Float64
     if hyperparameter_selection == "bic"
         return T*log(var(ϵ)) + K*log(T)