Skip to content

Commit

Permalink
Correct show and predict after selectmodel and remove side effects (#37)
Browse files Browse the repository at this point in the history
* deepcopy m.rr to avoid selectmodel sideeffects

* fix show and sideeffects bugs for RegularizedModel

* WIP predict() problems

* fixes predict() problem with X standardization

* add brew update to travis.yml

* changed copy! to copyto! for julia 1.0 compat

* changed predict tests from equal to approx for mac

* removed restriction that newX in predict is a float matrix and updated predit example in docs
  • Loading branch information
AsafManela authored Jan 1, 2020
1 parent 9d099f5 commit f2798e3
Show file tree
Hide file tree
Showing 7 changed files with 108 additions and 50 deletions.
7 changes: 3 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ os:
- osx
julia:
- 1.0
- 1.1
- 1.3
- nightly
notifications:
email: false
Expand All @@ -17,8 +17,7 @@ before_install:
- if [ $TRAVIS_OS_NAME = linux ]; then
sudo apt-get install gfortran -y;
else
brew cask uninstall oclint;
brew install gcc;
brew update && brew install gcc;
fi

after_success:
Expand All @@ -27,7 +26,7 @@ after_success:
jobs:
include:
- stage: "Documentation"
julia: 1.1
julia: 1.3
os: linux
script:
- julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
Expand Down
7 changes: 6 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "Lasso"
uuid = "b4fcebef-c861-5a0f-a7e2-ba9dc32b180a"
version = "0.5.0"
version = "0.5.1"

[deps]
DSP = "717857b8-e6f2-59f4-9121-6e50c889abd2"
Expand All @@ -15,7 +15,12 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"

[compat]
DSP = "0.6"
Distributions = "0.21"
GLM = "1.3"
MLBase = "0.8"
Reexport = "0.2"
StatsBase = "0.32"
StatsModels = "0.6"
julia = "0.7, 1"

Expand Down
34 changes: 16 additions & 18 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,30 +41,28 @@ julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
│ 3 │ 3 │ 7 │
julia> m = fit(LassoModel, @formula(Y ~ X), data)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}},Array{Float64,2}}
Y ~ X
LassoModel using MinAICc(2) segment of the regulatization path.
Coefficients:
──────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────
x1 3.88915 4.86043 0.800166 0.5704 -57.8684 65.6467
x2 0.222093 1.83707 0.120895 0.9234 -23.1201 23.5643
──────────────────────────────────────────────────────────────────
julia> stderror(m)
2-element Array{Float64,1}:
4.860427341926979
1.8370688588910695
────────────
Estimate
────────────
x1 3.88915
x2 0.222093
────────────
julia> predict(m)
3-element Array{Float64,1}:
4.161154511001072
4.433161908091373
4.705169305181673
4.111240223622052
4.333333333333333
4.555426443044614
julia> predict(m, data[2:end,:])
2-element Array{Union{Missing, Float64},1}:
4.333333333333333
4.555426443044614
```

To get an entire Lasso regularization path with default parameters:

```julia
Expand Down
16 changes: 7 additions & 9 deletions docs/src/lasso.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,17 +73,15 @@ julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
│ 3 │ 3 │ 7 │
julia> m = fit(LassoModel, @formula(Y ~ X), data; select=MinCVmse(Kfold(3,2)))
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}},Array{Float64,2}}
Y ~ X
LassoModel using MinCVmse(Kfold([3, 1, 2], 2, 1.5)) segment of the regulatization path.
Coefficients:
──────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────
x1 4.33333 5.4365 0.797081 0.5716 -64.744 73.4106
x2 0.0 2.0548 0.0 1.0000 -26.1088 26.1088
──────────────────────────────────────────────────────────────────
────────────
Estimate
────────────
x1 4.33333
x2 0.0
────────────
julia> coef(m)
2-element Array{Float64,1}:
Expand Down
16 changes: 15 additions & 1 deletion src/Lasso.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ include("TrendFiltering.jl")
using Reexport, LinearAlgebra, SparseArrays, Random, .Util, MLBase
import Random: Sampler
@reexport using GLM, Distributions, .FusedLassoMod, .TrendFiltering
using GLM: FPVector, LinPred, Link, LmResp, GlmResp, DensePredQR, updateμ!,
using GLM: FPVector, LinPred, Link, LmResp, GlmResp, cholpred, updateμ!,
linkfun, linkinv, LinPredModel
export RegularizationPath, LassoPath, GammaLassoPath, NaiveCoordinateDescent,
CovarianceCoordinateDescent, fit, fit!, coef, predict,
Expand Down Expand Up @@ -597,6 +597,20 @@ function StatsBase.predict(path::RegularizationPath, newX::AbstractMatrix{T}; of
end
end

"predicted values for data used to estimate `path`"
function StatsBase.predict(path::RegularizationPath; select=AllSeg())
X = path.m.pp.X
offset = path.m.rr.offset

# destandardize X if needed
if !isempty(path.Xnorm)
X = X ./ transpose(path.Xnorm)
end

predict(path, X; offset=offset, select=select)
end


"distribution of underlying GLM"
distfun(path::RegularizationPath{M}) where {M<:LinearModel} = Normal()
distfun(path::RegularizationPath) = path.m.rr.d
Expand Down
56 changes: 41 additions & 15 deletions src/segselect.jl
Original file line number Diff line number Diff line change
Expand Up @@ -119,15 +119,17 @@ segselect(path::RegularizationPath,
abstract type RegularizedModel <: RegressionModel end

"LassoModel represents a selected segment from a LassoPath"
struct LassoModel{M<:LinPredModel} <: RegularizedModel
struct LassoModel{M<:LinPredModel, S<:SegSelect} <: RegularizedModel
lpm::M # underlying GLM
intercept::Bool # whether path added an intercept
select::S # segment selector
end

"GammaLassoModel represents a selected segment from a GammaLassoPath"
struct GammaLassoModel{M<:LinPredModel} <: RegularizedModel
struct GammaLassoModel{M<:LinPredModel, S<:SegSelect} <: RegularizedModel
lpm::M # underlying GLM
intercept::Bool # whether path added an intercept
select::S # segment selector
end

"""
Expand All @@ -140,7 +142,7 @@ Predicted values using a selected segment of a regularization path.
m = fit(LassoModel, X, y; select=MinBIC())
predict(m, newX) # predict using BIC minimizing segment
"""
function StatsBase.predict(m::RegularizedModel, newX::AbstractMatrix{T}; kwargs...) where {T<:AbstractFloat}
function StatsBase.predict(m::RegularizedModel, newX::AbstractMatrix{T}; kwargs...) where T
# add an interecept to newX if the model has one
if m.intercept
newX = [ones(T,size(newX,1),1) newX]
Expand Down Expand Up @@ -169,8 +171,14 @@ selectmodel(path, MinCVmse(path, 5)) # 5-fold CV mse minimizing model
function selectmodel(path::R, select::SegSelect) where R<:RegularizationPath
# extract reusable path parts
m = path.m
rr = deepcopy(m.rr)
pp = m.pp
X = pp.X
X = pp.X

# destandardize X if needed
if !isempty(path.Xnorm)
X = X ./ transpose(path.Xnorm)
end

# add an interecept to X if the model has one
if hasintercept(path)
Expand All @@ -183,17 +191,21 @@ function selectmodel(path::R, select::SegSelect) where R<:RegularizationPath
beta0 = Vector{Float64}(coef(path, select))

# create new linear predictor
segpp = DensePredQR(segX, beta0)
pivot = true
p = cholpred(segX, pivot)

# rescale weights, which in GLM sum to nobs
m.rr.wts .*= nobs(path)
rr.wts .*= nobs(path)

# same things GLM does to init just before fit!
GLM.installbeta!(segpp)
updateμ!(m.rr, linpred(segpp, zero(eltype(m.rr.y))))
lp = rr.mu
copyto!(p.beta0, beta0)
fill!(p.delbeta, 0)
GLM.linpred!(lp, p, 0)
updateμ!(rr, lp)

# create a LinearModel or GeneralizedLinearModel with the new linear predictor
newglm(m, segpp)
newglm(m, rr, p)
end

"""
Expand Down Expand Up @@ -275,11 +287,11 @@ function StatsBase.fit(::Type{R}, X::AbstractMatrix{T}, y::V,
M = pathtype(R)
path = fit(M, X, y, d, l; intercept=intercept, kwargs...)

R(selectmodel(path, select), intercept)
R(selectmodel(path, select), intercept, select)
end

newglm(m::LinearModel, pp) = LinearModel(m.rr, pp)
newglm(m::GeneralizedLinearModel, pp) = GeneralizedLinearModel(m.rr, pp, true)
newglm(m::LinearModel, rr, pp) = LinearModel(rr, pp)
newglm(m::GeneralizedLinearModel, rr, pp) = GeneralizedLinearModel(rr, pp, true)

# don't add an intercept when using a @formula because we use the intercept keyword arg to add an intercept
StatsModels.drop_intercept(::Type{R}) where R<:RegularizedModel = true
Expand All @@ -291,9 +303,7 @@ for modeltype in (:LassoModel, :GammaLassoModel)
StatsBase.deviance, StatsBase.nulldeviance,
StatsBase.loglikelihood, StatsBase.nullloglikelihood,
StatsBase.dof, StatsBase.dof_residual, StatsBase.nobs,
StatsBase.stderror, StatsBase.vcov,
StatsBase.residuals, StatsBase.response,
StatsBase.coeftable
StatsBase.residuals, StatsBase.response
]
end
end
Expand All @@ -306,3 +316,19 @@ StatsBase.fit(::Type{M},
d::UnivariateDistribution=Normal(),
l::Link=canonicallink(d); kwargs...) where {M<:Union{RegularizationPath, RegularizedModel}} =
fit(M, float(X), float(y), d, l; kwargs...)

function coeftable(mm::RegularizedModel)
cc = coef(mm)
CoefTable([cc],
["Estimate"],
["x$i" for i = 1:size(mm.lpm.pp.X, 2)])
end

function Base.show(io::IO, obj::RegularizedModel)
# prefix = isa(obj.m, GeneralizedLinearModel) ? string(typeof(distfun(path)).name.name, " ") : ""
println(io, "$(typeof(obj).name.name) using $(obj.select) segment of the regulatization path.")

println(io, "\nCoefficients:\n", coeftable(obj))
end

StatsBase.vcov(obj::RegularizedModel, args...) = error("variance-covariance matrix for a regularized model is not yet implemented")
22 changes: 20 additions & 2 deletions test/segselect.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# tests for segment selection and StatsModels @formula+df interface
using Random
using GLM: predict

"Tests whether the output of `show(expr)` converted to String contains `expected`"
macro test_show(expr, expected)
s = quote
local o = IOBuffer()
show(o, $expr)
String(take!(o))
end
substr = quote $expected end
:(@test occursin($(esc(substr)), $(esc(s))))
end

@testset "segment/model selection" begin

datapath = joinpath(dirname(@__FILE__), "data")
Expand All @@ -25,13 +38,18 @@ datapath = joinpath(dirname(@__FILE__), "data")

Random.seed!(421)
pathpredict = Lasso.predict(path, data; select=select, offset=offset)
@test pathpredict Lasso.predict(path; select=select)

@test pathcoefs == coef(m)
if isa(dist, Normal)
@test pathpredict == GLM.predict(m, data) + offset
@test pathpredict predict(m, data) + offset
@test pathpredict predict(m)
else
@test pathpredict == GLM.predict(m, data; offset=offset)
@test pathpredict predict(m, data; offset=offset)
@test pathpredict predict(m)
end

@test_show m string(L)
end
end
end
Expand Down

2 comments on commit f2798e3

@AsafManela
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/7415

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if Julia TagBot is installed, or can be done manually through the github interface, or via:

git tag -a v0.5.1 -m "<description of version>" f2798e359f73d0240e8f1d32e48b25ac2cc76383
git push origin v0.5.1

Please sign in to comment.