Skip to content

Commit

Permalink
Add test fakes and test utils (#81)
Browse files Browse the repository at this point in the history
  • Loading branch information
Glenn Moynihan authored Apr 16, 2021
1 parent e95ea9c commit eb9daee
Show file tree
Hide file tree
Showing 11 changed files with 169 additions and 44 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "FeatureTransforms"
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
authors = ["Invenia Technical Computing Corporation"]
version = "0.3.3-DEV"
version = "0.3.3"

[deps]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Expand Down
22 changes: 11 additions & 11 deletions docs/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[CategoricalArrays]]
deps = ["DataAPI", "Future", "JSON", "Missings", "Printf", "Statistics", "StructTypes", "Unicode"]
git-tree-sha1 = "9f6101597998e8d8cc8c99b85e4aca144354403b"
git-tree-sha1 = "f713d583d10fc036252fd826feebc6c173c522a8"
uuid = "324d7699-5711-5eae-9e2f-1d82baa6b597"
version = "0.9.4"
version = "0.9.5"

[[Compat]]
deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
git-tree-sha1 = "919c7f3151e79ff196add81d7f4e45d91bbf420b"
git-tree-sha1 = "ac4132ad78082518ec2037ae5770b6e796f7f956"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "3.25.0"
version = "3.27.0"

[[Crayons]]
git-tree-sha1 = "3f71217b538d7aaee0b69ab47d9b7724ca8afa0d"
Expand Down Expand Up @@ -86,7 +86,7 @@ uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
deps = ["Dates", "NamedDims", "Statistics", "Tables"]
path = ".."
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
version = "0.3.1"
version = "0.3.3"

[[Formatting]]
deps = ["Printf"]
Expand Down Expand Up @@ -259,25 +259,25 @@ uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[[StructTypes]]
deps = ["Dates", "UUIDs"]
git-tree-sha1 = "89b390141d2fb2ef3ac2dc32e336f7a5c4810751"
git-tree-sha1 = "5d8e3d60f17791c4c64baf69a2bc5e7023ee73aa"
uuid = "856f2bd8-1eba-4b0a-8007-ebc267875bd4"
version = "1.5.0"
version = "1.7.0"

[[TOML]]
deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

[[TableTraits]]
deps = ["IteratorInterfaceExtensions"]
git-tree-sha1 = "b1ad568ba658d8cbb3b892ed5380a6f3e781a81e"
git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39"
uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
version = "1.0.0"
version = "1.0.1"

[[Tables]]
deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "TableTraits", "Test"]
git-tree-sha1 = "a9ff3dfec713c6677af435d6a6d65f9744feef67"
git-tree-sha1 = "c9d2d262e9a327be1f35844df25fe4561d258dc9"
uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
version = "1.4.1"
version = "1.4.2"

[[Tar]]
deps = ["ArgTools", "SHA"]
Expand Down
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ makedocs(;
"Guide to Transforms" => "transforms.md",
"Transform Interface" => "transform_interface.md",
"Examples" => "examples.md",
"TestUtils" => "test_utils.md",
"API" => "api.md",
],
strict=true,
Expand Down
11 changes: 11 additions & 0 deletions docs/src/test_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# [TestUtils](@id test-utils)

`FeatureTransforms.TestUtils` is used to test new data types that wish to support the [transform interface](@ref transform-interface) described in the documentation.
It provides various test fakes and utilities to help with doing so.

## API

```@autodocs
Modules=[FeatureTransforms.TestUtils]
Order=[:module, :type, :function]
```
7 changes: 5 additions & 2 deletions docs/src/transform_interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,18 @@

The "transform interface” is a mechanism that allows sequences of `Transform`s to be combined (with other steps) into end-to-end feature engineering pipelines.

This is supported by the return of a `Transform`s having the same type as the input.
This is supported by the return of a `Transform` having the same type as the input.
This type consistency helps to make `Transform`s _composable_, i.e., the output of one is always a valid input to another, which allows users to "stack" sequences of `Transform`s together with minimal glue code needed to keep it working.

Morever, the end-to-end pipelines themselves should obey the same principle: you should be able to add or remove `Transform`s (or another pipeline) to the output without breaking your code.
That is, the output should also be a valid "transformable" type: either an `AbstractArray`, a `Table`, or other type for which the user has extended [`FeatureTransforms.apply`](@ref) to support.
Valid types can be checked by calling `is_transformable`, which is the first part of the transform interface.
See the [FeatureTransforms.TestUtils](@ref test-utils) for this and other testing utiliies.

The second part is the `transform` method stub, which users should overload when they want to "encapsulate" an end-to-end pipeline.
The exact method for doing so is an implementation detail for the user but refer to the code below as an example.
The only requirement of the transform API is that the return of the implemented `transform` method is itself "transformable", i.e. satisfies `is_transformable`.
The only requirement of the transform API is that the return of the implemented `transform` method is itself "transformable".
That is, it should satisfy `is_transformable` by defining the required [`FeatureTransforms.apply`](@ref) method(s).

## Example

Expand All @@ -24,6 +26,7 @@ For example, if `MyModel` were being stacked with the result of a previous model
```@meta
DocTestSetup = quote
using FeatureTransforms
using FeatureTransforms.TestUtils
end
```

Expand Down
10 changes: 7 additions & 3 deletions src/FeatureTransforms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@ using NamedDims: dim
using Statistics: mean, std
using Tables

export Transform, transform, transform!
export HoD, LinearCombination, OneHotEncoding, Periodic, Power
export IdentityScaling, MeanStdScaling, AbstractScaling
export Transform
export is_transformable, transform, transform!
export AbstractScaling, IdentityScaling, MeanStdScaling

include("utils.jl")
include("traits.jl")
Expand All @@ -23,4 +22,9 @@ include("power.jl")
include("scaling.jl")
include("temporal.jl")

include("test_utils.jl")

# TODO: remove in v0.4 https://github.com/invenia/FeatureTransforms.jl/issues/82
Base.@deprecate_binding is_transformable TestUtils.is_transformable

end
65 changes: 65 additions & 0 deletions src/test_utils.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
FeatureTransforms.TestUtils
Provides fake [`Transform`](@ref)s and utilities for testing purposes only.
Each fake [`Transform`](@ref) has different a different `cardinality`: `OneToOne`, OneToMany`,
`ManyToOne`, or `ManyToMany`. So when users extend FeatureTransforms.jl for new data types
they only need to test against these 4 fakes to guarantee their type can support any
[`Transform`](@ref) in the package.
Similarly, `is_transformable` is used to check that the output of a `transform` pipeline is
a transformable type.
"""

module TestUtils

using ..FeatureTransforms
using ..FeatureTransforms: OneToOne, OneToMany, ManyToOne, ManyToMany
using Tables

export FakeOneToOneTransform, FakeOneToManyTransform
export FakeManyToOneTransform, FakeManyToManyTransform
export is_transformable

for C in (:OneToOne, :OneToMany, :ManyToOne, :ManyToMany)
FT = Symbol(:Fake, C, :Transform)
@eval begin
"""
$($FT) <: Transform
A fake `$($C)` transform for test purposes. Calling `apply` will return an
array of ones with a size and dimension matching the `cardinality` of the transform.
"""
struct $FT <: Transform end
FeatureTransforms.cardinality(::$FT) = $C()
end
end

function FeatureTransforms._apply(A, ::FakeOneToOneTransform; kwargs...)
return ones(size(A))
end

function FeatureTransforms._apply(A, ::FakeOneToManyTransform; kwargs...)
return hcat(ones(size(A)), ones(size(A)))
end

function FeatureTransforms._apply(A, ::FakeManyToOneTransform; dims, kwargs...)
return ones(size(first(A)))
end

function FeatureTransforms._apply(A, ::FakeManyToManyTransform; kwargs...)
return hcat(ones(size(A)), ones(size(A)))
end

"""
is_transformable(x)
Determine if `x` is both a valid input and output of any [`Transform`](@ref), i.e. that it
follows the [`transform`](@ref) interface.
Currently, all subtypes of `Table`s and `AbstractArray`s are transformable.
"""
is_transformable(::AbstractArray) = true
is_transformable(x) = Tables.istable(x)

end
10 changes: 0 additions & 10 deletions src/transform.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,6 @@ abstract type Transform end
# Make Transforms callable types
(t::Transform)(x; kwargs...) = apply(x, t; kwargs...)

"""
is_transformable(x)
Determine if `x` is both a valid input and output of any [`Transform`](@ref), i.e. that it
follows the [`transform`](@ref) interface.
Currently, all subtypes of `Table`s and `AbstractArray`s are transformable.
"""
is_transformable(::AbstractArray) = true
is_transformable(x) = Tables.istable(x)

"""
transform(::T, data)
Expand Down
2 changes: 1 addition & 1 deletion test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,6 @@ using TimeZones
include("power.jl")
include("scaling.jl")
include("temporal.jl")
include("transform.jl")
include("traits.jl")
include("test_utils.jl")
end
67 changes: 67 additions & 0 deletions test/test_utils.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
using FeatureTransforms.TestUtils

@testset "test_utils.jl" begin

@testset "FakeOneToOneTransform" begin
t = FakeOneToOneTransform()
@test cardinality(t) == OneToOne()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t) == ones(3)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t) == ones(3, 3)
end

@testset "FakeOneToManyTransform" begin
t = FakeOneToManyTransform()
@test cardinality(t) == OneToMany()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t) == ones(3, 2)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t) == ones(3, 6)
end

@testset "FakeManyToOneTransform" begin
t = FakeManyToOneTransform()
@test cardinality(t) == ManyToOne()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t; dims=1) == fill(1)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t; dims=1) == ones(3)
end

@testset "FakeManyToManyTransform" begin
t = FakeManyToManyTransform()
@test cardinality(t) == ManyToMany()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t) == ones(3, 2)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t) == ones(3, 6)
end


@testset "is_transformable" begin

# Test that AbstractArrays and Tables are transformable
@test is_transformable([1, 2, 3, 4, 5])
@test is_transformable([1 2 3; 4 5 6])
@test is_transformable(AxisArray([1 2 3; 4 5 6], foo=["a", "b"], bar=["x", "y", "z"]))
@test is_transformable(KeyedArray([1 2 3; 4 5 6], foo=["a", "b"], bar=["x", "y", "z"]))
@test is_transformable((a = [1, 2, 3], b = [4, 5, 6]))
@test is_transformable(DataFrame(:a => [1, 2, 3], :b => [4, 5, 6]))

# Test types that are not transformable
@test is_transformable(1) == false
@test is_transformable("string") == false
@test is_transformable(true) == false
@test is_transformable(Dict(2 => 3)) == false
end

end
16 changes: 0 additions & 16 deletions test/transform.jl

This file was deleted.

2 comments on commit eb9daee

@glennmoy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/34497

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.3.3 -m "<description of version>" eb9daeeee7371916210b5a2fd424065ef2a030e0
git push origin v0.3.3

Please sign in to comment.