Add test fakes and test utils (#81)
Glenn Moynihan authored Apr 16, 2021
1 parent e95ea9c commit eb9daee
Showing 11 changed files with 169 additions and 44 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "FeatureTransforms"
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
authors = ["Invenia Technical Computing Corporation"]
version = "0.3.3-DEV"
version = "0.3.3"

Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
22 changes: 11 additions & 11 deletions docs/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

deps = ["DataAPI", "Future", "JSON", "Missings", "Printf", "Statistics", "StructTypes", "Unicode"]
git-tree-sha1 = "9f6101597998e8d8cc8c99b85e4aca144354403b"
git-tree-sha1 = "f713d583d10fc036252fd826feebc6c173c522a8"
uuid = "324d7699-5711-5eae-9e2f-1d82baa6b597"
version = "0.9.4"
version = "0.9.5"

deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
git-tree-sha1 = "919c7f3151e79ff196add81d7f4e45d91bbf420b"
git-tree-sha1 = "ac4132ad78082518ec2037ae5770b6e796f7f956"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "3.25.0"
version = "3.27.0"

git-tree-sha1 = "3f71217b538d7aaee0b69ab47d9b7724ca8afa0d"
Expand Down Expand Up @@ -86,7 +86,7 @@ uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
deps = ["Dates", "NamedDims", "Statistics", "Tables"]
path = ".."
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
version = "0.3.1"
version = "0.3.3"

deps = ["Printf"]
Expand Down Expand Up @@ -259,25 +259,25 @@ uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

deps = ["Dates", "UUIDs"]
git-tree-sha1 = "89b390141d2fb2ef3ac2dc32e336f7a5c4810751"
git-tree-sha1 = "5d8e3d60f17791c4c64baf69a2bc5e7023ee73aa"
uuid = "856f2bd8-1eba-4b0a-8007-ebc267875bd4"
version = "1.5.0"
version = "1.7.0"

deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

deps = ["IteratorInterfaceExtensions"]
git-tree-sha1 = "b1ad568ba658d8cbb3b892ed5380a6f3e781a81e"
git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39"
uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
version = "1.0.0"
version = "1.0.1"

deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "TableTraits", "Test"]
git-tree-sha1 = "a9ff3dfec713c6677af435d6a6d65f9744feef67"
git-tree-sha1 = "c9d2d262e9a327be1f35844df25fe4561d258dc9"
uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
version = "1.4.1"
version = "1.4.2"

deps = ["ArgTools", "SHA"]
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ makedocs(;
"Guide to Transforms" => "",
"Transform Interface" => "",
"Examples" => "",
"TestUtils" => "",
"API" => "",
11 changes: 11 additions & 0 deletions docs/src/
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# [TestUtils](@id test-utils)

`FeatureTransforms.TestUtils` is used to test new data types that wish to support the [transform interface](@ref transform-interface) described in the documentation.
It provides various test fakes and utilities to help with doing so.

## API

Order=[:module, :type, :function]
7 changes: 5 additions & 2 deletions docs/src/
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,18 @@

The "transform interface” is a mechanism that allows sequences of `Transform`s to be combined (with other steps) into end-to-end feature engineering pipelines.

This is supported by the return of a `Transform`s having the same type as the input.
This is supported by the return of a `Transform` having the same type as the input.
This type consistency helps to make `Transform`s _composable_, i.e., the output of one is always a valid input to another, which allows users to "stack" sequences of `Transform`s together with minimal glue code needed to keep it working.

Morever, the end-to-end pipelines themselves should obey the same principle: you should be able to add or remove `Transform`s (or another pipeline) to the output without breaking your code.
That is, the output should also be a valid "transformable" type: either an `AbstractArray`, a `Table`, or other type for which the user has extended [`FeatureTransforms.apply`](@ref) to support.
Valid types can be checked by calling `is_transformable`, which is the first part of the transform interface.
See the [FeatureTransforms.TestUtils](@ref test-utils) for this and other testing utiliies.

The second part is the `transform` method stub, which users should overload when they want to "encapsulate" an end-to-end pipeline.
The exact method for doing so is an implementation detail for the user but refer to the code below as an example.
The only requirement of the transform API is that the return of the implemented `transform` method is itself "transformable", i.e. satisfies `is_transformable`.
The only requirement of the transform API is that the return of the implemented `transform` method is itself "transformable".
That is, it should satisfy `is_transformable` by defining the required [`FeatureTransforms.apply`](@ref) method(s).

## Example

Expand All @@ -24,6 +26,7 @@ For example, if `MyModel` were being stacked with the result of a previous model
DocTestSetup = quote
using FeatureTransforms
using FeatureTransforms.TestUtils

10 changes: 7 additions & 3 deletions src/FeatureTransforms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@ using NamedDims: dim
using Statistics: mean, std
using Tables

export Transform, transform, transform!
export HoD, LinearCombination, OneHotEncoding, Periodic, Power
export IdentityScaling, MeanStdScaling, AbstractScaling
export Transform
export is_transformable, transform, transform!
export AbstractScaling, IdentityScaling, MeanStdScaling

Expand All @@ -23,4 +22,9 @@ include("power.jl")


# TODO: remove in v0.4
Base.@deprecate_binding is_transformable TestUtils.is_transformable

65 changes: 65 additions & 0 deletions src/test_utils.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
Provides fake [`Transform`](@ref)s and utilities for testing purposes only.
Each fake [`Transform`](@ref) has different a different `cardinality`: `OneToOne`, OneToMany`,
`ManyToOne`, or `ManyToMany`. So when users extend FeatureTransforms.jl for new data types
they only need to test against these 4 fakes to guarantee their type can support any
[`Transform`](@ref) in the package.
Similarly, `is_transformable` is used to check that the output of a `transform` pipeline is
a transformable type.

module TestUtils

using ..FeatureTransforms
using ..FeatureTransforms: OneToOne, OneToMany, ManyToOne, ManyToMany
using Tables

export FakeOneToOneTransform, FakeOneToManyTransform
export FakeManyToOneTransform, FakeManyToManyTransform
export is_transformable

for C in (:OneToOne, :OneToMany, :ManyToOne, :ManyToMany)
FT = Symbol(:Fake, C, :Transform)
@eval begin
$($FT) <: Transform
A fake `$($C)` transform for test purposes. Calling `apply` will return an
array of ones with a size and dimension matching the `cardinality` of the transform.
struct $FT <: Transform end
FeatureTransforms.cardinality(::$FT) = $C()

function FeatureTransforms._apply(A, ::FakeOneToOneTransform; kwargs...)
return ones(size(A))

function FeatureTransforms._apply(A, ::FakeOneToManyTransform; kwargs...)
return hcat(ones(size(A)), ones(size(A)))

function FeatureTransforms._apply(A, ::FakeManyToOneTransform; dims, kwargs...)
return ones(size(first(A)))

function FeatureTransforms._apply(A, ::FakeManyToManyTransform; kwargs...)
return hcat(ones(size(A)), ones(size(A)))

Determine if `x` is both a valid input and output of any [`Transform`](@ref), i.e. that it
follows the [`transform`](@ref) interface.
Currently, all subtypes of `Table`s and `AbstractArray`s are transformable.
is_transformable(::AbstractArray) = true
is_transformable(x) = Tables.istable(x)

10 changes: 0 additions & 10 deletions src/transform.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,6 @@ abstract type Transform end
# Make Transforms callable types
(t::Transform)(x; kwargs...) = apply(x, t; kwargs...)

Determine if `x` is both a valid input and output of any [`Transform`](@ref), i.e. that it
follows the [`transform`](@ref) interface.
Currently, all subtypes of `Table`s and `AbstractArray`s are transformable.
is_transformable(::AbstractArray) = true
is_transformable(x) = Tables.istable(x)

transform(::T, data)
2 changes: 1 addition & 1 deletion test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,6 @@ using TimeZones
67 changes: 67 additions & 0 deletions test/test_utils.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
using FeatureTransforms.TestUtils

@testset "test_utils.jl" begin

@testset "FakeOneToOneTransform" begin
t = FakeOneToOneTransform()
@test cardinality(t) == OneToOne()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t) == ones(3)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t) == ones(3, 3)

@testset "FakeOneToManyTransform" begin
t = FakeOneToManyTransform()
@test cardinality(t) == OneToMany()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t) == ones(3, 2)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t) == ones(3, 6)

@testset "FakeManyToOneTransform" begin
t = FakeManyToOneTransform()
@test cardinality(t) == ManyToOne()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t; dims=1) == fill(1)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t; dims=1) == ones(3)

@testset "FakeManyToManyTransform" begin
t = FakeManyToManyTransform()
@test cardinality(t) == ManyToMany()

x = [1, 2, 3]
@test FeatureTransforms.apply(x, t) == ones(3, 2)

M = reshape(1:9, 3, 3)
@test FeatureTransforms.apply(M, t) == ones(3, 6)

@testset "is_transformable" begin

# Test that AbstractArrays and Tables are transformable
@test is_transformable([1, 2, 3, 4, 5])
@test is_transformable([1 2 3; 4 5 6])
@test is_transformable(AxisArray([1 2 3; 4 5 6], foo=["a", "b"], bar=["x", "y", "z"]))
@test is_transformable(KeyedArray([1 2 3; 4 5 6], foo=["a", "b"], bar=["x", "y", "z"]))
@test is_transformable((a = [1, 2, 3], b = [4, 5, 6]))
@test is_transformable(DataFrame(:a => [1, 2, 3], :b => [4, 5, 6]))

# Test types that are not transformable
@test is_transformable(1) == false
@test is_transformable("string") == false
@test is_transformable(true) == false
@test is_transformable(Dict(2 => 3)) == false

16 changes: 0 additions & 16 deletions test/transform.jl

