Skip to content

Commit

Permalink
Document all datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
davpinto committed Nov 22, 2016
1 parent db87563 commit 20a0f1d
Show file tree
Hide file tree
Showing 12 changed files with 114 additions and 8 deletions.
11 changes: 9 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
Package: fastknn
Version: 0.1.0
Title: Build Fast k-Nearest Neighbor Classifiers
Description: A fast KNN learner for binary and multinomial classification problems, build upon the ANN library. It has been developed to deal with very large datasets (> 100k rows). The 'fastknn' makes it easy to find the best 'k' and to plot beautiful decision boundaries for the classifiers. Moreover, it provides estimators for the class membership probabilities based on voting and weighted voting. The last one gives more calibrated probabilities in general, and reduces log-loss.
Description: A fast KNN learner for binary and multinomial classification
problems, build upon the ANN library. It has been developed to deal with very
large datasets (> 100k rows). The 'fastknn' makes it easy to find the best 'k'
and to plot beautiful decision boundaries for the classifiers. Moreover, it
provides estimators for the class membership probabilities based on voting and
weighted voting. The last one gives more calibrated probabilities in general,
and reduces log-loss.
Authors@R: person("David", "Pinto", , "[email protected]", c("aut", "cre"))
Depends:
R (>= 3.3.1),
R (>= 3.3.1),
ggplot2 (>= 2.1.0)
Imports:
RANN (>= 2.5),
Expand All @@ -20,3 +26,4 @@ License: GPL-2
URL: https://github.com/davpinto/fastknn
BugReports: https://github.com/davpinto/fastknn/issues
LazyData: true
RoxygenNote: 5.0.1
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ export("%>%")
export(fastknn)
export(fastknnCV)
export(knnDecision)
import(ggplot2)
importFrom(magrittr,"%>%")
40 changes: 38 additions & 2 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,45 @@
#'
#' Bidimensional binary dataset to be used in toy examples.
#'
#' @format A data frame with 2000 rows and 2 variables:
#' @format A data.frame with 2000 rows and 3 variables:
#' \describe{
#' \item{x1}{random generated}
#' \item{x2}{random generated}
#' \item{target}{class labels}
#' }
"chess"
"chess"

#' Spirals Dataset
#'
#' Bidimensional binary dataset to be used in toy examples.
#'
#' @format A data.frame with 2000 rows and 3 variables:
#' \describe{
#' \item{x1}{random generated}
#' \item{x2}{random generated}
#' \item{target}{class labels}
#' }
"spirals"

#' Multi Spirals Dataset
#'
#' Bidimensional multiclass dataset to be used in toy examples.
#'
#' @format A data.frame with 2000 rows and 3 variables:
#' \describe{
#' \item{x1}{random generated}
#' \item{x2}{random generated}
#' \item{target}{class labels}
#' }
"multi_spirals"

#' Covertype Data
#'
#' Sample of size 50k from the multiclass dataset Covertype from the UCI repository.
#'
#' @format A data.frame with 50000 rows and 55 variables:
#' \describe{
#' \item{V1...V54}{binary features}
#' \item{Target}{class labels}
#' }
"covertype"
2 changes: 1 addition & 1 deletion R/knn.R
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ fastknn <- function(xtr, ytr, xte, k, method = "dist") {
#' @param folds number of folds (default is 5) or an array with fold ids between
#' 1 and \code{n} identifying what fold each observation is in. The fold
#' assigment given by \code{fastknnCV} does stratified sampling.
#' @param eval loss to use for cross-validation. Currently five options are available:
#' @param eval.metric loss to use for cross-validation. Currently five options are available:
#' \itemize{
#' \item \code{eval.metric="overal_error"}: default option. It gives the overall
#' misclassification rate.
Expand Down
4 changes: 3 additions & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#' @import ggplot2

#' @importFrom magrittr %>%
#' @export
magrittr::`%>%`
magrittr::`%>%`
Binary file added data/covertype.rda
Binary file not shown.
Binary file removed data/covertype_sample.csv.gz
Binary file not shown.
3 changes: 2 additions & 1 deletion man/chess.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions man/covertype.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/fastknnCV.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions man/multi_spirals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions man/spirals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 20a0f1d

Please sign in to comment.