-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #26 from SidharthMacherla/develop_v1.2.0
v1.2.0 updated.
- Loading branch information
Showing
8 changed files
with
168 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,14 @@ | ||
Package: conjurer | ||
Type: Package | ||
Title: A Parametric Method for Generating Synthetic Data | ||
Version: 1.1.1 | ||
Date: 2020-03-22 | ||
Version: 1.2.0 | ||
Date: 2020-09-06 | ||
Authors@R: person("Sidharth", "Macherla", email = "[email protected]", | ||
role = c("aut", "cre"), comment = c(ORCID = "0000-0002-4825-2026")) | ||
Description: Builds synthetic data applicable across multiple domains. This package also provides flexibility to control data distribution to make it relevant to many industry examples. | ||
Depends: R (>= 2.10) | ||
License: MIT + file LICENSE | ||
URL: https://github.com/SidharthMacherla/conjurer | ||
URL: https://www.foyi.co.nz/posts/documentation/documentationconjurer/ | ||
BugReports: https://github.com/SidharthMacherla/conjurer/issues | ||
Encoding: UTF-8 | ||
LazyData: TRUE | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
|
||
export(buildCust) | ||
export(buildNames) | ||
export(buildNum) | ||
export(buildPareto) | ||
export(buildProd) | ||
export(genTrans) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +0,0 @@ | ||
## conjurer 1.1.1 (2020-03-21) | ||
### Bug fixes | ||
* Bug: Generating names based on custom training data doesn’t work #22 | ||
|
||
## conjurer 1.1.0 (2020-03-13) | ||
|
||
* Added new function buildNames. | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
#' Build Numeric Data | ||
#' @param n A number. This specifies the number of values to be generated. | ||
#' @param st A number. This defines the starting value of the number of data points. | ||
#' @param en A number. This defines the ending value of the number of data points. | ||
#' @param disp A number between \eqn{-(pi/2)} and \eqn{(pi/2)}. This defines the dispersion of the distribution. | ||
#' @param outliers A number. This signifies the presence of outliers. If set to value 1, then outliers are generated randomly. If set to value 0, then no outliers are generated. The presence of outliers is a very common occurrence and hence setting the outliers to 1 is recommended. However, there are instances where outliers are not needed. For example, if the objective of data generation is solely for visualization purposes then outliers may not be needed. | ||
#' @return A dataframe | ||
#' @details This function helps in generating numeric data such as age, height, weight etc. This function could be used along with other functions such as \code{\link{buildCust}} to make it more meaningful. The data distribution function uses the formulation of | ||
#' \deqn{sin((r*a)*x) + c} | ||
#' Where, | ||
#' \enumerate{ | ||
#' \item r is the random value such that \eqn{0.8 <= r <= 1.2}. This adds \eqn{+/-} 20\% randomness to the parameter \eqn{a}. | ||
#' \item a is the parameter such that, \eqn{-(pi/2) <= a <= (pi/2)}. | ||
#' \item x is a variable such that, \eqn{(pi/2) <= x <= (pi/2)}. | ||
#' \item c is a constant such that \eqn{2 <= c <= 5}. | ||
#' } | ||
#' | ||
#'The key component of this function is \eqn{disp}. This helps in controlling the dispersion of the distribution. Let us assume that one would like to generate age of people in years. Furthermore, let us assume that the range of the age is between 23 and 80. If \eqn{disp = 1}, then the function will generate more data with a negative slope i.e more people with age closer to 23 than 80. If \eqn{disp = 1} is used, then the opposite will be true. However, if one would like to generate data that is visually similar to normal distribution i.e more people in the middle age group and less towards 23 or 80, then \eqn{disp = 0.5} could be used. | ||
#' | ||
#'It is recommended to firstly plot the code and inspect visually to check which distribution is needed. | ||
#' | ||
#' @examples | ||
#' age <- buildNum(n = 10, st = 23, en = 80, disp = 0.5, outliers = 1) | ||
#' plot(age) #visualize the resulting distribution | ||
|
||
#' @export | ||
|
||
buildNum <- function(n, st, en, disp, outliers) | ||
{ | ||
#handle missing arguments | ||
outliers <- missingArgHandler(outliers, 1) | ||
disp <- missingArgHandler(disp, 1) | ||
|
||
#Exception handling for n | ||
if(missing(n)) | ||
{ | ||
stop("'n' is missing. Please specify the number of values to generate") | ||
}else if(n < 10) | ||
{ | ||
warning("'n' is too small. Recommended 'n' value is atleast 10.") | ||
} | ||
#Ensure that 'n' is a whole number | ||
if((n)%%1 != 0) | ||
{ | ||
warning(sprintf("'n' must be a whole number. Note that %s is rounded off to %s.", n, round(n))) | ||
} | ||
|
||
|
||
#Exception handling for disp | ||
if(disp < -(pi/2) | disp > (pi/2)) | ||
{ | ||
stop("The value of 'disp' must be between -(pi/2) and (pi/2)") | ||
} | ||
|
||
#Build distributions by calling internal function buildDistr | ||
distr <- buildDistr(st = st, en = en, cycles = "n", trend = disp, n = n) | ||
|
||
#Add outliers | ||
if(outliers == 1) | ||
{ | ||
distr <- (buildOutliers(distr)) | ||
}else if(outliers == 0) | ||
{ | ||
distr <- distr | ||
} | ||
|
||
return(distr) | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters