| Title: | Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions |
| Version: | 1.0.0 |
| Description: | The routine gof_test() in this package runs the goodness-of-fit test using various test statistic for multivariate data. Models under the null hypothesis can either be simple or allow for parameter estimation. p values are found via the parametric bootstrap (simulation). The routine gof_test_adjusted_pvalues() runs several tests and then finds a p value adjusted for simultaneous inference. The routine gof_power() allows the estimation of the power of the tests. hybrid_test() and hybrid_power() do the same by first generating a Monte Carlo data set under the null hypothesis and then running a number of two-sample methods. The routine run.studies() allows a user to quickly study the power of a new method and how it compares to those included in the package via a large number of case studies. For details of the methods and references see the included vignettes. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| LinkingTo: | Rcpp |
| Imports: | Rcpp, parallel, stats, microbenchmark, spatstat.geom, spatstat.explore, FNN, copula, mvtnorm, ggplot2, microbenchmark, MD2sample |
| Suggests: | rmarkdown, knitr |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 3.5) |
| LazyData: | true |
| NeedsCompilation: | yes |
| Packaged: | 2026-02-10 13:00:15 UTC; Wolfgang |
| Author: | Wolfgang Rolke |
| Maintainer: | Wolfgang Rolke <wolfgang.rolke@upr.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-12 20:40:03 UTC |
Find test statistic of Fasano–Franceschini test
Description
Find test statistic of Fasano–Franceschini test
Usage
FF(dta)
Arguments
dta |
data matrix |
Value
a test statistic
Ripley's K function test
Description
this function calculates the test statistic of Ripley's K function test
Usage
RipleyK(x)
Arguments
x |
matrix with data |
Value
a number (test statistic)
Find test statistics for continuous data
Description
Find test statistics for continuous data
Usage
TS_cont(x, pnull, param, TSextra)
Arguments
x |
A numeric matrix. |
pnull |
cdf. |
param |
parameters for pnull in case of parameter estimation. |
TSextra |
list with additional info |
Value
A numeric vector with test statistics
Find test statistics for discrete data
Description
Find test statistics for discrete data
Usage
TS_disc(x, pnull, param, TSextra)
Arguments
x |
A numeric matrix. |
pnull |
cdf. |
param |
parameters for pnull in case of parameter estimation. |
TSextra |
list with additional info |
Value
A numeric vector with test statistics
Run Bakshaev and Rudzkis Test
Description
Run Bakshaev and Rudzkis Test
Usage
bakshaev_rudzkis(dta, rnull, p, m_eval = 100L, nsim = 200L, nsim_mc = 1000L)
Arguments
dta |
data matrix. |
rnull |
generate new data. |
p |
, parameters for parametric bootstrap. |
m_eval |
=100, number of evaluation points of kde. |
nsim |
=200, number of simulation runs. |
nsim_mc |
=1000, number of simulation runs. |
Value
a list
This function calculates the test statistics for data
Description
This function calculates the test statistics for data
Usage
calcTS(dta, TS, typeTS, TSextra)
Arguments
dta |
data set (a matrix) |
TS |
routine |
typeTS |
format of TS |
TSextra |
list passed to TS function |
Value
A vector of values of test statistic(s)
Create various case studies
Description
This function creates the functions needed to run the various case studies.
Usage
case.studies(
which,
Continuous = TRUE,
WithEstimation = FALSE,
Dim = 2,
nsample = 250,
nbins = c(5, 5),
ReturnCaseNames = FALSE
)
Arguments
which |
name or number of the case study. |
Continuous |
= TRUE for continuous data |
WithEstimation |
=FALSE, with parameter estimation |
Dim |
=2 dimension of data |
nsample |
=250, sample size. |
nbins |
=c(5,5) number of bins in x and y direction |
ReturnCaseNames |
=FALSE, just return names of case studies? |
Value
a list of functions
Create various case studies for continuous data without parameter estimation
Description
This function creates the functions needed to run the various case studies.
Usage
case.studies.cont(which, nsample = 250, ReturnCaseNames = FALSE)
Arguments
which |
name of the case study. |
nsample |
=250, sample size. |
ReturnCaseNames |
=FALSE, just return names of case studies? |
Value
a list of functions
Create various case studies for continuous data in 5 dimensions without parameter estimation
Description
This function creates the functions needed to run the various case studies.
Usage
case.studies.cont.D5(which, nsample = 250, ReturnCaseNames = FALSE)
Arguments
which |
name of the case study. |
nsample |
=250, sample size. |
ReturnCaseNames |
=FALSE, just return names of case studies? |
Value
a list of functions
Discretize 2D data from case studies
Description
This function provides the info necessary to run the case studies for discrete data.
Usage
case.studies.disc(
which,
WithEstimation = FALSE,
nbins = c(5, 5),
nsample = 250
)
Arguments
which |
name or number of desired case study. |
WithEstimation |
= FALSE, case study with or without parameter estimation. |
nbins |
=c(5, 5) number of bins to use in x and y direction |
nsample |
= 250, required sample size |
Value
a list with needed stuff
Create various case studies with parameter estimation
Description
This function creates the functions needed to run the various case studies that include parameter estimation.
Usage
case.studies.est(which, nsample = 250, ReturnCaseNames = FALSE)
Arguments
which |
name of the case study. |
nsample |
=250, sample size. |
ReturnCaseNames |
=FALSE, just return names of case studies? |
Value
a list of functions
Change Marginals of 2D Data
Description
This function creates routines to modify the marginals
Usage
change.marginals(which, side, nsample = 250, null_param)
Arguments
which |
method for modifying the data sets. |
side |
which dimension to modify |
nsample |
=250 sample size |
null_param |
parameters under null hypothesis |
Value
a list of functions
Sanity Checks
Description
This function checks whether the inputs have the correct format
Usage
check.functions(pnull, rnull, phat = function(x) -99, x)
Arguments
pnull |
cdf under the null hypothesis |
rnull |
routine to generate data under the null hypothesis |
phat |
=function(x) -99, function to estimate parameters from the data, or -99 |
x |
matrix with data |
Chi-square test for 2D data
Description
This function does the chi square goodness-of-fit test for continuous data in two dimensions.
Usage
chi_cont_test(
dta,
pnull,
phat = function(x) -99,
Ranges = matrix(c(-Inf, Inf, -Inf, Inf), 2, 2),
nbins = c(5, 5),
minexpcount = 5,
SuppressMessages = TRUE
)
Arguments
dta |
a matrix of numbers. |
pnull |
function to calculate expected counts. |
phat |
=function(x) -99, function to estimate parameters of pnull. |
Ranges |
=matrix(c(-Inf, Inf, -Inf, Inf),2,2), a 2x2 matrix with lower and upper bounds |
nbins |
=c(5,5) number of bins in x and y direction |
minexpcount |
=5 minimum counts required per bin |
SuppressMessages |
=FALSE, should info be shown? |
Value
a matrix with statistics, p values and degree of freedoms
Chi-square test for discrete 2D data
Description
This function does the chi square goodness-of-fit test for discrete data in two dimensions.
Usage
chi_disc_test(
dta,
pnull,
dnull,
phat = function(x) -99,
minexpcount = 5,
SuppressMessages = FALSE
)
Arguments
dta |
a matrix of numbers. |
pnull |
distribution function to calculate expected counts. |
dnull |
density to calculate expected counts. |
phat |
=function(x) -99, function to estimate parameters of pnull. |
minexpcount |
=5 minimum counts required per bin |
SuppressMessages |
=TRUE, should info be shown? |
Value
a vector with statistic, p value and degree of freedom
Power Estimation of Chi Square Tests
Description
This function finds the power of various chi-square tests.
Usage
chi_power(
pnull,
ralt,
param_alt,
phat = function(x) -99,
alpha = 0.05,
Ranges = matrix(c(-Inf, Inf, -Inf, Inf), 2, 2),
nbins = c(5, 5),
rate = 0,
minexpcount = 5,
dnull = function(x) -99,
Retry = TRUE,
SuppressMessages = TRUE,
B = 1000
)
Arguments
pnull |
distribution function to find cdf under null hypothesis |
ralt |
function to generate data under alternative hypothesis |
param_alt |
vector of parameter values for distribution under alternative hypothesis |
phat |
=function(x) -99, function to estimate parameters |
alpha |
=0.05, the level of the hypothesis test |
Ranges |
=matrix(c(-Inf, Inf, -Inf, Inf),2,2), a 2x2 matrix with lower and upper bounds, if any |
nbins |
=c(5, 5), number of bins for chi square tests |
rate |
=0 rate of Poisson if sample size is random, 0 if sample size is fixed |
minexpcount |
=5 minimal expected bin count required |
dnull |
=function(x) -99, density function to find probabilities under null hypothesis, mostly used for discrete data, or -99 if missing. |
Retry |
=TRUE, retry if test fails? |
SuppressMessages |
=TRUE, should info be shown? |
B |
=1000 number of simulation runs to find power |
Value
A numeric matrix of power values.
Compute M statistic for one dtaset
Description
Compute M statistic for one dtaset
Usage
compute_M_for_dtaset(dta, Eval, hs, rnull, p, nsim_mc)
Arguments
dta |
data matrix. |
Eval |
matrix of evaluations |
hs |
bandwiths |
rnull |
generate new data |
p |
values for parametric bootstrap |
nsim_mc |
number of simulation runs |
Value
a double
Bins continuous data
Description
Bins continuous data
Usage
discretize(x, Range, nbins, ChangeVals = FALSE)
Arguments
x |
A numeric matrix with two columns. |
Range |
range of variables |
nbins |
number of bins. |
ChangeVals |
=FALSE, should values of discrete rv's be adjusted to midpoints? |
Value
A numeric matrix
Create plot for any case study
Description
This function illustrates any of the case studies.
Usage
draw_case(
which,
Continuous = TRUE,
WithEstimation = FALSE,
Dim = 2,
palt,
nsample = 1000,
Dms = c(1, 2),
AltOnly = FALSE
)
Arguments
which |
name or number of the case study. |
Continuous |
= TRUE for continuous data |
WithEstimation |
=FALSE, with parameter estimation |
Dim |
=2 dimension of data |
palt |
parameter for alternative. If missing value in study is used. |
nsample |
=250, sample size. |
Dms |
=c(1,2, which dimensions are to be shown (for 5D data). |
AltOnly |
= FALSE show only graph for alternative? |
Value
a ggplot2 object
Estimate E and Var/n at Eval for given h, using MC from rnull
Description
Estimate E and Var/n at Eval for given h, using MC from rnull
Usage
estimateEV(rnull, p, Eval, h, nsim_mc, n)
Arguments
rnull |
generate data under the null hypothesis. |
p |
values for rnull |
Eval |
matrix of evaluations |
h |
bandwith, a double |
nsim_mc |
number of simulation runs |
n |
sample size |
Value
a matrix
examples.mdgof.vignette
Description
stuff needed to run vignette fast enough to pass CRAN
Usage
examples.mdgof.vignette
Format
'examples.mdgof.vignette'
A list
Find gaussian kernel pdf
Description
Find gaussian kernel pdf
Usage
gauss_kernel_matrix(Eval, S, h)
Arguments
Eval |
a matrix. |
S |
a matrix |
h |
bandwith, a double |
Value
a matrix
Create copula objects
Description
This function creates copula objects
Usage
gen.cop(family, p, d = 2)
Arguments
family |
name of copula. |
p |
parameter of copula. |
d |
dimension |
Value
a copula object
Find evaluation points
Description
Find evaluation points
Usage
gen_eval(rnull, p, m)
Arguments
rnull |
a function that generate new data. |
p |
a vector of parameters for rnull. |
m |
size of matrix. |
Value
a matrix
Power estimation of goodness-of-fit tests.
Description
Find the power of various goodness-of-fit tests.
Usage
gof_power(
pnull,
rnull,
ralt,
param_alt,
phat = function(x) -99,
dnull = function(x) -99,
TS,
TSextra,
With.p.value = FALSE,
alpha = 0.05,
Ranges = matrix(c(-Inf, Inf, -Inf, Inf), 2, 2),
nbins = c(5, 5),
minexpcount = 5,
rate = 0,
SuppressMessages = FALSE,
maxProcessor,
B = 1000
)
Arguments
pnull |
function to find cdf under null hypothesis |
rnull |
function to generate data under null hypothesis |
ralt |
function to generate data under alternative hypothesis |
param_alt |
vector of parameter values for distribution under alternative hypothesis |
phat |
=function(x) -99, function to estimate parameters from the data, or -99 |
dnull |
=function(x) -99, density function under the null hypothesis, if available, or -99 if missing |
TS |
user supplied function to find test statistics |
TSextra |
list provided to TS (optional) |
With.p.value |
=FALSE does user supplied routine return p values? |
alpha |
=0.05, the level of the hypothesis test |
Ranges |
=matrix(c(-Inf, Inf, -Inf, Inf),2,2), a 2x2 matrix with lower and upper bounds, if any, for chi-square tests |
nbins |
=c(5, 5), number of bins for chi square tests. |
minexpcount |
=5 minimal expected bin count required for chi square tests. |
rate |
=0 rate of Poisson if sample size is random, 0 if sample size is fixed |
SuppressMessages |
=FALSE, should informative messages be shown? |
maxProcessor |
maximum of number of processors to use, 1 if no parallel processing is needed or number of cores-1 if missing |
B |
=1000 number of simulation runs |
Details
For details on the usage of this routine consult the vignette with vignette("MDgof","MDgof")
Value
A numeric matrix of power values.
Examples
# All examples are run with B=10 and maxProcessor=1 to pass CRAN checks.
# This is obviously MUCH TO SMALL for any real usage.
# Power of tests if null hypothesis specifies a bivariate standard normal
# distribution but data comes from a bivariate normal with different means,
# without parameter estimation.
rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
ralt=function(p) mvtnorm::rmvnorm(100, c(p, p))
pnull=function(x) {
if(!is.matrix(x)) return(mvtnorm::pmvnorm(rep(-Inf, 2), x))
apply(x, 1, function(x) mvtnorm::pmvnorm(rep(-Inf, 2), x))
}
gof_power(pnull, rnull, ralt, c(0, 1), B=10, maxProcessor = 1)
# Same as above, but now with density included
dnull=function(x) {
if(!is.matrix(x)) return(mvtnorm::dmvnorm(x))
apply(x, 1, function(x) mvtnorm::dmvnorm(x))
}
gof_power(pnull, rnull, ralt, c(0, 1), dnull=dnull, B=10, maxProcessor = 1)
# Power of tests when null hypothesis specifies a bivariate normal distribution,
# with mean parameter estimated, wheras data comes from a t distribution
rnull=function(p) mvtnorm::rmvnorm(100, p)
ralt=function(df) mvtnorm::rmvt(100, sigma=diag(2), df=df)
pnull=function(x,p) {
if(!is.matrix(x)) return(mvtnorm::pmvnorm(rep(-Inf, 2), x, mean=p))
apply(x, 1, function(x) mvtnorm::pmvnorm(rep(-Inf, 2), x, mean=p))
}
dnull=function(x, p) {
if(!is.matrix(x)) return(mvtnorm::dmvnorm(x, mean=p))
apply(x, 1, function(x) mvtnorm::dmvnorm(x, mean=p))
}
phat=function(x) apply(x, 2, mean)
gof_power(pnull, rnull, ralt, c(50, 5), dnull=dnull, phat=phat, B=10, maxProcessor = 1)
# Example of a discrete model, with parameter estimation
# Under null hypothesis: X~Bin(10, p), Y|X=x~Bin(5, 0.5+x/100)
# Under alternative hypothesis: X~Bin(10, p), Y|X=x~Bin(5, K+x/100)
rnull=function(p=0.5) {
x=stats::rbinom(1000, 10, p)
y=stats::rbinom(1000, 5, 0.5+x/100)
MDgof::sq2rec(table(x, y))
}
ralt=function(K=0.5) {
x=stats::rbinom(1000, 10, 0.5)
y=stats::rbinom(1000, 5, K+x/100)
MDgof::sq2rec(table(x, y))
}
pnull=function(x, p) {
f=function(x) sum(dbinom(0:x[1], 10, p[1])*pbinom(x[2], 5, 0.5+0:x[1]/100))
if(!is.matrix(x)) x=rbind(x)
apply(x, 1, f)
}
phat=function(x) {
tx=tapply(x[,3], x[,1], sum)
mean(rep(as.numeric(names(tx)), times=tx))/10
}
gof_power(pnull, rnull, ralt, c(0.5, 0.6), phat=phat, B=10, maxProcessor = 1)
Tests for the multivariate goodness-of-fit problem
Description
This function runs a number of goodness-of-fit tests using Rcpp and parallel computing.
Usage
gof_test(
x,
pnull,
rnull,
phat = function(x) -99,
dnull = function(x) -99,
TS,
TSextra,
rate = 0,
nbins = c(5, 5),
Ranges = matrix(c(-Inf, Inf, -Inf, Inf), 2, 2),
minexpcount = 5,
maxProcessor,
doMethods,
B = 5000,
ReturnTSextra = FALSE
)
Arguments
x |
a matrix with the data set |
pnull |
cdf under the null hypothesis |
rnull |
routine to generate data under the null hypothesis |
phat |
=function(x) -99, function to estimate parameters from the data, or -99 if no parameters are estimated |
dnull |
=function(x) -99, density function under the null hypothesis, if available, or -99 if missing |
TS |
user supplied function to find test statistics, if any. |
TSextra |
(optional) list passed to TS, if needed. |
rate |
=0 rate of Poisson if sample size is random, 0 if sample size is fixed |
nbins |
=c(5, 5) number of bins for chi-square tests |
Ranges |
=matrix(c(-Inf, Inf, -Inf, Inf),2,2), a 2x2 matrix with lower and upper bounds, if any, for chi-square tests |
minexpcount |
=5 minimal expected bin count required |
maxProcessor |
number of processors to use in parallel processing. |
doMethods |
a vector of codes for the methods to include. If ="all", it does all the included tests. #missing it runs a default selection. I |
B |
=5000 number of simulation runs. If B=0 the routine returns the test statistics. |
ReturnTSextra |
=FALSE, should setup info be returned? |
Details
For details on the usage of this routine consult the vignette with vignette("MDgof","MDgof")
Value
A list with vectors of test statistics and p.values
Examples
# All examples are run with B=10 and maxProcessor=1 to pass CRAN checks.
# This is obviously MUCH TO SMALL for any real usage.
# Tests to see whether data comes from a bivariate standard normal distribution,
# without parameter estimation.
rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
x=rnull()
pnull=function(x) {
if(!is.matrix(x)) return(mvtnorm::pmvnorm(rep(-Inf, 2), x))
apply(x, 1, function(x) mvtnorm::pmvnorm(rep(-Inf, 2), x))
}
gof_test(x, pnull, rnull, B=10, maxProcessor = 1)
# Same as above, but now with density included
dnull=function(x) {
if(!is.matrix(x)) return(mvtnorm::dmvnorm(x))
apply(x, 1, function(x) mvtnorm::dmvnorm(x))
}
gof_test(x, pnull, rnull, dnull=dnull, B=20, maxProcessor = 1)
# Tests to see whether data comes from a standard normal distribution,
# with mean parameter estimated.
rnull=function(p) mvtnorm::rmvnorm(100, p)
x=rnull(c(0,1))
pnull=function(x,p) {
if(!is.matrix(x)) return(mvtnorm::pmvnorm(rep(-Inf, 2), x, mean=p))
apply(x, 1, function(x) mvtnorm::pmvnorm(rep(-Inf, 2), x, mean=p))
}
dnull=function(x, p) {
if(!is.matrix(x)) return(mvtnorm::dmvnorm(x, mean=p))
apply(x, 1, function(x) mvtnorm::dmvnorm(x, mean=p))
}
phat=function(x) apply(x, 2, mean)
gof_test(x, pnull, rnull, dnull=dnull, phat=phat,B=20, maxProcessor = 1)
# Example of a discrete model, with parameter estimation
# X~Bin(10, p1), Y|X=x~Bin(5, p2+x/100)
rnull=function(p) {
x=rbinom(1000, 10, p[1])
y=rbinom(1000, 5, p[2]+x/100)
MDgof::sq2rec(table(x, y))
}
pnull=function(x, p) {
f=function(x) sum(dbinom(0:x[1], 10, p[1])*pbinom(x[2], 5, p[2]+0:x[1]/100))
if(!is.matrix(x)) x=rbind(x)
apply(x, 1, f)
}
phat=function(x) {
tx=tapply(x[,3], x[,1], sum)
p1=mean(rep(as.numeric(names(tx)), times=tx))/10
ty=tapply(x[,3], x[,2], sum)
p2=mean(rep(as.numeric(names(ty)), times=ty))/5-p1/10
c(p1, p2)
}
x=rnull(c(0.5, 0.5))
gof_test(x, pnull, rnull, phat=phat,B=10, maxProcessor = 1)
Adjusted p values
Description
This function runs a number of goodness-f-fit tests using Rcpp and parallel computing and then finds the correct p value for the combined tests.
Usage
gof_test_adjusted_pvalue(
x,
pnull,
rnull,
phat = function(x) -99,
dnull = function(x) -99,
B = c(5000, 1000),
nbins = c(5, 5),
minexpcount = 5,
Ranges = matrix(c(-Inf, Inf, -Inf, Inf), 2, 2),
SuppressMessages = FALSE,
maxProcessor,
doMethods
)
Arguments
x |
matrix with data |
pnull |
cdf under the null hypothesis |
rnull |
routine to generate data under the null hypothesis |
phat |
=function(x) -99, function to estimate parameters from the data, or -99 if no parameters are estimated |
dnull |
=function(x) -99, density function under the null hypothesis, if available, or -99 if missing |
B |
=c(5000, 1000), number of simulation runs for permutation test and for estimation of the empirical distribution function. |
nbins |
=c(5, 5), number of bins for chi square tests (2D only). |
minexpcount |
= 5, minimum required expected counts for chi-square tests. |
Ranges |
=matrix(c(-Inf, Inf, -Inf, Inf),2,2) a 2x2 matrix with lower and upper bounds. |
SuppressMessages |
= FALSE, show informative messages? |
maxProcessor |
number of cores for parallel processing. |
doMethods |
Which methods should be included? If missing a small number of methods that generally have good power are used. |
Details
For details consult the vignette("MDgof","MDgof")
Value
a vector of p values.
Examples
# All examples are run with B=10 and maxProcessor=1 to pass CRAN checks.
# This is obviously MUCH TO SMALL for any real usage.
# Tests to see whether data comes from a bivariate standard normal distribution,
# without parameter estimation.
rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
x=rnull()
pnull=function(x) {
if(!is.matrix(x)) return(mvtnorm::pmvnorm(rep(-Inf, 2), x))
apply(x, 1, function(x) mvtnorm::pmvnorm(rep(-Inf, 2), x))
}
dnull=function(x) {
if(!is.matrix(x)) return(mvtnorm::dmvnorm(x))
apply(x, 1, function(x) mvtnorm::dmvnorm(x))
}
gof_test_adjusted_pvalue(x, pnull, rnull, dnull=dnull, B=10, maxProcessor = 1)
Find gradient of log(f) for a matrix of points
Description
Find gradient of log(f) for a matrix of points
Usage
grad_mat(x, f)
Arguments
x |
point of evaluation |
f |
function |
Value
a matrix of gradient vectors
Find gradient of log(f)
Description
Find gradient of log(f)
Usage
grad_vec(x, f)
Arguments
x |
point of evaluation |
f |
function |
Value
a gradient vector
hybrid.mdgof.vignette
Description
stuff needed to run vignette MDgof::hybrid fast enough to pass CRAN
Usage
hybrid.mdgof.vignette
Format
'hybrid.mdgof.vignette'
A list
Power Estimation for the multivariate goodness-of-fit problem via twosample tests
Description
This function estimates the power of goodness-of-fit/two-sample hybrid tests using Rcpp and parallel computing by generating a Monte Carlo data set and then running a twosample test.
Usage
hybrid_power(
rnull,
ralt,
param_alt,
phat = function(x) -99,
nMC = 1,
TS,
TSextra,
With.p.value = FALSE,
alpha = 0.05,
B = 1000,
maxProcessor,
doMethods = "all"
)
Arguments
rnull |
routine to generate data under the null hypothesis. |
ralt |
routine to generate data under the alternative hypothesis. |
param_alt |
values passed to ralt. |
phat |
=function(x) -99 parameter estimation, if needed. |
nMC |
=1 sample size of Monte Carlo data set, if it is a number nMC<=10 sample size used will be nMC*sample size of x. |
TS |
user supplied function to find test statistics, if any. |
TSextra |
(optional) list passed to TS, if needed. |
With.p.value |
=FALSE, does user supplied method find its own p-values? |
alpha |
=0.05 type I error rate used in tests. |
B |
=5000 number of simulation runs. If B=0 the routine returns the test statistics. |
maxProcessor |
number of processors to use in parallel processing. |
doMethods |
="all", a vector of codes for the methods to include or all of them. |
Details
For details on the usage of this routine consult the vignette with vignette("MDgof-hybrid","MDgof-hybrid")
Value
A list with vectors of test statistics and p.values
Examples
# All examples are run with B=20 and maxProcessor=1 to pass CRAN checks.
# Power of tests see whether data comes from a bivariate standard normal distribution,
# without parameter estimation. True Distribution is bivariate normal with
# correlation r.
rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
ralt=function(r) mvtnorm::rmvnorm(100, sigma=matrix(c(1,r,r,1),2,2))
hybrid_power(rnull, ralt, 0.3, B=20, maxProcessor = 1)
# Power of tests to see whether data comes from a standard normal distribution,
# with mean parameter estimated. True data comes from t distribution.
rnull=function(p) mvtnorm::rmvnorm(100, p)
ralt=function(df) mvtnorm::rmvt(100, df=df)
phat=function(x) apply(x, 2, mean)
hybrid_power(rnull, ralt, 5, phat, B=20, maxProcessor = 1)
Tests for the multivariate goodness-of-fit problem via twosample tests
Description
This function runs a number of goodness-of-fit tests using Rcpp and parallel computing by generating a Monte Carlo data set and then running a twosample test.
Usage
hybrid_test(
x,
rnull,
phat = function(x) -99,
nMC = 1,
TS,
TSextra,
B = 1000,
maxProcessor,
doMethods = "all"
)
Arguments
x |
a matrix with the data set |
rnull |
routine to generate data under the null hypothesis. |
phat |
=function(x) -99 parameter estimation, if needed. |
nMC |
=1 sample size of Monte Carlo data set, if it is a number nMC<=10 sample size used will be nMC*sample size of x. |
TS |
user supplied function to find test statistics, if any. |
TSextra |
(optional) list passed to TS, if needed. |
B |
=5000 number of simulation runs. If B=0 the routine returns the test statistics. |
maxProcessor |
number of processors to use in parallel processing. |
doMethods |
="all", a vector of codes for the methods to include or all of them. |
Details
For details on the usage of this routine consult the vignette with vignette("MDgof-hybrid","MDgof-hybrid")
Value
A list with vectors of test statistics and p.values
Examples
# All examples are run with B=20 and maxProcessor=1 to pass CRAN checks.
# Tests to see whether data comes from a bivariate standard normal distribution,
# without parameter estimation.
rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
x=rnull()
hybrid_test(x, rnull, B=20, maxProcessor = 1)
# Tests to see whether data comes from a standard normal distribution,
# with mean parameter estimated.
rnull=function(p) mvtnorm::rmvnorm(100, p)
phat=function(x) apply(x, 2, mean)
x=rnull(c(0,1))
hybrid_test(x, rnull, phat, B=20, maxProcessor = 1)
# Example of a discrete model, without parameter estimation
# X~Bin(5, 0.5), Y|X=x~Bin(4, 0.5+x/100)
rnull=function() {
x=rbinom(1000, 5, 0.5)
y=rbinom(1000, 4, 0.5)
MDgof::sq2rec(table(x, y))
}
x=rnull()
hybrid_test(x, rnull, B=50, maxProcessor = 1)
# Example of a discrete model, with parameter estimation
# X~Bin(5, p), Y|X=x~Bin(4, 0.5+x/100)
rnull=function(p) {
x=rbinom(1000, 5, p)
y=rbinom(1000, 4, 0.5+x/100)
MDgof::sq2rec(table(x, y))
}
phat=function(x) {
tx=tapply(x[,3], x[,1], sum)
p1=mean(rep(as.numeric(names(tx)), times=tx))/5
p1
}
x=rnull(0.5)
hybrid_test(x, rnull, phat, B=20, maxProcessor = 1)
Find test statistic for Kernel Stein Discrepancy test
Description
Find test statistic for Kernel Stein Discrepancy test
Usage
ksd(X, scf, p)
Arguments
X |
data set. |
scf |
function to find scores |
p |
(possible) parameters |
Value
a double (test statistic)
Create list with needed info
Description
This function creates a list with info needed in various parts of the package
Usage
makeTSextra(
x,
Continuous,
pnull,
rnull,
phat = function(x) -99,
dnull = function(x) -99,
Ranges,
TSextra
)
Arguments
x |
data set |
Continuous |
=TRUE, is data continuous? |
pnull |
cdf under the null hypothesis |
rnull |
routine to generate data under the null hypothesis |
phat |
=function(x) -99, function to estimate parameters from the data, or -99 if no parameters are estimated |
dnull |
=function(x) -99, density function under the null hypothesis, if available, or -99 if missing |
Ranges |
Range of variables |
TSextra |
(optional) list passed to TS, if needed. |
Value
A list with vectors of test statistics and p.values
Finds the empirical distribution function
Description
Finds the empirical distribution function
Usage
mdecdf(dta, pts)
Arguments
dta |
a matrix of data points |
pts |
a matrix of evaluation points |
Value
a numeric vector
Example for a new test
Description
This shows how a new test routine can be used with Mgof, based on chi square tests
Usage
newTS(x, pnull, p, TSextra)
Arguments
x |
a data set. |
pnull |
function to calculate expected counts. |
p |
parameter for pnull, if needed |
TSextra |
a list with setup info |
Value
a vector with either values of the test statistic, or p values
R function order(x,y) for Rcpp
Description
R function order(x,y) for Rcpp
Usage
orderC(x, y)
Arguments
x |
first vector |
y |
second vector |
Value
a vector of integers
Find probabilities from cdf for discrete data
Description
Find probabilities from cdf for discrete data
Usage
p2dC(x, cdf, p, Fx = as.numeric(c(-1)))
Arguments
x |
matrix with data |
cdf |
function to find distribution function |
p |
(possible) arguments for cdf |
Fx |
(if available) already calculated values of cdf |
Value
a matrix with probabilities added
find power of gof tests for continuous data
Description
find power of gof tests for continuous data
Usage
powerC(rnull, ralt, param_alt, TS, typeTS, TSextra, B = 1000L)
Arguments
rnull |
R function (generate data under null hypothesis) |
ralt |
R function to generate data under alternative |
param_alt |
parameters of ralt |
TS |
function to calculate test statistics |
typeTS |
integer indicating type of test statistic |
TSextra |
list to pass to TS |
B |
=1000 Number of simulation runs |
Value
A matrix of powers
Power estimation of tests that find p values
Description
This function finds the power for tests that find their own p values
Usage
power_pvals(
pnull,
ralt,
param_alt,
TS,
TSextra = list(aa = 0),
alpha = 0.05,
B = 1000
)
Arguments
pnull |
cdf function |
ralt |
function that generates data |
param_alt |
parameters for ralt |
TS |
routine that runs the test and returns p values |
TSextra |
=list(aa=0), a list of things passed to TS, if needed |
alpha |
=0.05 type I error probability of test |
B |
=1000 number of simulation runs |
Value
A matrix of power values
power_studies_cont_D5_results
Description
the results of the included power studies for continuous data without estimation in 5 dimensions
Usage
power_studies_cont_D5_results
Format
'power_studies_cont_D5_results'
A list of matrices with powers
power_studies_cont_est_results
Description
the results of the included power studies for continuous data with estimation
Usage
power_studies_cont_est_results
Format
'power_studies_cont_est_results'
A list of matrices with powers
power_studies_cont_hybrid_results
Description
the results of the included power studies for continuous data without estimation using two-sample methods
Usage
power_studies_cont_hybrid_results
Format
'power_studies_cont_hybrid_results'
A list of matrices with powers
power_studies_cont_nMC5_hybrid_results
Description
the results of the included power studies for continuous data without estimation using two-sample methods and nMC=5
Usage
power_studies_cont_nMC5_hybrid_results
Format
'power_studies_cont_nMC5_hybrid_results'
A list of matrices with powers
power_studies_cont_results
Description
the results of the included power studies for continuous data without estimation
Usage
power_studies_cont_results
Format
'power_studies_cont_results'
A list of matrices with powers
power_studies_disc_est_results
Description
the results of the included power studies for discrete data with estimation
Usage
power_studies_disc_est_results
Format
'power_studies_disc_est_results'
A list of matrices with powers
power_studies_disc_hybrid_results
Description
the results of the included power studies for discrete data without estimation using two-sample methods
Usage
power_studies_disc_hybrid_results
Format
'power_studies_disc_hybrid_results'
A list of matrices with powers
power_studies_disc_nMC5_hybrid_results
Description
the results of the included power studies for discrete data without estimation using two-sample methods and nMC=5
Usage
power_studies_disc_nMC5_hybrid_results
Format
'power_studies_disc_nMC5_hybrid_results'
A list of matrices with powers
power_studies_disc_results
Description
the results of the included power studies for discrete data without estimation
Usage
power_studies_disc_results
Format
'power_studies_disc_results'
A list of matrices with powers
This function performs a Rosenblatt transform
Description
This function performs a Rosenblatt transform
Usage
rosenblattC(x, cdf, p, Range)
Arguments
x |
data set (a matrix) |
cdf |
distribution function |
p |
(possible) parameters for cdf |
Range |
matrix with range of data |
Value
A matrix of transformed data
Benchmarking for Multivariate Goodness-of-fit Tests
Description
This function runs the case studies included in the package.
Usage
run.studies(
study,
Continuous = TRUE,
WithEstimation = FALSE,
Dim = 2,
TS,
TSextra,
With.p.value = FALSE,
nsample = 250,
nbins = c(5, 5),
alpha = 0.05,
param_alt,
SuppressMessages = TRUE,
B = 1000,
maxProcessor
)
Arguments
study |
either the name of the study, or its number in the list. If missing all the studies are run. |
Continuous |
=TRUE, run cases for continuous data. |
WithEstimation |
=FALSE, run case studies with or without parameter estimation? |
Dim |
=2 two or five-dimensional continuous data sets? |
TS |
routine to calculate new test statistics. |
TSextra |
list passed to TS (optional). |
With.p.value |
=FALSE, does user supplied routine return p values? |
nsample |
= 250, desired sample size. 250 is used in included case studies. |
nbins |
=c(5,5) number of bins for discretized data. |
alpha |
=0.05, type I error probability of tests. 0.05 is used in included case studies. |
param_alt |
(list of) values of parameter under the alternative hypothesis. If missing included values are used. |
SuppressMessages |
=TRUE, should informative messages be shown? |
B |
= 1000, number of simulation runs. |
maxProcessor |
number of cores to use. If missing the number of physical cores-1 is used. If set to 1 no parallel processing is done. |
Details
For details consult vignette(package="MDgof")
Value
A (list of ) matrices of p.values.
Examples
#Examples are run with a super small B=25 simulation runs to satisfy CRAN submission rules.
#Run a new test for studies 1-3 for continuous data and without estimation.
#The new test is an (included) chi square test that finds it's own p value.
TSextra=list(Continuous=TRUE, WithEstimation=FALSE, Withpvalue=TRUE)
MDgof::run.studies(Continuous=TRUE, WithEstimation=FALSE,
study=1:3, TS=MDgof::newTS, TSextra=TSextra,
With.p.value = TRUE, B=25, maxProcessor = 1)
#Run included tests for studies 1-3 for discrete data and without estimation,
#but with type I error alpha=0.1
p=MDgof::power_studies_disc_results[[3]][1:3,,drop=FALSE]
MDgof::run.studies(Continuous=FALSE, WithEstimation=FALSE,
study=1:3, param_alt=p,alpha=0.1, B=25, maxProcessor = 1)
This function does some rounding to nice numbers
Description
This function does some rounding to nice numbers
Usage
## S3 method for class 'digits'
signif(x, d = 3)
Arguments
x |
a list of two vectors |
d |
=4 number of digits to round to |
Value
A list with rounded vectors
Helper function to find test statistics of simulated data.
Description
Helper function to find test statistics of simulated data.
Usage
simTS(dta, TS, typeTS, TSextra, B)
Arguments
dta |
a matrix with data |
TS |
test statistic routine |
typeTS |
type of routine |
TSextra |
a list |
B |
number of simulation runs |
Value
a matrix
Helper function to find p values of simulated data.
Description
Helper function to find p values of simulated data.
Usage
simpvals(dta, TS, typeTS, TSextra, A, Ranges, nbins, minexpcount, B)
Arguments
dta |
a matrix with data |
TS |
test statistic routine |
typeTS |
type of routine |
TSextra |
a list |
A |
a matrix |
Ranges |
a matrix |
nbins |
a vector |
minexpcount |
an integer |
B |
number of simulation runs |
Value
a matrix
Rearrange 2D discrete data
Description
This function changes a discrete data set given as a nXm counting matrix to a nmX3 matrix
Usage
sq2rec(x)
Arguments
x |
a matrix of discrete data. |
Value
a rearranged matrix
run gof tests for continuous data
Description
run gof tests for continuous data
Usage
testC(dta, rnull, TS, typeTS, TSextra, B = 5000L)
Arguments
dta |
A numeric matrix of data |
rnull |
R function (generate data under null hypothesis) |
TS |
function that calculates test statistics |
typeTS |
integer indicating type of test statistic |
TSextra |
list to pass to TS |
B |
(=5000) Number of simulation runs |
Value
A matrix of numbers (test statistics and p values)
estimate run time function
Description
estimate run time function
Usage
timecheck(dta, TS, typeTS, TSextra)
Arguments
dta |
data set |
TS |
test statistic |
typeTS |
format of TS |
TSextra |
additional info TS |
Value
Mean computation time