| Type: | Package |
| Title: | Data Driving Multiple Classifier System |
| Version: | 1.0.1 |
| Description: | Provides a novel framework to able to automatically develop and deploy an accurate Multiple Classifier System based on the feature-clustering distribution achieved from an input dataset. 'D2MCS' was developed focused on four main aspects: (i) the ability to determine an effective method to evaluate the independence of features, (ii) the identification of the optimal number of feature clusters, (iii) the training and tuning of ML models and (iv) the execution of voting schemes to combine the outputs of each classifier comprising the Multiple Classifier System. |
| Date: | 2022-08-22 |
| License: | GPL-3 |
| URL: | https://github.com/drordas/D2MCS |
| BugReports: | https://github.com/drordas/D2MCS/issues |
| Depends: | R (≥ 4.2) |
| Imports: | caret, devtools, dplyr, FSelector, ggplot2, ggrepel, gridExtra, infotheo, mccr, mltools, ModelMetrics, questionr, recipes, R6, tictoc, varhandle |
| Suggests: | grDevices, knitr, rmarkdown, testthat (≥ 3.0.2) |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.2.1 |
| Encoding: | UTF-8 |
| NeedsCompilation: | no |
| Config/testthat/edition: | 2 |
| Packaged: | 2022-08-23 11:11:05 UTC; Maite |
| Author: | David Ruano-Ordás [aut, ctb], Miguel Ferreiro-Díaz [aut, cre], José Ramón Méndez [aut, ctb], University of Vigo [cph] |
| Maintainer: | Miguel Ferreiro-Díaz <miguel.ferreiro.diaz@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2022-08-23 11:40:02 UTC |
Computes the Accuracy measure.
Description
Computes the ratio of number of correct predictions to the total number of input samples.
Details
Accuracy = (Number Correct Predictions) / (Total Number of
Predictions)
Super class
D2MCS::MeasureFunction -> Accuracy
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Accuracy$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixused as basis to compute the performance.
Method compute()
The function computes the Accuracy achieved by the M.L. model.
Usage
Accuracy$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the Accuracy measure.
Details
This function is automatically invoke by the
ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
Accuracy$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix.
Plotting feature clusters following bi-class problem.
Description
The BinaryPlot implements a basic plot for
bi-class problem.
Super class
D2MCS::GenericPlot -> BinaryPlot
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
BinaryPlot$new()
Method plot()
Plots feature-clustering data from a bi-class problem.
Usage
BinaryPlot$plot(summary)
Arguments
summaryA data.frame comprising the elements to be plotted.
Method clone()
The objects of this class are cloneable with this method.
Usage
BinaryPlot$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Feature-clustering based on ChiSquare method.
Description
Performs feature-clustering based on ChiSquare method.
Super class
D2MCS::GenericHeuristic -> ChiSquareHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
ChiSquareHeuristic$new()
Method heuristic()
Functions responsible of performing the ChiSquare feature-clustering operation.
Usage
ChiSquareHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
ChiSquareHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Implementation of Majority Voting voting.
Description
Implementation of the parliamentary 'majority voting' procedure. The majority class value is defined as final class. All class values have the same importance.
Super class
D2MCS::SimpleVoting -> ClassMajorityVoting
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ClassMajorityVoting$new(cutoff = 0.5, class.tie = NULL, majority.class = NULL)
Arguments
cutoffA character vector defining the minimum probability used to perform a positive classification. If is not defined, 0.5 will be used as default value.
class.tieA character used to define the target class value used when a tie is found. If NULL positive class value will be assigned.
majority.classA character defining the value of the majority class. If NULL will be used same value as training stage.
Method getMajorityClass()
The function returns the value of the majority class.
Usage
ClassMajorityVoting$getMajorityClass()
Returns
A character vector of length 1 with the name of the majority class.
Method getClassTie()
The function gets the class value assigned to solve ties.
Usage
ClassMajorityVoting$getClassTie()
Returns
A character vector of length 1.
Method execute()
The function implements the majority voting procedure.
Usage
ClassMajorityVoting$execute(predictions, verbose = FALSE)
Arguments
predictionsA
ClusterPredictionsobject containing all the predictions achieved for each cluster.verboseA logical value to specify if more verbosity is needed.
Method clone()
The objects of this class are cloneable with this method.
Usage
ClassMajorityVoting$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, ClassMajorityVoting,
ClassWeightedVoting, ProbAverageVoting,
ProbAverageWeightedVoting, ProbBasedMethodology
Implementation Weighted Voting scheme.
Description
A new implementation of ClassMajorityVoting where
each class value has different values (weights).
Super class
D2MCS::SimpleVoting -> ClassWeightedVoting
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ClassWeightedVoting$new(cutoff = 0.5, weights = NULL)
Arguments
Method getWeights()
The function returns the weights used to perform the voting scheme.
Usage
ClassWeightedVoting$getWeights()
Returns
A numeric vector.
Method setWeights()
The function allows changing the value of the weights.
Usage
ClassWeightedVoting$setWeights(weights)
Arguments
weightsA numeric vector containing the new weights.
Method execute()
The function implements the cluster-weighted majority voting procedure.
Usage
ClassWeightedVoting$execute(predictions, verbose = FALSE)
Arguments
predictionsA
ClusterPredictionsobject containing all the predictions achieved for each cluster.verboseA logical value to specify if more verbosity is needed.
Method clone()
The objects of this class are cloneable with this method.
Usage
ClassWeightedVoting$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, ClassMajorityVoting,
ClassWeightedVoting, ProbAverageVoting,
ProbAverageWeightedVoting, ProbBasedMethodology
D2MCS Classification Output.
Description
Allows computing the classification performance values achieved
by D2MCS. The class is automatically created when D2MCS
classification method is invoked.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ClassificationOutput$new(voting.schemes, models)
Arguments
voting.schemesA list containing the voting schemes used (inherited from
VotingStrategy.modelsA list containing the used
Modelduring classification stage.
Method getMetrics()
The function returns the measures used during training stage.
Usage
ClassificationOutput$getMetrics()
Returns
A character vector or NULL if training was not performed.
Method getPositiveClass()
The function gets the name of the positive class used for training/classification.
Usage
ClassificationOutput$getPositiveClass()
Returns
A character vector of size 1.
Method getModelInfo()
The function compiled all the information concerning to the M.L. models used during training/classification.
Usage
ClassificationOutput$getModelInfo(metrics = NULL)
Arguments
metricsA character vector defining the metrics used during training/classification.
Returns
A list with the information of each M.L. model.
Method getPerformances()
The function is used to compute the performance of D2MCS.
Usage
ClassificationOutput$getPerformances( test.set, measures, voting.names = NULL, metric.names = NULL, cutoff.values = NULL )
Arguments
test.setA
Subsetobject used to compute the performance.measuresA character vector with the measures to be used to compute performance value (inherited from
MeasureFunction).voting.namesA character vector with the name of the voting schemes to analyze the performance. If not defined, all the voting schemes used during classification stage will be taken into account.
metric.namesA character containing the measures used during training stage. If not defined, all training metrics used during classification will be taken into account.
cutoff.valuesA character vector defining the minimum probability used to perform a a positive classification. If is not defined, all cutoffs used during classification stage will be taken into account.
dir.pathA character vector with location where the plot will be saved.
Returns
A list of performance values.
Method savePerformances()
The function is used to save the computed predictions into a CSV file.
Usage
ClassificationOutput$savePerformances( dir.path, test.set, measures, voting.names = NULL, metric.names = NULL, cutoff.values = NULL )
Arguments
dir.pathA character vector with location where the plot will be saved.
test.setA
Subsetobject used to compute the performance.measuresA character vector with the measures to be used to compute performance value (inherited from
MeasureFunction).voting.namesA character vector with the name of the voting schemes to analyze the performance. If not defined, all the voting schemes used during classification stage will be taken into account.
metric.namesA character containing the measures used during training stage. If not defined, all training metrics used during classification will be taken into account.
cutoff.valuesA character vector defining the minimum probability used to perform a a positive classification. If is not defined, all cutoffs used during classification stage will be taken into account.
Method plotPerformances()
The function allows to graphically visualize the computed performance.
Usage
ClassificationOutput$plotPerformances( dir.path, test.set, measures, voting.names = NULL, metric.names = NULL, cutoff.values = NULL )
Arguments
dir.pathA character vector with location where the plot will be saved.
test.setA
Subsetobject used to compute the performance.measuresA character vector with the measures to be used to compute performance value (inherited from
MeasureFunction).voting.namesA character vector with the name of the voting schemes to analyze the performance. If not defined, all the voting schemes used during classification stage will be taken into account.
metric.namesA character containing the measures used during training stage. If not defined, all training metrics used during classification will be taken into account.
cutoff.valuesA character vector defining the minimum probability used to perform a positive classification. If is not defined, all cutoffs used during classification stage will be taken into account.
Method getPredictions()
The function is used to obtain the computed predictions.
Usage
ClassificationOutput$getPredictions( voting.names = NULL, metric.names = NULL, cutoff.values = NULL, type = NULL, target = NULL, filter = FALSE )
Arguments
voting.namesA character vector with the name of the voting schemes to analyze the performance. If not defined, all the voting schemes used during classification stage will be taken into account.
metric.namesA character containing the measures used during training stage. If not defined, all training metrics used during classification will be taken into account.
cutoff.valuesA character vector defining the minimum probability used to perform a a positive classification. If is not defined, all cutoffs used during classification stage will be taken into account.
typeA character to define which type of predictions should be returned. If not defined all type of probabilities will be returned. Conversely if "prob" or "raw" is defined then computed 'probabilistic' or 'class' values are returned.
targetA character defining the value of the positive class.
filterA logical value used to specify if only predictions matching the target value should be returned or not. If TRUE the function returns only the predictions matching the target value. Conversely if FALSE (by default) the function returns all the predictions.
Returns
A PredictionOutput object.
Method savePredictions()
The function saves the predictions into a CSV file.
Usage
ClassificationOutput$savePredictions( dir.path, voting.names = NULL, metric.names = NULL, cutoff.values = NULL, type = NULL, target = NULL, filter = FALSE )
Arguments
dir.pathA character vector with location defining the location of the CSV file.
voting.namesA character vector with the name of the voting schemes to analyze the performance. If not defined, all the voting schemes used during classification stage will be taken into account.
metric.namesA character containing the measures used during training stage. If not defined, all training metrics used during classification will be taken into account.
cutoff.valuesA character vector defining the minimum probability used to perform a positive classification. If is not defined, all cutoffs used during classification stage will be taken into account.
typeA character to define which type of predictions should be returned. If not defined all type of probabilities will be returned. Conversely if "prob" or "raw" is defined then computed 'probabilistic' or 'class' values are returned.
targetA character defining the value of the positive class.
filterA logical value used to specify if only predictions matching the target value should be returned or not. If TRUE the function returns only the predictions matching the target value. Conversely if FALSE (by default) the function returns all the predictions.
Method clone()
The objects of this class are cloneable with this method.
Usage
ClassificationOutput$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Manages the predictions achieved on a cluster.
Description
Stores the predictions achieved by the best M.L. of each cluster.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ClusterPredictions$new(class.values, positive.class)
Arguments
Method add()
The function is used to add the prediction achieved by a specific M.L. model.
Usage
ClusterPredictions$add(prediction)
Arguments
predictionA
Predictionobject containing the computed predictions.
Method get()
The function returns the predictions placed at specific position.
Usage
ClusterPredictions$get(position)
Arguments
positionA numeric value indicating the position of the predictions to be obtained.
Returns
A Prediction object.
Method getAll()
The function returns all the predictions.
Usage
ClusterPredictions$getAll()
Returns
A list containing all computed predictions.
Method size()
The function returns the number of computed predictions.
Usage
ClusterPredictions$size()
Returns
A numeric value.
Method getPositiveClass()
The function gets the value of the positive class.
Usage
ClusterPredictions$getPositiveClass()
Returns
A character vector of size 1.
Method getClassValues()
The function returns all the values of the target class.
Usage
ClusterPredictions$getClassValues()
Returns
A character vector containing all target values.
Method clone()
The objects of this class are cloneable with this method.
Usage
ClusterPredictions$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, ClassificationOutput,
Prediction
Abstract class to compute the class prediction based on combination between metrics.
Description
Abstract class used as a template to define new customized strategies to combine the class predictions made by different metrics.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
CombinedMetrics$new(required.metrics)
Arguments
required.metricsA character vector of length greater than 2 with the name of the required metrics.
Method getRequiredMetrics()
The function returns the required metrics that will participate in the combined metric process.
Usage
CombinedMetrics$getRequiredMetrics()
Returns
A character vector of length greater than 2 with the name of the required metrics.
Method getFinalPrediction()
Function used to implement the strategy to obtain the final prediction based on different metrics.
Usage
CombinedMetrics$getFinalPrediction( raw.pred, prob.pred, positive.class, negative.class )
Arguments
raw.predA character list of length greater than 2 with the class value of the predictions made by the metrics.
prob.predA numeric list of length greater than 2 with the probability of the predictions made by the metrics.
positive.classA character with the value of the positive class.
negative.classA character with the value of the negative class.
Returns
A logical value indicating if the instance is predicted as positive class or not.
Method clone()
The objects of this class are cloneable with this method.
Usage
CombinedMetrics$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Implementation of Combined Voting.
Description
Calculates the final prediction by performing the result of the
predictions of different metrics obtained through a SimpleVoting
class.
Super class
D2MCS::VotingStrategy -> CombinedVoting
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
CombinedVoting$new(voting.schemes, combined.metrics, methodology, metrics)
Arguments
voting.schemesA list of elements inherited from
SimpleVoting.combined.metricsAn object defining the metrics used to combine the voting schemes. The object must inherit from
CombinedMetricsclass.methodologyAn object specifying the methodology used to execute the combined voting. Object inherited from
MethodologyobjectmetricsA character vector with the name of the metrics used to perform the combined voting operations. Metrics should be previously defined during training stage.
Method getCombinedMetrics()
The function returns the metrics used to combine the metrics results.
Usage
CombinedVoting$getCombinedMetrics()
Returns
An object inherited from CombinedMetrics class.
Method getMethodology()
The function gets the methodology used to execute the combined votings.
Usage
CombinedVoting$getMethodology()
Returns
An object inherited from Methodology class.
Method getFinalPred()
The function returns the predictions obtained after executing the combined-voting methodology.
Usage
CombinedVoting$getFinalPred(type = NULL, target = NULL, filter = NULL)
Arguments
typeA character to define which type of predictions should be returned. If not defined all type of probabilities will be returned. Conversely if "prob" or "raw" is defined then computed 'probabilistic' or 'class' values are returned.
targetA character defining the value of the positive class.
filterA logical value used to specify if only predictions matching the target value should be returned or not. If TRUE the function returns only the predictions matching the target value. Conversely if FALSE (by default) the function returns all the predictions.
Returns
A data.frame with the computed predictions.
Method execute()
The function implements the combined voting scheme.
Usage
CombinedVoting$execute(predictions, verbose = FALSE)
Arguments
predictionsA
ClusterPredictionsobject containing the predictions computed for each cluster.verboseA logical value to specify if more verbosity is needed.
Method clone()
The objects of this class are cloneable with this method.
Usage
CombinedVoting$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, ClassMajorityVoting,
ClassWeightedVoting, ProbAverageVoting,
ProbAverageWeightedVoting, ProbBasedMethodology,
SimpleVoting
Confusion matrix wrapper.
Description
Creates a R6 confusion matrix from the
confusionMatrix caret package.
Methods
Public methods
Method new()
Method to create a confusion matrix object from a
caret confusionMatrix
Usage
ConfMatrix$new(confMatrix)
Arguments
confMatrixA
caretconfusionMatrix argument.
Method getConfusionMatrix()
The function obtains the confusionMatrix
following the same structured as defined in the caret package
Usage
ConfMatrix$getConfusionMatrix()
Returns
A confusionMatrix object.
Method getTP()
The function is used to compute the number of True Positive values achieved.
Usage
ConfMatrix$getTP()
Returns
A numeric vector of size 1.
Method getTN()
The function computes the True Negative values.
Usage
ConfMatrix$getTN()
Returns
A numeric vector of size 1.
Method getFN()
The function returns the number of Type II errors (False Negative).
Usage
ConfMatrix$getFN()
Returns
A numeric vector of size 1.
Method getFP()
The function returns the number of Type I errors (False Negative).
Usage
ConfMatrix$getFP()
Returns
A numeric vector of size 1.
Method clone()
The objects of this class are cloneable with this method.
Usage
ConfMatrix$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, MeasureFunction,
ClassificationOutput
Data Driven Multiple Classifier System.
Description
The class is responsible of managing the whole process. Concretely builds the M.L. models (optimizes models hyperparameters), selects the best M.L. model for each cluster and executes the classification stage.
Methods
Public methods
Method new()
The function is used to initialize all parameters needed to build a Multiple Classifier System.
Usage
D2MCS$new( dir.path, num.cores = NULL, socket.type = "PSOCK", outfile = NULL, serialize = FALSE )
Arguments
dir.pathA character defining location were the trained models should be saved.
num.coresAn optional numeric value specifying the number of CPU cores used for training the models (only if parallelization is allowed). If not defined (num.cores - 2) cores will be used.
socket.typeA character value defining the type of socket used to communicate the workers. The default type,
"PSOCK", calls makePSOCKcluster. Type"FORK"calls makeForkCluster. For more information seemakeClusteroutfileWhere to direct the stdout and stderr connection output from the workers. "" indicates no redirection (which may only be useful for workers on the local machine). Defaults to '/dev/null'
serializeA
logicalvalue. If TRUE (default) serialization will use XDR: where large amounts of data are to be transferred and all the nodes are little-endian, communication may be substantially faster if this is set to false.
Method train()
The function is responsible of performing the M.L. model training stage.
Usage
D2MCS$train( train.set, train.function, num.clusters = NULL, model.recipe = DefaultModelFit$new(), ex.classifiers = c(), ig.classifiers = c(), metrics = NULL, saveAllModels = FALSE )
Arguments
train.setA
Trainsetobject used as training input for the M.L. modelstrain.functionA
TrainFunctiondefining the training configuration options.num.clustersAn numeric value used to define the number of clusters from the
Trainsetthat should be utilized during the training stage. If not defined all clusters will we taken into account for training.model.recipeAn unprepared recipe object inherited from
GenericModelFitclass.ex.classifiersA character vector containing the name of the M.L. models used in training stage. See
getModelInfoand https://topepo.github.io/caret/available-models.html for more information about all the available models.ig.classifiersA character vector containing the name of the M.L. that should be ignored when performing the training stage. See
getModelInfoand https://topepo.github.io/caret/available-models.html for more information about all the available models.metricsA character vector containing the metrics used to perform the M.L. model hyperparameter optimization during the training stage. See
SummaryFunction,UseProbabilityandNoProbabilityfor more information.saveAllModelsA logical parameter. A TRUE saves all trained models while A FALSE saves only the M.L. model achieving the best performance on each cluster.
Returns
A TrainOutput object containing all the information
computed during the training stage.
Method classify()
The function is responsible for executing the classification stage.
Usage
D2MCS$classify(train.output, subset, voting.types, positive.class = NULL)
Arguments
train.outputThe
TrainOutputobject computed in the train stage.subsetA
Subsetcontaining the data to be classified.voting.typesA list containing
SingleVotingorCombinedVotingobjects.positive.classAn optional character parameter used to define the positive class value.
Returns
A ClassificationOutput with all the values computed
during classification stage.
Method getAvailableModels()
The function obtains all the available M.L. models.
Usage
D2MCS$getAvailableModels()
Returns
A data.frame containing the information of the available M.L. models.
Method clone()
The objects of this class are cloneable with this method.
Usage
D2MCS$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Examples
# Specify the random number generation
set.seed(1234)
## Create Dataset Handler object.
loader <- DatasetLoader$new()
## Load 'hcc-data-complete-balanced.csv' dataset file.
data <- loader$load(filepath = system.file(file.path("examples",
"hcc-data-complete-balanced.csv"),
package = "D2MCS"),
header = TRUE, normalize.names = TRUE)
## Get column names
data$getColumnNames()
## Split data into 4 partitions keeping balance ratio of 'Class' column.
data$createPartitions(num.folds = 4, class.balance = "Class")
## Create a subset comprising the first 2 partitions for clustering purposes.
cluster.subset <- data$createSubset(num.folds = c(1, 2), class.index = "Class",
positive.class = "1")
## Create a subset comprising second and third partitions for trainning purposes.
train.subset <- data$createSubset(num.folds = c(2, 3), class.index = "Class",
positive.class = "1")
## Create a subset comprising last partitions for testing purposes.
test.subset <- data$createSubset(num.folds = 4, class.index = "Class",
positive.class = "1")
## Distribute the features into clusters using MCC heuristic.
distribution <- SimpleStrategy$new(subset = cluster.subset,
heuristic = MCCHeuristic$new())
distribution$execute()
## Get the best achieved distribution
distribution$getBestClusterDistribution()
## Create a train set from the computed clustering distribution
train.set <- distribution$createTrain(subset = train.subset)
## Not run:
## Initialization of D2MCS configuration parameters.
## - Defining training operation.
## + 10-fold cross-validation
## + Use only 1 CPU core.
## + Seed was set to ensure straightforward reproductivity of experiments.
trFunction <- TwoClass$new(method = "cv", number = 10, savePredictions = "final",
classProbs = TRUE, allowParallel = TRUE,
verboseIter = FALSE, seed = 1234)
#' ## - Specify the models to be trained
ex.classifiers <- c("ranger", "lda", "lda2")
## Initialize D2MCS
#' d2mcs <- D2MCS$new(dir.path = tempdir(),
num.cores = 1)
## Execute training stage for using 'MCC' and 'PPV' measures to optimize model hyperparameters.
trained.models <- d2mcs$train(train.set = train.set,
train.function = trFunction,
ex.classifiers = ex.classifiers,
metrics = c("MCC", "PPV"))
## Execute classification stage using two different voting schemes
predictions <- d2mcs$classify(train.output = trained.models,
subset = test.subset,
voting.types = c(
SingleVoting$new(voting.schemes = c(ClassMajorityVoting$new(),
ClassWeightedVoting$new()),
metrics = c("MCC", "PPV"))))
## Compute the performance of each voting scheme using PPV and MMC measures.
predictions$getPerformances(test.subset, measures = list(MCC$new(), PPV$new()))
## Execute classification stage using multiple voting schemes (simple and combined)
predictions <- d2mcs$classify(train.output = trained.models,
subset = test.subset,
voting.types = c(
SingleVoting$new(voting.schemes = c(ClassMajorityVoting$new(),
ClassWeightedVoting$new()),
metrics = c("MCC", "PPV")),
CombinedVoting$new(voting.schemes = ClassMajorityVoting$new(),
combined.metrics = MinimizeFP$new(),
methodology = ProbBasedMethodology$new(),
metrics = c("MCC", "PPV"))))
## Compute the performance of each voting scheme using PPV and MMC measures.
predictions$getPerformances(test.subset, measures = list(MCC$new(), PPV$new()))
## End(Not run)
Iterator over a Subset object
Description
Creates a DIterator object to iterate over the
Subset.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
DIterator$new(data, chunk.size, verbose)
Arguments
dataA data.frame structure to be iterated.
chunk.sizeAn integer value indicating the size of chunks taken over each iteration. By default
chunk.sizeis defined as 10000.verboseA logical value to specify if more verbosity is needed.
Method getNext()
Gets the next chunk of data. Each iteration returns the same
instances (data.frame rows) as chunk.size. However, if remaining data if
less than chunk size, all the remaining data is returned. Conversely,
NULL when there is no more pending data. By default
chunk.size is defined as 10000.
Usage
DIterator$getNext()
Returns
A data.frame of NULL if all the data have been previously returned.
Method isLast()
Checks if the DIterator object reached the end
of the data.frame
Usage
DIterator$isLast()
Returns
A logical value indicating if the end of data.frame has been reached.
Method finalize()
Destroys the DIterator object.
Usage
DIterator$finalize()
Method clone()
The objects of this class are cloneable with this method.
Usage
DIterator$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Simple Dataset handler.
Description
Creates a valid simple dataset object.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Dataset$new( filepath, header = TRUE, sep = ",", skip = 0, normalize.names = FALSE, string.as.factor = FALSE, ignore.columns = NULL )
Arguments
filepathThe name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, '
getwd()'.headerA logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: '
header' is set to 'TRUE' if and only if the first row contains one fewer field than the number of columns.sepThe field separator character. Values on each line of the file are separated by this character.
skipDefines the number of header lines should be skipped.
normalize.namesA logical value indicating whether the columns names should be automatically renamed to ensure R compatibility.
string.as.factorA logical value indicating if character columns should be converted to factors (
default = FALSE).ignore.columnsSpecify the columns from the input file that should be ignored.
Method getColumnNames()
Get the name of the columns comprising the dataset.
Usage
Dataset$getColumnNames()
Returns
A character vector with the name of each column.
Method getDataset()
Gets the full dataset.
Usage
Dataset$getDataset()
Returns
A data.frame with all the loaded information.
Method getNcol()
Obtains the number of columns present in the dataset.
Usage
Dataset$getNcol()
Returns
An integer of length 1 or NULL
Method getNrow()
Obtains the number of rows present in the dataset.
Usage
Dataset$getNrow()
Returns
An integer of length 1 or NULL
Method getRemovedColumns()
Get the columns removed or ignored.
Usage
Dataset$getRemovedColumns()
Returns
A list containing the name of the removed columns.
Method cleanData()
Removes data.frame columns matching some criterion.
Usage
Dataset$cleanData(remove.funcs = NULL, remove.na = TRUE, remove.const = FALSE)
Arguments
Method removeColumns()
Applies cleanData function over an specific set of
columns.
Usage
Dataset$removeColumns( columns, remove.funcs = NULL, remove.na = FALSE, remove.const = FALSE )
Arguments
columnsSet of columns (numeric or character) where removal operation should be applied.
remove.funcsA vector of functions use to define which columns must be removed.
remove.naA logical value indicating whether
NAvalues should be removed.remove.constA logical value used to indicate if constant values should be removed.
Method createPartitions()
Creates a k-folds partition from the initial dataset.
Usage
Dataset$createPartitions( num.folds = NULL, percent.folds = NULL, class.balance = NULL )
Arguments
Method createSubset()
Creates a Subset for testing or classification
purposes. A target class should be provided for testing purposes.
Usage
Dataset$createSubset( num.folds = NULL, opts = list(remove.na = TRUE, remove.const = FALSE), class.index = NULL, positive.class = NULL )
Arguments
num.foldsA numeric defining the number of folds that should we used to build the Subset.
optsA list with optional parameters. Valid arguments are
remove.na(removes columns with NA values) andremove.const(ignore columns with constant values).class.indexA numeric value identifying the column representing the target class
positive.classDefines the positive class value.
Returns
A Subset object.
Method createTrain()
Creates a set for training purposes. A class should be defined to guarantee full-compatibility with supervised models.
Usage
Dataset$createTrain( class.index, positive.class, num.folds = NULL, opts = list(remove.na = TRUE, remove.const = FALSE) )
Arguments
class.indexA numeric value identifying the column representing the target class
positive.classDefines the positive class value.
num.foldsA numeric defining the number of folds that should we used to build the
Subset.optsA list with optional parameters. Valid arguments are
remove.na(removes columns with NA values) andremove.const(ignore columns with constant values).
Returns
A Trainset object.
See Also
Dataset creation.
Description
Wrapper class able to automatically create a
Dataset, HDDataset according to the input data.
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
DatasetLoader$new()
Method load()
Stores the input source into a Dataset or
HDDataset type object.
Usage
DatasetLoader$load( filepath, header = TRUE, sep = ",", skip.lines = 0, normalize.names = FALSE, string.as.factor = FALSE, ignore.columns = NULL )
Arguments
filepathThe name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, '
getwd()'.headerA logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: '
header' is set to 'TRUE' if and only if the first row contains one fewer field than the number of columns.sepThe field separator character. Values on each line of the file are separated by this character.
skip.linesDefines the number of header lines should be skipped.
normalize.namesA logical value indicating whether the columns names should be automatically renamed to ensure R compatibility.
string.as.factorA logical value indicating if character columns should be converted to factors (default = FALSE).
ignore.columnsSpecify the columns from the input file that should be ignored.
Returns
A Dataset or HDDataset object.
See Also
Examples
## Not run:
# Create Dataset Handler object.
loader <- DatasetLoader$new()
# Load input file.
data <- loader$load(filepath = system.file(file.path("examples",
"hcc-data-complete-balanced.csv"),
package = "D2MCS"),
header = T, normalize.names = T)
## End(Not run)
Default model fitting implementation.
Description
Creates a default recipe and
formula objects used in model training stage.
Super class
D2MCS::GenericModelFit -> DefaultModelFit
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
DefaultModelFit$new()
Method createFormula()
The function is responsible of creating a
formula for M.L. model.
Usage
DefaultModelFit$createFormula(instances, class.name, simplify = FALSE)
Arguments
instancesA data.frame containing the instances used to create the recipe.
class.nameA character vector representing the name of the target class.
simplifyA logical argument defining whether the formula should be generated as simple as possible.
Returns
A formula object.
Method createRecipe()
The function is responsible of creating a
recipe with five operations over the data:
step_zv, step_nzv,
step_corr, step_center,
step_scale
Usage
DefaultModelFit$createRecipe(instances, class.name)
Arguments
instancesA
data.framecontaining the instances used to create the recipe.class.nameA
charactervector representing the name of the target class.
Details
This function is automatically invoked by D2MCS
during model training stage.
Returns
An object of class recipe.
Method clone()
The objects of this class are cloneable with this method.
Usage
DefaultModelFit$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Clustering strategy based on dependency between features.
Description
Features are distributed according to their independence values. This strategy is divided into two steps. The first phase focuses on forming groups with those features most dependent on each other. This step also identifies those that are independent from all the others in the group. The second step is to try out different numbers of clusters until you find the one you think is best. These clusters are formed by inserting in all the independent characteristics identified previously and trying to distribute the features of the groups formed in the previous step in separate clusters. In this way, it seeks to ensure that the features are as independent as possible from those found in the same cluster.
Details
The strategy is suitable only for binary and real features. Other
features are automatically grouped into a specific cluster named as
'unclustered'. This class requires the StrategyConfiguration
type object implements the following methods:
- getBinaryCutoff(): The function is used to define the interval to
consider the dependency between binary features.
- getRealCutoff(): The function allows defining the cutoff to consider
the dependency between real features.
- tiebreak(feature, clus.candidates, fea.dep.dist.clus, corpus,
heuristic, class, class.name): The function solves the ties between two
(or more) features.
- qualityOfCluster(clusters, metrics): The function determines the
quality of a cluster
- isImprovingClustering(clusters.deltha): The function indicates if
clustering is getting better as the number of them increases.
An example of implementation with the description of each parameter is the
DependencyBasedStrategyConfiguration class.
Super class
D2MCS::GenericClusteringStrategy -> DependencyBasedStrategy
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object parameters during runtime.
Usage
DependencyBasedStrategy$new( subset, heuristic, configuration = DependencyBasedStrategyConfiguration$new() )
Arguments
subsetThe
Subsetused to apply the feature-clustering strategy.heuristicThe heuristic used to compute the relevance of each feature. Must inherit from
GenericHeuristicabstract class.configurationoptional parameter to customize configuration parameters for the strategy. Must inherited from
StrategyConfigurationabstract class.
Method execute()
Function responsible of performing the dependency-based
feature clustering strategy over the defined Subset.
Usage
DependencyBasedStrategy$execute(verbose = TRUE)
Arguments
verboseA logical value to specify if more verbosity is needed.
Method getDistribution()
Function used to obtain a specific cluster distribution.
Usage
DependencyBasedStrategy$getDistribution( num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
Returns
A list with the features comprising an specific clustering distribution.
Method createTrain()
The function is used to create a Trainset
object from a specific clustering distribution.
Usage
DependencyBasedStrategy$createTrain( subset, num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
subsetThe
Subsetobject used as a basis to create the train set (seeTrainsetclass).num.clustersA numeric value to select the number of clusters (define the distribution).
num.groupsA single or numeric vector value to identify a specific group that forms the clustering distribution.
include.unclusteredA logical value to determine if unclustered features should be included.
Details
If num.clusters and num.groups are not defined,
best clustering distribution is used to create the train set.
Method plot()
The function is responsible for creating a plot to visualize the clustering distribution.
Usage
DependencyBasedStrategy$plot(dir.path = NULL, file.name = NULL)
Arguments
dir.pathAn optional argument to define the name of the directory where the exported plot will be saved. If not defined, the file path will be automatically assigned to the current working directory, '
getwd()'.file.nameA character to define the name of the PDF file where the plot is exported.
Method saveCSV()
The function is used to save the clustering distribution to a CSV file.
Usage
DependencyBasedStrategy$saveCSV( dir.path = NULL, name = NULL, num.clusters = NULL )
Arguments
dir.pathThe name of the directory to save the CSV file.
nameDefines the name of the CSV file.
num.clustersAn optional parameter to select the number of clusters to be saved. If not defined, all cluster distributions will be saved.
Method clone()
The objects of this class are cloneable with this method.
Usage
DependencyBasedStrategy$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
GenericClusteringStrategy,
StrategyConfiguration,
DependencyBasedStrategyConfiguration
Custom Strategy Configuration handler for the DependencyBasedStrategy strategy.
Description
Define the default configuration parameters for the DependencyBasedStrategy strategy.
Super class
D2MCS::StrategyConfiguration -> DependencyBasedStrategyConfiguration
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
DependencyBasedStrategyConfiguration$new( binaryCutoff = 0.6, realCutoff = 0.6, tiebreakMethod = "lfdc", metric = "dep.tar" )
Arguments
binaryCutoffThe numeric value of binary cutoff.
realCutoffThe numeric value of real cutoff.
tiebreakMethodThe character value of tie-break method. The two tiebreak methods available are "lfdc" (less dependence cluster with the features) and "ltdc" (less dependence cluster with the target). These methods are used to add the features in the candidate feature clusters.
metricThe character value of the metric to apply the mean to obtain the quality of a cluster. The two metrics available are "dep.tar" (Dependence of cluster features on the target) and "dep.fea" (Dependence between cluster features).
Method minNumClusters()
Function used to return the minimum number of clusters distributions used. By default the minimum is set in 2.
Usage
DependencyBasedStrategyConfiguration$minNumClusters(...)
Arguments
...Further arguments passed down to
minNumClustersfunction.
Returns
A numeric vector of length 1.
Method maxNumClusters()
The function is responsible of returning the maximum number of cluster distributions used. By default the maximum number is set in 50.
Usage
DependencyBasedStrategyConfiguration$maxNumClusters(...)
Arguments
...Further arguments passed down to
maxNumClustersfunction.
Returns
A numeric vector of length 1.
Method getBinaryCutoff()
Gets the cutoff to consider the dependency between binary features.
Usage
DependencyBasedStrategyConfiguration$getBinaryCutoff()
Returns
The numeric value of binary cutoff.
Method getRealCutoff()
Gets the cutoff to consider the dependency between real features.
Usage
DependencyBasedStrategyConfiguration$getRealCutoff()
Returns
The numeric value of real cutoff.
Method setBinaryCutoff()
Sets the cutoff to consider the dependency between binary features.
Usage
DependencyBasedStrategyConfiguration$setBinaryCutoff(cutoff)
Arguments
cutoffThe new numeric value of binary cutoff.
Method setRealCutoff()
Sets the cutoff to consider the dependency between real features.
Usage
DependencyBasedStrategyConfiguration$setRealCutoff(cutoff)
Arguments
cutoffThe new numeric value of real cutoff.
Method tiebreak()
The function solves the ties between two (or more) features.
Usage
DependencyBasedStrategyConfiguration$tiebreak( feature, clus.candidates, fea.dep.dist.clus, corpus, heuristic, class, class.name )
Arguments
featureA character containing the name of the feature
clus.candidatesA single or numeric vector value to identify the candidate groups to insert the feature.
fea.dep.dist.clusA list containing the groups chosen for the features.
corpusA data.frame containing the features of the initial data.
heuristicThe heuristic used to compute the relevance of each feature. Must inherit from GenericHeuristic abstract class.
classA character vector containing all the values of the target class.
class.nameA character value representing the name of the target class.
Method qualityOfCluster()
The function determines the quality of a cluster.
Usage
DependencyBasedStrategyConfiguration$qualityOfCluster(clusters, metrics)
Arguments
Returns
A numeric vector of length 1.
Method isImprovingClustering()
The function indicates if clustering is getting better as the number of them increases.
Usage
DependencyBasedStrategyConfiguration$isImprovingClustering(clusters.deltha)
Arguments
clusters.delthaA numeric vector value with the quality values of the built clusters.
Returns
A numeric vector of length 1.
Method clone()
The objects of this class are cloneable with this method.
Usage
DependencyBasedStrategyConfiguration$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
StrategyConfiguration,
DependencyBasedStrategy
Handles training of M.L. models
Description
Allows to manage the executed M.L. models.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ExecutedModels$new(dir.path)
Arguments
dir.pathThe location were the executed models will be saved.
Method getNames()
The function is used to obtain the name of the ML model achieved the best performance during training stage.
Usage
ExecutedModels$getNames()
Returns
A character vector of length 1 of NULL if no ML model have been trained.
Method getBest()
The function is responsible of returning the model achieving the best performance value during training stage.
Usage
ExecutedModels$getBest()
Returns
A Model object.
Method add()
The function inserts a new model to the list of executed models.
Usage
ExecutedModels$add(model, keep.best = TRUE)
Arguments
Method exist()
The function is used to discern if a specific model has been executed previously.
Usage
ExecutedModels$exist(model.name)
Arguments
model.nameA character vector with the name of the model to check for existence.
Returns
A logical value. TRUE if the model exists and FALSE otherwise.
Method size()
The function is used to compute the number of executed ML models.
Usage
ExecutedModels$size()
Returns
A numeric vector or size 1.
Method save()
The function is responsible of saving the information of all executed models into a hidden file.
Usage
ExecutedModels$save()
Method delete()
The function removes an specific model.
Usage
ExecutedModels$delete(model.name)
Arguments
model.nameA character vector with the name of the model to be removed.
Method clone()
The objects of this class are cloneable with this method.
Usage
ExecutedModels$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Iterator over a file.
Description
Creates a FIterator object to iterate over high
dimensional files.
Details
Use HDDataset class to ensure the creation of a valid
FIterator object.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
FIterator$new(config.params, chunk.size, verbose)
Arguments
Method getNext()
Gets the next chunk of data. Each iteration returns the same
instances (data.frame rows) as chunk.size. However, if remaining data if
less than chunk size, all the remaining data is returned. Conversely,
NULL when there is no more pending data. By default
chunk.size is defined as 10000.
Usage
FIterator$getNext()
Returns
A data.frame of NULL if all the data have been previously returned.
Method isLast()
Checks if the FIterator object reached the end
of the data.frame
Usage
FIterator$isLast()
Returns
A logical value indicating if the end of data.frame has been reached.
Method finalize()
Destroys the FIterator object.
Usage
FIterator$finalize()
Method clone()
The objects of this class are cloneable with this method.
Usage
FIterator$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the False Negative errors.
Description
Computes the ratio of number of Type II errors achieved by the final M.L. model.
Super class
D2MCS::MeasureFunction -> FN
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
FN$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used to compute the FN measure.
Method compute()
The function computes the FN achieved by the M.L. model.
Usage
FN$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the FN measure
Details
This function is automatically invoked by the
ClassificationOutput framework.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
FN$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Computes the False Positive value.
Description
This is the number of individuals with a negative condition for which the test result is positive. The value entered here must be non-negative.
Super class
D2MCS::MeasureFunction -> FP
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
FP$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter used as basis to define the type of compute theFPmeasure.
Method compute()
The function computes the FP achieved by the M.L. model.
Usage
FP$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute theFPmeasure.
Details
This function is automatically invoked by the
ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
FP$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Stores the prediction for a specific voting scheme.
Description
The class is used to store the computed probability after executing an specific voting scheme.
Methods
Public methods
Method new()
Method for initializing the object variables during runtime.
Usage
FinalPred$new()
Method set()
Sets the computed probabilities after executing an specific voting scheme.
Usage
FinalPred$set(prob, raw, class.values, positive.class)
Arguments
Method getProb()
Gets the probabilities of the prediction for a specific voting scheme.
Usage
FinalPred$getProb()
Returns
The vector value of probabilities of the prediction for a specific voting scheme.
Method getRaw()
Gets the raw results of the prediction for a specific voting scheme.
Usage
FinalPred$getRaw()
Returns
The vector value of raw results of the prediction for a specific voting scheme.
Method getClassValues()
Gets the class values (positive class + negative class).
Usage
FinalPred$getClassValues()
Returns
The vector value of class values.
Method getPositiveClass()
Gets the positive class.
Usage
FinalPred$getPositiveClass()
Returns
The character value of positive class.
Method getNegativeClass()
Gets the negative class.
Usage
FinalPred$getNegativeClass()
Returns
The character value of negative class.
Method clone()
The objects of this class are cloneable with this method.
Usage
FinalPred$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Prediction, SimpleVoting,
SingleVoting, CombinedVoting,
VotingStrategy
Feature-clustering based on Fisher's Exact Test.
Description
Performs feature-clustering based on Fisher's exact test for testing the null of independence of rows and columns in a contingency table with fixed marginals.
Super class
D2MCS::GenericHeuristic -> FisherTestHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
FisherTestHeuristic$new()
Method heuristic()
Performs the Fisher's exact test for testing the null of independence between two columns (col1 and col2).
Usage
FisherTestHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
FisherTestHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Feature-clustering based on GainRatio methodology.
Description
Performs the feature-clustering using entropy-based filters.
Super class
D2MCS::GenericHeuristic -> GainRatioHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
GainRatioHeuristic$new()
Method heuristic()
The algorithms find weights of discrete attributes basing on their correlation with continuous class attribute.
Usage
GainRatioHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
GainRatioHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Dataset, gain.ratio
Abstract Feature Clustering Strategy class.
Description
Abstract class used as a template to ensure the proper definition of new customized clustering strategies.
Details
The GenericClusteringStrategy is an archetype class so it cannot be instantiated.
Methods
Public methods
Method new()
A function responsible for creating a GenericClusteringStrategy object.
Usage
GenericClusteringStrategy$new(subset, heuristic, description, configuration)
Arguments
subsetA
Subsetobject to perform the clustering strategy.heuristicThe heuristic to be applied. Must inherit from
GenericHeuristicclass.descriptionA character vector describing the strategy operation.
configurationOptional customized configuration parameters for the strategy. Must inherited from
StrategyConfigurationabstract class.
Method getDescription()
The function is used to obtain the description of the strategy.
Usage
GenericClusteringStrategy$getDescription()
Returns
A character vector of NULL if not defined.
Method getHeuristic()
The function returns the heuristic applied for the clustering strategy.
Usage
GenericClusteringStrategy$getHeuristic()
Returns
An object inherited from GenericClusteringStrategy
class.
Method getConfiguration()
The function returns the configuration parameters used to perform the clustering strategy.
Usage
GenericClusteringStrategy$getConfiguration()
Returns
An object inherited from StrategyConfiguration
class.
Method getBestClusterDistribution()
The function obtains the best clustering distribution.
Usage
GenericClusteringStrategy$getBestClusterDistribution()
Returns
A list of clusters. Each list element represents a feature group.
Method getUnclustered()
The function is used to return the features that cannot be clustered due to incompatibilities with the used heuristic.
Usage
GenericClusteringStrategy$getUnclustered()
Returns
A character vector containing the unclassified features.
Method execute()
Abstract function responsible of performing the clustering
strategy over the defined Subset.
Usage
GenericClusteringStrategy$execute(verbose, ...)
Arguments
verboseA logical value to specify if more verbosity is needed.
...Further arguments passed down to
executefunction.
Method getDistribution()
Abstract function used to obtain the set of features following an specific clustering distribution.
Usage
GenericClusteringStrategy$getDistribution( num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
Returns
A list with the features comprising an specific clustering distribution.
Method createTrain()
Abstract function in charge of creating a
Trainset object for training purposes.
Usage
GenericClusteringStrategy$createTrain( subset, num.cluster = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
Method plot()
Abstract function responsible of creating a plot to visualize the clustering distribution.
Usage
GenericClusteringStrategy$plot(dir.path = NULL, file.name = NULL, ...)
Arguments
dir.pathAn optional character argument to define the name of the directory where the exported plot will be saved. If not defined, the file path will be automatically assigned to the current working directory, '
getwd()'.file.nameThe name of the PDF file where the plot is exported.
...Further arguments passed down to
executefunction.
Method saveCSV()
Abstract function to save the clustering distribution to a CSV file.
Usage
GenericClusteringStrategy$saveCSV(dir.path, name, num.clusters = NULL)
Arguments
dir.pathThe name of the directory to save the CSV file.
nameDefines the name of the CSV file.
num.clustersAn optional parameter to select the number of clusters to be saved. If not defined, all clusters will be saved.
Method clone()
The objects of this class are cloneable with this method.
Usage
GenericClusteringStrategy$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Abstract Feature Clustering heuristic object.
Description
Abstract class used as a template to define new customized clustering heuristics.
Details
The GenericHeuristic is an archetype class so it cannot be instantiated.
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
GenericHeuristic$new()
Method heuristic()
Function used to implement the clustering heuristic.
Usage
GenericHeuristic$heuristic(col1, col2, column.names = NULL, ...)
Arguments
Returns
A numeric vector of length 1.
Method clone()
The objects of this class are cloneable with this method.
Usage
GenericHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Abstract class for defining model fitting method.
Description
Template to create a recipe or
formula objects used in model training stage.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
GenericModelFit$new()
Method createFormula()
The function is responsible of creating a
formula for M.L. model.
Usage
GenericModelFit$createFormula(instances, class.name, simplify = TRUE)
Arguments
instancesA data.frame containing the instances used to create the recipe.
class.nameA character vector representing the name of the target class.
simplifyA logical argument defining whether the formula should be generated as simple as possible.
Returns
A formula object.
Method createRecipe()
The function is responsible of creating a
recipe for M.L. model.
Usage
GenericModelFit$createRecipe(instances, class.name)
Arguments
instancesA data.frame containing the instances used to create the recipe.
class.nameA character vector representing the name of the target class.
Returns
A object of class recipe.
Method clone()
The objects of this class are cloneable with this method.
Usage
GenericModelFit$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Pseudo-abstract class for creating feature clustering plots.
Description
The GenericPlot implements a basic plot.
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
GenericPlot$new()
Method plot()
Implements a generic plot to visualize basic feature-clustering data.
Usage
GenericPlot$plot(summary)
Arguments
summaryA data.frame comprising the elements to be plotted.
Method clone()
The objects of this class are cloneable with this method.
Usage
GenericPlot$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
High Dimensional Dataset handler.
Description
Creates a high dimensional dataset object. Only the required instances are loaded in memory to avoid unnecessary of resources and memory.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
HDDataset$new( filepath, header = TRUE, sep = ",", skip = 0, normalize.names = FALSE, ignore.columns = NULL )
Arguments
filepathThe name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, '
getwd()'.headerA logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: '
header' is set to 'TRUE' if and only if the first row contains one fewer field than the number of columns.sepThe field separator character. Values on each line of the file are separated by this character.
skipDefines the number of header lines should be skipped.
normalize.namesA logical value indicating whether the columns names should be automatically renamed to ensure R compatibility.
ignore.columnsSpecify the columns from the input file that should be ignored.
Method getColumnNames()
Gets the name of the columns comprising the dataset
Usage
HDDataset$getColumnNames()
Returns
A character vector with the name of each column.
Method getNcol()
Obtains the number of columns present in the dataset.
Usage
HDDataset$getNcol()
Returns
An integer of length 1 or NULL
Method createSubset()
Creates a blinded HDSubset for classification purposes.
Usage
HDDataset$createSubset(column.id = FALSE, chunk.size = 1e+05)
Arguments
Returns
A HDSubset object.
See Also
Dataset, HDSubset,
DatasetLoader
High Dimensional Subset handler.
Description
Creates a high dimensional subset from a HDDataset
object. Only the required instances are loaded in memory to avoid unnecessary
use of resources and memory.
Details
Use HDDataset to ensure the creation of a valid
HDSubset object.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
HDSubset$new( file.path, feature.names, feature.id, start.at = 0, sep = ",", chunk.size )
Arguments
file.pathThe name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, '
getwd()'.feature.namesA character vector specifying the name of the features that should be included in the
HDDatasetobject.feature.idAn integer or character indicating the column (number or name respectively) identifier. Default NULL value is valid ignores defining a identification column.
start.atA numeric value to identify the reading start position.
septhe field separator character. Values on each line of the file are separated by this character.
chunk.sizean integer value indicating the size of chunks taken over each iteration. By default chunk.size is defined as 10000.
Method getColumnNames()
Gets the name of the columns comprising the subset.
Usage
HDSubset$getColumnNames()
Returns
A character vector containing the name of each column.
Method getNcol()
Obtains the number of columns present in the dataset.
Usage
HDSubset$getNcol()
Returns
A numeric value or 0 if is empty.
Method getID()
Obtains the column identifier.
Usage
HDSubset$getID()
Returns
A character vector of size 1.
Method getIterator()
Creates the FIterator object.
Usage
HDSubset$getIterator(chunk.size = private$chunk.size, verbose = FALSE)
Arguments
Returns
A FIterator object to transverse through
HDSubset instances
Method isBlinded()
Checks if the subset contains a target class.
Usage
HDSubset$isBlinded()
Returns
A logical to specify if the subset contains a target class or not.
Method clone()
The objects of this class are cloneable with this method.
Usage
HDSubset$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Feature-clustering based on InformationGain methodology.
Description
Performs the feature-clustering using entropy-based filters.
Super class
D2MCS::GenericHeuristic -> InformationGainHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
InformationGainHeuristic$new()
Method heuristic()
The algorithm find weights of discrete attributes basing on
their correlation with continuous class attribute. Particularly
Information Gain uses H(Class) + H(Attribute) - H(Class, Attribute)
Usage
InformationGainHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
InformationGainHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the Kappa Cohen value.
Description
Cohen's Kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories.
Details
\kappa \hspace{0.1cm} is \hspace{0.1cm} equivalent
\hspace{0.1cm} to \hspace{0.1cm} (p_o - p_e) / (1 - p_e) = 1 - (1 - p_0) /
(1 - p_e)
Super class
D2MCS::MeasureFunction -> Kappa
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Kappa$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixused as basis to compute the performance.
Method compute()
The function computes the Kappa achieved by the M.L. model.
Usage
Kappa$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute theKappameasure.
Details
This function is automatically invoked by the ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
Kappa$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Feature-clustering based on Kendall Correlation Test.
Description
Performs the feature-clustering using Kendall correlation tests.
Details
The method estimate the association between paired samples and compute a test of the value being zero. They use different measures of association, all in the range [-1, 1] with 0 indicating no association. Method valid only for bi-class problems.
Super class
D2MCS::GenericHeuristic -> KendallHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
KendallHeuristic$new()
Method heuristic()
Test for association between paired samples using Kendall's tau value.
Usage
KendallHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
a numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
KendallHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the Matthews correlation coefficient.
Description
The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between -1 and +1.
Details
MCC = (TP × (TN - FP) × FN)/(\sqrt{(TP + FP) × (TP + FN) × (TN + FP) × (TN + FN)})
Super class
D2MCS::MeasureFunction -> MCC
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
MCC$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter used as basis to compute theMCCmeasure.
Method compute()
The function computes the MCC achieved by the M.L. model.
Usage
MCC$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute theMCCmeasure.
Details
This function is automatically invoke by the ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
MCC$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Feature-clustering based on Matthews Correlation Coefficient score.
Description
Performs the feature-clustering using MCC score. Valid for both bi-class and multi-class problems
Super class
D2MCS::GenericHeuristic -> MCCHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
MCCHeuristic$new()
Method heuristic()
Calculates the Matthews correlation Coefficient (MCC) score.
Usage
MCCHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
MCCHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Archetype to define customized measures.
Description
Abstract class used as a template to define new M.L. performance measures.
Details
The GenericHeuristic is an full-abstract class so it cannot
be instantiated. To ensure the proper operation, compute method is
automatically invoke by D2MCS framework when needed.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
MeasureFunction$new(performance = NULL)
Arguments
performanceAn optional
ConfMatrixparameter to define the type of object used to compute the measure.
Method compute()
The function implements the metric used to measure the performance achieved by the M.L. model.
Usage
MeasureFunction$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used to compute the measure.
Details
This function is automatically invoke by the D2MCS
framework.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
MeasureFunction$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Abstract class to compute the probability prediction based on combination between metrics.
Description
Abstract class used as a template to define new customized strategies to combine the probability predictions made by different metrics.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Methodology$new(required.metrics)
Arguments
required.metricsA character vector of length greater than 2 with the name of the required metrics.
Method getRequiredMetrics()
The function returns the required metrics that will participate in the methodology to compute a metric based on all of them.
Usage
Methodology$getRequiredMetrics()
Returns
A character vector of length greater than 2 with the name of the required metrics.
Method compute()
Function to compute the probability of the final prediction based on different metrics.
Usage
Methodology$compute(raw.pred, prob.pred, positive.class, negative.class)
Arguments
raw.predA character list of length greater than 2 with the class value of the predictions made by the metrics.
prob.predA numeric list of length greater than 2 with the probability of the predictions made by the metrics.
positive.classA character with the value of the positive class.
negative.classA character with the value of the negative class.
Returns
A numeric value indicating the probability of the instance is predicted as positive class.
Method clone()
The objects of this class are cloneable with this method.
Usage
Methodology$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Combined metric strategy to minimize FN errors.
Description
Calculates if the positive class is the predicted one in any of the metrics, otherwise, the instance is not considered to have the positive class associated.
Super class
D2MCS::CombinedMetrics -> MinimizeFN
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
MinimizeFN$new(required.metrics = c("MCC", "PPV"))Arguments
required.metricsA character vector of length 1 with the name of the required metrics.
Method getFinalPrediction()
Function to obtain the final prediction based on different metrics.
Usage
MinimizeFN$getFinalPrediction( raw.pred, prob.pred, positive.class, negative.class )
Arguments
raw.predA character list of length greater than 2 with the class value of the predictions made by the metrics.
prob.predA numeric list of length greater than 2 with the probability of the predictions made by the metrics.
positive.classA character with the value of the positive class.
negative.classA character with the value of the negative class.
Returns
A logical value indicating if the instance is predicted as positive class or not.
Method clone()
The objects of this class are cloneable with this method.
Usage
MinimizeFN$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Combined metric strategy to minimize FP errors.
Description
Calculates if the positive class is the predicted one in all metrics, otherwise, the instance is not considered to have the positive class associated.
Super class
D2MCS::CombinedMetrics -> MinimizeFP
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
MinimizeFP$new(required.metrics = c("MCC", "PPV"))Arguments
required.metricsA character vector of length greater than 2 with the name of the required metrics.
Method getFinalPrediction()
Function to obtain the final prediction based on different metrics.
Usage
MinimizeFP$getFinalPrediction( raw.pred, prob.pred, positive.class, negative.class )
Arguments
raw.predA character list of length greater than 2 with the class value of the predictions made by the metrics.
prob.predA numeric list of length greater than 2 with the probability of the predictions made by the metrics.
positive.classA character with the value of the positive class.
negative.classA character with the value of the negative class.
Returns
A logical value indicating if the instance is predicted as positive class or not.
Method clone()
The objects of this class are cloneable with this method.
Usage
MinimizeFP$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Stores a previously trained M.L. model.
Description
Encapsulates and handles all the information and operations associated with a M.L. model.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Model$new(dir.path, model)
Arguments
dir.pathThe location were the executed models will be saved.
modelA
Modelobject.
Method isTrained()
The function is used to determine is the model has been already trained.
Usage
Model$isTrained()
Returns
A logical value. TRUE if the model has been trained and FALSE otherwise.
Method getDir()
The function returns the location path of the specific model.
Usage
Model$getDir()
Returns
A character vector specifying the location of the model.
Method getName()
The function is used to obtain the name of the model.
Usage
Model$getName()
Returns
A character vector with the name of the model.
Method getFamily()
The function gets the family of the model.
Usage
Model$getFamily()
Returns
A character vector representing the family of the ML model.
Method getDescription()
The function allows obtaining the description associated with an specific ML model.
Usage
Model$getDescription()
Returns
A character vector with the model description.
Method train()
The function is responsible of performing model training operation.
Usage
Model$train(train.set, fitting, trFunction, metric, logs)
Arguments
train.setA data.frame with the data used for training the model.
fittingThe model fitting formula. Must inherit from
GenericModelFitclass.trFunctionAn object inherited from
TrainFunctionused to define how the training acts.metricA character vector containing the metrics used to optimized model parameters.
logsA character vector containing the path to store the error logs.
Method getTrainedModel()
The function allows obtaining the trained model.
Usage
Model$getTrainedModel()
Returns
A train class.
Method getExecutionTime()
The function is used to compute the time taken to perform training operation.
Usage
Model$getExecutionTime()
Returns
A numeric vector with length 1.
Method getPerformance()
The function obtains the performance achieved by the model during training stage.
Usage
Model$getPerformance(metric = private$metric)
Arguments
metricA character used to specify the measure used to compute the performance.
Returns
A numeric value with the performance achieved.
Method getConfiguration()
The function is used to get the configuration parameters achieved by the ML model after the training stage.
Usage
Model$getConfiguration()
Returns
A list object with the configuration parameters.
Method save()
The function is responsible of saving the model to disc into a RDS file.
Usage
Model$save(replace = TRUE)
Arguments
Method remove()
The function is used to delete a model from disc.
Usage
Model$remove()
Method clone()
The objects of this class are cloneable with this method.
Usage
Model$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Feature-clustering based on Mutual Information Computation theory.
Description
Performs the feature-clustering using MCC score. Valid for both bi-class and multi-class problems. Only valid for bi-class problems.
Super class
D2MCS::GenericHeuristic -> MultinformationHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
MultinformationHeuristic$new()
Method heuristic()
Mutinformation takes two random variables as input and computes the mutual information in nats according to the entropy estimator method.
Usage
MultinformationHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
col1A vector/factor denoting a random variable or a data.frame denoting a random vector where columns contain variables/features and rows contain outcomes/samples.
col2An another random variable or random vector (vector/factor or data.frame).
column.namesAn optional character vector with the names of both columns.
Returns
Returns the mutual information I(X;Y) in nats.
Method clone()
The objects of this class are cloneable with this method.
Usage
MultinformationHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the Negative Predictive Value.
Description
Negative Predictive Values are the proportions of negative results in statistics and diagnostic tests that are true negative results.
Details
NPV = TN / (TN + FN)
Super class
D2MCS::MeasureFunction -> NPV
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
NPV$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute theNPVmeasure.
Method compute()
The function computes the NPV achieved by the M.L. model.
Usage
NPV$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the NPV measure.
Details
This function is automatically invoke by the ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
NPV$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Compute performance across resamples.
Description
Computes the performance across resamples when class probabilities cannot be computed.
Super class
D2MCS::SummaryFunction -> NoProbability
Methods
Public methods
Inherited methods
Method new()
The function defined during runtime the usage of five measures: 'Kappa', 'Accuracy', 'TCR_9', 'MCC' and 'PPV'.
Usage
NoProbability$new()
Method execute()
The function computes the performance across resamples using the previously defined measures.
Usage
NoProbability$execute(data, lev = NULL, model = NULL)
Arguments
dataA data.frame containing the data used to compute the performance.
levAn optional value used to define the levels of the target class.
modelAn optional value used to define the M.L. model used.
Returns
A vector of performance estimates.
Method clone()
The objects of this class are cloneable with this method.
Usage
NoProbability$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Feature-clustering based on Odds Ratio measure.
Description
Performs the feature-clustering using Odds Ratio methodology. Valid only for bi-class problems.
Super class
D2MCS::GenericHeuristic -> OddsRatioHeuristic
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
OddsRatioHeuristic$new()
Method heuristic()
Calculates the Odds Ratio method.
Usage
OddsRatioHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
OddsRatioHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the Positive Predictive Value.
Description
Positive Predictive Values are the proportions of positive results in statistics and diagnostic tests that are true positive results.
Details
PPV = TP / (TP + FP)
Super class
D2MCS::MeasureFunction -> PPV
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
PPV$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the PPV measure.
Method compute()
The function computes the PPV achieved by the M.L. model.
Usage
PPV$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the PPV measure.
Details
This function is automatically invoke by the ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
PPV$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Feature-clustering based on Pearson Correlation Test.
Description
Performs the feature-clustering using Pearson correlation tests. Valid for both, bi-class and multi-class problems.
Details
The test statistic is based on Pearson's product moment correlation coefficient cor(x, y) and follows a t distribution with length(x)-2 degrees of freedom if the samples follow independent normal distributions. If there are at least 4 complete pairs of observation, an asymptotic confidence interval is given based on Fisher's Z transform.
Super class
D2MCS::GenericHeuristic -> PearsonHeuristic
Methods
Public methods
Method new()
Creates a PearsonHeuristic object.
Usage
PearsonHeuristic$new()
Method heuristic()
Test for association between paired samples using Pearson test.
Usage
PearsonHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
PearsonHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the Precision Value.
Description
Precision is the fraction of relevant instances among the retrieved instances
Details
precision = TP / (TP + FP)
Super class
D2MCS::MeasureFunction -> Precision
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Precision$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the measure.
Method compute()
The function computes the Precision achieved by the M.L. model.
Usage
Precision$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the Precision measure.
Details
This function is automatically invoke by the ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
Precision$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Manages the prediction computed for a specific model.
Description
Allows to obtain predictions from the data provided using a pre-trained model.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Prediction$new(model, feature.id = NULL)
Arguments
Method execute()
Calculates predictions of the values passed by parameters using the corresponding model.
Usage
Prediction$execute(pred.values, class.values, positive.class)
Arguments
pred.valuesA data.frame containing the values to predict.
class.valuesA vector containing the class values.
positive.classA character value containing the positive class.
Method getPrediction()
The function is used to return the prediction values computed.
Usage
Prediction$getPrediction(type = NULL, target = NULL)
Arguments
Returns
A data.frame with the computed prediction.
Method getModelName()
Gets the model name.
Usage
Prediction$getModelName()
Returns
The character value of model value.
Method getModelPerformance()
Gets the performance of the model.
Usage
Prediction$getModelPerformance()
Returns
The numeric value of the model's performance.
Method clone()
The objects of this class are cloneable with this method.
Usage
Prediction$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Encapsulates the achieved predictions.
Description
The class used to encapsulates all the computed predictions to facilitate their access and maintenance.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
PredictionOutput$new(predictions, type, target)
Arguments
predictionstypeA character to define which type of predictions should be returned. If not defined all type of probabilities will be returned. Conversely if "prob" or "raw" is defined then computed 'probabilistic' or 'class' values are returned.
targetA character defining the value of the positive class.
Method getPredictions()
The function returns the final predictions.
Usage
PredictionOutput$getPredictions()
Returns
A list containing the final predictions or NULL if classification stage was not successfully performed.
Method getType()
The function returns the type of prediction should be returned. If "prob" or "raw" is defined then computed 'probabilistic' or 'class' values are returned.
Usage
PredictionOutput$getType()
Returns
A character value.
Method getTarget()
The function returns the value of the target class.
Usage
PredictionOutput$getTarget()
Returns
A character value.
Method clone()
The objects of this class are cloneable with this method.
Usage
PredictionOutput$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Implementation of Probabilistic Average voting.
Description
Computes the final prediction by performing the mean value of the probability achieved by each prediction.
Super class
D2MCS::SimpleVoting -> ProbAverageVoting
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ProbAverageVoting$new(cutoff = 0.5, class.tie = NULL, majority.class = NULL)
Arguments
cutoffA character vector defining the minimum probability used to perform a positive classification. If is not defined, 0.5 will be used as default value.
class.tieA character used to define the target class value used when a tie is found. If NULL positive class value will be assigned.
majority.classA character defining the value of the majority class. If NULL will be used same value as training stage.
Method getMajorityClass()
The function returns the value of the majority class.
Usage
ProbAverageVoting$getMajorityClass()
Returns
A character vector of length 1 with the name of the majority class.
Method getClassTie()
The function gets the class value assigned to solve ties.
Usage
ProbAverageVoting$getClassTie()
Returns
A character vector of length 1.
Method execute()
The function implements the majority voting procedure.
Usage
ProbAverageVoting$execute(predictions, verbose = FALSE)
Arguments
predictionsA
ClusterPredictionsobject containing all the predictions achieved for each cluster.verboseA logical value to specify if more verbosity is needed.
Method clone()
The objects of this class are cloneable with this method.
Usage
ProbAverageVoting$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, ClassMajorityVoting,
ClassWeightedVoting, ProbAverageVoting,
ProbAverageWeightedVoting, ProbBasedMethodology
Implementation of Probabilistic Average Weighted voting.
Description
Computes the final prediction by performing the weighted mean of the probability achieved by each cluster prediction. By default, weight values are consistent with the performance value achieved by the best M.L. model on each cluster.
Super class
D2MCS::SimpleVoting -> ProbAverageWeightedVoting
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ProbAverageWeightedVoting$new(cutoff = 0.5, class.tie = NULL, weights = NULL)
Arguments
cutoffA character vector defining the minimum probability used to perform a positive classification. If is not defined, 0.5 will be used as default value.
class.tieA character used to define the target class value used when a tie is found. If NULL positive class value will be assigned.
weightsA numeric vector with the weights of each cluster. If NULL performance achieved during training will be used as default.
Method getClassTie()
The function gets the class value assigned to solve ties.
Usage
ProbAverageWeightedVoting$getClassTie()
Returns
A character vector of length 1.
Method getWeights()
The function returns the value of the majority class.
Usage
ProbAverageWeightedVoting$getWeights()
Returns
A character vector of length 1 with the name of the majority class.
Method setWeights()
The function allows changing the value of the weights.
Usage
ProbAverageWeightedVoting$setWeights(weights)
Arguments
weightsA numeric vector containing the new weights.
Method execute()
The function implements the cluster-weighted probabilistic voting procedure.
Usage
ProbAverageWeightedVoting$execute(predictions, verbose = FALSE)
Arguments
predictionsA
ClusterPredictionsobject containing all the predictions achieved for each cluster.verboseA logical value to specify if more verbosity is needed.
Method clone()
The objects of this class are cloneable with this method.
Usage
ProbAverageWeightedVoting$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, ClassMajorityVoting,
ClassWeightedVoting, ProbAverageVoting,
ProbAverageWeightedVoting, ProbBasedMethodology
Methodology to obtain the combination of the probability of different metrics.
Description
Calculates the mean of the probabilities of the different metrics.
Super class
D2MCS::Methodology -> ProbBasedMethodology
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
ProbBasedMethodology$new(required.metrics = c("MCC", "PPV"))Arguments
required.metricsA character vector of length greater than 2 with the name of the required metrics.
Method compute()
Function to compute the probability of the final prediction based on different metrics.
Usage
ProbBasedMethodology$compute( raw.pred, prob.pred, positive.class, negative.class )
Arguments
raw.predA character list of length greater than 2 with the class value of the predictions made by the metrics.
prob.predA numeric list of length greater than 2 with the probability of the predictions made by the metrics.
positive.classA character with the value of the positive class.
negative.classA character with the value of the negative class.
Returns
A numeric value indicating the probability of the instance is predicted as positive class.
Method clone()
The objects of this class are cloneable with this method.
Usage
ProbBasedMethodology$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the Recall Value.
Description
Recall (also known as sensitivity) is the fraction of the total amount of relevant instances that were actually retrieved.
Details
recall = TP / (TP + FN)
Super class
D2MCS::MeasureFunction -> Recall
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Recall$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the measure.
Method compute()
The function computes the Recall achieved by the M.L. model.
Usage
Recall$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the Recall measure.
Details
This function is automatically invoke by the
ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
Recall$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Computes the Sensitivity Value.
Description
Sensitivity is a measure of the proportion of actual positive cases that got predicted as positive (or true positive).
Details
Sensitivity = TP / (TP + FN)
Super class
D2MCS::MeasureFunction -> Sensitivity
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Sensitivity$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute theSensitivitymeasure.
Method compute()
The function computes the Sensitivity achieved by the M.L. model.
Usage
Sensitivity$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the Sensitivity measure.
Details
This function is automatically invoke by the ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
Sensitivity$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Simple feature clustering strategy.
Description
Features are sorted by descendant according to the relevance value obtained after applying an specific heuristic. Next, features are distributed into N clusters following a card-dealing methodology. Finally best distribution is assigned to the distribution having highest homogeneity.
Details
The strategy is suitable for all features that are valid for the indicated heuristics. Invalid features are automatically grouped into a specific cluster named as 'unclustered'.
Super class
D2MCS::GenericClusteringStrategy -> SimpleStrategy
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
SimpleStrategy$new( subset, heuristic, configuration = StrategyConfiguration$new() )
Arguments
subsetThe
Subsetused to apply the feature-clustering strategy.heuristicThe heuristic used to compute the relevance of each feature. Must inherit from
GenericHeuristicabstract class.configurationOptional parameter to customize configuration parameters for the strategy. Must inherited from
StrategyConfigurationabstract class.
Method execute()
Function responsible of performing the clustering
strategy over the defined Subset.
Usage
SimpleStrategy$execute(verbose = FALSE)
Arguments
verboseA logical value to specify if more verbosity is needed.
Method getBestClusterDistribution()
The function obtains the best clustering distribution.
Usage
SimpleStrategy$getBestClusterDistribution()
Returns
A list of clusters. Each list element represents a feature group.
Method getUnclustered()
The function is used to return the features that cannot be clustered due to incompatibilities with the used heuristic.
Usage
SimpleStrategy$getUnclustered()
Returns
A character vector containing the unclassified features.
Method getDistribution()
Function used to obtain a specific cluster distribution.
Usage
SimpleStrategy$getDistribution( num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
Returns
A list with the features comprising an specific clustering distribution.
Method createTrain()
The function is used to create a Trainset
object from a specific clustering distribution.
Usage
SimpleStrategy$createTrain( subset, num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
subsetThe
Subsetobject used as a basis to create the train set (seeTrainsetclass).num.clustersA numeric value to select the number of clusters (define the distribution).
num.groupsA single or numeric vector value to identify a specific group that forms the clustering distribution.
include.unclusteredA logical value to determine if unclustered features should be included.
Details
If num.clusters and num.groups are not defined,
best clustering distribution is used to create the train set.
Returns
A Trainset object.
Method plot()
The function is responsible for creating a plot to visualize the clustering distribution.
Usage
SimpleStrategy$plot(dir.path = NULL, file.name = NULL)
Arguments
dir.pathAn optional argument to define the name of the directory where the exported plot will be saved. If not defined, the file path will be automatically assigned to the current working directory, '
getwd()'.file.nameA character to define the name of the PDF file where the plot is exported.
Method saveCSV()
The function is used to save the clustering distribution to a CSV file.
Usage
SimpleStrategy$saveCSV(dir.path, name = NULL, num.clusters = NULL)
Arguments
dir.pathThe name of the directory to save the CSV file.
nameDefines the name of the CSV file.
num.clustersAn optional parameter to select the number of clusters to be saved. If not defined, all cluster distributions will be saved.
Method clone()
The objects of this class are cloneable with this method.
Usage
SimpleStrategy$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
GenericClusteringStrategy,
StrategyConfiguration
Abtract class to define simple voting schemes.
Description
Abstract class used as a template to define new customized simple voting schemes.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
SimpleVoting$new(cutoff = NULL)
Arguments
cutoffA character vector defining the minimum probability used to perform a positive classification. If is not defined, 0.5 will be used as default value.
Method getCutoff()
The function obtains the minimum probabilistic value used to perform a positive classification.
Usage
SimpleVoting$getCutoff()
Returns
A numeric value.
Method getFinalPred()
The function is used to return the prediction values computed by a voting strategy.
Usage
SimpleVoting$getFinalPred(type = NULL, target = NULL, filter = NULL)
Arguments
typeA character to define which type of predictions should be returned. If not defined all type of probabilities will be returned. Conversely if 'prob' or 'raw' is defined then computed 'probabilistic' or 'class' values are returned.
targetA character defining the value of the positive class.
filterA logical value used to specify if only predictions matching the target value should be returned or not. If TRUE the function returns only the predictions matching the target value. Conversely if FALSE (by default) the function returns all the predictions.
Returns
A FinalPred object.
Method execute()
Abstract function used to implement the operation of the voting scheme.
Usage
SimpleVoting$execute(predictions, verbose = FALSE)
Arguments
predictionsA
ClusterPredictionsobject containing all the predictions achieved for each cluster.verboseA logical value to specify if more verbosity is needed.
Method clone()
The objects of this class are cloneable with this method.
Usage
SimpleVoting$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, ClassMajorityVoting,
ClassWeightedVoting, ProbAverageVoting,
ProbAverageWeightedVoting, ProbBasedMethodology,
CombinedVoting
Manages the execution of Simple Votings.
Description
The class is responsible of initializing and executing voting schemes. Additionally, to ensure a proper operation, the class automatically checks the compatibility of defined voting schemes.
Super class
D2MCS::VotingStrategy -> SingleVoting
Methods
Public methods
Inherited methods
Method new()
The function initializes the object arguments during runtime.
Usage
SingleVoting$new(voting.schemes, metrics)
Arguments
voting.schemesA vector of voting schemes inheriting from
SimpleVotingclass.metricsA list containing the metrics used as basis to perform the voting strategy.
Method execute()
The function is used to execute all the previously defined (and compatible) voting schemes.
Usage
SingleVoting$execute(predictions, verbose = FALSE)
Arguments
predictionsA
ClusterPredictionsobject containing all the predictions computed in the classification stage.verboseA logical value to specify if more verbosity is needed.
Method clone()
The objects of this class are cloneable with this method.
Usage
SingleVoting$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
D2MCS, SimpleVoting,
CombinedVoting
Feature-clustering based on Spearman Correlation Test.
Description
Performs the feature-clustering using Spearman's rho statistic.
Details
Spearman's rho statistic is to estimate a rank-based measure of association. These tests may be used if the data do not necessarily come from a bivariate normal distribution.
Super class
D2MCS::GenericHeuristic -> SpearmanHeuristic
Methods
Public methods
Method new()
Creates a SpearmanHeuristic object.
Usage
SpearmanHeuristic$new()
Method heuristic()
Test for correlation between paired samples using Spearman rho statistic.
Usage
SpearmanHeuristic$heuristic(col1, col2, column.names = NULL)
Arguments
Returns
A numeric vector of length 1 or NA if an error occurs.
Method clone()
The objects of this class are cloneable with this method.
Usage
SpearmanHeuristic$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the Specificity Value.
Description
Specificity is defined as the proportion of actual negatives, which got predicted as the negative (or true negative). This implies that there will be another proportion of actual negative, which got predicted as positive and could be termed as false positives.
Details
Specificity = True Negative / (True Negative + False Positive)
Super class
D2MCS::MeasureFunction -> Specificity
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Specificity$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the measure.
Method compute()
The function computes the Specificity achieved by the M.L. model.
Usage
Specificity$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the Specificity measure.
Details
This function is automatically invoke by the
ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
Specificity$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Default Strategy Configuration handler.
Description
Define default configuration parameters for the clustering strategies.
Details
The StrategyConfiguration can be used to define the
default configuration parameters for a feature clustering strategy or as an
archetype to define new customized parameters.
Methods
Public methods
Method new()
Empty function used to initialize the object arguments in runtime.
Usage
StrategyConfiguration$new()
Method minNumClusters()
Function used to return the minimum number of clusters distributions used. By default the minimum is set in 2.
Usage
StrategyConfiguration$minNumClusters(...)
Arguments
...Further arguments passed down to
minNumClustersfunction.
Returns
A numeric vector of length 1.
Method maxNumClusters()
The function is responsible of returning the maximum number of cluster distributions used. By default the maximum number is set in 50.
Usage
StrategyConfiguration$maxNumClusters(...)
Arguments
...Further arguments passed down to
maxNumClustersfunction.
Returns
A numeric vector of length 1.
Method clone()
The objects of this class are cloneable with this method.
Usage
StrategyConfiguration$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
DependencyBasedStrategyConfiguration
Classification set.
Description
The Subset is used for testing or classification
purposes. If a target class is defined the Subset can be used
as test and classification, otherwise the Subset only
classification is compatible.
Details
Use Dataset to ensure the creation of a valid
Subset object.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Subset$new( dataset, class.index = NULL, class.values = NULL, positive.class = NULL, feature.id = NULL )
Arguments
datasetA fully filled data.frame.
class.indexA numeric value identifying the column representing the target class
class.valuesA character vector containing all the values of the target class.
positive.classA character value representing the positive class value.
feature.idA numeric value specifying the column number used as identifier.
Method getColumnNames()
Get the name of the columns comprising the subset.
Usage
Subset$getColumnNames()
Returns
A character vector containing the name of each column.
Method getFeatures()
Gets the values of all features or those indicated by arguments.
Usage
Subset$getFeatures(feature.names = NULL)
Arguments
feature.namesA character vector comprising the name of the features to be obtained.
Returns
A character vector or NULL if subset is empty.
Method getID()
Gets the column name used as identifier.
Usage
Subset$getID()
Returns
A character vector of size 1 of NULL if column id is not defined.
Method getIterator()
Creates the DIterator object.
Usage
Subset$getIterator(chunk.size = private$chunk.size, verbose = FALSE)
Arguments
Returns
A DIterator object to transverse through
Subset instances.
Method getClassValues()
Gets all the values of the target class.
Usage
Subset$getClassValues()
Returns
A factor vector with all the values of the target class.
Method getClassBalance()
The function is used to compute the ratio of each class
value in the Subset.
Usage
Subset$getClassBalance(target.value = NULL)
Arguments
target.valueThe class value used as reference to perform the comparison.
Returns
A numeric value.
Method getClassIndex()
The function is used to obtain the index of the column containing the target class.
Usage
Subset$getClassIndex()
Returns
A numeric value.
Method getClassName()
The function is used to specify the name of the column containing the target class.
Usage
Subset$getClassName()
Returns
A character value.
Method getNcol()
The function is in charge of obtaining the number of columns
comprising the Subset. See ncol for more
information.
Usage
Subset$getNcol()
Returns
An integer of length 1 or NULL.
Method getNrow()
The function is used to determine the number of rows present
in the Subset. See nrow for more information.
Usage
Subset$getNrow()
Returns
An integer of length 1 or NULL.
Method getPositiveClass()
The function returns the value of the positive class.
Usage
Subset$getPositiveClass()
Returns
A character vector of size 1 or NULL if not defined.
Method isBlinded()
The function is used to check if the Subset contains a target class.
Usage
Subset$isBlinded()
Returns
A logical value where TRUE represents the absence of target class and FALSE its presence.
See Also
Dataset, DatasetLoader,
Trainset
Abstract class to computing performance across resamples.
Description
Abstract used as template to define customized metrics to compute model performance during train.
Details
This class is an archetype, so it cannot be instantiated.
Methods
Public methods
Method new()
The function carries out the initialization of parameters during runtime.
Usage
SummaryFunction$new(measures)
Arguments
measuresA character vector with the measures used.
Method execute()
Abstract function used to implement the performance
calculator method. To guarantee a proper operation, this method is
automatically invoked by D2MCS framework.
Usage
SummaryFunction$execute()
Method getMeasures()
The function obtains the measures used to compute the performance across resamples.
Usage
SummaryFunction$getMeasures()
Returns
A character vector of NULL if measures are not defined.
Method clone()
The objects of this class are cloneable with this method.
Usage
SummaryFunction$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Computes the True Negative value.
Description
This is the number of individuals with a negative condition for which the test result is negative. The value entered here must be non-negative.
Super class
D2MCS::MeasureFunction -> TN
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
TN$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used to compute the TN measure.
Method compute()
The function computes the TN achieved by the M.L. model.
Usage
TN$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the TN measure.
Details
This function is automatically invoke by the
ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
TN$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Computes the True Positive Value.
Description
TP is the number of individuals with a positive condition for which the test result is positive. The value entered here must be non-negative.
Super class
D2MCS::MeasureFunction -> TP
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
TP$new(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used to compute the measure.
Method compute()
The function computes the TP achieved by the M.L. model.
Usage
TP$compute(performance.output = NULL)
Arguments
performance.outputAn optional
ConfMatrixparameter to define the type of object used as basis to compute the TP measure.
Details
This function is automatically invoke by the
ClassificationOutput object.
Returns
A numeric vector of size 1 or NULL if an error occurred.
Method clone()
The objects of this class are cloneable with this method.
Usage
TP$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
MeasureFunction, ClassificationOutput,
ConfMatrix
Control parameters for train stage.
Description
Abstract class used as template to define customized functions to control the computational nuances of train function.
Methods
Public methods
Method new()
Function used to initialize the object parameters during execution time.
Usage
TrainFunction$new( method, number, savePredictions, classProbs, allowParallel, verboseIter, seed )
Arguments
methodThe resampling method: "boot", "boot632", "optimism_boot", "boot_all", "cv", "repeatedcv", "LOOCV", "LGOCV" (for repeated training/test splits), "none" (only fits one model to the entire training set), "oob" (only for random forest, bagged trees, bagged earth, bagged flexible discriminant analysis, or conditional tree forest models), timeslice, "adaptive_cv", "adaptive_boot" or "adaptive_LGOCV"
numberEither the number of folds or number of resampling iterations
savePredictionsAn indicator of how much of the hold-out predictions for each resample should be saved. Values can be either "all", "final", or "none". A logical value can also be used that convert to "all" (for true) or "none" (for false). "final" saves the predictions for the optimal tuning parameters.
classProbsA logical value. Should class probabilities be computed for classification models (along with predicted values) in each resample?
allowParallelA logical value. If a parallel backend is loaded and available, should the function use it?
verboseIterA logical for printing a training log.
seedAn optional integer that will be used to set the seed during model training stage.
Method create()
Creates a trainControl requires for the
training stage.
Usage
TrainFunction$create(summaryFunction, search.method = "grid", class.probs)
Arguments
summaryFunctionAn object inherited from
SummaryFunctionclass.search.methodEither "grid" or "random", describing how the tuning parameter grid is determined.
class.probsA logical indicating if class probabilities should be computed for classification models (along with predicted values) in each resample.
Method getResamplingMethod()
Returns the resampling method used during training staged.
Usage
TrainFunction$getResamplingMethod()
Returns
A character vector or length 1 or NULL if not defined.
Method getNumberFolds()
Returns the number or folds or number of iterations used during training.
Usage
TrainFunction$getNumberFolds()
Returns
An integer vector or length 1 or NULL if not defined.
Method getSavePredictions()
Indicates if the predictions for each resample should be saved.
Usage
TrainFunction$getSavePredictions()
Returns
A logical value or NULL if not defined.
Method getClassProbs()
Indicates if class probabilities should be computed for classification models in each resample.
Usage
TrainFunction$getClassProbs()
Returns
A logical value.
Method getAllowParallel()
Determines if model training is performed in parallel.
Usage
TrainFunction$getAllowParallel()
Returns
A logical value. TRUE indicates parallelization is enabled and FALSE otherwise.
Method getVerboseIter()
Determines if training log should be printed.
Usage
TrainFunction$getVerboseIter()
Returns
A logical value. TRUE indicates training log is enabled and FALSE otherwise.
Method getTrFunction()
Function used to return the
trainControl object.
Usage
TrainFunction$getTrFunction()
Returns
A trainControl object.
Method getMeasures()
Returns the measures used to optimize model hyperparameters.
Usage
TrainFunction$getMeasures()
Returns
A character vector.
Method getType()
Obtains the type of classification problem ("Bi-class" or "Multi-class").
Usage
TrainFunction$getType()
Returns
A character vector with length 1. Either "Bi-class" or "Multi-class".
Method getSeed()
Indicates seed used during model training stage.
Usage
TrainFunction$getSeed()
Returns
An integer value or NULL if not defined.
Method setSummaryFunction()
Function used to change the SummaryFunction
used in the training stage.
Usage
TrainFunction$setSummaryFunction(summaryFunction)
Arguments
summaryFunctionAn object inherited from
SummaryFunctionclass.
Method setClassProbs()
The function allows changing the class computation capabilities.
Usage
TrainFunction$setClassProbs(class.probs)
Arguments
class.probsA logical indicating if class probabilities should be computed for classification models (along with predicted values) in each resample
Method clone()
The objects of this class are cloneable with this method.
Usage
TrainFunction$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Stores the results achieved during training.
Description
This class manages the results achieved during training stage (such as optimized hyperparameters, model information, utilized metrics).
Methods
Public methods
Method new()
Function used to initialize the object arguments during runtime.
Usage
TrainOutput$new(models, class.values, positive.class)
Arguments
Method getModels()
The function is used to obtain the best M.L. model of each cluster.
Usage
TrainOutput$getModels(metric)
Arguments
metricA character vector which specifies the metric(s) used for configuring M.L. hyperparameters.
Returns
A list is returned of class train.
Method getPerformance()
The function returns the performance value of M.L. models during training stage.
Usage
TrainOutput$getPerformance(metrics = NULL)
Arguments
metricsA character vector which specifies the metric(s) used to train the M.L. models.
Returns
A character vector containing the metrics used for configuring M.L. hyperparameters.
Method savePerformance()
The function is used to save into CSV file the performance achieved by the M.L. models during training stage.
Usage
TrainOutput$savePerformance(dir.path, metrics = NULL)
Arguments
dir.pathThe location to store the into a CSV file the performance of the trained M.L.
metricsAn optional parameter specifying the metric(s) used to train the M.L. models. If not defined, all the metrics used in train stage will be saved.
Method plot()
The function is responsible for creating a plot to visualize the performance achieved by the best M.L. model on each cluster.
Usage
TrainOutput$plot(dir.path, metrics = NULL)
Arguments
dir.pathThe location to store the exported plot will be saved.
metricsAn optional parameter specifying the metric(s) used to train the M.L. models. If not defined, all the metrics used in train stage will be plotted.
Method getMetrics()
The function returns all metrics used for configuring M.L. hyperparameters during train stage.
Usage
TrainOutput$getMetrics()
Returns
A character value.
Method getClassValues()
The function is used to get the values of the target class.
Usage
TrainOutput$getClassValues()
Returns
A character containing the values of the target class.
Method getPositiveClass()
The function returns the value of the positive class.
Usage
TrainOutput$getPositiveClass()
Returns
A character vector of size 1.
Method getSize()
The function is used to get the number of the trained M.L. models. Each cluster contains the best M.L. model.
Usage
TrainOutput$getSize()
Returns
A numeric value or NULL training was not successfully performed.
Method clone()
The objects of this class are cloneable with this method.
Usage
TrainOutput$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Trainning set.
Description
The Trainset is used to perform training
operations over M.L. models. A target class should be defined to guarantee a
full compatibility with supervised models.
Details
Use Dataset object to ensure the creation of a valid
Trainset object.
Methods
Public methods
Method new()
Method for initializing the object arguments during runtime.
Usage
Trainset$new(cluster.dist, class.name, class.values, positive.class)
Arguments
cluster.distThe type of cluster distribution used as basis to build the
Trainset. SeeGenericClusteringStrategyfor more information.class.nameUsed to specify the name of the column containing the target class.
class.valuesSpecifies all the possible values of the target class.
positive.classA character with the value of the positive class.
Method getPositiveClass()
The function is used to obtain the value of the positive class.
Usage
Trainset$getPositiveClass()
Returns
A numeric value with the positive class value.
Method getClassName()
The function is used to return the name of the target class.
Usage
Trainset$getClassName()
Returns
A character vector with length 1.
Method getClassValues()
The function is used to compute all the possible target class values.
Usage
Trainset$getClassValues()
Returns
A factor value.
Method getColumnNames()
The function returns the name of the columns comprising an specific cluster distribution.
Usage
Trainset$getColumnNames(num.cluster)
Arguments
Returns
A character vector with all column names.
Method getFeatureValues()
The function returns the values of the columns comprising an specific cluster distribution. Target class is omitted.
Usage
Trainset$getFeatureValues(num.cluster)
Arguments
Returns
A data.frame with the values of the features comprising the selected cluster distribution.
Method getInstances()
The function returns the values of the columns comprising an specific cluster distribution. Target class is included as the last column.
Usage
Trainset$getInstances(num.cluster)
Arguments
Returns
A data.frame with the values of the features comprising the selected cluster distribution.
Method getNumClusters()
The function obtains the number of groups (clusters) that forms the cluster distribution.
Usage
Trainset$getNumClusters()
Returns
A numeric vector of size 1.
See Also
Dataset, DatasetLoader,
Subset, GenericClusteringStrategy
Control parameters for train stage (Bi-class problem).
Description
Implementation to control the computational nuances of train function for bi-class problems.
Super class
D2MCS::TrainFunction -> TwoClass
Methods
Public methods
Inherited methods
Method new()
Usage
TwoClass$new( method, number, savePredictions, classProbs, allowParallel, verboseIter, seed = NULL )
Arguments
methodThe resampling method: "boot", "boot632", "optimism_boot", "boot_all", "cv", "repeatedcv", "LOOCV", "LGOCV" (for repeated training/test splits), "none" (only fits one model to the entire training set), "oob" (only for random forest, bagged trees, bagged earth, bagged flexible discriminant analysis, or conditional tree forest models), timeslice, "adaptive_cv", "adaptive_boot" or "adaptive_LGOCV"
numberEither the number of folds or number of resampling iterations
savePredictionsAn indicator of how much of the hold-out predictions for each resample should be saved. Values can be either "all", "final", or "none". A logical value can also be used that convert to "all" (for true) or "none" (for false). "final" saves the predictions for the optimal tuning parameters.
classProbsA logical value. Should class probabilities be computed for classification models (along with predicted values) in each resample?
allowParallelA logical value. If a parallel backend is loaded and available, should the function use it?
verboseIterA logical for printing a training log.
seedAn optional integer that will be used to set the seed during model training stage.
Method create()
Creates a trainControl requires for the
training stage.
Usage
TwoClass$create(summaryFunction, search.method = "grid", class.probs = NULL)
Arguments
summaryFunctionAn object inherited from
SummaryFunctionclass.search.methodEither "grid" or "random", describing how the tuning parameter grid is determined.
class.probsA logical indicating if class probabilities should be computed for classification models (along with predicted values) in each resample
Method getTrFunction()
Function used to return the
trainControl object.
Usage
TwoClass$getTrFunction()
Returns
A trainControl object.
Method setClassProbs()
The function allows changing the class computation capabilities.
Usage
TwoClass$setClassProbs(class.probs)
Arguments
Method getMeasures()
Returns the measures used to optimize model hyperparameters.
Usage
TwoClass$getMeasures()
Returns
A character vector.
Method getType()
Obtains the type of classification problem ("Bi-class" or "Multi-class").
Usage
TwoClass$getType()
Returns
A character vector with "Bi-class" value.
Method setSummaryFunction()
Function used to change the SummaryFunction
used in the training stage.
Usage
TwoClass$setSummaryFunction(summaryFunction)
Arguments
summaryFunctionAn object inherited from
SummaryFunctionclass.
Method clone()
The objects of this class are cloneable with this method.
Usage
TwoClass$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Feature clustering strategy.
Description
Features are sorted by descendant according to the relevance value obtained after applying an specific heuristic. Next, features are distributed into N clusters following a card-dealing methodology. Finally best distribution is assigned to the distribution having highest homogeneity.
Details
The strategy is suitable only for binary and real features. Other features are automatically grouped into a specific cluster named as 'unclustered'.
Super class
D2MCS::GenericClusteringStrategy -> TypeBasedStrategy
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object arguments during runtime.
Usage
TypeBasedStrategy$new( subset, heuristic, configuration = StrategyConfiguration$new() )
Arguments
subsetThe
Subsetused to apply the feature-clustering strategy.heuristicThe heuristic used to compute the relevance of each feature. Must inherit from
GenericHeuristicabstract class.configurationOptional parameter to customize configuration parameters for the strategy. Must inherited from
StrategyConfigurationabstract class.
Method execute()
Function responsible of performing the clustering strategy
over the defined Subset.
Usage
TypeBasedStrategy$execute(verbose = FALSE)
Arguments
verboseA logical value to specify if more verbosity is needed.
Method getDistribution()
Function used to obtain a specific cluster distribution.
Usage
TypeBasedStrategy$getDistribution( num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
Returns
A list with the features comprising an specific clustering distribution.
Method createTrain()
The function is used to create a Trainset object from a specific clustering distribution.
Usage
TypeBasedStrategy$createTrain( subset, num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
subsetThe
Subsetobject used as a basis to create the train set (seeTrainsetclass).num.clustersA numeric value to select the number of clusters (define the distribution).
num.groupsA single or numeric vector value to identify a specific group that forms the clustering distribution.
include.unclusteredA logical value to determine if unclustered features should be included.
Details
If num.clusters and num.groups are not defined,
best clustering distribution is used to create the train set.
Returns
A Trainset object.
Method plot()
The function is responsible for creating a plot to visualize the clustering distribution.
Usage
TypeBasedStrategy$plot(dir.path = NULL, file.name = NULL)
Arguments
dir.pathAn optional character argument to define the name of the directory where the exported plot will be saved. If not defined, the file path will be automatically assigned to the current working directory, '
getwd()'.file.nameA character to define the name of the PDF file where the plot is exported.
Method saveCSV()
The function is used to save the clustering distribution to a CSV file.
Usage
TypeBasedStrategy$saveCSV(dir.path = NULL, name = NULL, num.clusters = NULL)
Arguments
dir.pathThe name of the directory to save the CSV file.
nameDefines the name of the CSV file.
num.clustersAn optional parameter to select the number of clusters to be saved. If not defined, all cluster distributions will be saved.
Method clone()
The objects of this class are cloneable with this method.
Usage
TypeBasedStrategy$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
GenericClusteringStrategy,
StrategyConfiguration
Compute performance across resamples.
Description
Computes the performance across resamples when class probabilities can be computed.
Super class
D2MCS::SummaryFunction -> UseProbability
Methods
Public methods
Inherited methods
Method new()
The function defined during runtime the usage of seven measures: 'ROC', 'Sens', 'Kappa', 'Accuracy', 'TCR_9', 'MCC' and 'PPV'.
Usage
UseProbability$new()
Method execute()
The function computes the performance across resamples using the previously defined measures.
Usage
UseProbability$execute(data, lev = NULL, model = NULL)
Arguments
dataA data.frame containing the data used to compute the performance.
levAn optional value used to define the levels of the target class.
modelAn optional value used to define the M.L. model used.
Returns
A vector of performance estimates.
Method clone()
The objects of this class are cloneable with this method.
Usage
UseProbability$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Voting Strategy template.
Description
Abstract class used to define new SingleVoting and
CombinedVoting schemes.
Methods
Public methods
Method new()
Abstract method used to initialize the object arguments during runtime.
Usage
VotingStrategy$new()
Method getVotingSchemes()
The function returns the voting schemes that will participate in the voting strategy.
Usage
VotingStrategy$getVotingSchemes()
Returns
A vector of object inheriting from VotingStrategy
class.
Method getMetrics()
The function is used to get the metric that will be used during the voting strategy.
Usage
VotingStrategy$getMetrics()
Returns
A character vector.
Method execute()
Abstract function used to implement the operation of the voting schemes.
Usage
VotingStrategy$execute(predictions, ...)
Arguments
predictionsA
ClusterPredictionsobject containing the prediction achieved for each cluster....Further arguments passed down to
executefunction.
Method getName()
The function returns the name of the voting scheme.
Usage
VotingStrategy$getName()
Returns
A character vector of size 1.
Method clone()
The objects of this class are cloneable with this method.
Usage
VotingStrategy$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.