final_point_estimate = "average"S3 class,
which makes internal code cleaner and facilitates simpler addition of
new predictiveness measures.extract_sampled_split_predictions is a vector, not a list.
This facilitates proper use in the new version of the package.truncate = FALSE in
vimp_cimeasure_avg_value (computes the average value and efficient
influence function) and updates to vim,
cv_vim, and sp_vim.method and family for weighted EIF
estimation within outer functions (vim,
cv_vim, sp_vim) rather than the
measure* functions. This allows compatibility for binary
outcomes.sp_vim that are necessary to compute
the test statisticsparallel argument to be specified for calls to
CV.SuperLearner but not for calls to
SuperLearnerZ in coarsened-data
settings; allow case-insensitive specification of covariate
names/positions when creating ZV defaults to 5 if no cross-fitting folds are specified
externallycross_fitted_f1 and
cross_fitted_f2 in cv_vimcross_fitted_f1 and
cross_fitted_f2 in cv_vimcv_vim handles an odd number of outer folds
being passed with pre-computed regression function estimates. Now, you
can use an odd number of folds (e.g., 5) to estimate the full and
reduced regression functions and still obtain cross-validated variable
importance estimates.vrc01 data as an exported objectvrc01 dataC to not be specified in
make_foldsNone
measure_auc to hew more
closely to ROCR and cvAUC, using computational
tricks to speed up weighted AUC and EIF computation.cross_fitted_se to cv_vim
and sp_vim; this logical option allows the standard error
to be estimated using cross-fitting. This can improve performance in
cases where flexible algorithms are used to estimate the full and
reduced regressions.vim and cv_vim; currently, this option is only
available for non-sampled-split calls (i.e., with
sample_splitting = FALSE)vim are based on the entire dataset, while the
full and reduced predictiveness (predictiveness_full and
predictiveness_reduced, along with the corresponding
confidence intervals) is evaluated using separate portions of the data
for the full and reduced regressions.sample_splitting to vim,
cv_vim and sp_vim; if FALSE,
sample-splitting is not used to estimate predictiveness. Note that we
recommend using the default, TRUE, in all cases, since
inference using sample_splitting = FALSE will be invalid
for variables with truly null variable importance.sample_splitting = TRUE to match more
closely with theoretical results (and improve power!). In this case, we
first split the data into \(2K\)
cross-fitting folds, and split these folds equally into two
sample-splitting folds. For the nuisance regression using all
covariates, for each \(k \in \{1, \ldots,
K\}\) we set aside the data in sample-splitting fold 1 and
cross-fitting fold \(k\) [this
comprises \(1 / (2K)\) of the data]. We
train using the remaining observations [comprising \((2K-1)/(2K)\) of the data] not in this
testing fold, and we test on the originally withheld data. We repeat for
the nuisance regression using the reduced set of covariates, but
withhold data in sample-splitting fold 2. This update affects both
cv_vim and sp_vim. If
sample_splitting = FALSE, then we use standard
cross-fitting.>= in computing the numerator of AUC with
inverse probability weightsroxygen2 documentation for wrappers
(vimp_*) to inherit parameters and details from
cv_vim (reduces potential for documentation
mismatches)None
family if it isn’t
specified; use stats::binomial() if there are only two
unique outcome values, otherwise use stats::gaussian()None
cvAUC)cvAUCNone
ipc_est_type (available in vim,
cv_vim, and sp_vim; also corresponding wrapper
functions for each VIM and corresponding internal estimation
functions)None
None
testthat/ to use glm
rather than xgboost (increases speed)glm rather than
xgboost or ranger (increases speed, even
though the regression is now misspecified for the truth)forcats from vignetteNone
measure_accuracy and
measure_auc for project-wide consistencytestthat/ to not explicitly load
xgboostNone
None
stats::qlogis and
stats::plogis rather than bespoke functionsNone
None
vimp
will handle the rest.vimp”run_regression = TRUE
for simplicityverbose to sp_vim; if
TRUE, messages are printed throughout fitting that display
progress and verbose is passed to
SuperLearnercv_predictiveness_point_est and
predictiveness_point_est to
est_predictiveness_cv and est_predictiveness,
respectivelycv_predictiveness_update,
cv_vimp_point_est, cv_vimp_update,
predictiveness_update, vimp_point_est,
vimp_update; this functionality is now in
est_predictiveness_cv and est_predictiveness
(for the *update* functions) or directly in
vim or cv_vim (for the *vimp*
functions)predictiveness_se and
predictiveness_ci (functionality is now in
vimp_se and vimp_ci, respectively)weights argument to ipc_weights,
clarifying that these weights are meant to be used as inverse
probability of coarsening (e.g., censoring) weightsAdded functions sp_vim, sample_subsets,
spvim_ics, spvim_se; these allow computation
of Shapely Population Variable Importance (SPVIM)
None
sp_vim and helper functions
run_sl, sample_subsets,
spvim_ics, spvim_se; these will be added in a
future releasecv_vim_nodonsker, since
cv_vim supersedes this functionsp_vim and helper functions
run_sl, sample_subsets,
spvim_ics, spvim_se; these functions allow
computation of the Shapley Population Variable Importance Measure
(SPVIM)cv_vim and vim now use an outer layer
of sample splitting for hypothesis testingvimp_auc,
vimp_accuracy, vimp_deviance,
vimp_rsquaredvimp_regression is now deprecated; use
vimp_anova insteadvim; each variable importance
function is now a wrapper function around vim with the
type argument filled incv_vim_nodonsker is now deprecated; use
cv_vim insteadvimp_anova)vimp_anova)None
gam package update by switching
library to SL.xgboost, SL.step, and
SL.meanNone
gam package update in unit testsNone
cv_vim andcv_vim_nodonsker now return the
cross-validation folds used within the functionNone
family for the top-level
SuperLearner if run_regression = TRUE; in call cases, the
second-stage SuperLearner uses a gaussian familySL.mean as the best-fitting
algorithm, the second-stage regression is now run using the original
outcome, rather than the first-stage fitted valuescv_vim_nodonsker, which computes the
cross-validated naive estimator and the update on the same, single,
validation fold. This does not allow for relaxation of the Donsker class
conditions.None
two_validation_set_cv, which sets up
folds for V-fold cross-validation with two validation sets per foldcv_vim: now, the
cross-validated naive estimator is computed on a first validation set,
while the update for the corrected estimator is computed using the
second validation set (both created from
two_validation_set_cv); this allows for relaxation of the
Donsker class conditions necessary for asymptotic convergence of the
corrected estimator, while making sure that the initial CV naive
estimator is not biased high (due to a higher R^2 on the training
data)None
None
cv_vim: now, the
cross-validated naive estimator is computed on the training data for
each fold, while the update for the corrected cross-validated estimator
is computed using the test data; this allows for relaxation of the
Donsker class conditions necessary for asymptotic convergence of the
corrected estimatorvim, replaced with
individual-parameter functionsvimp_regression to match Python
packagecv_vim now can compute regression estimatorsvimp_ci,
vimp_se, vimp_update,
onestep_based_estimatorNone
Bugfixes etc.