Document Type

Honors Project On-Campus Access Only

Abstract

Since the training error tends to underestimate the true test error, an appropriate test error estimator is necessary to evaluate and select predictive learning models. Our research builds on previous results to compare the single bootstrap and k-fold cross-validation, with a wider variety of parameters underlying the data causal structure, learning models and test error estimators. Using simulated data from a causal graph, we compared cross-validation estimates and bootstrap estimates with the true test error for LASSO and random forest models in varied parameter settings. We found that bootstrap underestimates the test error for both models, while k-fold cross-validation underestimates the test error for LASSO and performs well for random forests.

Share

COinS
 
 

© Copyright is owned by author of this document