machine learning - Why does RSquared increase with # of folds in k-fold cross validation? -
i'm tuning model using k-fold cross-validation , noticed rsquared accuracy appears improve number of folds -- e.g. higher rsquared value when using 30 folds compared using 10 folds.
two questions hoping insight on:
- why occur?
- is there reason believe rsquared k=10 better estimate of model accuracy using k=30? or both unrelated future error rate can expect on unseen test set?
here's simple example of effect i'm referring to:
############### k = 10 ##################### > data(iris) > train_control <- traincontrol(method="repeatedcv", number=10, repeats=3) > train(sepal.length~.,data=iris,trcontrol=train_control,method="rf",metric="rsquared") random forest 150 samples 4 predictor no pre-processing resampling: cross-validated (10 fold, repeated 3 times) summary of sample sizes: 137, 135, 134, 134, 135, 136, ... resampling results across tuning parameters: mtry rmse rsquared rmse sd rsquared sd 2 0.3381065 0.8404534 0.07692415 0.07583768 3 0.3247406 0.8502577 0.07311807 0.07326181 5 0.3228651 0.8517740 0.07213958 0.07315720 ############### k = 30 ##################### > data(iris) > train_control <- traincontrol(method="repeatedcv", number=30, repeats=3) > train(sepal.length~.,data=iris,trcontrol=train_control,method="rf",metric="rsquared") random forest 150 samples 4 predictor no pre-processing resampling: cross-validated (30 fold, repeated 3 times) summary of sample sizes: 143, 145, 146, 144, 145, 144, ... resampling results across tuning parameters: mtry rmse rsquared rmse sd rsquared sd 2 0.3238545 0.8580474 0.10327919 0.1352787 3 0.3119541 0.8679321 0.09734168 0.1236307 5 0.3109572 0.8717550 0.09727307 0.1123173
bigger number of folds - bigger training set , smaller testing one. observe best result fo loo (n-fold n training samples) , worst k=2. there no 1 answer generic question how many folds use, solely depends on dataset. furthermore if there underlying relation between datapoints (for example come time series) important how set divided.
Comments
Post a Comment