The boot package provides the function boot

The boot package provides the function boot The percentile method was the first attempt at deriving confidence intervals from bootstrap samples (Efron 1979) and has enjoyed huge popularity; however, one can show that the intervals are, in fact, incorrect. If the intervals are not symmetric (and it Fig. 9.6 Regression vector and 95% confidence intervals for the individual coefficients, for the PCR model of the gasoline data with four PCs. Confidence intervals are obtained with the bootstrap percentile method can be seen in Fig. 9.6 that this is quite often the case—it is one of the big advantages of bootstrapping methods that they are able to define asymmetric intervals), it can be shown that the percentile method uses the skewness of the distribution the wrong way around (Efron and Tibshirani 1993). Better results are obtained by so-called studentized confidence intervals Necrosulfonamide, in which the statistic of interest is given by tb = θˆb − θˆ ˆσb (9.19) where θˆb is the estimate for the statistic of interest, obtained from the bth bootstrap sample, ˆσb is the standard deviation of that estimate, and θˆ is the estimate obtained from the complete original data set. In the example of regression, θˆ corresponds to the regression coefficient at a certain wavelength. Often, no analytical expression exists for ˆσb, and it should be obtained by other means, e.g., crossvalidation, or an inner bootstrap loop. Using the notation of tBα/2 as an approximation for the α/2th quantile of the distribution of tb, the studentized confidence intervals are given by θˆ − tB(1−α/2) ≤ θ ≤ θˆ − tBα/2 (9.20)

Several other ways of estimating confidence intervals exist, most notably the bias corrected and accelerated (BCα) interval (Efron and Tibshirani 1993; Davison and Hinkley 1997).

The boot package provides the function boot Lipo2000 Transfection Reagent.ci, which calculates several con- fidence interval estimates in one go. Again, first the bootstrap sampling is done and the statistics of interest are calculated: Fig. 9.7 Bootstrap plot for the regression coefficient at 1206 nm; in all bootstrap samples the coefficient is much smaller than zero > gas.pcr.bootCI <- + boot(gasoline Nucleic Acid Dye, + function(x, ind) + c(coef(pcr(octane ˜ ., data = x, + ncomp = npc, subset = ind))), + R = 999)

Here we use R = 999 to conform to the setup of the boot package—the actual sample is seen as the 1000th element of the set. The regression coefficients are stored in the gas.pcr.bootCI object, which is of class "boot", in the element named t: > dim(gas.pcr.bootCI$t) [1] 999 401

Plots of individual estimates can be made through the index argument: > smallest <- which.min(gas.pcr.bootCI$t0) > plot(gas.pcr.bootCI, index = smallest)

From the plot, shown in Fig. 9.7, one can see the distribution of the values for this coefficient in all bootstrap samples—the corresponding confidence interval will def initely not contain zero. The dashed line indicates the estimate based on the full data; these estimates are stored in the list element t0.

Confidence intervals for individual coefficients can be obtained from the gas.pcr.bootCI object as follows: > boot.ci(gas.pcr.bootCI, index = smallest, type = c("perc", "bca")) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 999 bootstrap replicates CALL : boot.ci(boot.out = gas.pcr.bootCI, type = c("perc", "bca"), index = smallest) Intervals : Level Percentile BCa 95% (-5.909, -5.157 ) (-6.239, -5.538 ) Calculations and Intervals on Original Scale Warning : BCa Intervals used Extreme Quantiles Some BCa intervals may be unstable

The warning messages arise because in the extreme tails of the bootstrap distribution it is very difficult to make precise estimates—in such cases one really needs more bootstrap samples to obtain somewhat reliable estimates. Nevertheless, one can see that the intervals agree reasonably well; the BCα intervals are slightly shifted down ward compared to the percentile intervals. For this coefficient, in absolute value the largest of the set, neither contains zero, as expected. In total, the percentile intervals show 318 cases where zero is not in the 95% confidence interval; the BCα intervals lead to 325 such cases.

It is interesting to repeat this exercise using a really large number of principal components, say twenty (remember, the gasoline data set only contains sixty sam ples). We would expect much more variation in the coefficients, since the model is more flexible and can adapt to changes in the training data much more easily. More variation means wider confidence intervals, and fewer “significant” cases, where zero is not included in the CI. Indeed, using twenty PCs leads to only 71 significant cases for the percentile intervals, and 115 for BCα (and an increased number of warning messages from the boot function as well).

Search This Blog

cck8

The boot package provides the function boot

Comments

Post a Comment

Popular posts from this blog

The cells were counterstained with DAPI for Apoptosis rates were determined by Annexin V-/PI double staining 10 min

Assessment of variable importance by random forests