A short-cut function for doing cross-validation with the classifier is also provided.
The software is most suitable for analyzing the data with very high dimension, for example the diagnosis of cancer based on the gene expression data.
The original real-valued colon data of R format: colon.rda. The binary colon data of R format: colon.bin.rda. There are 62 patients (40 vs 22) and 2000 genes. They can be loaded into R workspace by using "load" function:
> load("colon.bin.rda")
Test how well the above method with leave-one-out crossvalidation:
>cv.bayes(colon.bin,T,62,4,0.4,30,0.8,5,30,T,40)
Results:
The result of above R command is shown by cv-colon-result. The error rate of above analysis is
0.0967742, i.e. 6 out 62 cases were misclassified. This is the lowest error
rate for Colon data (compared to the results collected by Prof. Tibshirani). We selected only 4 features out of 2000
for each iteration in cross-validation. Our method is also very fast, taking
totally 103 secs for 62 folds crossvalidation, which includes also the time for
feature selection. One more thing, our method is also pretty simple
conceptually.