Abstract

“Method execution in the SummarizedBenchmark framework is handled by the buildBench() function. Occassionally, errors occur when running several methods on a new data set. The approach to error handling implemented in the buildBench() function is described in this vignette along with options to disable error handling. SummarizedBenchmark package version: 2.3.4”

Introduction

When running a large benchmark study, not uncommonly, a single or a small subset of methods may fail during execution. This may be the result of misspecified parameters, an underlying bug in the software, or any number of other reasons. By default, errors thrown by methods which fail during buildBench() or updateBench() (see Feature: Iterative Benchmarking for details on updateBench()) are caught and handled in a user-friendly way. As long as a single method executes without any errors, a SummarizedBenchmark object is returned as usual, with the assay columns of failed methods set to NA. Additionally, the corresponding error messages are stored in the metadata of the object for reference.

Simple Case Study

library(SummarizedBenchmark)
library(magrittr)

As an example, consider the following example where we run case where we benchmark two simple methods. The first, slowMethod draws 5 random normal samples after waiting 5 seconds, and the second, fastMethod draws 5 random normal samples immediately. Each method is then passed through two post-processing functions, keepSlow and makeSlower, and keepFast and makeSlower, respectively. This results in three partially overlapping assays, keepSlow, keepFast and makeSlower. With this example, we also demonstrate how mismatched assays are handled across methods.

bdslow <- BenchDesign(data = tdat) %>%
    addMethod("slowMethod", function() { Sys.sleep(5); rnorm(5) },
              post = list(keepSlow = identity,
                          makeSlower = function(x) { Sys.sleep(5); x })) %>%
    addMethod("fastMethod", function() { rnorm(5) },
              post = list(keepFast = identity,
                          makeSlower = function(x) { Sys.sleep(5); x }))

We run these methods in parallel using parallel = TRUE and specify a timeout limit of 1 second for the BPPARAM. Naturally, slowMethod will fail, and fastMethod will fail during the makeSlower post-processing function.

bpp <- SerialParam()
bptimeout(bpp) <- 1

sbep <- buildBench(bdslow, parallel = TRUE, BPPARAM = bpp)
## !! error caught in buildBench !!
## !! error in main function of method: 'slowMethod'
## !!  original message: 
## !!  reached elapsed time limit
## !! error caught in buildBench !!
## !! error in method: 'fastMethod', post: 'makeSlower'
## !!  original message: 
## !!  reached elapsed time limit

Notice that during the execution process, errors caught by buildBench() are printed to the console along with the name of the failed method and post-processing function when appropriate.

We can verify that a valid SummarizedBenchmark object is still returned with the the remaining results.

## class: SummarizedBenchmark 
## dim: 5 2 
## metadata(1): sessions
## assays(3): keepFast keepSlow makeSlower
## rownames: NULL
## rowData names(3): keepFast keepSlow makeSlower
## colnames(2): slowMethod fastMethod
## colData names(4): func.pkg func.pkg.vers func.pkg.manual
##   session.idx

We can also check the values of the assays.

sapply(assayNames(sbep), assay, x = sbep, simplify = FALSE)
## $keepFast
##      slowMethod  fastMethod
## [1,]         NA  0.34710757
## [2,]         NA  0.89652630
## [3,]         NA  0.53201779
## [4,]         NA  0.19891399
## [5,]         NA -0.09981322
## 
## $keepSlow
##      slowMethod fastMethod
## [1,]         NA         NA
## [2,]         NA         NA
## [3,]         NA         NA
## [4,]         NA         NA
## [5,]         NA         NA
## 
## $makeSlower
##      slowMethod fastMethod
## [1,]         NA         NA
## [2,]         NA         NA
## [3,]         NA         NA
## [4,]         NA         NA
## [5,]         NA         NA

Notice that most columns contain only NA values. These columns correspond to both methods which returned errors, as well as methods missing post-processing functions, e.g. no keepSlow function was defined for the fastMethod method. While the NA values cannot be used to distinguish the sources of the NA values, this is documented in the sessions list of the SummarizedBenchmark metadata. While the sessions object is a list containing information for all previous sessions, we are only interested in the current, first session. (For more details on why multiple sessions may be run, see the Feature: Iterative Benchmarking vignette.)

names(metadata(sbep)$sessions[[1]])
## [1] "methods"     "results"     "parameters"  "sessionInfo"

In sessions, there is a "results" entry which includes a summary of the results for each combination of method and post-processing function (assay). The entries of results can take one of three values: "success", "missing", or an error message of class buildbench-error. The easiest way to view these resultsis by passing the results to the base R function, simplify2array().

##            slowMethod                   fastMethod                  
## keepSlow   "reached elapsed time limit" "reached elapsed time limit"
## makeSlower "reached elapsed time limit" "missing"                   
## keepFast   "missing"                    "success"

In the returned table, columns correspond to methods, and rows correspond to assays. We clearly see that many of the methods failed due to exceeding the specified time limit. If we check one of these entries more closesly, we see that it is indeed a buildbench-error object that occurred ("origin") during the "main" function.

## [1] "reached elapsed time limit"
## attr(,"class")
## [1] "buildbench-error"
## attr(,"origin")
## [1] "main"

Disable Handling

If this error handling is not wanted, and the user would like the benchmark experiment to terminate when an error is thrown, then optional parameter catchErrors = FALSE can be specified to eiher buildBench() or updateBench(). Generally, this is advised against as the outputs computed for all non-failing methods will also be lost. As a result, the entire benchmarking experiment will need to be re-executed.

References