Full article: Bayesian Modeling and Inference for One-Shot Experiments

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In one-shot experiments, units are subjected to varying levels of stimulus and their binary response (go/no-go) is recorded. Experimental data is used to estimate the “sensitivity function”, which characterizes the probability of a “go” for a given level of stimulus. We review the current GLM approaches to modeling and inference, and identify some deficiencies. To address these, we propose a novel Bayesian approach using an adjustable number of cubic splines, with physically-plausible smoothness, monotonicity, and tail constraints introduced through the prior distribution on the coefficients. Our approach runs “out of the box,” and in roughly the same time as the GLM approaches. We illustrate with two contrasting datasets, and show that our more flexible Bayesian approach gives different inferences to the GLM approaches for both the sensitivity function and its inverse.

Keywords:

1 Introduction

A detonator is given an energy stimulus, and the probability that the detonator fires (a “go”) is a function of the amount of energy. This is an example of a “single event go/no-go” test, also known as a “one-shot” test or a “sensitivity” test. We will use “one-shot” test, and “experiment” for a set of tests at specified values of the stimulus. “One-shot” and “single event” indicate that a unit can only be tested once, regardless of the outcome, because after the stimulus the unit is no longer factory-fresh. For explosive devices, there is also the cost of disposing of a “no-go” unit, which still has the potential to “go”. Thus, one-shot experiments tend to be expensive, and there is a preference not to do tests at low stimulus values.

There is a large literature on one-shot experiments; see, for example, Langlie (Citation1963), Wetherill (Citation1963), Young and Easterling (Citation1994), Neyer (Citation1994), Dror and Steinberg (Citation2008), and Wu and Tian (Citation2014). Although much of the literature concerns explosive devices, we prefer to write “stimulus” and “go” rather than, say, “energy” and “fire,” to reflect that there are also plenty of other applications. In ecotoxicology, for example, the stimulus is the concentration of the chemical, and “go” is the survival of an organism, or some other endpoint such as growth or reproduction (see, e.g., Hickey and Hart Citation2013, sec. 14.2.1).

Denote the probability of a “go” as a function of the stimulus the “sensitivity function.” Broadly, the purpose of one-shot experiments is to characterize the sensitivity function. Sometimes the focus is the center of the sensitivity function. For example, in ecotoxicology one quantity of interest is the EC₅₀, the concentration at which 50% of the organisms survive. But sometimes the focus is deep in the tails. For example, evaluation of the No-Fire Threshold (NFT) of a detonator requires an uncertainty assessment of the inverse sensitivity function at p = 0.001, which is a stimulus far lower than those used in the tests.

Section 2 reviews current approaches, which fall into three overlapping categories. We show that these approaches have deficiencies, and we claim that a Bayesian approach can address these, but at a potentially higher cost, both in terms of code complexity and compute time. Section 3 presents our novel Bayesian approach. Section 4 is a short discussion about uncertainty assessment. Section 5 illustrates with two contrasting datasets, and Section 6 concludes with a brief summary and an outline of current research directions. The Markov chain Monte Carlo (MCMC) sampler for the Bayesian posterior distribution is described in the supplementary material.

2 Review

Let $x \mapsto p (x)$ be the sensitivity function of a specified type of unit, where x is the stimulus, and p(x) is the probability of “go”. In an ideal world the sensitivity function would be a step-function, with a transition at some critical stimulus, but in reality it is an increasing function, where the difference between p(x) and 0 or 1 represents the influence of many uncontrolled factors. These include miscalibration of the stimulus, the profile of the delivered stimulus (which may ramp up over a short time-interval), manufacturing variations in the unit, exposure to contaminants during emplacement, environmental conditions such as temperature and humidity, “shake, rattle and roll,” and so on.

We will use a parametric model for the sensitivity function. Interest in one-shot experiments often focuses on the inverse of the sensitivity function, which Wetherill (Citation1963) in an early review refers to as the “quantal response function,” $q (\cdot) : = p^{- 1} (\cdot)$ , which we shorten to “quantal function.” The quantal function comes for free, with a parametric model for the sensitivity function, and therefore this section and the next focus on the sensitivity function.

2.1 Current GLM-based Approaches

Consider this parametric model for the sensitivity function: (1a) $p (x) = Φ (x; μ, σ^{2}),$ (1a) where $Φ (\cdot)$ is the Gaussian distribution function with mean and variance arguments. Another way to express this model is (1b) $p (x) = Φ (\frac{x - μ}{σ}; 0, 1) = Φ_{S} (α_{0} + α_{1} x)$ (1b) where $Φ_{S} (\cdot)$ is the standard Gaussian distribution function, and $α_{0} : = - μ / σ, α_{1} : = 1 / σ$ . When attached to a Binomial likelihood function, this is a Binomial Generalized Linear Model (GLM) with a Probit link function; see (6). This is the standard model for one-shot experiments, with either a Probit or a Logit link function (see, e.g., Teller et al. Citation2016).

Unfortunately, (1b) has the feature that p(x) > 0 for $x \leq 0$ , which is nonphysical. This would be inappropriate for any analysis of the sensitivity function at low values of the stimulus, and hard to defend in the presence of alternatives. The obvious alternative to (1) is (2a) $p (x) = Φ_{S} (α_{0} + α_{1} \log (x)),$ (2a) for which $\lim_{x ↓ 0} p (x) = 0$ . Reversing the argument in (1), (2b) $p (x) = Φ (\log (x); μ, σ^{2}) = L (x; μ, σ^{2}),$ (2b) where “ $L$ ” is the distribution function of the Lognormal distribution. So the problem with (1) can be fixed simply by using $\log (x)$ instead of x, and the result is equivalent to using a Lognormal link function in x, and fitting over $(μ, σ)$ .

So, to summarize, there are three approaches for parametric models of the sensitivity function, which overlap:

GLM-lin: Binomial GLM in x, (1b);
GLM-log: Binomial GLM in $\log (x)$ , (2a);
Distribution approach: the link function is the quantile function for a nonnegative random variable, like (2b).

The GLM approaches can use a standard GLM algorithm such as glm in the R statistical programming environment (R Core Team Citation2020). The Distribution approach requires a numerical optimization to fit the distribution parameters by Maximum Likelihood, unless it happens to overlap with the GLM-log approach, like the Lognormal overlaps with GLM-log and a Probit link function. Apart from the Lognormal, other obvious choices of two-parameter distribution would be the Gamma, Weibull, and Gompertz.

The second and third approaches respect the condition p(x) = 0 for $x \leq 0$ . For completeness, note that this condition can be generalized to p(x) = 0 for $x \leq x_{0}$ simply by replacing x with $x - x_{0}$ .

Finally, it is important to recognize that there is one situation where GLM-lin and GLM-log will give effectively the same fit. Let $x_{mid}$ be some central value for the test stimuli, and write $x = x_{mid} + δ$ . If $| δ | ≪ x_{mid}$ , then (3) $\begin{matrix} α_{0} + α_{1} \log (x) & = α_{0} + α_{1} \log {x_{mid} (1 + δ / x_{mid})} \\ \approx α_{0} + α_{1} \log (x_{mid}) + α_{1} (δ / x_{mid}) \\ = α_{0}^{'} + α_{1}^{'} x \end{matrix}$ (3) where $α_{0}^{'} : = α_{0} + α_{1} {\log (x_{mid}) - 1}$ and $α_{1}^{'} : = α_{1} / x_{mid}$ . It is useful to operationalize this property for the whole dataset, albeit heuristically.

Let x_u be the largest “no-go” stimulus, and $x_{l}$ be the smallest “go” stimulus, and let $x_{mid}$ be their midpoint. The interval $[x_{l}, x_{u}]$ is the “overlap,” assuming, as is usual in a designed experiment, that $x_{l} < x_{u}$ , so that the overlap is the interval of stimulus values containing both “no-go’s” and “go’s”. A fitted GLM will tend to set $p (x) \approx 0$ for $x < x_{l}$ , and $p (x) \approx 1$ for $x > x_{u}$ , these being the maximum likelihood values of a fully flexible model: crudely, the overlap is where the fitting occurs. The approximation in (3) relies on $\log (1 + v) \approx v$ , which holds to two significant figures for $| v | < 1 / 8$ . Therefore, the twin conditions $\frac{x_{mid} - x_{l}}{x_{mid}} < \frac{1}{8}, \frac{x_{u} - x_{mid}}{x_{mid}} < \frac{1}{8}$ place the overlap inside the region where the approximation in (3) applies. Equivalently, (4) $\frac{x_{u} - x_{l}}{x_{u} + x_{l}} < \frac{1}{8},$ (4) which we call the “overlap condition,” with the term on the left being the “overlap ratio.” If the overlap condition holds, then the fitted sensitivity functions in GLM-lin and GLM-log will be effectively the same.

The overlap condition often holds in practice, which probably explains why GLM-lin persists, despite being nonphysical: it is “effectively physical.” However, being nonphysical matters whenever very small values of the stimulus are of interest, as will be shown in Section 5. Since it costs nothing to replace x with $\log (x)$ in a GLM, we strongly deprecate the use of GLM-lin in practice. After all, if the overlap condition holds then the fitted GLM-lin and GLM-log are effectively the same, and if the overlap condition does not hold then the fitted GLM-lin is potentially nonphysical.

2.2 Limitations in these Approaches

These different approaches highlight an important distinction when it comes to generalizations. We do not believe that two parameters are sufficient to capture the complexity of the sensitivity function, over the full range of the stimulus. However, uncontrolled factors blur the sensitivity function in experiments, and affect the unit in usage. What is estimated from an experiment is that part of the sensitivity function which emerges from the blur, and this might be quite low-dimensional, especially in small experiments. But to insist that it is two-dimensional is very prescriptive. In general it would be better to allow the number of parameters to be larger and adjustable. This in turn requires approaches that can be generalized to allow for an adjustable number of parameters.

The GLM approaches can easily be generalized to more than two parameters simply by including more regressors, for example (5) $p (x) = Φ_{S} (α_{0} + α_{1} \log (x) + α_{2} {\log (x)}^{2})$ (5) for GLM-log. However, the resulting estimated sensitivity function is not necessarily increasing. This could be enforced with a careful choice of regressors and sign constraints on the coefficients, but this would no longer be a standard GLM, and would require more complicated methods with less-well-understood properties (Pya and Wood Citation2015).

The Distribution approach will ensure that the sensitivity function is increasing, but the pool of distribution functions with an adjustable number of parameters is poorly-defined. For example, there are three-parameter generalizations of the Gamma and Weibull distributions, but no obvious further extensions. Of course we could confect such distributions, but they would be largely arbitrary, and intractable.

So both approaches are deficient, if we want a parametric sensitivity function which is both increasing and has an adjustable number of parameters.

These approaches have another deficiency too, which is uncertainty assessment. Formal characterizations of the sensitivity function are expressed with a specified level of confidence. For example, the NATO definition of the No-Fire Threshold (NFT) is the value x_L for which $(x_{L}, \infty)$ is a 95% confidence interval for $q (0.001)$ , known as a lower one-sided 95% confidence interval for $q (0.001)$ ; see (STANAG 4560 Citation2016, p. 4). The NFT requires confidence intervals for stimuli which are far lower than those used in the experiment, which are typically around p = 0.5.

Unfortunately, there are no practical algorithms for producing accurate confidence intervals at very low values of the stimulus. Thus, the coverage of practical algorithms which produce a lower one-sided nominal 95% confidence interval for $q (0.001)$ cannot be guaranteed to be at least 95%. The accuracy of practical algorithms such as the delta method (Casella and Berger Citation2002, p. 240), likelihood ratios (Neyer Citation1992), or the bootstrap (Efron and Hastie Citation2016, chap. 10 and 11) is only guaranteed under regularity conditions and as the number of tests goes to infinity. Even where the regularity conditions hold, convergence in the tails of the quantal function is likely to be slow. A typical one-shot experiment might involve tens of tests, which seems far too low to trust an asymptotic result in the tail of the quantal function. So a value x_L can be computed, for the NFT, but the true coverage of the algorithm which produces x_L is poorly known.

This pessimistic assessment is supported by the empirical study of Young and Easterling (Citation1994), who find that both point estimates and standard asymptotic confidence intervals perform poorly for $q (0.99)$ and $q (0.999)$ , for sample sizes up to n = 50.

The need for accurate uncertainty assessment in the tails of the sensitivity function favors a Bayesian approach to estimation and prediction. This will produce an effectively exact credible interval for the sensitivity function at any stimulus. Steinberg, Dror, and Villa (Citation2014) also advocate a Bayesian parametric approach, for a variety of reasons including uncertainty assessment. It is a moot point whether a user such as a regulator would prefer an approximate confidence interval or an exact credible interval; we are content to provide an approach for the latter, to widen the pool of options.

A Bayesian approach can also meet the other needs discussed above: (i) p(x) = 0 for $x \leq 0$ , (ii) $p (\cdot)$ increasing, and (iii) $p (\cdot)$ parameterized by an adjustable number of parameters. In fact, as we show in Section 3.2, a Bayesian approach can generalize the condition that p(x) = 0 for $x \leq 0$ . The GLM-log and Distribution approaches have p(x) > 0 for x > 0. That is, they “lift off” at x = 0. The Bayesian approach allows liftoff to happen at some unknown value which may be substantially greater than zero. The additional requirement to specify a prior distribution for the parameters should not diminish the appeal of a Bayesian approach, because a flexible model with a flexible prior distribution is far less prescriptive than the two-parameter models that are currently used.

The issue with a Bayesian approach is that it can be time-consuming if a sampling approach like Markov chain Monte Carlo (MCMC) is required. Overall, there is no clear winner, but we believe the arguments above favor a Bayesian approach if the technical and computational cost can be managed, or if the application justifies a high cost. The next section presents a Bayesian approach, to widen the set of options available to the analyst.

Finally, if we were required to provide just one justification for favoring a Bayesian approach over the GLM approaches, it would be the flexibility of the Bayesian approach to depart from symmetry. GLM-lin is symmetric by construction, since the Gaussian distribution is symmetric. Then GLM-log is often nearly symmetric, because, as shown in Section 2.1, the predictions of GLM-log and GLM-lin will be effectively the same when the overlap condition holds. But there is seldom anything in the physics, chemistry, biology, engineering, or other aspects of the application, to justify this symmetry. The same argument applies to the Distribution approach, which also imposes a very tight relationship between the lower and the upper half of the sensitivity function.

3 Bayesian Approach

This section presents our novel Bayesian approach for analyzing datasets from one-shot experiments. We are not the first authors to propose a Bayesian approach, per se. For example, Dror and Steinberg (Citation2008), Steinberg, Dror, and Villa (Citation2014), and Teller et al. (Citation2016) pursue an approach to experimental design in which the GLM-lin model with a Probit link function is treated in a Bayesian fashion with a prior distribution on $(α_{0}, α_{1})$ , rather than estimated through the usual GLM machinery. This is advantageous in experimental design, where expert judgement can provide an informative prior distribution. The critical feature of our approach is its functional flexibility, as summarized in the final paragraph of the previous section. We show below that this imposes a high level of complexity on the parameter space, which requires a Bayesian inference to manage the uncertainty assessment. So when we write “Bayesian approach,” it is this flexibility that is the crucial feature, not simply the presence of a prior distribution on the parameters.

Furthermore, there are other approaches to Bayesian isotonic regression, such as Neelon and Dunson (Citation2004). Our approach is different from these approaches because of the need to respect physically-plausible features of sensitivity functions. Additionally, we are sensitive to two user needs. First, to provide defensible assessments of uncertainty in the tails of the sensitivity function and the quantal function, despite possibly a small number of tests. Second, not to substantially increase user effort over the current GLM approaches.

3.1 The Model

Start with the GLM-lin approach described in the previous section. The statistical model is (6) $\begin{matrix} Y_{i} \sim Bin (n_{i}, p (x_{i})), & i = 1, \dots, m \\ g (p (x)) = α_{0} + α_{1} x, \end{matrix}$ (6) where “ $Bin$ ” is a Binomial distribution with size and probability arguments, and $g (\cdot)$ is a link function, like the quantile function of the standard Gaussian distribution (Probit). We have chosen, without loss of generality, to group the tests within m stimulus bins, taken to be closed on the left. In this case, n_i is the number of tests which fall within the ith stimulus bin, Y_i is the number of “go’s”, and x_i is the midpoint of the bin.

Denote the zero-point of the stimulus as x₀: the value at which it is practically impossible for a “go” to occur, possibly the physical lower limit of the stimulus values. Usually this would be $x_{0} = 0$ , but we write x₀ rather than 0 for clarity and generality.

In our model we require the link function $g (\cdot)$ to satisfy $g^{- 1} (v) = 0 for all v \leq 0,$ that is, to have a hard lower bound, and we replace $α_{0} + α_{1} x$ with a more flexible function, but still smooth, continuous, and increasing: (7) $h (x; α) : = \sum_{j = 1}^{k} α_{j} b_{j} (x) = α^{T} b (x)$ (7) where the $b_{j} (\cdot)$ are cubic B-spline basis functions on an evenly-spaced set of knot locations (see, e.g., Wood Citation2017, sec. 5.3.3). The use of splines requires specified lower and upper bounds for x, denoted $\underline{x}$ and $\bar{x}$ . These bounds delimit the range of stimulus values of interest, where $x_{i} \in [\underline{x}, \bar{x}]$ for each i, and $x_{0} \leq \underline{x}$ . The condition (8) $α_{1} \leq α_{2} \leq \dots \leq α_{k}$ (8) is sufficient for $h (\cdot)$ in (7) to be increasing; see (Wood Citation2017, sec. 5.3.6) or (17). The sufficiency of this condition holds whether or not the knots are evenly-spaced, and it would be an option to use an uneven knot spacing in $[\underline{x}, \bar{x}]$ . However, it is simpler to have evenly-spaced knots which are implicitly defined by the range $[\underline{x}, \bar{x}]$ and the number k.

For later reference, cubic B-splines have k + 4 knots in total, here denoted $x^{(j)}$ , for which $\underline{x} = x^{(4)}, \bar{x} = x^{(k + 1)}$ , so that the knot spacing is (9) $Δ : = \frac{\bar{x} - \underline{x}}{k - 3} .$ (9)

The jth cubic spline $b_{j} (\cdot)$ is zero for $x \leq x^{(j)}$ and for $x \geq x^{(j + 4)}$ , and strictly positive and symmetric between $x^{(j)}$ and $x^{(j + 4)}$ . Therefore, for any x, either three or four basis functions are strictly positive: four in general, but three at the knots, including $\underline{x}$ and $\bar{x}$ . See (Wood Citation2017, sec. 5.3.3) for a summary, or de Boor (Citation2001) for full details. shows the basis functions on $[0, 1]$ with k = 5.

Fig. 1 Cubic spline basis functions on $[0, 1]$ with k = 5. The 9 knots are shown as crosses on the horizontal axis; the knot spacing is $Δ = 0.5$ . The five basis functions on $[0, 1]$ are shown as solid lines, while their continuations outside $[0, 1]$ are shown as dashed lines. At most values of $x \in [0, 1]$ there are four nonzero basis functions, but at the knots, including the two endpoints, there are only three.

Finally in this section, we suggest default choices for $[\underline{x}, \bar{x}]$ , k, and $g (\cdot)$ . The range $[\underline{x}, \bar{x}]$ is usually determined by the application. Well-known experimental design approaches, like Langlie (Citation1963), take such a range as their starting-point. But where the user is reluctant to provide a range, a “pretty” range is easy to construct based on the test stimulus values, possibly extending down to x₀.

For the number of basis functions we reason as follows. Current practice is satisfied with two parameters: α₀ and α₁ in (6). As discussed in Section 2, this is too restrictive. On the other hand, we do not want make k huge, because there is only a limited amount of information in a one-shot experiment. We have experimented with using a variety of k values and selecting one using an information criterion (WAIC, see Gelman et al. Citation2014, sec. 7.2), but the results were not compelling across the wide range of experimental datasets encountered in practice. Therefore, our starting-point is $k \leftarrow 5$ , which is modestly larger than current practice.

However, if x₀ is not too far below $\underline{x}$ , then it seems sensible to set $\underline{x} \leftarrow x_{0}$ so that the model “sees” x₀, and increase k to keep the density of knots about the same. Specifically, compute Δ from (9) with k = 5. If x₀ is closer to $\underline{x}$ than $\underline{x}$ is to $\bar{x}$ , then set $\underline{x} \leftarrow x_{0}$ , and increase k to keep the knot spacing about the same: (10) $k \leftarrow ⌈ \frac{\bar{x} - \underline{x}}{Δ} ⌉ + 3.$ (10) where $⌈ v ⌉$ is the smallest integer not smaller than v. The user may want to select a smaller or larger k, depending on the number of tests in the experimental dataset, and also examine sensitivity to k for quantities of interest, such as low quantals.

For the link function we reason as follows. Current practice is satisfied with a symmetric sigmoidal link function (Probit). Therefore, we will also use a symmetric sigmoidal link function, although the resulting sensitivity function will not be symmetric, because $h (x; α)$ is not an affine function of x. We use the quantile function of a $Beta (2, 2)$ distribution, (11) $g^{- 1} (v) : = \Pr (V \leq v) V \sim Beta (2, 2),$ (11) which satisfies our requirement that $g^{- 1} (v) = 0$ for $v \leq 0$ . This choice implies that $g^{- 1} (v) = 1$ for $v \geq 1$ ; that is, sufficiently high stimulus values are certain to cause a “go,” although these values may be well above $\bar{x}$ , the top of the range of interest. This particular Beta was chosen because of its width, but the result will not be sensitive to the choice because the spline coefficients can adjust.

3.2 Prior Distribution

This section explains how to specify a prior distribution for the basis coefficients $α$ , which adjusts automatically to the modeling choices of the previous section.

Putting aside all shape constraints on the sensitivity function, the natural prior distribution for $α$ is a multivariate Gaussian distribution of the general form (12) $α | σ \sim N_{k} (\bar{α} 1, σ^{2} W) .$ (12)

Both $\bar{α}$ and W can be given simple values. For $\bar{α}$ , note that $E (h (x; α) | σ) = {(\bar{α} 1)}^{T} b (x) = \bar{α},$ because $1^{T} b (x) = 1$ for all $x \in [\underline{x}, \bar{x}]$ (de Boor Citation2001, p. 89). Therefore, $\bar{α}$ is the vertical offset, and the natural choice is (13) $\bar{α} \leftarrow g (0.5) .$ (13)

For the link function in (11), $\bar{α} = 0.5$ .

For W we reason as follows. As outlined above, cubic B-splines are localized. Therefore, consecutive smoothness in the components of $α$ promotes smoothness in the sensitivity function. This feature is exploited in spline-based smoothing; for example, Eilers and Marx (Citation1996) proposed a penalty on $α$ which amounted to a random walk prior (Wood Citation2017, sec. 5.3.3). However, the monotonic constraint (8) already imposes a high degree of smoothness on $α$ , and we have found that it is unnecessary to add more. Therefore, we make the simplest choice, that $W \leftarrow I,$ the identity matrix.

Next, we need a prior distribution for σ. Letting $\bar{σ} : = E (σ)$ , we tentatively propose that $\bar{σ}$ satisfies $sd (h (x_{mid}) | \bar{σ}) = g (0.75) - g (0.25)$ for some representative value $x_{mid} \in [\underline{x}, \bar{x}]$ . In other words, one standard deviation covers 50% of the range. This would imply (14) $\bar{σ} \leftarrow \frac{g (0.75) - g (0.25)}{{‖ b (x_{mid}) ‖}_{2}}$ (14) after substituting W = I, where ${‖ \cdot ‖}_{2}$ is the Euclidean norm. However, the denominator can vary according to where $x_{mid}$ lies relative to the knots, whose location is determined by $[\underline{x}, \bar{x}]$ and k. It is largest when $x_{mid}$ lies at a knot, and smallest when $x_{mid}$ lies equidistant between knots. So we define $x_{mid}$ to be a stimulus value equidistant between knots. For the link function in (11) with k = 5 basis functions, $\bar{σ} = 0.512$ .

We choose an Exponential prior distribution for σ, so that $\bar{σ}$ is the only argument required. The Exponential distribution is the Shannon entropy maximizing distribution on the positive real line, subject to an expectation constraint. As will be seen below ( and ) this large $\bar{σ}$ and Exponential distribution, in combination with the other choices, generates a wide and fairly neutral set of prior samples for the sensitivity function.

Fig. 2 Dror dataset, 100 samples from the prior and the posterior sensitivity functions.

Finally, we come back to the shape constraints, that $p (\cdot)$ is increasing, and that p(x) = 0 for $x \leq x_{0}$ . The prior distribution on $(α, σ)$ defines a stochastic process for $p (\cdot)$ , the sensitivity function. In this stochastic process every sample path is a function which maps a stimulus value x into $[0, 1]$ .

First, consider the case where $x_{0} = \underline{x}$ . The sample paths do not all pass through $(x_{0}, 0)$ , since there will be values of $α$ for which $v = α^{T} b (x_{0}) > 0$ , and these sample paths pass through $(x_{0}, g^{- 1} (v))$ , where $g^{- 1} (v) > 0$ . As the link function satisfies $g^{- 1} (v) = 0$ for $v \leq 0$ , the conjunction (15) $x_{0} = \underline{x} : α_{1} \leq \dots \leq α_{k} and α^{T} b (x_{0}) \leq 0$ (15) is sufficient for $p (\cdot)$ increasing and p(x) = 0 for $x \leq x_{0}$ . The second condition ensures that $p (x_{0}) = 0$ , and the first condition ensures that $p (\cdot)$ is increasing, which then implies that $p (x) = 0$ for all $x \leq x_{0}$ . This conjunction does not rule out the possibility that p(x) = 0 for $x > x_{0}$ . This is in contrast with GLM-log, discussed in Section 2, where p(x) > 0 for all $x > x_{0}$ .

Now consider the case where $x_{0} < \underline{x}$ . This situation is more complicated because x₀ lies outside the domain of the basis functions. In this case, we adapt the second condition in (15) by linearly extrapolating from $\underline{x}$ to x₀, which gives (16) $α^{T} b (\underline{x}) + (x_{0} - \underline{x}) h' (\underline{x}; α) \leq 0$ (16) which generalizes (15). de Boor (Citation2001, p. 116, eq. 12’) states the expression for the derivative of $h (\cdot)$ : (17) $h' (x; α) = 3 \sum_{j = 1}^{k + 1} \frac{α_{j} - α_{j - 1}}{x^{(j + 3)} - x^{(j)}} q_{j} (x), α_{0} = α_{k + 1} = 0,$ (17) where $x^{(j)}$ are the knot locations and $q_{j} (\cdot)$ are quadratic splines (unfortunately the expression in Wood Citation2017, p. 209, is wrong). It is straightforward to check that (17) implies $h' (\underline{x}; α) = \frac{- α_{1} + α_{3}}{2 Δ} .$

Furthermore, $b (\underline{x})$ is only nonzero for its first three components (see ). Therefore, (16) can be written as (18a) $R : = {α \in R^{k} : α_{1} \leq \dots \leq α_{k} and α^{T} \tilde{b} \leq 0}$ (18a) where $\tilde{b} = ({\tilde{b}}_{1}, {\tilde{b}}_{2}, {\tilde{b}}_{3}, 0, \dots, 0)$ , and (18b) ${\tilde{b}}_{1} : = b_{1} (\underline{x}) - c, {\tilde{b}}_{2} : = b_{2} (\underline{x}), {\tilde{b}}_{3} : = b_{3} (\underline{x}) + c,$ (18b) where $c : = (x_{0} - \underline{x}) / (2 Δ)$ .

Constraining $α | σ$ to $R$ gives the final model: (19) $\begin{matrix} Y_{i} | α & \sim Bin (n_{i}, p (x_{i}; α)), i = 1, \dots, m \\ g (p (x; α)) & = α^{T} b (x) \\ α | σ & \sim {TN}_{k} (\bar{α} 1, σ^{2} I, R) \\ σ & \sim Exp (1 / \bar{σ}), \end{matrix}$ (19) where ${TN}_{k} (μ, Σ, R)$ is the k-variate Normal distribution with mean $μ$ and variance Σ, restricted to the set $R$ , and $\bar{α}$ and $\bar{σ}$ are specified in (13) and (14), respectively. The Supplementary Material describes our MCMC approach for targeting the posterior distribution of $α$ .

As a reviewer has noted, the quantification of $\bar{σ}$ is made before the restriction of $α$ to $R$ . The concern is that this restriction will filter out many sample paths for the sensitivity function, and therefore $\bar{σ}$ as quantified might be too low to generate a wide variety of sample paths in the prior distribution. However, the prior distribution for σ is Exponential, and so many of the σ values in the prior will be larger than $\bar{σ}$ . Moreover, we inspect samples from the prior distribution, to ensure that they are not overly prescriptive; see and .

4 Uncertainty Assessment

As described in Section 2, some applications require specific uncertainty assessments, such as a lower one-sided 95% confidence interval for $q (0.001)$ , for the no-fire threshold (NFT). Under a Bayesian approach, these can be straightforwardly extracted from the output of the MCMC sampler. Each sample has a posterior $α$ , each $c α$ implies a sensitivity function, each sensitivity function can be inverted to a quantal function, and each quantal function yields a value for $q (0.001)$ . The posterior distribution of $q (0.001)$ is then summarized as an interval. The lower one-sided 95% credible interval would start at the 5th percentile of the resulting sample.

Things are not so straightforward for the GLM approaches, bearing in mind that confidence intervals for nonlinear models must be based on some form of approximation: the delta method is the default for GLM algorithms, like glm in R. However, the value $q (0.001)$ is itself a nonlinear function of the parameters $(α_{0}, α_{1})$ : (20) $q (0.001) = \exp {\frac{g (0.001) - α_{0}}{α_{1}}}$ (20) for GLM-log, and so uncertainty about $q (0.001)$ is doubly hard to assess accurately.

For the GLM approaches we suggest using a simulation-based approach based on treating $(α_{0}, α_{1})$ as Gaussian, using the asymptotic approximation to the covariance based on inverting the observed information matrix. This approach is commonly used in practice, and has a “quasi-Bayesian” justification (Wood Citation2017, sec. 2.2, 6.10, Appendix A.4). It is a unified approach which can be applied to approximating pointwise (vertical) confidence intervals for both the sensitivity function and the quantal function. For example, ten thousand samples of $(α_{0}, α_{1})$ would be generated from the fitted GLM-log model, $q (0.001)$ would be computed in each case using (20), and the lower one-sided 95% credible interval would start at the 5th percentile of the resulting sample. For the sensitivity function, the simulation-based approach is the same as using the delta method to construct a confidence interval on the link scale, and then transforming by the inverse link function, which is quicker and does not incur Monte Carlo error.

5 Illustrations

This section illustrates the performance of the different approaches, using two contrasting experimental datasets, both given below. The three approaches are GLM-lin and GLM-log, presented in Section 2.1, and the Bayesian approach presented in Section 3. To summarize their similarities and differences:

GLM-lin does not respect the physical constraint p(x) = 0 for $x \leq x_{0}$ , where in both illustrations $x_{0} = 0$ . GLM-log and the Bayesian approach both respect this constraint.
GLM-lin has a symmetric sensitivity function, and GLM-log has an effectively symmetric sensitivity function when the overlap condition holds, as it will do in many applications. The Bayesian approach is not necessarily symmetric.
More generally, the flexibility the GLM approaches is restricted by having only two parameters, while the Bayesian approach has more: we use five and seven below.
The GLM approaches require approximations to compute pointwise confidence intervals for the sensitivity and quantal functions; therefore, their actual coverage will differ from their nominal coverage. The Bayesian approach provides effectively exact coverage in its pointwise credible intervals.

The calculations in this section were all carried out using the R package oneshot. This is available in the Supplementary Material, which also contains more details about the MCMC results below. The run-time of the entire set of calculations was just a few seconds on a standard laptop, using 10,000 samples in each approach, plus a spin-up of one thousand samples in the Bayesian approach.

5.1 The Dror Dataset

The first illustration uses the small detonator dataset from , Panel (b) of Dror and Steinberg (Citation2008), with 20 tests; hereafter “Dror”. This dataset is given in . The overlap ratio is 0.016, and therefore the overlap condition in (4) holds, so that there is no practical difference between GLM-lin and GLM-log. We will only fit the latter, but note the implication, that the GLM approaches are necessarily symmetric for this dataset.

Table 1 The Dror dataset, hand-digitized from Figure 5, Panel (b) of Dror and Steinberg (Citation2008), where the stimulus is voltage.

Download CSV Display Table

As stated in (Dror and Steinberg Citation2008, sec. 4), the objectives of the experiment “were to verify the manufacturer’s statement that the explosives will not detonate at 12 V (being a safe voltage) and will detonate (“all fire”) at 25 V.” Using these two values as indicative of lower and upper limits, there is a case for replacing the lower limit with $x_{0} = 0$ and increasing the knots from the default value of k = 5, as described around (10). For this illustration, though, we will stick with $[\underline{x}, \bar{x}] = [10, 30]$ and k = 5.

shows samples from the prior and posterior sensitivity functions for the Dror dataset. The samples from the prior show that there are a wide variety of sensitivity functions in the prior distribution, so that the choice of prior described in Section 3.2 is not overly prescriptive. Comparing these two samples shows that conditioning on the dataset has substantially reduced uncertainty about the sensitivity function, but apparently these 20 tests are not able to eliminate uncertainty about the sensitivity function in the left-hand tail, despite the monotonicity constraint.

shows the sensitivity function for the GLM and Bayesian approaches, along with pointwise 95% prediction intervals. When presented with these two estimates, a subject matter expert, or regulator, might agree that they are clearly different, but neither is obviously wrong. To return to the question posed by Dror and Steinberg (Citation2008), above, it would be prudent to conclude from that more testing is required, for both the lower limit of 12 V and the upper limit of 25 V.

Fig. 3 Dror dataset, sensitivity function under the GLM and Bayesian approaches. The solid line is the fitted function, and the shaded polygon shows pointwise 95% prediction intervals. The dataset is shown as dots: at p = 0 for “no-go”, and $p = 1$ for “go”. The overlap condition in (4) holds, and therefore the two GLM approaches are effectively identical; only GLM-log is shown.

5.2 The ESD Dataset

The second illustration uses a large detonator dataset from an experiment carried out at the Atomic Weapons Establishment (AWE), at Aldermaston in the United Kingdom; hereafter “ESD,” as it was used as part of an electrostatic discharge safety assessment. This dataset is given in . It is very different from the Dror dataset, both in its size (79 tests), and in its overlap ratio, which is 0.34, so that the overlap condition in (4) does NOT hold. Therefore, the two GLM approaches may give different results, and GLM-log may not be symmetric. In order to investigate the low quantals, we use $[\underline{x}, \bar{x}] = (0, 300)$ and k = 7.

Table 2 The ESD dataset, from an experiment carried out at the Atomic Weapons Establishment (AWE), Aldermaston.

Download CSV Display Table

and show the prior and posterior samples, and the estimated sensitivity functions, respectively. The effect of the constraint set $R$ in (19) is apparent in the left-hand panel of , where every prior sensitivity function satisfies $p (0) = 0$ . With this much larger dataset there is greater similarity between the three fits, although also some differences. In the GLM-lin prediction interval has not gone to zero at x = 0, which will have a catastrophic effect on the low-probability quantals (see below). Comparing GLM-log and the Bayesian approach, the fits are similar around p = 0.5, but the Bayesian approach is more uncertain in the tails.

Fig. 4 ESD dataset, 100 samples from the prior and the posterior sensitivity functions.

Fig. 5 ESD dataset, sensitivity function under the GLM and Bayesian approaches. The solid line is the fitted function, and the shaded polygon shows pointwise 95% prediction intervals.

shows the fitted sensitivity functions along with the same functions rotated around their midpoints. This figure shows asymmetry in the fitted sensitivity functions. Due to the shape of the Lognormal distribution, which has a long right-hand tail, the only asymmetry that GLM-log can manifest is a sensitivity function which is steeper taking off from p = 0 than it is arriving at p = 1. The Bayesian approach, which has no such constraint, finds that this asymmetry is present in the ESD dataset, and GLM-log has found it too. GLM-log appears to be more asymmetric than the Bayesian approach, but this is tenuous, because the asymmetry of GLM-log is determined entirely by the fitted values of μ and σ—the asymmetry may be a side-effect.

Fig. 6 ESD dataset, rotating the fitted sensitivity functions around their midpoints, $(q (0.5), 0.5)$ . (a) GLM-log, (b) Bayesian. If the fit is symmetric, the solid line and the dashed line will coincide.

shows some low quantals for the three approaches, using two-tailed 95% prediction intervals. GLM-lin has given a nonsensical outcome for the p = 0.001 and p = 0.01 quantals, owing to not respecting the constraint $p (x) = 0$ for $x \leq 0$ . The quantals for GLM-log and the Bayesian approach are roughly comparable, although the prediction intervals for the latter are wider. This could be due to differences in the two models, or it could be coverage error in GLM-log.

Fig. 7 ESD dataset, low quantals. The dot is the fitted value and the intervals are two-tailed 95%. (a) GLM-lin, (b) GLM-log, (c) Bayesian.

6 Conclusion

Current practice for analysing data from one-shot experiments, for example for detonator safety and reliability assessment, is based on a Binomial Generalized Linear Model (GLM). We show that this approach has several deficiencies which can be addressed within a Bayesian approach in which the prior distribution represents the physical constraints of the application. We develop a MCMC sampling scheme which allows the Bayesian approach to run “out of the box,” in about the same time as the current GLM-based approaches. We show that our Bayesian approach gives different inferences to those of the GLM-based approaches.

Along the way, we explain why GLM-log should always be favored over GLM-lin (Section 2.1), and give a simple condition for when they will provide effectively identical fits (the overlap condition, (4)). When this condition holds, GLM-log will have an effectively symmetric sensitivity function, a restriction which is unlikely to be justified by the underlying application. When this condition does not hold, GLM-log can only manifest one type of asymmetry, in which the sensitivity function is steeper taking off from p = 0 than it is arriving at p = 1. Our Bayesian approach can manifest both types of asymmetry, although in Section 5.2 we find that the direction of asymmetry in the ESD dataset is the same as that of GLM-log.

Finally, we would like to highlight two directions of ongoing research. First, we are planning a detonator experiment with several hundred tests, designed to investigate asymmetry in the sensitivity function. With such a large number of tests we ought to be able to use a large number of basis functions in the Bayesian approach, and—we hope—see whatever asymmetry exists, in high resolution. This will inform future inference for one-shot experiments to test detonators.

Second, we are examining experimental design to target a particular tail of the quantal function, such as the lower tail to assess the No-Fire Threshold (NFT). Experimental design for one-shot experiments has been well-studied (see, e.g., Wu and Tian Citation2014), but usually with either a GLM approach, or with a nonparametric approach. The nonparametric approach is flexible and effective around $p = 0.5$ , but not practical for very low quantals. The rigidity of the two-parameter GLM approach has the unfortunate feature that tests at medium and high stimulus values will be informative for the sensitivity at very low stimulus values, a viewpoint which is rejected by subject matter experts. We have found that the Bayesian spline model combined with minimizing the expected Shannon entropy of the quantity of interest (such as the NFT) is effective at identifying stimulus values for further tests. The expected Shannon entropy can be estimated directly from the MCMC sample.

Supplementary Materials

The MCMC sampler is described in the Supplementary Material, along with diagnostics for ESD dataset. The code and datasets are also available, including the oneshot package for R.

Supplemental material

Supplemental Material

Download Zip (236.9 KB)

esd_final_suppmat.pdf

Download PDF (212.2 KB)

Acknowledgments

We would like to thank the following for their very helpful comments on previous versions of this article: Rod Drake, David Steinberg, Simon Wood, Andrew Zammit Mangion, and Matt Stapleton. The comments of the Editor, an Associate Editor, and three reviewers substantially improved the paper’s focus and clarity.

Disclosure Statement

The authors report there are no competing interests to declare.

References

Casella, G., and Berger, R. L. (2002), Statistical Inference (2nd ed.), Pacific Grove, CA: Duxbury.
Google Scholar
de Boor, C. (2001), A Practical Guide to Splines (revised ed.), New York: Springer.
Google Scholar
Dror, H. A., and Steinberg, D. M. (2008), “Sequential Experimental Designs for Generalized Linear Models,” Journal of the American Statistical Association, 103, 288–297. DOI: 10.1198/016214507000001346.
Web of Science ®Google Scholar
Efron, B., and Hastie, T. (2016), Computer Age Statistical Inference, New York, NY: Cambridge University Press. Available at https://hastie.su.domains/CASI/.
Google Scholar
Eilers, P. H. C., and Marx, B. D. (1996), “Flexible Smoothing with b-splines and Penalties,” Statistical Science, 11, 89–102. With discussion and rejoinder, 102–121. DOI: 10.1214/ss/1038425655.
Web of Science ®Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014), Bayesian Data Analysis (3rd ed.), Boca Raton, FL: Chapman and Hall/CRC. Available at http://www.stat.columbia.edu/∼gelman/book/.
Google Scholar
Hickey, G., and Hart, A. (2013), “Statistical Aspects of Risk Characterisation in Ecotoxicology,” in Risk and Uncertainty Assessment for Natural Hazards, eds. J. C. Rougier, R. S. J. Sparks, and L. J. Hill, Cambridge, UK: Cambridge University Press.
Google Scholar
Langlie, H. J. (1963), “A Reliability Test Method for ‘One-Shot’ Items,” Technical Report ADP014612, Aeronutronic Division, Ford Motor Company. Available at https://apps.dtic.mil/sti/citations/ADP014612.
Google Scholar
Neelon, B., and Dunson, D. B. (2004), “Bayesian Isotonic Regression and Trend Analysis,” Biometrics, 60, 398–406. DOI: 10.1111/j.0006-341X.2004.00184.x.
PubMed Web of Science ®Google Scholar
Neyer, B. T. (1992), “An Analysis of Sensitivity Tests,” Technical Report, Mound, Miamisburg, OH, US. Available at https://www.osti.gov/biblio/5654343, OSTI identifier 5654343.
Google Scholar
Neyer, B. T. (1994), “A D-optimality-based Sensitivity Test,” Technometrics, 36, 61–70.
Web of Science ®Google Scholar
Pya, N., and Wood, S. N. (2015), “Shape Constrained Additive Models,” Statistics and Computing, 25, 543–559. DOI: 10.1007/s11222-013-9448-7.
Web of Science ®Google Scholar
R Core Team. (2020), R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at https://www.R-project.org/.
Google Scholar
STANAG 4560. (2016), Electro-Explosive Devices, Assessment and Test Methods for Characterization (3rd ed.). Full reference number NS0/1418(2016)SGA/4560, 21 November 2016.
Google Scholar
Steinberg, D. M., Dror, H., and Villa, L. (2014), “Discussion of “Three-phase Optimal Design of Sensitivity Experiments” by Wu and Tian,” Journal of Statistical Planning and Inference, 149, 26–29. DOI: 10.1016/j.jspi.2013.12.012.
Google Scholar
Teller, A., Steinberg, D. M., Teper, L., Rozenblum, R., Mendel, L., and Jaeger, M. (2016), “New Methods for Sensitivity Tests of Explosive Devices,” in 13th International Conference on Probabilistic Safety Assessment and Management (PSAM 13), Seoul, Korea.
Google Scholar
Wetherill, G. B. (1963), “Sequential Estimation of Quantal Response Curves,” Journal of the Royal Statistical Society, Series B, 25, 1–48. DOI: 10.1111/j.2517-6161.1963.tb00481.x.
Google Scholar
Wood, S. N. (2017), Generalized Linear Models: An Introduction with R (2nd ed.), Boca Raton FL: CRC Press.
Google Scholar
Wu, C. F. J., and Tian, Y. (2014), “Three-Phase Optimal Design of Sensitivity Experiments,” Journal of Statistical Planning and Inference, 149, 1–15. With comments and rejoinder, pp. 16–32. DOI: 10.1016/j.jspi.2013.10.007.
Web of Science ®Google Scholar
Young, L. J., and Easterling, R. G. (1994), “Estimation of Extreme Quantiles based on Sensitivity Tests: A Comparative Study,” Technometrics, 36, 48–60. DOI: 10.1080/00401706.1994.10485400.
Web of Science ®Google Scholar

Bayesian Modeling and Inference for One-Shot Experiments

Abstract

1 Introduction