Full article: Adaptive sequential design for phase II single-arm oncology trials: an expansion of Simon’s design

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Single-arm phase II trials are very common in oncology. A fixed sample trial may lack sufficient power if the true efficacy is less than the assumed one. Adaptive designs have been proposed in the literature. We propose a Simon’s design based, adaptive sequential design. Simon’s design is the most used fixed sample design for single-arm phase II oncology trials. A prominent feature of Simon’s design is that it minimizes the sample size when there is no clinically meaningful efficacy. We identify Simon’s trial as a special group sequential design. Established methods for sample size re-estimation (SSR) can be readily applied to Simon’s design. Simulations show that simply adding SSR to Simon’s design may still not provide desirable power. We propose some expansions to Simon’s design. The expanded design with SSR can provide even more power.

KEYWORDS:

1. Introduction

In phase II single-arm trials, the primary endpoint is often a binary response rate $p$ (e.g., radiological response, tumor response rate, or objective response rate (ORR) (U.S. Food and Drug Administration, Citation2018). The null hypothesis is that the response rate does not exceed some $p_{0}$ , which is a response rate indicating low activity with insufficient clinical benefit. The alternative hypothesis is that the response rate is larger than $p_{0}$ . Clinical considerations may suggest some $p_{low}$ (which is larger than $p_{0}$ ) such that any response rate no less than $p_{low}$ will be clinically meaningful. The response rate $p_{low}$ will depend on the cancer type and available treatment options. For some recalcitrant cancer in patients with some specific genes with limited treatment options, even incremental efficacy (e.g., such as $p_{low} < 0.2)$ can be clinically meaningful. With over 3500 citations, Simon’s two-stage design Simon (Citation1989) is the most used design for single-arm phase II oncology trials. Simon’s designs are fixed sample designs. A fixed sample trial designed to provide $1 - β$ power for $p = p_{low}$ will also provide at least $1 - β$ power if the true response rate is larger. This is to say that there is a fixed sample size that can provide adequate power for a range of possible response rates. However, such a trial assumes the lowest clinically meaningful response rate and is the largest trial of all possibilities. The large sample size with this design is unnecessary if the true response rate is higher than $p_{low}$ . Oncology trials are usually very expensive and time consuming, trials with potentially unnecessarily large sample sizes are often undesirable. An ideal sample size is such that it’s “just about right”, but determining accurate sample size requires precise knowledge about the true response rate which is often unknown. Usually, a target response rate $p_{1} > p_{low}$ is assumed (which may not be accurate for various reasons) for power calculation. However, the trial will be underpowered if the true response rate $p$ is smaller than the assumed $p_{1}$ . Similar concerns about underpowered trial designs are also common in placebo-controlled studies and have motivated extensive research on sample size re-estimation (SSR), and such methods have been widely used (e.g., U.S Food and Drug Administration (CDER and CBER) (Citation2019)) to provide flexibility in trial design. There are mainly two categories of such methods: combination tests/conditional error functions (e.g., Bauer and Kohne (Citation1994); Proschan and Hunsberger (Citation1995)), and adaptive sequential designs (ASD) (e.g., Cui et al. (Citation1999); Müller and Schäfer (Citation2001); Jennison and Turnbull (Citation2003, Citation2006); Chen et al. (Citation2004); Gao et al. (Citation2008). Conditional error function-based methods for SSR have been proposed for phase II single-arm studies (e.g., Englert and Kieser Citation2012; Englert and S Citation2013; Kunzmann and Kieser Citation2016). In this article, we propose an ASD that can be used for single-arm trials. We show that Simon’s design is a special kind of group sequential design, and thus can be naturally combined with SSR methods from ASD Gao et al. (Citation2008), and Gao et al. (Citation2013). However, we also show through simulations that simply adding SSR to Simon’s design may not sufficiently increase the power. For this reason, we first augment and expand Simon’s design, all following Simon’s designing considerations. Then, we combine the expansions with SSR, which provides more power than Simon’s design combined with SSR. The method is an ASD and retains the basic structure and considerations of Simon’s design. Our design is mathematically similar to the sequential design by Chang et al. (Citation1987), but our constraint on sample size is more aligned with that of Simon (Citation1989). Like other two-stage designs Simon (Citation1989); Jung et al. (Citation2004); Lin and Shih (Citation2004); Englert and Kieser (Citation2012); Englert and S (Citation2013), our method includes a first stage in which a futility analysis is conducted, and the trial is terminated if the efficacy does not reach a clinical meaningful threshold. If the trial is not stopped for futility, further patients are then enrolled to assess efficacy in the second stage. Our method also includes an option for three-stage design which is statistically more efficient. One feature of ASD-based SSR (used in our proposal) is that exact inference Gao et al. (Citation2013) is available, which includes median unbiased estimate for efficacy, and two-sided exact confidence interval. Further, through simulations, we show that even with adaptive design, satisfactory operating characteristics (OC) (e.g., power and sample size) are not automatic. We provide several design options for the users to explore, compare, and choose from. We recommend that simulations be conducted to investigate the OCs of designs under consideration such that designs with desirable OCs can be selected. The proposed method and simulations are supported by the software Design and Analysis for Clinical Trials (DACT) at https://www.innovatiostat.com/softeware.html, which is free for academic researchers. Computing codes are provided in the online supporting material and also available upon request.

2. Simon’s design and challenges in phase II oncology trials

2.1. Notations

Simon’s design is a two-stage design which includes four parameters $r_{1}, n_{1}, r_{2}, n_{2}$ . We propose several two-stage (the hybrid and the mid-point designs) and a three-stage design in this article, each of these will involve similar parameters. We introduce some notations to distinguish the designs and the associated parameters.

2.2. Simon’s design

Let $p$ , $p_{0}$ and $p_{1}$ be as discussed above. Sample size is calculated assuming that the response rate is $p_{1}$ . There are two stages in Simon’s design. In stage 1, $n_{1}$ subjects will be enrolled. If no more than $r_{1}$ responses are observed, then the trial would be stopped for futility. Otherwise, the trial proceeds to the second stage and enroll to a total of $n_{2}$ subjects for the trial. The null hypothesis will be rejected if more than $r_{2}$ responses are observed at the end of the trial. Let $B (r, p, n)$ denote the cumulative binomial distribution, and $b (r, p, n)$ the binomial probability mass function. Let $Δ n_{12} = n_{2} - n_{1}$ . Let $f (r_{1}, n_{1}, r_{2}, n_{2}, p) = B (r_{1}, p, n_{1}) + \sum_{x = r_{1} + 1}^{\min [n_{1}, r_{2}]} b (x, p, n_{1}) B (r_{2} - x, p, Δ n_{12})$ . This is the probability of not rejecting the null hypothesis with the response rate of $p$ (Simon, Citation1989). The type I error is $1 - f (r_{1}, n_{1}, r, n_{2}, p_{0})$ . The type II error and power are $f (r_{1}, n_{1}, r, n_{2}, p)$ and $1 - f (r_{1}, n_{1}, r, n_{2}, p)$ , respectively. $PET (p) = B (r_{1}; p, n_{1})$ is the probability of early termination. The expected sample size is $EN (p) = n_{1} + (1 - PET (p)) Δ n_{12}$ . A Simon’s design includes the quadruplet ( $n_{1}$ , $r_{1}$ , $n_{2}$ , $r_{2}$ ), such that under $p_{0}, p, α, β$ , the quadruplet satisfies the three considerations/constraints: i) The one-sided type I error should be controlled at a target level $α$ ; ii) the type II error be controlled at $β$ . iii) minimizing the sample size when the drug has no sufficient activity. We refer to these as Simon’s considerations. All methods for sample size calculation for clinical trials share the first two constraints/considerations. The third one aims to select the smallest $n_{2}$ under $p_{0}$ (among all possible quadruplets) for the minimax design, or the smallest $EN (p_{0})$ for the optimal design Simon (Citation1989). This is a distinctive feature for Simon’s design, and likely contributed to the huge popularity of Simon’s design. Let $S (p_{0}, p, α, β)$ denote the quadruplet ( $n_{1}$ , $r_{1}$ , $n_{2}$ , $r_{2}$ ) under $p_{0}, p, α, β$ .

2.3. Group sequential design and Simon’s design

Let ${X_{i}}_{} B (p)$ , $i = 1, \dots, n$ be random samples from a binary distribution with $P (X_{i} = 1) = p$ , and $P (X_{i} = 0) = 1 - p$ . The parameter of interest is $θ = p - p_{0}$ . Let $\hat{p} = \overset{ˉ}{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ . Let $s . e . (\hat{p}) = \sqrt{\hat{p} (1 - \hat{p}) / n}$ be the standard error for $\hat{p}$ . $\hat{t} = [s . e . (\hat{p})]^{- 2}$ is the estimated Fisher’s information time. Then $Z (\hat{t}) = (\hat{p} - p_{0}) \sqrt{\hat{t}}$ is approximately $N (θ \sqrt{\hat{t}}, 1)$ distributed and will be the Wald statistics from the observations $(X_{1}, \dots, X_{n})$ .

Let the Fisher’s information times be $t_{i} = \frac{n_{i}}{p (1 - p)}$ , $i = 1, 2$ , where $n_{i}$ is the sample size at the $i$ -th stage analysis. Let $s_{i} = t_{i} / T$ be the information fractions, where $T = t_{K}$ is the total information time for the trial. Let $d_{1} = (\frac{r_{1}}{n_{1}} - p_{0}) \sqrt{\frac{{(n_{1})}^{3}}{r_{1} (n_{1} - r_{1})}}$ , $c_{1} = + \infty$ , $c_{2} = (\frac{r_{2}}{n_{2}} - p_{0}) \sqrt{\frac{{(n_{2})}^{3}}{r_{2} (n_{2} - r_{2})}}$ . Let $x_{i}$ be the number of observed responses at $n_{i}$ , $i = 1, 2$ . In Simon’s design the trial is terminated for futility at $n_{1}$ if $(x_{1} \leq r_{1})$ is observed, and the null hypothesis of $p \leq p_{0}$ is rejected if $(x_{2} \geq r_{2})$ is observed at $n_{2}$ . Let ${\hat{t}}_{i} = [s . e . (\hat{p})]^{- 2}$ $= n_{i} {[\frac{x_{i}}{n_{i}} (1 - \frac{x_{i}}{n_{i}})]}^{- 1} = \frac{{(n_{i})}^{3}}{x_{i} (n_{i} - x_{i})}$ be the estimated Fisher’s information time at $n_{i}$ . Let $Z (t_{i}) = (\frac{x_{i}}{n_{i}} - p_{0}) \times \sqrt{{\hat{t}}_{i}}$ . Then $(x_{1} \leq r_{1})$ is the same event as $(Z (t_{1}) \leq d_{1})$ , and $(x_{2} \geq r_{2})$ is the same event as $(Z (t_{2}) \geq c_{2})$ . i.e., Simon’s design is a GSD, such that the trial is terminated for futility if $(Z (t_{1}) \leq d_{1})$ is observed, and the null hypothesis rejected if $(Z (t_{2}) \geq c_{2})$ is observed. GSD are commonly designed by deriving critical boundaries such as O’Brien-Fleming boundary O’Brien and Fleming (Citation1979), the Pocock (Citation1977), or $α$ - spending Lan and DeMets (Citation1983) at planned information fractions to control type I error, while the boundaries $d_{1}$ and $c_{2}$ in Simon’s design are derived from ( $n_{1}$ , $r_{1}$ , $n_{2}$ , $r_{2}$ ) and $p_{0}$ . $d_{1}$ and $c_{2}$ properly control type I error because ( $n_{1}$ , $r_{1}$ , $n_{2}$ , $r_{2}$ ) does.

Pocock’s boundaries satisfy $c_{1} = c_{2}$ , and O’Brien-Fleming boundaries satisfy $c_{1} \sqrt{t_{1}} = c_{2} \sqrt{t_{2}}$ . The motivation for Simon’s design is different from both Pocock’s and O’Brien-Fleming boundaries. Hence, the boundaries are naturally different, with $c_{1} = + \infty$ in Simon’s design.

2.4. Challenges in phase II oncology trials and the need for SSR

A fixed sample design, including Simon’s design, provides adequate power only when the true response rate is no less than the assumed response rate $p_{1}$ . The power will be less than $1 - β$ if the true response rate is less than $p_{1}$ . In practice, a range of response rates can be clinically beneficial. Besides $p_{low}$ , prior clinical evidence, or pathological analysis and mode of action may suggest that some optimistic response rate $p_{high}$ may be possible. Any assumed response rate is reasonable if it is between $p_{low}$ and $p_{high}$ . A flexible design is such that it targets some response rate $p_{1}$ between $p_{low}$ and $p_{high}$ , and with a sample size smaller than $n_{2}^{S i mon} (p_{0}, p_{low})$ , but allows for sample size increase to that similar to $n_{2}^{S i mon} (p_{0}, \hat{p})$ , if the observed response rate is $\hat{p}$ at the interim analysis, with $p_{low} < \hat{p} < p_{1}$ . We have identified that Simon’s design is a special type of group sequential design. Hence, it is natural to utilize and combine available methods in ASD with Simon’s design.

3. Adaptive sequential design with binding futility termination boundary

An ASD is the combination of a group sequential design (GSD) with adaptive features such as sample size modification. The first step in an ASD is to select a GSD. Let $θ$ be the efficacy parameter, larger values of $θ$ indicates better efficacy. Let the null hypothesis $H_{0}$ be that $θ \leq 0$ , and the one-sided alternative hypothesis $H_{a}$ be that $θ > 0$ . In a GSD, $K$ (including interim and the final) analyses will be conducted at information times $0 < t_{1} < \dots < t_{K}$ . Let $Z_{i}$ be the Wald statistics at the $i$ -th interim analysis. Critical boundaries $c_{1}, \dots, c_{K}$ and futility boundaries $d_{1}, \dots, d_{K - 1}$ may be selected, such that the null hypothesis $H_{0}$ is rejected if $Z_{i} \geq c_{i}$ is observed at any $1 \leq i \leq K$ , and the trial declared futile and terminated if the $Z_{i} < d_{i}$ is observed for any $1 \leq i \leq K - 1$ . The $c_{i}$ ’s and $d_{i}$ ’s are chosen such that the one-sided type I error is controlled at level $α$ . The futility boundaries $d_{1}, \dots, d_{K - 1}$ can be chosen to be either binding or non-binding. If the Wald statistics for single-arm trials can be constructed, then the usual GSD can be applied to single-arm trials, in a completely parallel manner.

3.1. Applying sample size re-estimation to Simon’s design

3.1.1. The SSR algorithm

SSR procedure can be used to increase sample size when the observed effect size is smaller than the assumed. The SSR procedure in a single arm oncology trial can be completely parallel to that in ASD (e.g., Cui et al. (Citation1999); Müller and Schäfer (Citation2001); Gao et al. (Citation2008) for two arm comparison. Let $\hat{p}$ be the MLE of $p$ at the interim analysis with a sample size of $n_{1}$ . Then the conditional power of rejecting the null hypothesis at the end of the trial with planned sample size $n_{2}$ Gao et al. (Citation2008) is $c p_{n_{2}} = P (Z (t_{2}) \geq c_{2} | Z (t_{1}) = z_{1}) = Φ ((p_{1} - p_{0}) \sqrt{t_{2} - t_{1}} - \frac{c_{2} \sqrt{t_{2}} - Z (t_{1}) \sqrt{t_{1}}}{\sqrt{t_{2} - t_{1}}})$ . And the estimated conditional power is ${\hat{cp}}_{n_{2}} = Φ ((\hat{p} - p_{0}) \sqrt{{\hat{t}}_{2} - {\hat{t}}_{1}} - \frac{c_{2} \sqrt{{\hat{t}}_{2}} - Z ({\hat{t}}_{1}) \sqrt{{\hat{t}}_{1}}}{\sqrt{{\hat{t}}_{2} - {\hat{t}}_{1}}})$ , where ${\hat{t}}_{2} = {\hat{t}}_{1} \times n_{2} / n_{1}$ . If ${\hat{cp}}_{n_{2}} < 1 - β$ , then the sample size may be increased to $n_{new}$ , such that ${\hat{cp}}_{n_{ew}} \geq 1 - β$ , where ${\hat{cp}}_{n_{ew}} = Φ ((\hat{p} - p_{0}) \sqrt{{\hat{t}}_{new, (1 - β)} - {\hat{t}}_{1}} - \frac{c_{2} \sqrt{{\hat{t}}_{2}} - Z ({\hat{t}}_{1}) \sqrt{{\hat{t}}_{1}}}{\sqrt{{\hat{t}}_{2} - {\hat{t}}_{1}}})$ . ${\hat{t}}_{new, (1 - β)}$ can be solved Gao et al. (Citation2008) as:

(1)

{\hat{t}}_{new, (1 - β)} \geq \frac{1}{{(\hat{p} - p_{0})}^{2}} {\{Φ^{- 1} (1 - β) + \frac{c_{2} \sqrt{{\hat{t}}_{2}} - Z ({\hat{t}}_{1}) \sqrt{{\hat{t}}_{1}}}{\sqrt{{\hat{t}}_{2} - {\hat{t}}_{1}}}\}}^{2} + {\hat{t}}_{1}

(1)

Let $n_{new, (1 - β)} = \hat{p} (1 - \hat{p}) \times {\hat{t}}_{new, (1 - β)}$ . Let $N_{\max}$ be a pre-determined maximum feasible sample size. Then the new sample size may be chosen as $n_{new} = \min (N_{\max}, n_{new, (1 - β)})$ . Let ${\hat{t}}_{new} = {\hat{t}}_{2} \times n_{new} / n_{2}$ . The final critical boundary at $n_{new}$ will be (see Gao et al. (Citation2008)) $c_{new} = e_{new} / \sqrt{{\hat{t}}_{new}}$ , where

(2)

e_{new} = \frac{\sqrt{{\hat{t}}_{new} - {\hat{t}}_{1}}}{\sqrt{{\hat{t}}_{2} - {\hat{t}}_{1}}} (c_{2} \sqrt{{\hat{t}}_{2}} - Z ({\hat{t}}_{1}) \sqrt{{\hat{t}}_{1}}) + Z ({\hat{t}}_{1}) \sqrt{{\hat{t}}_{1}}

(2)

Let $x_{new}$ be the number of responses observed at the final analysis. Let $r_{new}$ be the smallest integer such that

(3)

(\frac{r_{new}}{n_{new}} - p_{0}) \sqrt{\frac{{(n_{new})}^{3}}{r_{new} (n_{new} - r_{new})}} \geq c_{new} .

(3)

Then the null hypothesis is rejected if $x_{new} \geq r_{new}$ .

3.1.2. SSR using continuity correction

Due to the discreteness of the binary distribution, the above procedure for calculating $c_{new}$ may not exactly control the type I error and a continuity correction (e.g., Feller (Citation1945); Devore (Citation1995)) may need to be used. Let $r_{i}^{c o r}$ $=$ $r_{i} +$ 0.5, $i = 1, 2$ be the continuity corrections for $r_{i}$ . Let the continuity corrected binding futility boundary be $d_{1}^{c o r} = (\frac{r_{1}^{c o r}}{n_{1}} - p_{0}) \sqrt{\frac{{(n_{1})}^{3}}{r_{1}^{c o r} (n_{1} - r_{1}^{c o r})}}$ , and $r_{2}^{c o r}$ correspond to critical boundaries $c_{2}^{c o r} = (\frac{r_{2, cor}}{n_{2}} - p_{0}) \sqrt{\frac{{(n_{2})}^{3}}{r_{2}^{c o r} (n_{2} - r_{2}^{c o r})}}$ . Then $c_{2}^{c o r}$ (instead of $c_{2}$ ) can be used to calculate $c_{n e w}^{c o r}$ in the SSR, and $r_{n e w}^{c o r}$ can be obtained in the same way as for $r_{new}$ . demonstrates how to conduct interim analysis using DACT. DACT output includes $n_{new}$ , $r_{new}$ and $r_{n e w}^{c o r}$ .

Figure 1. Interim analysis using DACT.

The interim analysis shown in can accommodate both the two-stage and the three-stage design (see section 3.2.4 on expansions). The sample size re-estimation is conducted at $n_{1}$ for a two-stage design and at $n_{2}$ for a three-stage design. Therefore, for a two-stage design, $n_{inter} = n_{1}$ , $x_{inter} = x_{1}$ , $n_{final} = n_{2}$ , and $r_{final} = r_{2}$ . For a three-stage design, $n_{inter} = n_{2}$ , $x_{inter} = x_{2}$ , $n_{final} = n_{3}$ , and $r_{final} = r_{3}$ . For the input in , $r_{new} \neq r_{n e w}^{c o r}$ . However, if $x_{inter}$ is changed to 7, then $r_{new} = r_{n e w}^{c o r} = 15$ .

The corrections $r_{i}^{c o r},$ $i = 1, 2$ changes the calculations of $d_{1}^{c o r}$ and $c_{2}^{c o r}$ , which superficially changes the sequential design, but they do not actually change the original two-stage design, because it is discrete. The continuity correction only affects the calculation of $c_{n e w}^{c o r}$ in the SSR. $c_{n e w}^{c o r}$ is calculated using formulas (1) and (2), with $c_{2}$ being replaced by $r_{2}^{c o r}$ . Then, $r_{n e w}^{c o r}$ is identified using formula (3) with $n_{new}$ and $c_{new}$ being replaced by $n_{n e w}^{c o r}$ and $c_{n e w}^{c o r}$ , respectively. So $r_{1}^{c o r}$ is actually not used in this calculation and its definition is actually not necessary. But not defining $r_{1}^{c o r}$ may invite questions which will also need explanations and keeping the definition of $r_{1}^{c o r}$ may be a simpler approach. Per formula (1) and (2), $c_{n e w}^{c o r}$ is affected by ${\hat{t}}_{1}$ , ${\hat{t}}_{2}$ (which are determined by $n_{1}$ , $n_{2}$ ), $Z ({\hat{t}}_{1})$ (which is affected by the number of observed responses), $r_{2}^{c o r}$ . None of these parameters are associated with the original design (i.e., optimal, minimax, or the expanded design [see section 3.1] such as the average, the hybrid. The midpoint, or the three-stage design).

The use of $c_{2}^{c o r}$ may not always be necessary. The necessity may be checked with simulations. If the simulated type I error using $c_{2}$ is less than $α$ , the continuity correction would not be necessary. The simulations can be conducted using the DACT software, as illustrated in .

Figure 2. Necessity of continuity correction in a two-stage design.

Figure 3. Necessity of continuity correction in a three-stage design.

In , the type I error without continuity correction is <0.025, hence there is no need for the continuity correction.

In , the type I error without continuity correction is >0.025, while the type I error is <0.025, hence continuity correction is necessary.

3.2. Expanding Simon’s design

3.2.1. Motivation for expanding Simon’s design

Combining an SSR procedure with Simon’s design, as discussed in the previous section, will add flexibility to Simon’s design and increase the power when the true response rate is lower than the assumed one. However, simulations () show that simply adding the SSR procedure to $S (p_{0}, p_{1}, α, β)$ may still fall short of providing desired power when $p$ is close to $p_{low}$ . One of the reasons is that $n_{1}^{S i mon} (p_{0}, p_{1})$ may be too small and that the related $PET (p_{low})$ can be too large. To design a trial that can provide power of about $1 - β$ for all $p$ between $p_{low}$ and $p_{1}$ , changes must be made on Simon’s design. We seek to expand Simon’s design such that the expanded design is flexible and can satisfy Simon’s considerations for all $p$ between $p_{low}$ and $p_{1}$ . Let $E N^{Simon} (p_{0}, p_{low})$ , $E N^{Simon} (p_{0}, p_{1})$ , $n_{2}^{S i mon} (p_{0}, p_{1})$ and $n_{2}^{S i mon} (p_{0}, p_{low})$ be defined as in . Because both $p_{1}$ and $p_{low}$ are among possible response rates, the trial designers could choose either $S (p_{0}, p_{low}, α, β)$ or $S (p_{0}, p_{1}, α, β)$ , and both are reasonable designs. Hence $E N^{Simon} (p_{0}, p_{1})$ , $E N^{Simon} (p_{0}, p_{low})$ and $n_{2}^{S i mon} (p_{0}, p_{1})$ and $n_{2}^{S i mon} (p_{0}, p_{low})$ are all acceptable under Simon’s third consideration, and can be considered to be benchmarks, such that the $EN (p_{0})$ or $n_{2}$ for the new design is permissible if it is no larger than $E N^{Simon} (p_{0}, p_{low})$ or $n_{2}^{S i mon} (p_{0}, p_{low})$ , respectively. Hence, the expansions will be permissible if $EN (p_{0})$ or $n_{2}$ for the new designs are permissible. Trial designs with smaller $EN (p_{0})$ or $n_{2}$ (i.e., closer to $E N^{Simon} (p_{0}, p_{1})$ or $n_{2}^{S i mon} (p_{0}, p_{1})$ , respectively) are preferable. Our motivation is to identify designs with permissible $EN (p_{0})$ and $n_{2}$ , and that they are as close to $E N^{Simon} (p_{0}, p_{1})$ or $n_{2}^{S i mon :} (p_{0}, p_{1})$ as possible. Let $PET (p_{low})$ be the probability of early futility termination under $p = p_{low}$ for a design. The power of combining SSR to a design will not exceed $1 - PET (p)$ , no matter the method is the combination test/conditional error function-based SSR (e.g., Englert and Kieser (Citation2012); Englert and S (Citation2013)), or an ASD method (such as proposed in this article). In order to achieve a power of $1 - β$ when $p$ is close to $p_{low}$ , $PET (p_{low}) < β$ must be required for the design. For $S (p_{0}, p_{low}, α, β)$ , $PET (p_{low}) = B (r_{1}, p_{low}, n_{1}) < β$ is automatically satisfied. But for $S (p_{0}, p_{1}, α, β)$ , it is possible that $PET (p_{low}) = B (r_{1}, p_{low}, n_{1}^{S i mon} (p_{0}, p_{1})) > β$ . When this happens, simply applying SSR to $S (p_{0}, p_{1}, α, β)$ will not provide desired power when $p$ is close to $p_{low}$ . Applying SSR to $S (p_{0}, p_{low}, α, β)$ won’t be necessary since the design already has enough power for all response rate no less than $p_{low}$ . A suitable design would likely have a sample size between those for $S (p_{0}, p_{1}, α, β)$ and $S (p_{0}, p_{low}, α, β)$ . We propose several expansion options, which will be different compromises between $S (p_{0}, p_{1}, α, β)$ and $S (p_{0}, p_{low}, α, β)$ . Each will have a planned sample size smaller than $S (p_{0}, p_{low}, α, β)$ , but with SSR, can provide more power than that of $S (p_{0}, p_{1}, α, β)$ combined with SSR. Each of the expansions will satisfy Simon’s considerations, and that $PET (p_{low}) < β$ . The last condition means that the sample sizes will need to be larger than that for $S (p_{0}, p_{1}, α, β)$ . The expansions include a hybrid design, a midpoint design, and a three-stage design. However, before these expansions, we propose an average design, which is an augmentation of Simon’s optimal and minimax designs, but not an expansion. In Simon’s design, $r_{1}$ is a futility cutoff and will be denoted as $r_{1, f}$ . While $r_{2}$ is an efficacy cutoff and will be denoted as $r_{2, e}$ . In our expansions, we’ll denote futility cutoffs $r_{i}$ as $r_{i, f}$ , and efficacy cutoffs $r_{i}$ as $r_{i, e}$ .

Table 1. Notations.

Display Table

3.2.2. The average design – an augmentation

Jung et al. (Citation2004) noted that “the two Simon’s designs may result in highly divergent sample size requirements”. Indeed, for example, if $p_{0} = 0.2$ , $p_{1} = 0.33$ , $α = 0.025$ , $β = 0.1$ , then $n_{1}^{S i mon : m i nimax} (p_{0}, p_{1}) = 100$ , $n_{2}^{S i mon : m i nimax} (p_{0}, p_{1}) = 119$ for the minimax design and $n_{1}^{S i mon : o p t} (p_{0}, p_{1}) = 50$ , $n_{2}^{S i mon : o p t} (p_{0}, p_{1}) = 137$ for optimal design. Such divergence could present challenges for the trial designers, and also suggest that some other designs may also be reasonable choices. Jung et al. (Citation2004) proposed the admissible designs. We propose an additional design option, the average design, to balance the desire to minimize the $EN (p_{0})$ (the optimal design) and that to minimize the $n_{2}$ (the minimax design). The optimal and the minimax designs will be derived first. Let $n_{1, avg}$ , $r_{1, avg}$ be the smallest integers larger than the average of $n_{1}$ ’s and $r_{1}$ ’s from the optimal and minimax designs, respectively. Then search for $(n_{2}, r_{2})$ ’s that satisfy $f (r_{1, avg}, n_{1, avg}, r_{2}, n_{2}, p_{1}) \leq β$ , and $1 - f (r_{1, avg}, n_{1, avg}, r_{2}, n_{2}, p_{0}) \leq α$ . $(Δ n_{12}, r_{2})$ can be chosen such that $Δ n_{12}$ is the smallest integer satisfying the requirement on type I and type II errors. Design examples are given in . The $n_{2}$ and $EN (p_{0})$ for the average design are smaller than $n_{2}^{S i mon : o p t} (p_{0}, p_{1})$ , and $E N^{Simon : minimax} (p_{0})$ , respectively. Jung’s admissible designs aim to achieve the minimum of weighted average of $n_{2}$ (with weight $q$ ) and $EN (p_{0})$ (with weight $1 - q$ ), while the average design balances the desire for the smallest $n_{2}$ and the smallest $EN (p_{0})$ . Hence, the motivation of the average design is similar to Jung’s admissible design with $q = 0.5$ . However, they are obtained through different search processes, and thus are not expected to be exactly the same. The average design offers an additional option for trial designers when the minimax design and the optimal design are very different. Three admissible designs from Jung et al. (Citation2004, ) are presented in for comparison with the average design.

Table 2. Comparing the average design and Jung’s admissible design.

Display Table

In , in the case of $(p_{0}, p_{1}, α, β) = (0.1, 0.3, 0.05, 0.15)$ , the average design is different form Jung’s admissible design. In the case of $(p_{0}, p_{1}, α, β) = (0.05, 0.25, 0.05, 0.1)$ , there are two Jung’s admissible designs, and one of them is the same as the average design.

3.2.3. The modified Simon’s design

Because of the necessity of achieving $PET (p_{low}) < β$ , it is natural to think about modifying Simon’s design, by adding this constraint to Simon’s considerations in the search of the quadruplet $r_{1}, n_{1}, r_{2}, n_{2}$ for the two-stage design, such that the following will be satisfied: i) the type I error satisfies $P = 1 - f (r_{1}, n_{1}, r, n_{2}, p_{0}) < α$ . ii) The power under $p = p_{1}$ satisfies $1 - f (r_{1}, n_{1}, r, n_{2}, p_{1}) \geq 1 -$ $β$ . iii) $PET (p_{low}) = B (r_{1}, p_{low}, n_{1}) < β$ . Similar to Simon’s design, among all the quadruplets that satisfy these constraints, the one with the smallest $n_{2}$ will be selected as the minimax design, and the one with the minimum $EN (p_{0})$ will be the optimal design. However, our investigation shows that the requirement of $PET (p_{low}) = B (r_{1}, p_{low}, n_{1}) < β$ frequently led to the reduction of $r_{1}$ , and the difference between the modified design and the original Simon’s design is often minor. Consequently, the modified design combined with SSR may not meaningfully improve the power of the original Simon’s design $S (p_{0}, p_{1}, α, β)$ combined with SSR if the true response rate is close to $p_{low}$ . Hence, the modified design will not be further discussed.

3.2.4. The expansions

Our investigation further shows that besides the requirement that $PET (p_{low}) < β$ , $n_{1}$ (at which the SSR will be performed) can have significant impact on the power of a design combined with SSR. In order for a two-stage design plus SSR to have substantial power increase over Simon’s original design $S (p_{0}, p_{1}, α, β)$ plus SSR, $n_{1}$ will need to be larger than $n_{1}^{S i mon} (p_{0}, p_{1})$ . We explore several possibilities for $n_{1}$ , each will satisfy $n_{1}^{S i mon} (p_{0}, p_{1}) < n_{1} \leq n_{1}^{S i mon} (p_{0}, p_{low})$ .

3.2.4.1. Two-stage designs

The hybrid design

The design $S (p_{0}, p_{low}, α, β)$ automatically satisfies the condition $PET (p_{low}) = B (r_{1}^{S i mon} (p_{0}, p_{low}), p_{low}, n_{1}^{S i mon} (p_{0}, p_{low})) < β$ . Hence, $(r_{1}^{S i mon} (p_{0}, p_{low}), n_{1}^{S i mon} (p_{0}, p_{low}))$ is an obvious candidate for $(r_{1}, n_{1})$ in a two-stage design expansion. The next step is to search for $n_{2}$ and $r_{2}$ to satisfy the type I error and power constraints under the alternative hypothesis of $p > p_{0}$ . This design aims to address both of the possibilities of $p = p_{1}$ and $p = p_{low}$ , and it uses parameters from $S (p_{0}, p_{low}, α, β)$ (i.e., $(r_{1}, n_{1}) = (r_{1}^{S i mon} (p_{0}, p_{low}), n_{1}^{S i mon} (p_{0}, p_{low}))$ ). Therefore, it is named the hybrid design. With this design, $(n_{1}^{h y brid}, r_{1, f}^{h y brid}) = (n_{1}^{S i mon} (p_{0}, p_{low}), r_{1}^{S i mon} (p_{0}, p_{low}))$ . It has the same $PET (p_{0})$ and $PET (p_{low})$ as with $S (p_{0}, p_{low}, α, β)$ . $n_{2} = n_{2}^{h y brid}$ will be smaller than $n_{2}^{S i mon} (p_{0}, p_{low})$ (so $n_{2}$ is permissible and $Δ n_{12} < n_{2}^{S i mon} (p_{0}, p_{low}) - n_{1}^{S i mon} (p_{0}, p_{low})$ ). Hence, $EN (p_{0}) = E N^{hybrid} (p_{0}) = n_{1} + (1 - PET (p_{0})) Δ n_{12}$ $< E N^{Simon} (p_{0}, p_{low})$ , and $EN (p_{0})$ will be permissible as well. The process of identifying $n_{1}$ , $r_{1}$ , $n_{2}$ and $r_{2}$ is detailed in the online supplemental material. Let $n_{α, β, p_{1}} \approx {(\frac{z_{α} + z_{β}}{p_{1} - p_{0}})}^{2} p_{1} (1 - p_{1})$ . With this sample size and response rate of $p_{1}$ , a fixed sample size trial will have about $1 - β$ power and type I error $α$ . The difference between $p_{low}$ and $p_{1}$ could be large enough such that $n_{1} = n_{1}^{h y brid} \geq n_{α, β, p_{1}}$ . For example, suppose that $p_{0} = 0.2$ , $p_{low} = 0.33$ , and $p_{1} = 0.4$ . Then $n_{1}^{S i mon : m i nimax} (p_{low})$ from $S (p_{0}, p_{low}, α, β)$ will be 100, which is greater than $n_{α, β, p_{1}} = 64$ . In such a situation, it is reasonable to have an additional hypothesis test at $n_{1}$ , and such that the null hypothesis will be rejected if the number of responses exceeds some efficacy boundary $r_{1, e}$ at $n_{1}$ . Details are provided in the online supplemental material. There will be many candidates for $n_{2}$ , and it is natural to select the smallest $n_{2}$ . However, as discussed, it is possible that $n_{1} = n_{1}^{h y brid} \geq n_{α, β, p_{1}}$ , and selecting the smallest $n_{2}$ could result in selecting $Δ n_{12} = 1$ in such a scenario. A small $Δ n_{12}$ means that the final analysis will happen soon after the stage 1 interim analysis, which may not be practical. Therefore, the design includes an option such that the user can set a minimum value for $Δ n_{12}$ , which will take the enrollment into consideration. The selection of $n_{2}$ can be optimized by evaluating the operating characteristics of hybrid +SSR with simulations in DACT.

The midpoint design

Since $E N^{hybrid} (p_{0}) < E N^{Simon} (p_{0}, p_{low})$ , it will be permissible. However, because of the choices for $n_{1}^{h y brid}$ and $r_{1, f}^{h y brid}$ , $E N^{hybrid} (p_{0})$ could be very close to $E N^{Simon} (p_{0}, p_{low})$ and much larger than $E N^{Simon} (p_{0}, p_{1})$ . It could be desirable to reduce $E N^{hybrid} (p_{0})$ . This can be improved with a mid-point design. This is also a two-stage design. In this design, $S (p_{0}, p_{1}, α, β)$ and $S (p_{0}, p_{low}, α, β)$ will be derived first, then $n_{1} = n_{1}^{m i dpoint} =$ $ρ \times n_{1}^{S i mon} (p_{low}) + (1 - ρ) \times n_{1}^{S i mon} (p_{1})$ , where $0 < ρ < 1$ is determined by the user and can be optimized with simulations (provided in the DACT software) Then, $r_{1}$ , $n_{2}$ , $r_{2}$ will be searched. The details are provided in the online supplemental material. Similar to the hybrid design, the midpoint design may include an efficacy threshold $r_{1, e}$ , and a minimum for $Δ n_{12}$ .

3.2.4.2. Three stage design

A three-stage design, which includes two interim futility analyses, is more effective at reducing $E N^{hybrid} (p_{0})$ than the mid-point design (see and for simulation results).

No interim hypothesis testing

In this design, $n_{1}$ subjects will be enrolled in stage 1, if no more than $r_{1} = r_{1, f}$ responses are observed, then the trial would be stopped for futility. Otherwise, the trial proceeds to the second stage and enroll to a total of $n_{2}$ subjects. If no more than $r_{2} = r_{2, f}$ responses are observed, then the trial would be stopped for futility. Otherwise, the trial proceeds to the third stage and enroll to a total of $n_{3}$ subjects. The null hypothesis of $p \leq p_{0}$ will be rejected if more than $r_{3} = r_{3, e}$ responses are observed at the end of stage 3. The details for searching $r_{1, f}, n_{1}, r_{2, f}, n_{2}, r_{3}, n_{3}$ are provided in the online supplemental material.

With interim hypothesis testing

Let $n_{1} = n_{1}^{3 s tage}$ , $n_{2} = n_{2}^{3 s tage}$ be chosen as above. Similar to the situation in the hybrid design, it is possible that $n_{2} = n_{2}^{3 s tage} \geq n_{α, β, p_{1}}$ . A hypothesis testing can be added at $n_{2}$ , such that the null hypothesis is rejected, and the trial can be terminated if the number of responses exceeds some threshold $r_{2, e}$ . The details for searching $r_{1, f}, n_{1}, r_{2, f}, r_{2, e}, n_{2}, r_{3}, n_{3}$ are provided in the online supplemental material.

3.3. Inference

After the trial, with or without an SSR, the median unbiased point estimate and two-sided exact confidence interval for $θ = Δp = p - p_{0}$ can be estimated per Gao, Liu, Mehta (Gao et al. Citation2013). Then, the estimates for $p$ can be obtained as that for $Δp + p_{0}$ . The inference can be conducted using DACT. is an example of final analysis for an adaptive two-stage design in which the sample size was changed after the interim analysis.

Figure 4. Conducting final analysis using DACT.

In , $p_{0} = 0.2$ , $n_{1} = 100$ , $r_{1, e} = 31$ , $n_{2} = 105$ , $r_{2} = 29$ . The number of observed responses at the interim analysis (at $n_{1}$ ) is $x_{1} = 28$ . The value of $r_{1}$ is not needed for the final analysis, because the trial continued after the interim analysis, and thus the final analysis does not need the value of $r_{1}$ (only $r_{1, e}$ is needed). After the interim analysis, the sample size was changed to $n_{new} = 115$ . The final observed number of responses is $x_{new} = 34$ . We note that the final analysis only involves $x_{new}$ , but not the adjusted $r_{new}$ or $r_{n e w}^{c o r}$ . Assuming that simulations had been conducted at the trial design stage (e.g. and ) to determine whether continuity correction is necessary. If continuity correction was not necessary, then the output without continuity will be used as the final inference, with the estimated response rate of $\hat{p} = 0.29152$ , the 95% confidence interval of (0.206993, 0.375902), and a p-value of 0.0169. Otherwise, using inference with continuity correction, the estimated response rate of $\hat{p} = 0.291644$ , the 95% confidence interval of (0.206728, 0.375954), and a p-value of 0.0166.

4. Design examples and simulations

4.1. Design examples

Suppose that a single-arm phase II trial is being designed. Suppose that $p_{0} = 0.2$ , $p_{low} = 0.33$ , $p_{1} = 0.4$ . There are several possible designs: i) $S (p_{0}, p_{low}, α, β)$ : Simon’s design with $p_{1} = p_{low} = 0.33$ ; ii) $S (p_{0}, p_{1}, α, β)$ : Simon’s design with $p_{1} = 0.4$ ; iii) The hybrid design and the midpoint design ( $ρ$ = 0.5) with $p_{low} = 0.33$ , $p_{1} = 0.4$ . Minimum value for $Δ n_{12}$ is set to be 5 for purpose of discussion. iv) the three-stage design. $(α_{1}, α_{2}) = (0.001, 0.024)$ is used for discussion. Investigators can use any $(α_{1}, α_{2})$ , as long as $α_{1} + α_{2} \leq α$ . Note that the search for $r_{1, e}$ and $r_{2, e}$ is such that $α_{1}, α_{2}$ are associated with $r_{1, e}$ and $r_{2, e}$ as actual probabilities of $α$ -spending, not nominal $α$ -spending (the sum of nominal $α$ -spending will be greater than $α$ ). So the condition of $α_{1} + α_{2} \leq α$ is not a conservative requirement. In practice, larger $α_{1}$ may be selected if the investigators believe (e.g., based on literature and/or clinical evidence) that the response rate is more likely to be closer to $p_{1}$ , smaller $α_{1}$ may be selected otherwise. The impact of the $α$ -spending can be investigated with simulations (with SSR) to optimize power for the interested range of the response rate, $(p_{low}, p_{1})$ . and present design examples. $err$ indicates type I error, and PW denotes power.

Table 3. Simon’s two-stage design with augmentation (fixed sample).

Display Table

Table 4. The expanded designs (fixed sample).

Display Table

4.2. Simulations

If the true response rate is $p_{low} = 0.33$ , then both the Simon’s design $S (p_{0}, p_{1}, α, β)$ , with $p_{1} = 0.4$ , and all of the expanded designs with $p_{1} = 0.4$ , and $p_{low} = 0.33$ , will have power less than $1 - β$ . We conducted the simulations to check if combining with SSR can improve the power to be close to $1 - β$ for each of the designs. We use a hypothetical maximum sample size of 140. This is close to $n_{2}^{S i mon : o p t} (p_{0}, p_{low}) = 137$ . The critical boundaries without continuity correction are used when the type I error is adequately controlled without the correction. Otherwise, the critical boundaries with continuity correction are used.

Simulations were conducted to investigate the operating characteristics of each design with SSR and the results are presented in . The algorithm for the SSR is described in section 3.1.1. The results include: type I error ( $p_{0} = 0.2$ ) with and without continuity correction ( $er r_{CC}$ and $err$ ), power under $p_{low} = 0.33$ and $p_{1} = 0.4$ , with or without continuity correction ( $P W_{CC}$ and $PW$ ), expected sample size under both $p_{0}$ and $p_{low}$ ( $EN (p_{0})$ and $EN (p_{low})$ , as well as the probability of early termination under $p_{low}$ ( $PET (p_{low})$ . All the simulations are supported by the DACT software. DACT provides all the outputs summarized in .

Table 5. Simulation on SSR with Simon’s two-stage design.

Display Table

Table 6. Simulation SSR with the expanded two-stage designs.

Display Table

All type I errors without continuity correction ( $err$ ) were less than 0.025. Hence, continuity correction was not necessary for the scenarios in .

All type I error without continuity correction ( $err$ ) for the hybrid design exceeded 0.025, while the continuity corrected type I error ( $er r_{CC})$ were less than 0.025. Hence, continuity correction was necessary for hybrid design the scenarios in .

All type I error without continuity ( $err$ ) for the midpoint design were less than 0.025. Hence, continuity correction was not necessary for the hybrid design in the scenarios in .

The type I error without continuity correction ( $err$ ) for the optimal design exceeded 0.025, while the continuity corrected type I error ( $er r_{CC})$ for the optimal were less than 0.025. Hence, continuity correction was necessary for optimal design in the scenarios in .

The type I error without continuity correction ( $err$ ) for the minimax and average designs were less than 0.025. Hence, continuity correction was not necessary for the minimax and average designs in the scenarios in . , and show that the three-stage design has the smallest $EN (p_{0})$ .

Table 7. Simulation SSR with the expanded three-stage designs.

Display Table

If the true response rate is $p = p_{low} = 0.33$ ,

The power of $S (p_{0}, p_{1}, α, β)$ +SSR () under $p = p_{low}$ were only 0.727 for the optimal design, and even smaller for the minimax design + SSR at 0.668. Confirming that simply adding SSR to $S (p_{0}, p_{1}, α, β)$ could be insufficient to provide desirable power for all $p$ between $p_{low}$ and $p_{1}$ . Larger sample sizes $n_{1}$ and $n_{2}$ would be necessary to achieve larger power.
The purpose for introducing the expansions was that the expansions in combination with SSR will have more power than $S (p_{0}, p_{1}, α, β)$ combined with SSR. It is noted that: i) The sample sizes ( $n_{1}$ and $n_{2}$ ) for all the expanded designs were larger than that for $S (p_{0}, p_{1}, α, β)$ and were permissible. Interim analysis with larger sample sizes (at $n_{1})$ have more information time and smaller variation, and the interim analysis results are more accurate. Hence, the SSR with the expanded designs were more reliable than the SSR with $S (p_{0}, p_{1}, α, β)$ .
Although the sample size of the three-stage designs was larger than that of $S (p_{0}, p_{1}, α, β)$ , the simulated $EN (p_{0})$ for the three-stage design combined with SSR () were very close to that of $S (p_{0}, p_{1}, α, β)$ combined with SSR (). i.e., the three-stage design combined with SSR performs well with Simon’s considerations.
$PET (p_{low}) > 0.17 > β$ for Simon’s design $S (p_{0}, p_{1}, α, β)$ , while $PET (p_{low}) < β$ for all the expanded designs.
The simulated power does not exceed $1 - PET (p_{low})$ for all designs.

5. Discussions

Lin and Shih (Citation2004) facilitate hypothesis tests for two possible response rates, an optimistic and a skeptical target rate. From this perspective, Lin and Shih (Citation2004) is more flexible than either Simon’s design Simon (Citation1989) or that of Jung et al. (Citation2004), since both of them conducts hypothesis testing on only one target response rate. However, in general, SSR (either conditional error function based, such as that by Englert and Kieser (Citation2012); Englert and S (Citation2013); or adaptive sequential methods in our proposal) allows for a range of possible response rates and is thus more flexible than Lin and Shih (Citation2004).

Simon’s designs only use a binomial endpoint, which may be considered to be a limitation. However, a response rate (such as ORR) may be more intuitive than other endpoints. Such intuitiveness may be the reason for the popularity of Simon’s design. On the other hand, expanding Simon’s design using other endpoints (e.g., progression-free survival) could offer more options for trial investigators conducting phase II oncology trials. However, that is beyond the scope of this article.

We propose several expansions for Simon’s design. They are intended to be used together with SSR, such that the planned sample size is sufficient to provide power for when the response rate is $p_{1}$ , and with the flexibility to increase sample size if the observed response rate is closer to $p_{low}$ . The simulations show that all of expanded design combined with SSR have more desirable power than Simon’s design combined with SSR, as intended. The three-stage design is statistically more efficient than the two-stage expansions because it has similar simulated power but a smaller $EN (p_{0})$ than the other expansions. Three-stage design has been previously proposed (Chen Citation1997; Ensign et al. Citation1994). It is recognized that a three-stage design is operationally more complicated than a two-stage design. We note that most clinical trials involve at most one interim analysis. However, when feasible, statistical optimality could be preferable despite the added operational complexity. For example, some single-arm phase II oncology trials do conduct multiple interim analysis (e.g., the Zuma-1 trial protocol (Citation2015a); Zuma-2 trial protocol Citation(2015b)) . If a three-stage design is not feasible, either the hybrid or the midpoint design may be used.

In practice, the operating characteristics of any of the proposed designs can be influenced by several factors, such as $p_{0}, p_{1}$ , $p_{low}$ $p_{high}$ , $N_{\max}$ ., $\min (Δ n_{12})$ and $(α_{1}, α_{2})$ (for two-stage expanded designs), $\min (Δ n_{23})$ and $(α_{1}, α_{2}, α_{3})$ (for three-stage expanded designs). For the midpoint design, the OCs are also affected by the choices of $ρ$ . The OCs of each design under consideration should be thoroughly investigated with simulations. Designs that optimize the considerations of budget, patient enrollment, and feasibility can then be selected. All these simulations can be conducted with the DACT software. In general, we recommend the three-stage design because of its statistical efficiency.

Supplemental material

Supplemental Material

Download MS Word (55.5 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplemental data

Supplemental data for this article can be accessed online at https://doi.org/10.1080/10543406.2024.2341673.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

Bauer, P., and K. Kohne. 1994. Evaluation of experiments with adaptive interim analyses. Biometrics Bulletin 50 (4):1029–1041. doi:10.2307/2533441.
Google Scholar
Chang, M. N., T. M. Therneau, H. S. Wieand, and S. S. Cha. 1987, Dec. Designs for group sequential phase II clinical trials. Biometrics Bulletin 43(4):865–874. doi:10.2307/2531540.
Google Scholar
Chen, T. T. 1997, Dec 15. Optimal three-stage designs for phase II cancer clinical trials. Statistics in Medicine 16 (23):2701–2711. doi:10.1002/(sici)1097-0258(19971215)16:23<2701:aid-sim704>3.0.co;2-1. PMID: 9421870.
PubMed Web of Science ®Google Scholar
Chen, Y. H. J., D. L. DeMets, and K. K. G. Lan. 2004 Apr 15. Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine 23(7):1023–1038. doi:10.1002/sim.1688.
PubMed Web of Science ®Google Scholar
Cui, L., H. M. Hung, and S. J. Wang. 1999. Modification of sample size in group sequential clinical trials. Biometrics Bulletin 55 (3):853–857. doi:10.1111/j.0006-341X.1999.00853.x.
Google Scholar
Devore, J. L. 1995. Probability and statistics for engineering and the sciences. 4th ed. USA: Duxbury Press.
Google Scholar
Englert, S., and M. Kieser. 2012, September. Improving the flexibility and efficiency of phase II designs for oncology trials. Biometrics Bulletin 68(3):886–892. doi:10.1111/j.1541-0420.2011.01720.x.
Google Scholar
Englert, S., and M. K. S. 2013, Nov. Optimal adaptive two-stage designs for phase II cancer clinical trials. Biometrical Journal 55(6):955–968. doi:10.1002/bimj.201200220.
PubMed Web of Science ®Google Scholar
Ensign, L. G., E. A. Gehan, D. S. Kamen, and P. F. Thall. 1994 Sep 15. An optimal three-stage design for phase II clinical trials. Statistics in Medicine 13(17):1727–1736. doi:10.1002/sim.4780131704.
PubMed Web of Science ®Google Scholar
Feller, W. 1945. On the normal approximation to the binomial distribution. The Annals of Mathematical Statistics 16 (4):319–329. doi:10.1214/aoms/1177731058.
Google Scholar
Gao, P., L. Liu, and C. Mehta. 2013 Oct 15. Exact inference for adaptive group sequential designs. Statistics in Medicine 32(23):3991–4005. doi:10.1002/sim.5847.
PubMed Web of Science ®Google Scholar
Gao, P., J. H. Ware, and C. Mehta. 2008. Sample size re-estimation for adaptive sequential designs. Journal of Biopharmaceutical Statistics 18 (6):1184–1196. doi:10.1080/10543400802369053.
PubMed Web of Science ®Google Scholar
Jennison, C., and B. Turnbull. 2003. Mid-course sample size modification in clinical trials based on the observed treatment effect. Statistics in Medicine 22 (6):971–993. doi:10.1002/sim.1457.
PubMed Web of Science ®Google Scholar
Jennison, C., and B. Turnbull. 2006, Mar. Adaptive and non-adaptive group sequential tests. Biometrika 93(1):1–21. doi:10.1093/biomet/93.1.1.
Web of Science ®Google Scholar
Jung, S.-H., T. Lee, K. M. Kim, and S. L. George. 2004. Admissible two-stage designs for phase II cancer clinical trials. Statistics in Medicine 23 (4):561–569. doi:10.1002/sim.1600.
PubMed Web of Science ®Google Scholar
Kunzmann, K., and M. Kieser. 2016. Optimal adaptive two-stage designs for single-arm trial with binary endpoint. arXiv:1605.00249 [stat.AP].
Google Scholar
Lan, K. K. G., and D. L. DeMets. 1983. Discrete sequential boundaries for clinical trials. Biometrika 70 (3):659–663. doi:10.2307/2336502.
Web of Science ®Google Scholar
Lin, Y., and W. J. Shih. 2004, Jun. Adaptive two-stage designs for arm phase Ha cancer clinical trials. Biometrics Bulletin 60(2):482–490. doi:10.1111/j.0006-341X.2004.00193.x.
Google Scholar
Müller, H. H., and H. Schäfer. 2001. Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches. Biometrics. 57 (3):886–91.
PubMed Web of Science ®Google Scholar
O’Brien, P. C., and T. R. Fleming. 1979. A multiple testing procedure for clinical trials. Biometrics Bulletin 35 (3):549–556. doi:10.2307/2530245.
Google Scholar
Pocock, S. J. 1977. Group sequential methods in the design and analysis of clinical trials. Biometrika 64 (2):191–199. doi:10.1093/biomet/64.2.191.
Web of Science ®Google Scholar
Proschan, M. A., and S. A. Hunsberger. 1995, Dec. Designed extension of studies based on conditional power. Biometrics Bulletin 51(4):1315–1324. doi:10.2307/2533262.
Google Scholar
Simon, R. 1989. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 10 (1):1–10. doi:10.1016/0197-2456(89)90015-9.
PubMedGoogle Scholar
U.S. Food and Drug Administration. 2018, Dec. Oncology center of excellence/center for drug evaluation and research (CDER)/center for biologics evaluation and research (CBER). Clinical trial endpoints for the approval of cancer drugs and biologics. guidance for industry.
Google Scholar
U.S. Food and Drug Administration (CDER and CBER). 2019. Guidance for industry adaptive design clinical trials for drugs and biologics.
Google Scholar
Zuma-1 trial protocol, 2015. https://www.nejm.org/doi/suppl/10.1056/NEJMoa1707447/suppl_file/nejmoa1707447_protocol.pdf.
Google Scholar
Zuma-2 trial protocol, 2015. https://www.nejm.org/doi/suppl/10.1056/NEJMoa1914347/suppl_file/nejmoa1914347_protocol.pdf.
Google Scholar

Adaptive sequential design for phase II single-arm oncology trials: an expansion of Simon’s design

ABSTRACT

1. Introduction