1,372
Views
0
CrossRef citations to date
0
Altmetric
Articles

Log-rank and stratified log-rank tests

, &
Pages 309-317 | Received 03 Feb 2023, Accepted 20 Sep 2023, Published online: 05 Oct 2023

Abstract

In randomized clinical trials with right-censored time-to-event outcomes, the popular log-rank test without adjusting for baseline covariates is asymptotically valid for treatment effect under simple randomization of treatments but is too conservative under covariate-adaptive randomization. The stratified log-rank test, which adjusts baseline covariates in the test procedure by stratification, is asymptotically valid regardless of what treatment randomization is applied. In the literature, however, under simple randomization there is no affirmative conclusion about whether the stratified log-rank test is asymptotically more powerful than the unstratified log-rank test. In this article we show when the stratified and unstratified log-rank tests aim for the same null hypothesis and that, under simple randomization, the stratified log-rank test is asymptotically more powerful than the unstratified log-rank test in the region of alternative hypothesis that is specified by a Cox proportional hazards model. We also provide some discussion about why we do not have an affirmative conclusion in general.

1. Introduction

The log-rank test (Mantel, Citation1966) and stratified log-rank test (Peto et al., Citation1976) are the two longstanding and most popular nonparametric tests for treatment effect in randomized clinical trials with two treatment arms and right-censored time-to-event outcomes. What motivates the stratified version of log-rank test is that baseline prognostic factors (covariates), measured prior to treatment assignments and thus not affected by treatments, are adjusted through stratification for efficiency gain.

Adjusting baseline covariates has been widely advocated to improve efficiency for tests and other analyzes, in the following two aspects. (i) In the design stage, covariate-adaptive randomization can be used to enforce the balance of treatment assignments across baseline prognostic factors, which results in more efficient tests (EMA, Citation2015). More details about covariate-adaptive randomization are given in Section 2. (ii) In the analysis stage, ‘incorporating prognostic baseline factors in the primary statistical analysis of clinical trial data can result in a more efficient use of data to demonstrate and quantify the effects of treatment’ (FDA, Citation2021), ‘under approximately the same minimal statistical assumptions that would be needed for unadjusted’ (EMA, Citation2015; FDA, Citation2021; ICH E9, Citation1998).

If the log-rank test is considered as ‘unadjusted test’, then the stratified log-rank test qualifies as an adjusted test under the same minimal assumption because it is still a nonparametric test without using any model. Tests using the Cox proportional hazards model as a working model are also qualified (DiRienzo & Lagakos, Citation2002; Kong & Slud, Citation1997; Lin & Wei, Citation1989), but the resulting tests can be less efficient than the unadjusted log-rank test when the working model is wrong (Kong & Slud, Citation1997). In this paper we focus on the stratified and unstratified log-rank tests.

Although stratified log-rank test uses information from baseline prognostic factors and thus is expected to be more efficient, an affirmative conclusion about whether it is asymptotically more efficient than the unstratified log-rank test is not available, under simple randomization in which patients are assigned to treatments completely at random. Another issue is that the stratified log-rank actually tests a null hypothesis stronger than that of the log-rank test and, hence, a prerequisite in their comparison is to investigate when the two null hypotheses are the same.

The purpose of this paper is to establish some affirmative conclusions about the stratified and unstratified log-rank tests, in terms of null hypothesis, asymptotic validity of tests and Pitman's asymptotic relative efficiency. The research is important as these two longstanding tests are used a lot in applications without a guidance on which one should be used.

Section 2 describes data, design, and log-rank test statistics. Section 3 introduces hypotheses, assumptions and the concept of validity for log-rank tests. Some theoretical results for stratified and unstratified log-rank tests are given in Section 4, where we show that, under simple randomization, the stratified log-rank test is asymptotically more powerful in the region of alternative hypothesis that is specified by a Cox proportional hazards model. Section 5 contains conclusions and Appendix provides technical proofs.

2. Data, design and test statistics

For a patient from the population under investigation, let Tj and Cj be the potential life time and right-censoring time, respectively, under treatment j=0 or 1, and W be the vector of all baseline covariates and other time-varying covariates, observed or unobserved. Suppose that a random sample of n patients is obtained from the population with independent (Ti0,Ci0,Ti1,Ci1,Wi), i=1,,n, identically distributed as (T0,C0,T1,C1,W). For each patient, only one of the two treatments is assigned and received.

Let Ii be a binary treatment indicator for patient i and 0<π<1 be the pre-specified treatment assignment proportion for treatment 1. Consider the design, i.e. the generation of Ii's for n sequentially arrived patients. Simple randomization assigns patients to treatments completely at random with P(Ii=1)=π for all i, which may yield treatment proportions that substantially deviate from the target π across levels of some baseline prognostic factors. Because of this, covariate-adaptive randomization using Z, a sub-vector of W containing observed baseline prognostic factors with finitely many joint levels, is widely applied. When patient i with baseline Zi=z is arrived, a treatment is assigned using a mechanism dependent on all previously assigned treatments for patients with Zi=z. For example, the most popular covariate-adaptive randomization scheme, the stratified permuted block design (Zelen, Citation1974), randomly assigns sequentially arrived patients with Zi=z in blocks of size B, each having Bπ patients in treatment 1, where B is appropriately chosen so that Bπ is an integer and the last block is allowed to be incomplete. Another popular covariate-adaptive randomization is Pocock-Simon's minimization (Pocock & Simon, Citation1975; Taves, Citation1974). Other schemes can be found in two reviews, Schulz and Grimes (Citation2002) and Shao (Citation2021). To see how popular covariate-adaptive randomization is, it was used in more than 500 clinical trials between 1989 and 2008 (Taves, Citation2010) and 237 trials among nearly 300 trials published in two years, 2009 and 2014 (Ciolino et al., Citation2019). All commonly used covariate-adaptive randomization schemes satisfy the following mild condition (Antognini & Zagoraiou, Citation2015).

  1. Given {Z1,,Zn}, {I1,,In} and {T11,C11,T10,C10,W1,,Tn1,Cn1,Tn0,Cn0,Wn} are conditionally independent; E(IiZ1,,Zn)=π for all i; and for every level z of Z, nz1/nzπ in probability as n, where nz is the number of patients with Zi=z and nz1 is the number of patients with Zi=z and Ii=1.

Most commonly used covariate-adaptive randomization schemes except Pocock-Simon's minimization also satisfy the next condition.

  1. Conditional on Z1,,Zn, the vector whose zth component is n(nz1/nπ) with z ranging over all levels of Z converges in distribution to N(0,Ω), where Ω is the diagonal matrix whose zth diagonal entry is ν/P(Z=z) and νπ(1π) is a known constant depending on the randomization scheme.

Although simple randomization is not counted as covariate-adaptive randomization, it satisfies (D1) and (D2) with ν=π(1π).

After Ii is assigned, the observed outcome from patient is min(Ti,Ci) with Ti=IiTi1+(1Ii)Ti0 and Ci=IiCi1+(1Ii)Ci0, together with an indicator of TiCi.

The log-rank test statistic is (1) L=nUˆL/σˆL,UˆL=1ni=1n0τ{IiY¯1(t)Y¯(t)}dNi(t),σˆL2=1ni=1n0τY¯1(t)Y¯0(t)Y¯(t)2dNi(t),(1) where Y¯(t)=i=1nYi(t)/n, Yi(t)=IiYi1(t)+(1Ii)Yi0(t), Yij(t)= the indicator of the event min(Tij,Cij)t, Y¯1(t)=i=1nIiYi(t)/n, Y¯0(t)=i=1n(1Ii)Yi(t)/n, Ni(t)=IiNi1(t)+(1Ii)Ni0(t), Nij(t) is the indicator of the event Tijmin(t,Cij), and the upper limit τ in the integral is a point satisfying P(min(Tij,Cij)τ)>0 for j = 0, 1.

The stratified log-rank test statistic is a weighted average of the stratum-specific log-rank test statistics with strata constructed using Z, (2) SL=nUˆSL/σˆSL,UˆSL=1nzi:Zi=z0τ{IiY¯z1(t)Y¯z(t)}dNi(t),σˆSL2=1nzi:Zi=z0τY¯z1(t)Y¯z0(t)Y¯z(t)2dNi(t),(2) where Y¯z1(t)=i:Zi=zIiYi(t)/n, Y¯z0(t)=i:Zi=z(1Ii)Yi(t)/n, and Y¯z(t)=Y¯z1(t)+Y¯z0(t).

It is clear that in terms of test statistics, the stratified SL in (Equation2) utilizes Z values whereas the unstratified L in (Equation1) is unadjusted. Under covariate-adaptive randomization, L is not completely unadjusted since it uses Z-information through assignments Ii's, although it does not adjust for covariate-adaptive randomization in a correct way. On the other hand, the stratified SL uses Z-information in both design and analysis stages.

We consider stratification with all levels of Z. In applications, it is allowed to use more covariates to form strata. The conclusions in what follows remain the same. However, it is not a good idea to use fewer levels of Z for stratification, because it may result in a test that is not asymptotically valid.

3. Null hypothesis, assumption and validity

Throughout, α(0,1) denotes a given significance level and zα/2 is the (1α/2)th quantile of the standard normal distribution. When |L|>zα/2, the log-rank test rejects the following null hypothesis H0 of no treatment effect, (3) H0:λ1(t)=λ0(t)for all t,(3) where λj(t) is the unconditional hazard function of Tj, j = 0, 1. H0 in (Equation3) is a commonly adopted null hypothesis of no treatment effect unconditional on covariates.

The log-rank test is nonparametric. Its validity requires non-informative censoring (DiRienzo & Lagakos, Citation2002; Kong & Slud, Citation1997), i.e.,

(C) Cj is independent of Tj given j.

Under simple randomization, it is well-known (Kalbfleisch & Prentice, Citation2011) that the log-rank test is asymptotically valid in the sense that (4) limnP(|L|>zα/2)α(4) with equality holding for at least one population P under H0.

Unlike simple randomization, covariate-adaptive randomization generates a dependent sequence of treatment assignments, which may render conventional methods developed under simple randomization, such as the log-rank test, not valid under covariate-adaptive randomization (EMA, Citation2015; FDA, Citation2021). It is shown in Ye and Shao (Citation2020) that, under covariate-adaptive randomization with ν in (D2) strictly smaller than π(1π), the log-rank test is asymptotically conservative in the sense that, (5) limnP(|L|>zα/2)α0<α(5) for all P under H0.

The stratified log-rank SL in (Equation2) actually tests the null hypothesis (6) H~0:λ1(tz)=λ0(tz)for all t and z,(6) where λj(tz) is the hazard function of Tj conditional on Z = z, j = 0, 1. Note that H~0 in (Equation6) holds if and only if the hazard functions are the same in every stratum z and, thus, is stronger than H0 in (Equation3).

The validity of stratified log-rank test requires the following assumption on censoring:

(CZ)  Cj is independent of Tj given j and Z.

Conditions (C) and (CZ) are not comparable, although both are implied by that Cj is independent of (Tj,Z) given j, a reasonable condition for non-informative censoring.

Under simple randomization and covariate-adaptive randomization satisfying (D1) in Section 2, (Equation4) holds with L replaced by SL and H0 replaced by H~0 (Ye & Shao, Citation2020), provided that all levels of Z are used in stratification.

Since H~0 is stronger than H0, the stratified and unstratified log-rank tests are not comparable. Thus, a prerequisite for the comparison of efficiency of two log-rank tests is H~0=H0. Is there a scenario under which H~0=H0? Consider the following transformation model assumption.

  1. There is an increasing function h such that h(P(T0tV))=θ+h(P(T1tV)) for all (t,V) and a constant θ, where V is a vector of covariates, ZVW, and both h and θ can be unknown.

Assumption (TR) is discussed in Cheng et al. (Citation1995), which includes many commonly used semiparametric models as special cases, for example, the Cox proportional hazards model (see formula (Equation7) in Section 4). It is a mild assumption since h is unknown and we only need to know it exists.

The proof of following result is in the Appendix.

Theorem 3.1

Under (TR), H~0 in (Equation6) is the same as H0 in (Equation3).

4. Comparison of two log-rank tests

When H~0=H0, is the stratified log-rank test SL more efficient than the unstratified log-rank test L under simple randomization when both tests are asymptotic valid? Intuitively this sounds correct since L does not adjust for covariates.

Unfortunately, there is no result on this in the literature. In this section we try to fill this gap to some extent and explain why the two log-rank tests are not comparable in terms of efficiency. This is important because both stratified and unstratified log-rank tests are used a lot in applications.

To this goal, we first state the following asymptotic result (whose proof is given in Appendix) for the asymptotic distributions of stratified and unstratified log-rank tests under local alternatives. Define Oij=0τ{1μ(t)}j{μ(t)}1j{dNij(t)Yij(t)p(t)dt},j=0,1,Ozij=0τ{1μz(t)}j{μz(t)}1j{dNij(t)Yij(t)pz(t)dt},j=0,1,where μ(t)=E(IiYi(t)=1), μz(t)=E(IiYi(t)=1,Zi=z), p(t)dt=E{dNi(t)}/E{Yi(t)}, and pz(t)dt=E{dNi(t)Zi=z}/E{Yi(t)Zi=z}. Also, we use Oj to denote Oij for any i and Ozj to denote Ozij for any i and z. Note that, under the null hypothesis H0, E(Oj)=0 for j = 0, 1, and under the null hypothesis H~0, E(OzjZ=z)=0 for all z and j = 0, 1.

Theorem 4.1

  1. Assume (CZ) and (D1). Under the local alternative hypothesis that E(OzjZ=z)=czjn1/2 with czj's not depending on n and that λ1(tz)/λ0(tz) is bounded and 1 for every t and z, SLdN(δSL/σSL, 1), where d denotes convergence in distribution as n, δSL=zP(Z=z){πcz1(1π)cz0}, σSL2=zP(Z=z){πvarH~0(Oz1Z=z)+(1π)varH~0(Oz0Z=z)}, and varH~0 denotes variance under H~0.

  2. Assume (C), (D1), and (D2). Under the local alternative hypothesis that E(Ozi)=cjn1/2 with cj's not depending on n and that λ1(t)/λ0(t) is bounded and 1 for every t, LdN(δ3L/σL, σL2(ν)/σL2), where δL=πc1(1π)c0, σL2=πvarH0(O1)+(1π)varH0(O0), σL2(ν)=σL2{π(1π)ν}varH0{EH0(O1|Z)+EH0(O0|Z)} for ν given in (D2), and EH0 and varH0 denote expectation and variance under H0, respectively.

Because the local alternative hypotheses specified in (a) and (b) of Theorem 4.1 do not follow any model, δSL2 and δL2 can be arbitrarily very different and, thus, SL and L may be not comparable in terms of asymptotic efficiency. In other words, the space of alternative hypothesis is too large to compare efficiency of SL and L, as there is no model at all. A semiparametric model on alternative hypothesis narrowing down the space of alternative hypothesis may result in affirmative results of comparing efficiency. We derive a result under the Cox proportional hazards model to highlight this.

Suppose that the true hazard follows a Cox proportional hazards model, (7) λj(tV)=λ(t)exp(θj+ηV),j=0,1,(7) where ZVW, λj(tV) is the hazard conditional on covariate V, θ is an unknown parameter, η is an unknown parameter vector, and λ(t) is an unspecified function. Under model (Equation7), (TR) holds with h(s)=log(log(s)) and H~0=H0:θ=0.

Corollary 4.1

Assume that model (Equation7) holds, Cj is independent of (Tj,Z) given j, and P(C1tV)=P(C0tV) for all t. Then, under simple randomization, the stratified log-rank test SL is always more efficient than the unstratified log-rank test L in terms of Pitman's asymptotic relative efficiency.

The proof is given in the Appendix. A key to the proof is that the local alternative hypotheses in (a) and (b) of Theorem 4.1 can be unified into θ=c/n with the help of model (Equation7).

As both log-rank tests are nonparametric and do not need model (Equation7), what does Corollary 4.1 tell us? It says that, under simple randomization, the stratified log-rank test SL is more efficient in the region of alternative hypothesis specified by model (Equation7), although we cannot claim that SL is more efficient in the entire alternative hypothesis space.

We now turn to covariate-adaptive randomization, under which the unstratified log-rank test L is not valid but conservative, as we discussed in Section 3. On the other hand, by Theorem 4.1(a), the stratified log-rank test SL is valid for testing H~0 regardless of which covariate-adaptive randomization is applied. Therefore, stratified log-rank test is a clear winner when covariate-adaptive randomization is applied.

Another way to adjust for covariates used in randomization is the modified (unstratified) log-rank test RL=σˆLL/σˆL(ν) proposed by Ye and Shao (Citation2020), where σˆL(ν) is a consistent estimator of σL(ν) (see §3.2 of Ye and Shao Citation2020). RL removes the conservativeness of L and is valid for testing H0 in (Equation3) under covariate-adaptive randomization.

Even if model (Equation7) holds, SL and RL are not comparable in terms of asymptotic efficiency. We provide two simulation examples here to demonstrate that SL is more efficient in one scenario but less efficient in another scenario, compared with RL. The simulation setting is model (Equation7) with λ(t)=121log2 for all t and ηV=1.5Z1+0.5Z22, where Z1 is binary with P(Z1=1)=0.5, Z2N(0,1), and Z1 and Z2 are independent. Z1 and discretized Z2 with 4 equal probability categories are used for stratified permuted block randomization with block size 4. In scenario 1, censoring is independent of treatment and (Z1,Z2) and distributed as uniform on (10,40). In scenario 2, censoring is independent of treatment and Z2, but conditioned on Z1, and censoring is distributed as 10 + the exponential distribution with mean 2Z1. The power curves over θ with α=0.05 and n = 500 based on 2000 simulations are given in Figure . Note that SL is more powerful than RL under scenario 1 but less powerful under scenario 2. Both SL and RL are more powerful than the conservative L in any case.

Figure 1. Power curves based on n = 500 and 2000 simulations.

Figure 1. Power curves based on n = 500 and 2000 simulations.

The reason why the stratified SL and the modified unstratified RL are not comparable in asymptotic efficiency is that the two tests adopt different approaches in utilizing baseline covariates: the former adjusts baseline covariates by stratification, whereas the latter utilizes baseline covariates by modifying the unstratified L whose performance is affected by covariate-adaptive randomization.

5. Conclusion and discussion

  1. Under some semiparametric models for survival time such as the transformation model (TR) described in Section 3, the null hypotheses of stratified and unstratified log-rank tests are the same.

  2. Under simple randomization of treatment assignments, the stratified log-rank test is asymptotically more efficient than the unstratified log-rank test in terms of Pitman's relative efficiency in the region of alternative hypothesis specified by the Cox proportional hazards model given by (Equation7). It is of interest to derive more affirmative results using assumptions/models other than the Cox model to narrow down the space of alternative hypothesis.

  3. Under covariate-adaptive randomization of treatment assignments, the unstratified log-rank test is not asymptotically valid but conservative, whereas the stratified log-rank test is asymptotically valid as long as the covariates used in randomization are all included in stratification. Thus, the stratified log-rank test is a clear winner. A modified unstratified log-rank test removes conservativeness and is valid, but its relative efficiency compared with the stratified log-rank test has no definite conclusion, because the two tests apply different approaches in utilizing covariates.

  4. Because the region specified by the Cox model is quite large and the stratified log-rank test is a clear winner under covariate-adaptive randomization, we recommend the stratified log-rank test over the unstratified log-rank test.

Acknowledgements

We would like to thank two anonymous referees and an associate editor for helpful comments and suggestions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Antognini, A. B., & Zagoraiou, M. (2015). On the almost sure convergence of adaptive allocation procedures. Bernoulli Journal, 21(2), 881–908.
  • Cheng, S., Wei, L., & Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika, 82(4), 835–845. https://doi.org/10.1093/biomet/82.4.835
  • Ciolino, J. D., Palac, H. L., Yang, A., Vaca, M., & Belli, H. M. (2019). Ideal vs. real: A systematic review on handling covariates in randomized controlled trials. BMC Medical Research Methodology, 19(1), 136. https://doi.org/10.1186/s12874-019-0787-8
  • DiRienzo, A. G., & Lagakos, S. W. (2002). Effects of model misspecification on tests of no randomized treatment effect arising from cox's proportional hazards model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(4), 745–757. https://doi.org/10.1111/1467-9868.00310
  • EMA (2015). Guideline on adjustment for baseline covariates in clinical trials. Committee for Medicinal Products for Human Use, European Medicines Agency (EMA).
  • FDA (2021). Adjusting for covariates in randomized clinical trials for drugs and biological products. Draft Guidance for Industry. Center for Drug Evaluation and Research and Center for Biologics Evaluation and Research, Food and Drug Administration (FDA), U.S. Department of Health and Human Services. May 2021.
  • ICH E9 (1998). Statistical principles for clinical trials E9. International Council for Harmonisation (ICH).
  • Kalbfleisch, J. D., & Prentice, R. L. (2011). The statistical analysis of failure time data. Wiley.
  • Kong, F. H., & Slud, E. (1997). Robust covariate-adjusted logrank tests. Biometrika, 84(4), 847–862. https://doi.org/10.1093/biomet/84.4.847
  • Lin, D. Y., & Wei, L. J. (1989). The robust inference for the cox proportional hazards model. Journal of the American Statistical Association, 84(408), 1074–1078. https://doi.org/10.1080/01621459.1989.10478874
  • Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports, 50(3), 163–170.
  • Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., Mantel, N., McPherson, K., Peto, J., & Smith, P. G. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient. i. introduction and design. British Journal of Cancer, 34(6), 585–612. https://doi.org/10.1038/bjc.1976.220
  • Pocock, S. J., & Simon, R. (1975). Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics, 31(1), 103–115. https://doi.org/10.2307/2529712
  • Schulz, K. F., & Grimes, D. A. (2002). Generation of allocation sequences in randomised trials: Chance, not choice. The Lancet, 359(9305), 515–519. https://doi.org/10.1016/S0140-6736(02)07683-3
  • Shao, J. (2021). Inference for covariate-adaptive randomization: Aspects of methodology and theory (with discussions). Statistical Theory and Related Fields, 5(3), 172–186. https://doi.org/10.1080/24754269.2021.1871873
  • Taves, D. R. (1974). Minimization: A new method of assigning patients to treatment and control groups. Clinical Pharmacology and Therapeutics, 15(5), 443–453. https://doi.org/10.1002/cpt.1974.15.issue-5
  • Taves, D. R. (2010). The use of minimization in clinical trials. Contemporary Clinical Trials, 31(2), 180–184. https://doi.org/10.1016/j.cct.2009.12.005
  • Ye, T., & Shao, J. (2020). Robust tests for treatment effect in survival analysis under covariate-adaptive randomization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(5), 1301–1323. https://doi.org/10.1111/rssb.12392
  • Ye, T., Shao, J., Yi, Y., & Zhao, Q. (2022). Toward better practice of covariate adjustment in analyzing randomized clinical trials. Journal of the American Statistical Association.
  • Zelen, M. (1974). The randomization and stratification of patients to clinical trials. Journal of Chronic Diseases, 27(7-8), 365–375. https://doi.org/10.1016/0021-9681(74)90015-0

Appendix

A.1. Proof of Theorem 3.1

It is clear that H~0 in (Equation6) implies H0 in (Equation3). Thus, it suffices to show that, under (TR), λ1(t)=λ0(t) for all t implies λ1(tZ)=λ0(tZ) for all (t,Z). Define Sj(tV)=P(TjtV) and Sj(t)=P(Tjt). If λ1(t)=λ0(t) for all t, then S0(t)=S1(t) for all t. Condition (TR) implies that S0(tV)=h1(θ+h(S1(tV)))for all (t,V).Then S1(t)=S0(t)=E{S0(tV)}=E{h1(θ+h(S1(tV)))}for all t,i.e., E{h1(θ+h(S1(tV)))S1(tV)}=0for all t.Since h1(θ+h(S1(tV)))S1(tV) or S1(tV) depending on whether θ0 or 0, h1(θ+h(S1(tV)))=S1(tV)for all (t,V).This implies that θ=0 and, thus, S0(tV)=S1(tV) for all (t,V), which together with ZV imply that S0(tZ)=S1(tZ) and hence λ0(tZ)=λ1(tZ) for all (t,Z).

A.2. Proof of Theorem 4.1

We prove (a) only, since the proof of (b) is similar. We first show that, under the null hypothesis H~0 or alternative hypothesis, (A1) n(UˆSLznz1θz1nz0θz0n)dN(0,σ~SL2),(A1) where nzj= the number of patients with treatment j in stratum z, θzj=E(OzjZ=z), j = 0, 1, and σ~SL2=zP(Z=z){πvar(Oz1Z=z)+(1π)var(Oz0Z=z)}.Following the argument in the Appendix of Lin and Wei (Citation1989), we obtain that, under either the null or alternative hypothesis, the left hand side of (EquationA1) is equal to (A2) 1nzi:Zi=z{Ii(Ozi1θz1)(1Ii)(Ozi0θz0)}+op(1),(A2) where op(1) denotes a quantity converging to 0 in probability as n. Define I={I1,,In} and Z={Z1,,Zn}. Similar to the proof of Theorem 2 in Ye et al. (Citation2022), the Lindeberg's Central Limit Theorem justifies that, conditioned on I and Z, the random vector (1nzi:Zi=zIi(Ozi1θz1), 1nzi:Zi=z(1Ii)(Ozi0θz0))converges in distribution to a 2-dimensional normal distribution with mean 0, conditional on I and Z. Let M be the quantity in (EquationA2) excluding op(1), which is the sum of two components of the previous random vector. Consequently, {var(MI,Z)}1/2MI,ZdN(0,1).Under (D1), var(MI,Z)=1nz{i:Zi=z,Ii=1var(Oiz1Z)+i:Zi=z,Ii=0var(Oiz0Z)}=1nz{i:Zi=z,Ii=1var(Oiz1Zi=z)+i:Zi=z,Ii=0var(Oiz0Zi=z)}=1nz{nz1var(Oz1Z=z)+nz0var(Oz0Z=z)}=znzn{nz1nzvar(Oz1Z=z)+nz0nzvar(Oz0Z=z)}=zP(Z=z){πvar(Oz1Z=z)+(1π)var(Oz0Z=z)}+op(1)=σ~SL2+op(1).Then {var(MI,Z)}1/2MdN(0,1) unconditionally. Thus, by Slutsky's theorem, (EquationA1) holds.

Next, under the local alternative specified in part (a), σ~{SL}2σ{SL}2 and n(znz1θz1nz0θz0n)=zP(Z=z){πcz1(1π)cz0}+op(1)=δSL+op(1).Hence, by (EquationA1) and Slutsky's theorem, nUˆSLdN(δSL,σSL2).It remains to show that σˆSL2σSL2=op(1), under the specified local alternative. By Lemma 3 of Ye and Shao (Citation2020), within any stratum z, Y¯z1(t)Y¯z0(t)/Y¯z(t)2=μz(t){1μz(t)}+op(1). By the identity E{dNi(t)}=πE{Yi1(t)λ1(t)}dt+(1π)E{Yi0(t)λ0(t)}dtfrom Kalbfleisch and Prentice (Citation2011) and the form of σˆSL2, we obtain that, under the specified local alternative, σˆSL2=znzn0τμz(t){1μz(t)}E{dNi(t)}+op(1)=znzn0τμz(t){1μz(t)}[πE{Yi1(t)λ1(t)}+(1π)E{Yi0(t)λ0(t)}]dt+op(1)=zP(Z=z){πvarH~0(Oz1Z=z)+(1π)varH~0(Oz0Z=z)}+op(1)=σSL2+op(1).

A.3. Proof of Corollary 4.1

A direct calculation shows that σL2=σSL2=0τEH0{Yi(t)exp(ηVi)}v(t)dΛ(t),where σL2 and σSL2 are given in Theorem 4.1, Λ(t)=0tλ(s)ds, v(t)=var(IiYi(t)=1), and EH0 denotes expectation under H0:θ=0.

Under the local alternative hypothesis θ=c/n with a fixed constant c0, by Theorem 4.1, LdN(cθL/σL,1), where θL=σL20τ(EH0{Yi(t)exp(2ηVi)}[EH0{Yi(t)exp(ηVi)}]2EH0{Yi(t)})v(t)Λ(t)dΛ(t),and SLdN(cθSL/σL,1), where θSL=σL20τ(EH0{Yi(t)exp(2ηVi)}EH0[E{Yi(t)exp(ηVi)Zi}]2EH0{Yi(t)Zi})v(t)Λ(t)dΛ(t).Pitman's asymptotic relative efficiency of SL with respect to L is θSL2/θL2.

Applying Jensen's inequality φ{E(M)}E{φ(M)} with convex function φ(t1,t2)=t12/t2 and M=(EH0{Yi(t)exp(ηVi)}Zi}, EH0{Yi(t)Zi}), we obtain that θLθSL. To reach the conclusion θSL2/θL21, it remains to show that θL0.

The condition P(C1tV)=P(C0tV) for all t implies that v(t)=π(1π) and, hence, θL=π(1π)0τEH0{Yi(t)exp(ηVi)}dΛ(t)π(1π)0τEH0{Yi(t)exp(2ηVi)}Λ(t)dΛ(t)+π(1π)0τ[EH0{Yi(t)exp(ηVi)}]2EH0{Yi(t)}Λ(t)dΛ(t).Thus, it suffices to show (A3) 0τ[EH0{Yi(t)exp(ηVi)}EH0{Yi(t)exp(2ηVi)}Λ(t)]dΛ(t)0.(A3) Note that E{Ni1(τ)}=0τE{dNi1(t)}=0τE{Yi1(t)exp(θ+ηVi)}dΛ(t)=0τEV[exp{Λ(t)exp(θ+ηVi)}P(C1tVi)exp(θ+ηVi)]dt,where EV is the expectation with respect to covariate Vi and is not depending on θ. Taking the derivative with respect to θ, we obtain that E{Ni1(τ)}θ=0τEV[exp{Λ(t)exp(θ+ηVi)}P(C1tVi)exp(θ+ηVi)exp{Λ(t)exp(θ+ηVi)}P(C1tVi)exp(2θ+2ηVi)Λ(t)]dΛ(t).Then, E{Ni1(τ)}θ|θ=0=0τEV[exp{Λ(t)exp(ηVi)}P(C1tVi)exp(ηVi)exp{Λ(t)exp(ηVi)}P(C1tVi)exp(2ηVi)Λ(t)]dΛ(t)=0τ[EH0{Yi1(t)exp(ηVi)}EH0{Yi1(t)exp(2ηVi)Λ(t)}]dΛ(t),which is the same as the left-hand side of (EquationA3). As E{Ni1(τ)} is the probability of having an observed failure before time τ, it is a non-decreasing function of θ. This implies that (EquationA3) holds.