Abstract
In a Data-Generating Experiment (DGE), the data, , is often obtained from a Black-Box and is approximated with a learning machine/sampler,
is random, f is known. When
has unknown cdf,
nonidentifiability of θ cannot be confirmed and may limit the predictive accuracy of the learned model,
estimate of
Using properties of the Expected p-value for the Kolmogorov-Smirnov test, the Empirical Discrimination Index (EDI) and the Proportion of p-Values Index (PPVI) are introduced: (i) to confirm almost surely, discrimination of θ from
(ii) to confirm with EDI-graphics identifiability of
by repeating (i) for
in a fine sieve of
and (iii) to compare EDI-graphics and PPVIs of DGEs and select to use the DGE with the greater parameter discrimination and the smaller number of
violating identifiability of
Among the applications, EDI and PPVI explain why the g-estimate in Tukey’s g-and-h model is better than that for the g-and-k model, unless the sample size is extremely large;
EDI-graphics indicate that Normal learning machines have better parameter discrimination than Sigmoid learning machines and their parameters are nonidentifiable. Supplementary materials for this article are available online.
Supplementary Materials
Proofs and R-functions used in Examples 4.1–4.6.
Acknowledgments
Many thanks are due to Professor Faming Liang and Professor Galin Jones, Editors, who have handled, respectively, the original submission and the revisions. Thanks are due to the referees, for their comments that improved the presentation of the paper, and to Mr. Yongzhen Feng, Tsinghua University, for the suggestions to improve readability.
Disclosure Statement
The authors report there are no competing interests to declare.
Notes
1 When without preliminary estimation,
has countably infinite elements.
2 Since one DGE is studied.