Semi-parametric Methods for Survival
and Longitudinal Data
(26 Feb - 24 Apr 2005)
~ Abstracts ~
A semi-parametric duration model
with heterogeneity that does not need to be estimated
Jerry A. Hausman, Massachusetts Institute of Technology
This paper presents a new estimator for the mixed
proportional hazard model that allows for a nonparametric
baseline hazard and time-varying regressors. In particular,
this paper allows for discrete measurement of the durations
as happens often in practice. The integrated baseline hazard
and all parameters are estimated at regular rate, √N , where
N is the number of individuals. However, no parametric form
of the heterogeneity distribution is required so that the
specification is fully semi-parametric.
« Back...
Bayesian methods for survival and
longitudinal data
Ming-Hui Chen, University of Connecticut
Survival analysis arises in many fields of study such as
medicine, biology, engineering, public health, epidemiology,
and economics. This talk aims to provide a comprehensive
treatment of Bayesian survival analysis. There will be four
2-hour sessions.
Session 1. Bayesian Semiparametric Survival Models
In this session, we will discuss Bayesian models based on
prior processes for the baseline hazard and cumulative
hazard, construction of the likelihood function, prior
elicitation, and computational algorithms for sampling from
the posterior distribution. We will discuss gamma processes,
beta processes, and correlated gamma processes. Several
examples and case studies will be presented to illustrate
the various models.
Session 2. Bayesian Cure Rate Models
Survival models incorporating a cure fraction, often
referred to as cure rate models, are becoming increasingly
popular in analyzing data from cancer clinical trials. The
cure rate model has been used for modeling time-to-event
data for various types of cancers, including breast cancer,
non-Hodgkins lymphoma, leukemia, prostate cancer, melanoma,
and head and neck cancer, where for these diseases, a
significant proportion of patients are ``cured." In this
session, we will give a comprehensive Bayesian treatment of
the cure rate model and discuss its applications in cancer.
We will also present a multivariate extension of the cure
rate model, and discuss its computational implementation in
detail. In addition, we will discuss informative prior
elicitation for this model based on the power prior, and
discuss its properties. Several case studies will be
presented to illustrate this model.
Session 3. Joint Models for Longitudinal and Survival
Data
In this session, we will discuss joint models for
longitudinal and survival data. Joint models for survival
and longitudinal data have recently become quite popular in
cancer and AIDS clinical trials, where a longitudinal
biologic marker such as CD4 count or immune response to a
vaccine can be an important predictor of survival. Often in
clinical trials where the primary endpoint is time to an
event, patients are also monitored longitudinally with
respect to one or more biologic endpoints throughout the
follow-up period. This may be done by taking immunologic or
virologic measures in the case of infectious diseases or
perhaps with a questionnaire assessing the quality of life
after receiving a particular treatment. Often these
longitudinal measures are incomplete or may be prone to
measurement error. These measurements are also important
because they may be predictive of survival. Therefore
methods which can model both the longitudinal and the
survival components jointly are becoming increasingly
essential in most cancer and AIDS clinical trials. In this
part of the session, we will give a detailed development of
joint models for longitudinal and survival data, and discuss
Bayesian techniques for fitting such models. Examples from
cancer vaccine trials will be presented.
Session 4. Bayesian Model Assessment in Survival Analysis
The scope of Bayesian model comparison is quite broad,
and can be investigated via Bayes factors, model diagnostics
and goodness of fit measures. In many situations, one may
want to compare several models which are not nested. Such
comparisons are common in survival analysis, since, for
example, we may want to compare a fully parametric model
versus a semiparametric model, or a cure rate model versus a
Cox model, and so forth. In this session, we will discuss
several methods for Bayesian model comparison, including
Bayes factors and posterior model probabilities, the
Bayesian Information Criterion (BIC), the Conditional
Predictive Ordinate (CPO), and the L measure. Detailed
examples using real data are presented, and computational
implementation is examined.
« Back...
Aging and degradation models
for survival and longitudinal data: state of the art
Mikhail Nikouline, Universite Bordeaux 2
We consider here the models describing dependence of the
lifetime distribution on the time-depending explanatory
variables. Such models are used in survival analysis and
reliability to study the reliability of bio-technical
systems. We shall consider a general approach for
construction of efficient statistical models to study aging
and degradation problems in different areas such as
oncology, demography, biostatistics, survival analysis,
etc... We shall discuss the problems of statistical modeling
and of choice of design in clinical trials to obtain the
statistical estimators of the main survival characteristics.
« Back...
Nonparametric maximum likelihood
inference for change-point transformation models under right
censoring
Michael R. Kosorok, University of Wisconsin-Madison
We consider linear transformation models applied to right
censored survival data with a change-point in the regression
coefficient based on a covariate threshold. We establish
consistency and weak convergence of the nonparametric
maximum likelihood estimators. The change-point parameter is
shown to be n-consistent, while the remaining parameters are
shown to be square-root-of-n consistent. For the special
case of the Cox model, our change-point model is similar to
the model considered by Pons (2003, Annals of Statistics),
except that we allow a change in the intercept after the
change-point. Our contribution goes beyond Pons (2003) in
three important ways. First, we extend to general
transformation models. This results in a significant
increase in complexity since estimation of the baseline
hazard can no longer be avoided through use of the
partial-profile likelihood. Second, we study inference for
the model parameters. We show that the procedure is adaptive
in the sense that the non-threshold parameters are estimable
with the same precision as if the true threshold value were
known. Third, we develop a hypothesis test for the existence
of a change point. This is quite challenging since some of
the model parameters are no longer identifiable under the
null hypothesis of no change-point, and the known results
for testing when identifiability is lost under the null do
not apply. This research is joint work with my student Rui
Song.
« Back...
Mixed effects models and
longitudinal data analysis
Jiming Jiang, University of California at Davis
Over the past decade there has been an explosion of
developments in mixed effects models and their applications.
This lecture series concentrates on two major classes of
mixed effects models, linear mixed models and generalized
linear mixed models, with the intention of offering an
up-to-date account of the theory and methods in inference
about these models as well as their application in the
analysis of longitudinal data.
Lecture 1: Linear mixed models
The first lecture is devoted to linear mixed models. We
classify linear mixed models as Gaussian (linear) mixed
models and non-Gaussian linear mixed models. There have been
extensive studies in estimation in Gaussian mixed models as
well as tests and confidence intervals. On the other hand,
the literature on non-Gaussian linear mixed models is much
less extensive, partially because of the difficulties in
inference about these models. Yet, non-Gaussian linear mixed
models are important because, in practice, one can never be
sure that normality holds. This lecture offers a systematic
approach to inference about non-Gaussian linear mixed
models. In particular, we have included recently developed
methods such as partially observed information, jackknife in
the con text of longitudinal mixed models, goodness-of-fit
tests, prediction intervals and mixed model selection. These
are of course, in addition to traditional methods such as
maximum likelihood and restricted maximum likelihood.
Lecture 2: Generalized linear mixed models
The next lecture deals with generalized linear mixed
models. These models may be regarded as extensions of
Gaussian mixed models, and are useful in situations where
responses are both correlated and discrete or categorical. A
special case of generalized linear mixed models, a mixed
logistic model, was first introduced by McCullagh and Nelder
(1989) for the infamous salamander mating problem. Since
then the models have received considerable attention, and
various methods of inference have been developed. A major
issue regarding generalized linear mixed models has been the
computation of maximum likelihood estimator. It is known
that the likelihood function under these models may involve
high dimensional integrals which cannot be evaluated
analytically. Therefore, maximum likelihood estimator is
difficult to compute. We classify the methods of inference
as likelihood-based and non-likelihood-based methods. The
likelihood-based methods focus on developing methods of
computation for the maximum likelihood. The non-likelihood
based approaches try to avoid the computational difficulty.
These include approximate inference and generalized
estimating equation approach. Some thoughts about future
research as well as open problems will also be discussed.
Lecture 3: Longitudinal dada analysis I
One of the applications of mixed effects models is the
analysis of longitudinal data. We begin with the Laird and
Ware (1982) model which is a direct application of linear
mixed models. In many cases of longitudinal data the
problems of main interest is the mean function of the
responses. When the mean function is linear, a well-known
method of estimating the regression coefficients involved is
weighted least squares, or WLS. We introduce a recently
developed method known as iterative WLS, which results in
asymptotically efficient estimators without assuming a
parametric model for the variance-covariance structure of
the data. An extension of WLS is the generalized estimating
equations (GEE), which also applies to cases such as binary
responses and counts.
Lecture 4: Longitudinal data analysis II
This lecture discusses semiparametric and nonparametric
approaches in the analysis of longitudinal data. In
particular, we discuss the so-called varying coefficient
model, in which the mean function of responses is modelled
by polynomial splines. Other topics include extension of the
iterative WLS introduced in Lecture 3 to semiparametric
models, and models with informative droppouts and missing
covariates.
« Back...
Local polynomial regression
analysis of longitudinal data
Kani Chen, Hong Kong University of Science and Technology
We propose a simple and effective local polynomial
regression smoothers for curve estimation based on
longitudinal or clustered data. The method is based on a
conservation utilization of the within-cluster dependence,
which leads to minimax efficient estimation with a slight
modification.
« Back...
Nonparametric estimation of the
cumulative incidence function for multiple events data
Weijing Wang, National Chiao Tung University, Taiwan
Data with multiple endpoints are commonly seen in medical
studies. This talk focuses on nonparametric estimation of
the cumulative incidence function for a particular type of
failure. Two different data structures are considered. One
is the conventional setting of competing risks data and the
other is related to the setup of cure models. For each data
structure, we demonstrate that the cumulative incidence
function can be estimated via different approaches,
including the methods of imputation and
inverse-probability-weighting and the nonparametric MLE.
Under each setting, we show that these approaches are
equivalent. We also demonstrate that the complement of the
Kaplan-Meier estimator is still a valid approach to
estimating the cumulative incidence function if it is
applied to the correct data structure. The effect of
sufficient follow-up on the estimation of the long-term
incidence rate, which involves the tail information, is also
discussed.
« Back...
Goodness-of-fit tests for a
varying-coefficients model in longitudinal studies
Lixing Zhu, The University of Hong Kong
Varying-coefficient longitudinal regression models are
important tools for making inferences in longitudinal
analysis. The main objective in these analyses is to
evaluate the change of the mean response over time and the
effects of the explanatory variables of interest on the mean
response. In this article, we construct an residual-marked
process based test for a varying-coefficient longitudinal
model. Two approaches are recommended for determining
critical values: innovation process approach and
Nonparametric Monte Carlo Test (NMCT) approximation. The
former is to use a martingale transformation to obtain an
innovation process and to define a distribution-free test
under a composite null model and the latter can simulate the
null distribution of the test by Monte Carlo. The NMCT
approximation is very easy to implement and overcomes the
difficulty that the consistency of the bootstrap
approximation is unclear. Applications of the proposed
approaches are demonstrated through a simulation and an
example in epidemiology.
« Back...
Lg
penalty models: computation and applications
Wenjiang Fu, Texas A & M University
During the past decade, there has been growing interest
in Lg
penalty models, such as the ridge - L2
penalty and the Lasso - L1 penalty (Tibshirani
1996). These penalty models not only provide techniques to
improve prediction for regression models suffering from
collinearity, but also offer means for variable selection.
In this series of four lectures, I will present different
aspects of Lg
penalty models, including algorithms, asymptotics, selection
of tuning parameter, and extension to longitudinal data. I
will also introduce some most recent development, including
Bayesian approach, the fused Lasso, etc.
Lecture 1. Lg
penalty: variable selection and computation for linear
models.
I will introduce Lg
penalty with linear models, present variable selection
property of the Lasso, and provide efficient algorithms for
Lg
penalty estimators. While there are different algorithms for
the Lasso and its variance, I will compare two of them, the
combined quadratic programming method with the shooting
method for the estimator itself, and the modified ridge type
variance with the Lasso type variance. I will demonstrate
the above algorithms and shrinkage effect with real data
sets in biomedical research.
Lecture 2. Selection of tuning parameter and asymptotics.
Since Lg
penalty yields a broad range of estimators, including
linear, piece-wise linear and nonlinear estimators, the
generalized cross-validation (GCV) method in Tibshirani
(1996) may not work properly for all in selecting tuning
parameter. I will provide a novel method, the nonlinear GCV
for the selection of tuning parameter for linear models. I
will also provide asymptotics results for Lg
penalty models.
Lecture 3. Extension to non-Gaussian response and
longitudinal studies.
I will motivate the extension of Lg
penalty to generalized linear models and generalized
estimating equations (GEE) in longitudinal studies, provide
theory for the extension to the penalized GEE model.
Selection of tuning parameter is as challenging as the
penalized GEE model due to the lack of joint likelihood. I
will introduce the quasi-GCV method for the penalized GEE
model in selecting tuning parameters. This quasi-GCV method
is a natural extension of the above nonlinear GCV and
possesses similar properties.
Lecture 4. Recent development in Lg
penalty models and related topics.
Recent development in bioinformatics studies, especially
microarray data analysis, has stimulated growing interest in
the variable selection properties of penalty models,
particularly the Lasso. The small-n-large-p problems, i.e.,
small sample size and large number of genes (independent
variables), make Bayesian approach attractive. Although
penalty models have a Bayesian interpretation, direct
conversion of Lg
penalty into priors may be technically difficult and
computationally inefficient. I will present some most recent
development in Bayesian variable selection with priors based
on penalty models, a new family of prior distributions for
Bayesian variable selection, including Laplacian prior and
Gaussian prior as special cases. This new family of prior
distributions possesses attractive properties, such as
sparseness and efficient computation, especially for high
throughput data in bioinformatics. I will also briefly
mention some other penalty models, such as the fused Lasso,
time permitting.
« Back...
Lifetime expectancy regression
Ying Qing Chen, Fred Hutchinson Cancer Research Center
In statistical analysis of lifetimes, residual life and
its characteristics, such as the mean residual life
function, have been understudied, although they can be of
substantial scientific interest in the medical research of,
for examples, treatment efficacy assessment or
cost-effectiveness analysis. In this talk, we will discuss
the challenges in regression analysis of residual life with
censored lifetimes. In particular, we will use the so-called
proportional mean residual life model to demonstrate how to
handle these challenges and draw appropriate inferences.
Extensions of the residual life regression will be also
discussed.
« Back...
Analyzing recurrent event data
using nonparametric and semiparametric models
Mei-Cheng Wang, Johns Hopkins University
Recurrent events serve as important measurements for
evaluating disease progression, health deterioration, or
insurance plans in studies of different fields. The
intensity function of a recurrent event process is known as
the occurrence probability conditioning on event history. In
contrast with the conditional interpretation of the
intensity function, the rate function is defined as the
occurrence probability unconditioning on the event history.
In this talk the 'shape' and 'size' parameters of the rate
function are introduced to characterize, model and analyze
recurrent event data. Particular interests will focus on
latent variable models which allow for informative censoring
in two different situations: 1) informative censoring as a
nuisance, 2) informative censoring generated mainly or
partly by a failure event, and joint modeling between the
recurrent event process and the failure time is of interest.
Nonparametric and semiparametric methods will be constructed
via the estimation of the shape and size parameters in
one-sample and regression models. If time allows, I will
also briefly discuss related topics such as bivariate
recurrent event processes and recurrent longitudinal data.
« Back...
Varying coefficient GARCH versus
local constant modelling
Vladimir Spokoiny, Weierstrass Institute for Applied
Analysis and Stochastics
In this talk we compare the performance of the varying
coefficient GARCH models and more simple models with local
constant volatility. The obtained results indicate that many
stylized facts of financial time series like long range
dependence, cointegration etc. can be explained by changes
of model parameters. Next we apply the procedure to some
exchange rate datasets and show that the more simple local
constant approach delivers typically better results as far
as the short term ahead forecasting of Value-at-Risk are
concerned.
« Back...
An Introduction to R: software
for statistical modelling and computing
Petra Kuhnert, CSIRO, Australia
This three day course will provide an overview of the R,
software for statistical modelling and computing. The course
will provide an elementary introduction to the software,
introduce participants to the statistical modelling and
graphical capabilities of R as well as provide an overview
of two advanced topics: neural networks and classification
and regression trees.
« Back...
Semiparametric models in
survival analysis and quantile regression
Probal Chaudhuri, Indian Statistical Institute
Many of the popular regression models used in survival
analysis including Cox's proportional hazard model can be
viewed as semiparametric models having some intrinsic
monotonicity properties. One is interested in estimating and
drawing inference about a finite dimensional Euclidean
parameter in that model in the presence of an infinite
dimensional nuisance parameter. These survival analysis
models are special cases of monotone single index model used
in econometrics. The use of average derivative qunatile
regression techniques for parameter estimation in such
models will be discussed. In addition to regression models
with univariate response and a single index, we will also
discuss possible extensions of the methodology for
multivariate response and multiple index models.
« Back...
Mark-specific hazard function
modeling with application to HIV vaccine efficacy trials
Ian W. McKeague, Florida State University and Columbia
University
Methods for analyzing survival data with discrete causes
of failure are well developed. In many applications,
however, a continuous mark variable is observed at
uncensored failure times, which amounts to a unique cause of
failure for each individual, so it is necessary to borrow
strength from neighboring observations of the mark. This
talk discusses some new non- and semi-parametric models for
mark-specific hazard functions. We describe 1) a test of
whether a mark-specific relative risk function depends on
the mark, and 2) inference for a mark-specific proportional
hazards model. An application to data from an HIV vaccine
efficacy trial is presented. The efficacy of an HIV vaccine
to prevent infection is likely to depend the genetic
variation of the exposing virus, and it is of interest to
model such dependence in terms of the divergence of
infecting HIV viruses in trial participants from the HIV
strain that is contained in the vaccine. We discuss the
importance of accounting for such viral divergence to assess
vaccine efficacy. The talk is based on joint work with Peter
Gilbert and Yanqing Sun.
« Back...
Nonparametric estimation of
homothetic and homothetically separable functions
Oliver Linton, London School of Economics and Political
Science
For vectors x and w, let r(x,w) be a function that can be
nonparametrically estimated consistently and asymptotically
normally. We provide consistent, asymptotically normal
estimators for the functions g and h, where r(x,w)=h[g(x),w],
g is linearly homogeneous and h is monotonic in g. This
framework encompasses homothetic and homothetically
separable functions. Such models reduce the curse of
dimensionality, provide a natural generalization of linear
index models, and are widely used in utility, production,
and cost function applications. Extensions to related
functional forms include a generalized partly linear model
with unknown link function and endogenous regressors. We
provide simulation evidence on the small sample performance
of our estimator, and we apply our method to a Chinese
production dataset.
« Back...
Longitudinal Growth Charts
Xuming He, University of Illinois at Urbana-Champaign and
National Science Foundation
Growth charts are often more informative when they are
customized per subject, taking into account the prior
measurements and possibly other covariates of the subject.
We study a global semiparametric quantile regression model
that has the ability to estimate conditional quantiles
without the usual distributional assumptions. The model can
be estimated from longitudinal reference data with irregular
measurement times and with some level of robustness against
outliers, and is also flexible for including covariate
information. We propose a rank score test for large sample
inference on covariates, and develop a new model assessment
tool for longitudinal growth data. Our research indicates
that the global model has the potential to be a very useful
tool in conditional growth chart analysis. (This talk is
based on joint work with Ying Wei at Columbia University.)
« Back...
Estimation of density for
arbitrarily censored and truncated data
Catherine Huber, Université René Descartes - Paris 5
B. W. Turnbull proposed in his paper entitled " The
empirical distribution function with arbitrary grouped,
censored and truncated data." (JRSS, 38, 1976, p. 290-295),
a general method for nonparametric maximum likelihood
estimation of the distribution function in the presence of
missing and incomplete data due to grouping, censoring and
truncation. His method has been used since by many authors.
But, to our knowledge, the consistency of the resulting
estimate was never proved. With Valentin Solev, while he was
recently visiting us in Paris, we proved the consistency of
Turnbull's NPMLE under appropriate regularity conditions on
the involved censoring, truncation and survival
distributions.
« Back...
A model selection test for
bivariate failure-time data
Xiaohong Chen, New York University
In this paper, we address two important issues in
survival model selection for censored data generated by the
Archimedean copula family; method of estimating the
parametric copulas and data reuse. We demonstrate that for
model selection, estimators of the parametric copulas based
on minimizing the selection criterion function may be
preferred to other estimators. To handle the issue of data
reuse, we put model selection in the context of hypothesis
testing and propose a simple test for model selection from a
finite number of parametric copulas. Results from a
simulation study and two empirical applications provide
strong support to our theoretical findings.
« Back...
Diagnostic plots and corrective
adjustments for the proportional hazards regression model
Debasis Sengupta, Indian Statistical Institute
There are several diagnostic plots for the proportional
hazards regression model for survival data. The focus of
this talk is strictly on those plots which diagnose the
validity of the proportional hazards assumption,
irrespective of the validity of other related assumptions
(such as additivity/linearity of the covariate effects). If
the proportional hazards assumption does not hold, then it
may still be possible to use a modified version of this
model after suitable adjustment. The set of plots available
in the literature and some new plots are examined from this
point of view. We consider two specific types of violation:
(a) a covariate contributing as a scale factor of the
failure time (rather than a scale factor of the hazard rate)
(b) a regression coefficient being time-dependent. It is
assumed that the effects of all the covariates except one
are proportional on the hazard rate. Simple and intuitive
methods of adjusting for the non-proportional effect of a
single covariate are then explored and studied via
simulations.
The procedures considered here can be organized to form a
gateway to the proportional hazards model. The entry could
be (i) automatic (if the plots are good), (ii) conditional
(if the plots are not good but satisfactory remedial
measures are available) or (iii) denied. Other diagnostics
for the Proportional Hazards model come into the picture
only after automatic or conditional entry has been gained.
The main contribution of this work is in the area of
`conditional' entry.
« Back...
Identification and estimation of
truncation regression models
Songnian Chen, Hong Kong University of Science and
Technology
In this paper we considere nonparametric identification
and estimation of truncated regression models in the
cross-sectional and panel data settings. We first present
various identification results. Our estimators are based on
minimizing some Cramer-von Mises-type distances adjusted for
truncation, using one set of identification results. For the
cross-sectional case, our estimation procedures overcome
certain drawbacks associated with the existing estimators.
Furthermore, our estimation procedures can be extended to
the panel data model with fixed effects.
« Back...
An extended semiparametric
transformation model with non-susceptibility and
heteroscedasticity
Chen-Hsin Chen, Institute of Statistical Science,
Academia Sinica, Taipei
Semiparametric mixture regression models have been
proposed to formulate the probability of susceptibility by a
logistic regression and the time to event of the susceptible
by Cox's proportional hazards regression. Recently a more
general class of semiparametric transformation cure models
(Lu and Ying, 2004) was presented to include the
proportional hazards cure model and the proportional odds
cure model as special cases. On the other hand, the
heteroscedastic hazards regression model (Hsieh, 2001)
sharing a similar representation of transformation models
was developed to tackle the crossing in survival curves
without non-susceptibility. We hence propose a
semiparametric heteroscedastic transformation cure model to
deal with these two issues simultaneously. Given a specific
form of the transformation function, our approach has
finite-sample optimality, while this optimality is not
attainable in the reduced homoscedastic case of Lu and Ying.
We obtain asymptotic properties of the estimators and a
closed form for the asymptotical variance-covariance matrix.
Simulation studies and a real data analysis will be also
discussed. The talk is based on joint work with Chyong-Mei
Chen.
« Back...
First hitting models and
threshold regressions
Mei-Ling Ting Lee, Harvard University
The first-hitting time (FHT) model has proved to be
useful as an alternative model for time-to-event and
survival data. On the basis of the FHT model, we introduce
the threshold regression (TR) methodology. The threshold
regression model has an underlying latent stochastic process
representing a subject’s latent health state. This health
status process fluctuates randomly over time until its level
reaches a critical threshold, thus defining the outcome of
interest. The time to reach the primary endpoint or failure
(death, disease onset, etc.) is the time when the latent
health status process first crosses a failure threshold
level. The effectiveness of threshold regression lies in how
initial health status, hazards and the progression of
disease are modeled, while taking account of covariates and
competing outcomes. The threshold regression model does not
require the proportional hazards assumption and hence offers
a rich potential for applications.
In a recent application to environmental research, we
consider a retrospective longitudinal study of more than
50,000 US railroad workers tracked from 1959 to 1996. The
initial investigation was focused on lung cancer death
because of a suspected link to diesel exhaust exposure.
Based on an intuitive concept that a lung cancer mortality
event occurs when the cumulative environmental diesel
exposure of a subject first hits a threshold value, the
threshold regression is found to be effective in providing
insights into the process of disease progression. The
threshold regression model also allows the survival pattern
for each period of exposure to be observed. We show that TR
is useful in a competing risks context, which is encountered
here because three causes of death are under consideration.
We introduce a modified Kaplan-Meyer plot that provides new
insights into the health effects of diesel exhaust exposure.
« Back...
Sample size and power of
randomized clinical trials
Feifang Hu, University of Virginia
Randomized designs are often used in clinical trials. In
the literature, the power and sample size are usually
obtained by ignoring the randomness of the allocation in
randomized designs. However, when using a randomized design,
the power is a random variable for a fixed sample size n. In
this talk, we focus on the power function (random) and the
sample size of two-arm (drug versus control) randomized
clinical trials. We first give an example where a target
power can not be achieved with high probability when the
requisite sample size (based on the formula in the
literature) is used. Then we obtain the power function for
any given sample size and study the properties of this power
function. Based on the power function, a formula of sample
size is derived for randomized designs. This formula is
applied to several randomization procedures. We also discuss
our finding that response adaptive designs can be used to
reduce the requisite sample size. Some simulation studies
are reported.
« Back...
Estimating features of a
distribution from binomial data
Daniel L. McFadden, University of California at Berkeley
A statistical problem that arises in several fields is
that of estimating the features of an unknown distribution,
which may be conditioned on covariates, using a sample of
binomial observations on whether draws from this
distribution exceed threshold levels set by experimental
design. Applications include bioassay and destructive
duration analysis. The empirical application we consider is
referendum contingent valuation in resource economics, where
one is interested in features of the distribution of values
(willingness to pay) placed by consumers on a public good
such as endangered species. Sample consumers are asked
whether they favor a referendum that would provide the good
at a cost specified by experimental design. This paper
provides estimators for moments and quantiles of the unknown
distribution in this problem under both nonparametric and
semiparametric specifications.
« Back...
An old-new family of multivariate
distributions for left truncated and right censored data
Shulamith Gross, National Science Foundation
Catherine Huber-Carol of Universite Paris V and I carried
out this work in the Spring of 2000 while I was at the
Biostatistics Department of Universite Paris V. The family
is a semi-parametric family which generalizes the Cox model
in one dimension. It does not require new software when used
to model left truncated and/or right censored data. It
includes both purely discrete and purely continues
distribution. Its main advantage over Copula-based
multivariate distributions is in its multidimensional
modeling of the covariance.
« Back...
Measles metapopulation dynamics: a
gravity model for epidemiological coupling and dynamics
Xia Yingcun, National University of Singapore
Infectious diseases provide a particularly clear
illustration of the spatio-temporal underpinnings of
consumer-resource dynamics. The paradigm is provided by
extremely contagious, acute, immunizing childhood
infections. Partially synchronized, unstable oscillations
are punctuated by local extinctions. This, in turn, can
result in spatial differentiation in the timing of
epidemics, and -- depending on the nature of spatial
contagion -- may result in travelling waves. Measles are one
of a few systems documented well enough to reveal all of
these properties and how they are affected by spatio-temporal
variations in population structure and demography. Based on
a gravity coupling model and a time series
susceptible-infected-recovered (TSIR) model for local
dynamics, we propose a metapopulation model for regional
measles dynamics. The model can capture all the major spatio-temporal
properties in pre-vaccination epidemics of measles in
England and Wales.
« Back...
Nonparametric methods for
inference in the presence of instrumental variables
Joel L. Horowitz, Northwestern University
We suggest two nonparametric approaches, based on kernel
methods and orthogonal series, respectively, to estimating
regression functions in the presence of instrumental
variables. For the first time in this class of problems we
derive optimal convergence rates, and show that they are
attained by particular estimators. In the presence of
instrumental variables the relation that identifies the
regression function also defines an ill-posed inverse
problem, the “difficulty” of which depends on eigenvalues of
a certain integral operator which is determined by the joint
density of endogenous and instrumental variables. We
delineate the role played by problem difficulty in
determining both the optimal convergence rate and the
appropriate choice of smoothing parameter.
« Back...
Semiparametric and nonparametric
estimation and testing
Joel L. Horowitz, Northwestern University
These lectures will present methods for estimating
semiparametric single-index models, estimating nonparametric
additive models with and without a link function, and
testing a parametric model against a nonparametric
alternative. Single-index and additive models are important
ways to achieve dimension reduction in nonparametric
estimation. The lectures will cover theory and applications.
Lecture 1: Semiparametric single-index models
Lecture 2: Nonparametric additive models
Lecture 3: Nonparametric additive models with a link
function
Lecture 4: Testing a parametric model against a
nonparametric alternative
« Back...
Bayesian inference and computation
for the Cox regression model with missing covariates
Ming-Hui Chen, University of Connecticut
Missing covariate data in the Cox model is a
fundamentally important practical problem in biomedical
research. In this talk, I will present necessary and
sufficient conditions for posterior propriety of the
regression coefficients, $\bbeta$, in Cox's partial
likelihood, which can be obtained through a gamma process
prior for the cumulative baseline hazard and a uniform
improper prior for $\bbeta$. The main focus of my talk will
be on how to carry out a very challenging Bayesian
computation that arises from this interesting problem. The
novel Bayesian computational scheme we have developed is
based on the introduction of several latent variables and
the use of the collapsed Gibbs technique of Liu (1994). A
real dataset is presented to illustrate the proposed
methodology. This is a joint work with Joseph G. Ibrahim and
Qi-Man Shao.
« Back...
Joint analysis of Longitudinal
latent variable and a survival process with application in
quality of life
Mounir Mesbah, Université Pierre Et Marie Curie
Multivariate mixed Rasch model is used to analyse binary
responses of questionnaires assessed at several time visits.
The responses of our model are two way correlated. First, at
a given visit, the binary responses of a single individual
are correlated and second, they are repeated over the
visits, they, also become correlated. It is, however, well
known that a full likelihood analysis for such mixed models
is hampered by the need for numerical integrations. To
overcome such integration problems, generalized estimating
equations approach is used, following useful approximations.
Fixed effects parameters and variance components are
estimated consistently by asymptotical normal statistics.
Usefulness of the method is shown using simulations and is
illustrated with a complex real data from quality of life
field, where missing responses and death can occurs during
follow up.
« Back...
Spline confidence band and
hypothesis testing of leaf area index trend in East Africa
Lijian Yang, National University of Singapore
Asymptotically exact and conservative confidence bands
are obtained for nonparametric regression function, based on
piecewise constant and piecewise linear polynomial spline
estimation, respectively. Compared to the pointwise
nonparametric confidence interval of Huang (2003), the
confidence bands are inflated only by a factor of {log
(n)}^1/2, similar to the Nadaraya-Watson confidence
bands of Härdle (1989), and the
local polynomial bands of Xia (1998) and Claeskens and Van
Keilegom (2003). Simulation experiments have provided strong
evidence that corroborates with the asymptotic theory.
Testing against the linear spline confidence band, the
commonly used trigonometric trend is rejected with highly
significant evidence for the Leaf Area Index of Aquatic
Agriculture land, based on the remote sensing data collected
from East Africa.
« Back...
Small sample issues in microarray
studies: sample size and error rate estimation
Wenjiang Fu, Texas A & M University
Microarray technology has gained increasing popularity.
It provides great opportunities to screen thousands of genes
simultaneously through a small number of samples but also
poses great challenges, such as sample size determination,
misclassification error rate estimation with small samples,
and gene selection, due to the special data structure of
small sample size and high dimensionality.
In this presentation, I will address two aspects of this
general small sample problem. The first topic is the
determination of sample size, where conventional sample size
calculations may not apply. I will introduce a novel
sequential approach, which allows large enough sample size
to make sound decisions and yet small enough sample size to
make the studies affordable. The second topic is the
estimation of misclassification error, where current
available methods, such as cross-validation, leave-one-out
bootstrap, .632 bootstrap (Efron 1983) and .632+ bootstrap (Efron
and Tibshirani 1997), suffer from large variability or high
bias. I will propose a novel bootstrap cross-validation (BCV)
method of estimating misclassification error with small
samples. I will demonstrate the above methods through Monte
Carlo simulations and applications to microarray data,
although our methods also apply to other types of data, such
as clinical diagnosis in medical research.
This is a joint work with Raymond Carroll, Edward
Dougherty, Bani Mallick and Suojin Wang.
« Back...
|