Semi-parametric Methods for Survival
and Longitudinal Data
(26 Feb - 24 Apr 2005)

~ Abstracts ~

A semi-parametric duration model with heterogeneity that does not need to be estimated
Jerry A. Hausman, Massachusetts Institute of Technology

This paper presents a new estimator for the mixed proportional hazard model that allows for a nonparametric baseline hazard and time-varying regressors. In particular, this paper allows for discrete measurement of the durations as happens often in practice. The integrated baseline hazard and all parameters are estimated at regular rate, √N , where N is the number of individuals. However, no parametric form of the heterogeneity distribution is required so that the specification is fully semi-parametric.

« Back...

Bayesian methods for survival and longitudinal data
Ming-Hui Chen, University of Connecticut

Survival analysis arises in many fields of study such as medicine, biology, engineering, public health, epidemiology, and economics. This talk aims to provide a comprehensive treatment of Bayesian survival analysis. There will be four 2-hour sessions.

Session 1. Bayesian Semiparametric Survival Models

In this session, we will discuss Bayesian models based on prior processes for the baseline hazard and cumulative hazard, construction of the likelihood function, prior elicitation, and computational algorithms for sampling from the posterior distribution. We will discuss gamma processes, beta processes, and correlated gamma processes. Several examples and case studies will be presented to illustrate the various models.

Session 2. Bayesian Cure Rate Models

Survival models incorporating a cure fraction, often referred to as cure rate models, are becoming increasingly popular in analyzing data from cancer clinical trials. The cure rate model has been used for modeling time-to-event data for various types of cancers, including breast cancer, non-Hodgkins lymphoma, leukemia, prostate cancer, melanoma, and head and neck cancer, where for these diseases, a significant proportion of patients are ``cured." In this session, we will give a comprehensive Bayesian treatment of the cure rate model and discuss its applications in cancer. We will also present a multivariate extension of the cure rate model, and discuss its computational implementation in detail. In addition, we will discuss informative prior elicitation for this model based on the power prior, and discuss its properties. Several case studies will be presented to illustrate this model.

Session 3. Joint Models for Longitudinal and Survival Data

In this session, we will discuss joint models for longitudinal and survival data. Joint models for survival and longitudinal data have recently become quite popular in cancer and AIDS clinical trials, where a longitudinal biologic marker such as CD4 count or immune response to a vaccine can be an important predictor of survival. Often in clinical trials where the primary endpoint is time to an event, patients are also monitored longitudinally with respect to one or more biologic endpoints throughout the follow-up period. This may be done by taking immunologic or virologic measures in the case of infectious diseases or perhaps with a questionnaire assessing the quality of life after receiving a particular treatment. Often these longitudinal measures are incomplete or may be prone to measurement error. These measurements are also important because they may be predictive of survival. Therefore methods which can model both the longitudinal and the survival components jointly are becoming increasingly essential in most cancer and AIDS clinical trials. In this part of the session, we will give a detailed development of joint models for longitudinal and survival data, and discuss Bayesian techniques for fitting such models. Examples from cancer vaccine trials will be presented.

Session 4. Bayesian Model Assessment in Survival Analysis

The scope of Bayesian model comparison is quite broad, and can be investigated via Bayes factors, model diagnostics and goodness of fit measures. In many situations, one may want to compare several models which are not nested. Such comparisons are common in survival analysis, since, for example, we may want to compare a fully parametric model versus a semiparametric model, or a cure rate model versus a Cox model, and so forth. In this session, we will discuss several methods for Bayesian model comparison, including Bayes factors and posterior model probabilities, the Bayesian Information Criterion (BIC), the Conditional Predictive Ordinate (CPO), and the L measure. Detailed examples using real data are presented, and computational implementation is examined.

« Back...

Aging and degradation models for survival and longitudinal data: state of the art
Mikhail Nikouline, Universite Bordeaux 2

We consider here the models describing dependence of the lifetime distribution on the time-depending explanatory variables. Such models are used in survival analysis and reliability to study the reliability of bio-technical systems. We shall consider a general approach for construction of efficient statistical models to study aging and degradation problems in different areas such as oncology, demography, biostatistics, survival analysis, etc... We shall discuss the problems of statistical modeling and of choice of design in clinical trials to obtain the statistical estimators of the main survival characteristics.

« Back...

Nonparametric maximum likelihood inference for change-point transformation models under right censoring
Michael R. Kosorok, University of Wisconsin-Madison

We consider linear transformation models applied to right censored survival data with a change-point in the regression coefficient based on a covariate threshold. We establish consistency and weak convergence of the nonparametric maximum likelihood estimators. The change-point parameter is shown to be n-consistent, while the remaining parameters are shown to be square-root-of-n consistent. For the special case of the Cox model, our change-point model is similar to the model considered by Pons (2003, Annals of Statistics), except that we allow a change in the intercept after the change-point. Our contribution goes beyond Pons (2003) in three important ways. First, we extend to general transformation models. This results in a significant increase in complexity since estimation of the baseline hazard can no longer be avoided through use of the partial-profile likelihood. Second, we study inference for the model parameters. We show that the procedure is adaptive in the sense that the non-threshold parameters are estimable with the same precision as if the true threshold value were known. Third, we develop a hypothesis test for the existence of a change point. This is quite challenging since some of the model parameters are no longer identifiable under the null hypothesis of no change-point, and the known results for testing when identifiability is lost under the null do not apply. This research is joint work with my student Rui Song.

« Back...

Mixed effects models and longitudinal data analysis
Jiming Jiang, University of California at Davis

Over the past decade there has been an explosion of developments in mixed effects models and their applications. This lecture series concentrates on two major classes of mixed effects models, linear mixed models and generalized linear mixed models, with the intention of offering an up-to-date account of the theory and methods in inference about these models as well as their application in the analysis of longitudinal data.

Lecture 1: Linear mixed models

The first lecture is devoted to linear mixed models. We classify linear mixed models as Gaussian (linear) mixed models and non-Gaussian linear mixed models. There have been extensive studies in estimation in Gaussian mixed models as well as tests and confidence intervals. On the other hand, the literature on non-Gaussian linear mixed models is much less extensive, partially because of the difficulties in inference about these models. Yet, non-Gaussian linear mixed models are important because, in practice, one can never be sure that normality holds. This lecture offers a systematic approach to inference about non-Gaussian linear mixed models. In particular, we have included recently developed methods such as partially observed information, jackknife in the con text of longitudinal mixed models, goodness-of-fit tests, prediction intervals and mixed model selection. These are of course, in addition to traditional methods such as maximum likelihood and restricted maximum likelihood.

Lecture 2: Generalized linear mixed models

The next lecture deals with generalized linear mixed models. These models may be regarded as extensions of Gaussian mixed models, and are useful in situations where responses are both correlated and discrete or categorical. A special case of generalized linear mixed models, a mixed logistic model, was first introduced by McCullagh and Nelder (1989) for the infamous salamander mating problem. Since then the models have received considerable attention, and various methods of inference have been developed. A major issue regarding generalized linear mixed models has been the computation of maximum likelihood estimator. It is known that the likelihood function under these models may involve high dimensional integrals which cannot be evaluated analytically. Therefore, maximum likelihood estimator is difficult to compute. We classify the methods of inference as likelihood-based and non-likelihood-based methods. The likelihood-based methods focus on developing methods of computation for the maximum likelihood. The non-likelihood based approaches try to avoid the computational difficulty. These include approximate inference and generalized estimating equation approach. Some thoughts about future research as well as open problems will also be discussed.

Lecture 3: Longitudinal dada analysis I

One of the applications of mixed effects models is the analysis of longitudinal data. We begin with the Laird and Ware (1982) model which is a direct application of linear mixed models. In many cases of longitudinal data the problems of main interest is the mean function of the responses. When the mean function is linear, a well-known method of estimating the regression coefficients involved is weighted least squares, or WLS. We introduce a recently developed method known as iterative WLS, which results in asymptotically efficient estimators without assuming a parametric model for the variance-covariance structure of the data. An extension of WLS is the generalized estimating equations (GEE), which also applies to cases such as binary responses and counts.

Lecture 4: Longitudinal data analysis II

This lecture discusses semiparametric and nonparametric approaches in the analysis of longitudinal data. In particular, we discuss the so-called varying coefficient model, in which the mean function of responses is modelled by polynomial splines. Other topics include extension of the iterative WLS introduced in Lecture 3 to semiparametric models, and models with informative droppouts and missing covariates.

« Back...

Local polynomial regression analysis of longitudinal data
Kani Chen, Hong Kong University of Science and Technology

We propose a simple and effective local polynomial regression smoothers for curve estimation based on longitudinal or clustered data. The method is based on a conservation utilization of the within-cluster dependence, which leads to minimax efficient estimation with a slight modification.

« Back...

Nonparametric estimation of the cumulative incidence function for multiple events data
Weijing Wang, National Chiao Tung University, Taiwan

Data with multiple endpoints are commonly seen in medical studies. This talk focuses on nonparametric estimation of the cumulative incidence function for a particular type of failure. Two different data structures are considered. One is the conventional setting of competing risks data and the other is related to the setup of cure models. For each data structure, we demonstrate that the cumulative incidence function can be estimated via different approaches, including the methods of imputation and inverse-probability-weighting and the nonparametric MLE. Under each setting, we show that these approaches are equivalent. We also demonstrate that the complement of the Kaplan-Meier estimator is still a valid approach to estimating the cumulative incidence function if it is applied to the correct data structure. The effect of sufficient follow-up on the estimation of the long-term incidence rate, which involves the tail information, is also discussed.

« Back...

Goodness-of-fit tests for a varying-coefficients model in longitudinal studies
Lixing Zhu, The University of Hong Kong

Varying-coefficient longitudinal regression models are important tools for making inferences in longitudinal analysis. The main objective in these analyses is to evaluate the change of the mean response over time and the effects of the explanatory variables of interest on the mean response. In this article, we construct an residual-marked process based test for a varying-coefficient longitudinal model. Two approaches are recommended for determining critical values: innovation process approach and Nonparametric Monte Carlo Test (NMCT) approximation. The former is to use a martingale transformation to obtain an innovation process and to define a distribution-free test under a composite null model and the latter can simulate the null distribution of the test by Monte Carlo. The NMCT approximation is very easy to implement and overcomes the difficulty that the consistency of the bootstrap approximation is unclear. Applications of the proposed approaches are demonstrated through a simulation and an example in epidemiology.

« Back...

Lg penalty models: computation and applications
Wenjiang Fu, Texas A & M University

During the past decade, there has been growing interest in Lg penalty models, such as the ridge - L2 penalty and the Lasso - L1 penalty (Tibshirani 1996). These penalty models not only provide techniques to improve prediction for regression models suffering from collinearity, but also offer means for variable selection. In this series of four lectures, I will present different aspects of Lg penalty models, including algorithms, asymptotics, selection of tuning parameter, and extension to longitudinal data. I will also introduce some most recent development, including Bayesian approach, the fused Lasso, etc.

Lecture 1. Lg penalty: variable selection and computation for linear models.

I will introduce Lg penalty with linear models, present variable selection property of the Lasso, and provide efficient algorithms for Lg penalty estimators. While there are different algorithms for the Lasso and its variance, I will compare two of them, the combined quadratic programming method with the shooting method for the estimator itself, and the modified ridge type variance with the Lasso type variance. I will demonstrate the above algorithms and shrinkage effect with real data sets in biomedical research.

Lecture 2. Selection of tuning parameter and asymptotics.

Since Lg penalty yields a broad range of estimators, including linear, piece-wise linear and nonlinear estimators, the generalized cross-validation (GCV) method in Tibshirani (1996) may not work properly for all in selecting tuning parameter. I will provide a novel method, the nonlinear GCV for the selection of tuning parameter for linear models. I will also provide asymptotics results for Lg penalty models.

Lecture 3. Extension to non-Gaussian response and longitudinal studies.

I will motivate the extension of Lg penalty to generalized linear models and generalized estimating equations (GEE) in longitudinal studies, provide theory for the extension to the penalized GEE model. Selection of tuning parameter is as challenging as the penalized GEE model due to the lack of joint likelihood. I will introduce the quasi-GCV method for the penalized GEE model in selecting tuning parameters. This quasi-GCV method is a natural extension of the above nonlinear GCV and possesses similar properties.

Lecture 4. Recent development in Lg penalty models and related topics.

Recent development in bioinformatics studies, especially microarray data analysis, has stimulated growing interest in the variable selection properties of penalty models, particularly the Lasso. The small-n-large-p problems, i.e., small sample size and large number of genes (independent variables), make Bayesian approach attractive. Although penalty models have a Bayesian interpretation, direct conversion of Lg penalty into priors may be technically difficult and computationally inefficient. I will present some most recent development in Bayesian variable selection with priors based on penalty models, a new family of prior distributions for Bayesian variable selection, including Laplacian prior and Gaussian prior as special cases. This new family of prior distributions possesses attractive properties, such as sparseness and efficient computation, especially for high throughput data in bioinformatics. I will also briefly mention some other penalty models, such as the fused Lasso, time permitting.

« Back...

Lifetime expectancy regression
Ying Qing Chen, Fred Hutchinson Cancer Research Center

In statistical analysis of lifetimes, residual life and its characteristics, such as the mean residual life function, have been understudied, although they can be of substantial scientific interest in the medical research of, for examples, treatment efficacy assessment or cost-effectiveness analysis. In this talk, we will discuss the challenges in regression analysis of residual life with censored lifetimes. In particular, we will use the so-called proportional mean residual life model to demonstrate how to handle these challenges and draw appropriate inferences. Extensions of the residual life regression will be also discussed.

« Back...

Analyzing recurrent event data using nonparametric and semiparametric models
Mei-Cheng Wang, Johns Hopkins University

Recurrent events serve as important measurements for evaluating disease progression, health deterioration, or insurance plans in studies of different fields. The intensity function of a recurrent event process is known as the occurrence probability conditioning on event history. In contrast with the conditional interpretation of the intensity function, the rate function is defined as the occurrence probability unconditioning on the event history. In this talk the 'shape' and 'size' parameters of the rate function are introduced to characterize, model and analyze recurrent event data. Particular interests will focus on latent variable models which allow for informative censoring in two different situations: 1) informative censoring as a nuisance, 2) informative censoring generated mainly or partly by a failure event, and joint modeling between the recurrent event process and the failure time is of interest. Nonparametric and semiparametric methods will be constructed via the estimation of the shape and size parameters in one-sample and regression models. If time allows, I will also briefly discuss related topics such as bivariate recurrent event processes and recurrent longitudinal data.

« Back...

Varying coefficient GARCH versus local constant modelling
Vladimir Spokoiny, Weierstrass Institute for Applied Analysis and Stochastics

In this talk we compare the performance of the varying coefficient GARCH models and more simple models with local constant volatility. The obtained results indicate that many stylized facts of financial time series like long range dependence, cointegration etc. can be explained by changes of model parameters. Next we apply the procedure to some exchange rate datasets and show that the more simple local constant approach delivers typically better results as far as the short term ahead forecasting of Value-at-Risk are concerned.

« Back...

An Introduction to R: software for statistical modelling and computing
Petra Kuhnert, CSIRO, Australia

This three day course will provide an overview of the R, software for statistical modelling and computing. The course will provide an elementary introduction to the software, introduce participants to the statistical modelling and graphical capabilities of R as well as provide an overview of two advanced topics: neural networks and classification and regression trees.

« Back...

Semiparametric models in survival analysis and quantile regression
Probal Chaudhuri, Indian Statistical Institute

Many of the popular regression models used in survival analysis including Cox's proportional hazard model can be viewed as semiparametric models having some intrinsic monotonicity properties. One is interested in estimating and drawing inference about a finite dimensional Euclidean parameter in that model in the presence of an infinite dimensional nuisance parameter. These survival analysis models are special cases of monotone single index model used in econometrics. The use of average derivative qunatile regression techniques for parameter estimation in such models will be discussed. In addition to regression models with univariate response and a single index, we will also discuss possible extensions of the methodology for multivariate response and multiple index models.

« Back...

Mark-specific hazard function modeling with application to HIV vaccine efficacy trials
Ian W. McKeague, Florida State University and Columbia University

Methods for analyzing survival data with discrete causes of failure are well developed. In many applications, however, a continuous mark variable is observed at uncensored failure times, which amounts to a unique cause of failure for each individual, so it is necessary to borrow strength from neighboring observations of the mark. This talk discusses some new non- and semi-parametric models for mark-specific hazard functions. We describe 1) a test of whether a mark-specific relative risk function depends on the mark, and 2) inference for a mark-specific proportional hazards model. An application to data from an HIV vaccine efficacy trial is presented. The efficacy of an HIV vaccine to prevent infection is likely to depend the genetic variation of the exposing virus, and it is of interest to model such dependence in terms of the divergence of infecting HIV viruses in trial participants from the HIV strain that is contained in the vaccine. We discuss the importance of accounting for such viral divergence to assess vaccine efficacy. The talk is based on joint work with Peter Gilbert and Yanqing Sun.

« Back...

Nonparametric estimation of homothetic and homothetically separable functions
Oliver Linton, London School of Economics and Political Science

For vectors x and w, let r(x,w) be a function that can be nonparametrically estimated consistently and asymptotically normally. We provide consistent, asymptotically normal estimators for the functions g and h, where r(x,w)=h[g(x),w], g is linearly homogeneous and h is monotonic in g. This framework encompasses homothetic and homothetically separable functions. Such models reduce the curse of dimensionality, provide a natural generalization of linear index models, and are widely used in utility, production, and cost function applications. Extensions to related functional forms include a generalized partly linear model with unknown link function and endogenous regressors. We provide simulation evidence on the small sample performance of our estimator, and we apply our method to a Chinese production dataset.

« Back...

Longitudinal Growth Charts
Xuming He, University of Illinois at Urbana-Champaign and National Science Foundation

Growth charts are often more informative when they are customized per subject, taking into account the prior measurements and possibly other covariates of the subject. We study a global semiparametric quantile regression model that has the ability to estimate conditional quantiles without the usual distributional assumptions. The model can be estimated from longitudinal reference data with irregular measurement times and with some level of robustness against outliers, and is also flexible for including covariate information. We propose a rank score test for large sample inference on covariates, and develop a new model assessment tool for longitudinal growth data. Our research indicates that the global model has the potential to be a very useful tool in conditional growth chart analysis. (This talk is based on joint work with Ying Wei at Columbia University.)

« Back...

Estimation of density for arbitrarily censored and truncated data
Catherine Huber, Université René Descartes - Paris 5

B. W. Turnbull proposed in his paper entitled " The empirical distribution function with arbitrary grouped, censored and truncated data." (JRSS, 38, 1976, p. 290-295), a general method for nonparametric maximum likelihood estimation of the distribution function in the presence of missing and incomplete data due to grouping, censoring and truncation. His method has been used since by many authors. But, to our knowledge, the consistency of the resulting estimate was never proved. With Valentin Solev, while he was recently visiting us in Paris, we proved the consistency of Turnbull's NPMLE under appropriate regularity conditions on the involved censoring, truncation and survival distributions.

« Back...

A model selection test for bivariate failure-time data
Xiaohong Chen, New York University

In this paper, we address two important issues in survival model selection for censored data generated by the Archimedean copula family; method of estimating the parametric copulas and data reuse. We demonstrate that for model selection, estimators of the parametric copulas based on minimizing the selection criterion function may be preferred to other estimators. To handle the issue of data reuse, we put model selection in the context of hypothesis testing and propose a simple test for model selection from a finite number of parametric copulas. Results from a simulation study and two empirical applications provide strong support to our theoretical findings.

« Back...

Diagnostic plots and corrective adjustments for the proportional hazards regression model
Debasis Sengupta, Indian Statistical Institute

There are several diagnostic plots for the proportional hazards regression model for survival data. The focus of this talk is strictly on those plots which diagnose the validity of the proportional hazards assumption, irrespective of the validity of other related assumptions (such as additivity/linearity of the covariate effects). If the proportional hazards assumption does not hold, then it may still be possible to use a modified version of this model after suitable adjustment. The set of plots available in the literature and some new plots are examined from this point of view. We consider two specific types of violation: (a) a covariate contributing as a scale factor of the failure time (rather than a scale factor of the hazard rate) (b) a regression coefficient being time-dependent. It is assumed that the effects of all the covariates except one are proportional on the hazard rate. Simple and intuitive methods of adjusting for the non-proportional effect of a single covariate are then explored and studied via simulations.

The procedures considered here can be organized to form a gateway to the proportional hazards model. The entry could be (i) automatic (if the plots are good), (ii) conditional (if the plots are not good but satisfactory remedial measures are available) or (iii) denied. Other diagnostics for the Proportional Hazards model come into the picture only after automatic or conditional entry has been gained.

The main contribution of this work is in the area of `conditional' entry.

« Back...

Identification and estimation of truncation regression models
Songnian Chen, Hong Kong University of Science and Technology

In this paper we considere nonparametric identification and estimation of truncated regression models in the cross-sectional and panel data settings. We first present various identification results. Our estimators are based on minimizing some Cramer-von Mises-type distances adjusted for truncation, using one set of identification results. For the cross-sectional case, our estimation procedures overcome certain drawbacks associated with the existing estimators. Furthermore, our estimation procedures can be extended to the panel data model with fixed effects.

« Back...

An extended semiparametric transformation model with non-susceptibility and heteroscedasticity
Chen-Hsin Chen, Institute of Statistical Science, Academia Sinica, Taipei

Semiparametric mixture regression models have been proposed to formulate the probability of susceptibility by a logistic regression and the time to event of the susceptible by Cox's proportional hazards regression. Recently a more general class of semiparametric transformation cure models (Lu and Ying, 2004) was presented to include the proportional hazards cure model and the proportional odds cure model as special cases. On the other hand, the heteroscedastic hazards regression model (Hsieh, 2001) sharing a similar representation of transformation models was developed to tackle the crossing in survival curves without non-susceptibility. We hence propose a semiparametric heteroscedastic transformation cure model to deal with these two issues simultaneously. Given a specific form of the transformation function, our approach has finite-sample optimality, while this optimality is not attainable in the reduced homoscedastic case of Lu and Ying. We obtain asymptotic properties of the estimators and a closed form for the asymptotical variance-covariance matrix. Simulation studies and a real data analysis will be also discussed. The talk is based on joint work with Chyong-Mei Chen.

« Back...

First hitting models and threshold regressions
Mei-Ling Ting Lee, Harvard University

The first-hitting time (FHT) model has proved to be useful as an alternative model for time-to-event and survival data. On the basis of the FHT model, we introduce the threshold regression (TR) methodology. The threshold regression model has an underlying latent stochastic process representing a subject’s latent health state. This health status process fluctuates randomly over time until its level reaches a critical threshold, thus defining the outcome of interest. The time to reach the primary endpoint or failure (death, disease onset, etc.) is the time when the latent health status process first crosses a failure threshold level. The effectiveness of threshold regression lies in how initial health status, hazards and the progression of disease are modeled, while taking account of covariates and competing outcomes. The threshold regression model does not require the proportional hazards assumption and hence offers a rich potential for applications.

In a recent application to environmental research, we consider a retrospective longitudinal study of more than 50,000 US railroad workers tracked from 1959 to 1996. The initial investigation was focused on lung cancer death because of a suspected link to diesel exhaust exposure. Based on an intuitive concept that a lung cancer mortality event occurs when the cumulative environmental diesel exposure of a subject first hits a threshold value, the threshold regression is found to be effective in providing insights into the process of disease progression. The threshold regression model also allows the survival pattern for each period of exposure to be observed. We show that TR is useful in a competing risks context, which is encountered here because three causes of death are under consideration. We introduce a modified Kaplan-Meyer plot that provides new insights into the health effects of diesel exhaust exposure.

« Back...

Sample size and power of randomized clinical trials
Feifang Hu, University of Virginia

Randomized designs are often used in clinical trials. In the literature, the power and sample size are usually obtained by ignoring the randomness of the allocation in randomized designs. However, when using a randomized design, the power is a random variable for a fixed sample size n. In this talk, we focus on the power function (random) and the sample size of two-arm (drug versus control) randomized clinical trials. We first give an example where a target power can not be achieved with high probability when the requisite sample size (based on the formula in the literature) is used. Then we obtain the power function for any given sample size and study the properties of this power function. Based on the power function, a formula of sample size is derived for randomized designs. This formula is applied to several randomization procedures. We also discuss our finding that response adaptive designs can be used to reduce the requisite sample size. Some simulation studies are reported.

« Back...

Estimating features of a distribution from binomial data
Daniel L. McFadden, University of California at Berkeley

A statistical problem that arises in several fields is that of estimating the features of an unknown distribution, which may be conditioned on covariates, using a sample of binomial observations on whether draws from this distribution exceed threshold levels set by experimental design. Applications include bioassay and destructive duration analysis. The empirical application we consider is referendum contingent valuation in resource economics, where one is interested in features of the distribution of values (willingness to pay) placed by consumers on a public good such as endangered species. Sample consumers are asked whether they favor a referendum that would provide the good at a cost specified by experimental design. This paper provides estimators for moments and quantiles of the unknown distribution in this problem under both nonparametric and semiparametric specifications.

« Back...

An old-new family of multivariate distributions for left truncated and right censored data
Shulamith Gross, National Science Foundation

Catherine Huber-Carol of Universite Paris V and I carried out this work in the Spring of 2000 while I was at the Biostatistics Department of Universite Paris V. The family is a semi-parametric family which generalizes the Cox model in one dimension. It does not require new software when used to model left truncated and/or right censored data. It includes both purely discrete and purely continues distribution. Its main advantage over Copula-based multivariate distributions is in its multidimensional modeling of the covariance.

« Back...

Measles metapopulation dynamics: a gravity model for epidemiological coupling and dynamics
Xia Yingcun, National University of Singapore

Infectious diseases provide a particularly clear illustration of the spatio-temporal underpinnings of consumer-resource dynamics. The paradigm is provided by extremely contagious, acute, immunizing childhood infections. Partially synchronized, unstable oscillations are punctuated by local extinctions. This, in turn, can result in spatial differentiation in the timing of epidemics, and -- depending on the nature of spatial contagion -- may result in travelling waves. Measles are one of a few systems documented well enough to reveal all of these properties and how they are affected by spatio-temporal variations in population structure and demography. Based on a gravity coupling model and a time series susceptible-infected-recovered (TSIR) model for local dynamics, we propose a metapopulation model for regional measles dynamics. The model can capture all the major spatio-temporal properties in pre-vaccination epidemics of measles in England and Wales.

« Back...

Nonparametric methods for inference in the presence of instrumental variables
Joel L. Horowitz, Northwestern University

We suggest two nonparametric approaches, based on kernel methods and orthogonal series, respectively, to estimating regression functions in the presence of instrumental variables. For the first time in this class of problems we derive optimal convergence rates, and show that they are attained by particular estimators. In the presence of instrumental variables the relation that identifies the regression function also defines an ill-posed inverse problem, the “difficulty” of which depends on eigenvalues of a certain integral operator which is determined by the joint density of endogenous and instrumental variables. We delineate the role played by problem difficulty in determining both the optimal convergence rate and the appropriate choice of smoothing parameter.

« Back...

Semiparametric and nonparametric estimation and testing
Joel L. Horowitz, Northwestern University

These lectures will present methods for estimating semiparametric single-index models, estimating nonparametric additive models with and without a link function, and testing a parametric model against a nonparametric alternative. Single-index and additive models are important ways to achieve dimension reduction in nonparametric estimation. The lectures will cover theory and applications.

Lecture 1: Semiparametric single-index models
Lecture 2: Nonparametric additive models
Lecture 3: Nonparametric additive models with a link function
Lecture 4: Testing a parametric model against a nonparametric alternative

« Back...

Bayesian inference and computation for the Cox regression model with missing covariates
Ming-Hui Chen, University of Connecticut

Missing covariate data in the Cox model is a fundamentally important practical problem in biomedical research. In this talk, I will present necessary and sufficient conditions for posterior propriety of the regression coefficients, $\bbeta$, in Cox's partial likelihood, which can be obtained through a gamma process prior for the cumulative baseline hazard and a uniform improper prior for $\bbeta$. The main focus of my talk will be on how to carry out a very challenging Bayesian computation that arises from this interesting problem. The novel Bayesian computational scheme we have developed is based on the introduction of several latent variables and the use of the collapsed Gibbs technique of Liu (1994). A real dataset is presented to illustrate the proposed methodology. This is a joint work with Joseph G. Ibrahim and Qi-Man Shao.

« Back...

Joint analysis of Longitudinal latent variable and a survival process with application in quality of life
Mounir Mesbah, Université Pierre Et Marie Curie

Multivariate mixed Rasch model is used to analyse binary responses of questionnaires assessed at several time visits. The responses of our model are two way correlated. First, at a given visit, the binary responses of a single individual are correlated and second, they are repeated over the visits, they, also become correlated. It is, however, well known that a full likelihood analysis for such mixed models is hampered by the need for numerical integrations. To overcome such integration problems, generalized estimating equations approach is used, following useful approximations. Fixed effects parameters and variance components are estimated consistently by asymptotical normal statistics. Usefulness of the method is shown using simulations and is illustrated with a complex real data from quality of life field, where missing responses and death can occurs during follow up.

« Back...

Spline confidence band and hypothesis testing of leaf area index trend in East Africa
Lijian Yang, National University of Singapore

Asymptotically exact and conservative confidence bands are obtained for nonparametric regression function, based on piecewise constant and piecewise linear polynomial spline estimation, respectively. Compared to the pointwise nonparametric confidence interval of Huang (2003), the confidence bands are inflated only by a factor of {log (n)}^1/2, similar to the Nadaraya-Watson confidence bands of Härdle (1989), and the local polynomial bands of Xia (1998) and Claeskens and Van Keilegom (2003). Simulation experiments have provided strong evidence that corroborates with the asymptotic theory. Testing against the linear spline confidence band, the commonly used trigonometric trend is rejected with highly significant evidence for the Leaf Area Index of Aquatic Agriculture land, based on the remote sensing data collected from East Africa.

« Back...

Small sample issues in microarray studies: sample size and error rate estimation
Wenjiang Fu, Texas A & M University

Microarray technology has gained increasing popularity. It provides great opportunities to screen thousands of genes simultaneously through a small number of samples but also poses great challenges, such as sample size determination, misclassification error rate estimation with small samples, and gene selection, due to the special data structure of small sample size and high dimensionality.

In this presentation, I will address two aspects of this general small sample problem. The first topic is the determination of sample size, where conventional sample size calculations may not apply. I will introduce a novel sequential approach, which allows large enough sample size to make sound decisions and yet small enough sample size to make the studies affordable. The second topic is the estimation of misclassification error, where current available methods, such as cross-validation, leave-one-out bootstrap, .632 bootstrap (Efron 1983) and .632+ bootstrap (Efron and Tibshirani 1997), suffer from large variability or high bias. I will propose a novel bootstrap cross-validation (BCV) method of estimating misclassification error with small samples. I will demonstrate the above methods through Monte Carlo simulations and applications to microarray data, although our methods also apply to other types of data, such as clinical diagnosis in medical research.

This is a joint work with Raymond Carroll, Edward Dougherty, Bani Mallick and Suojin Wang.

« Back...