Research Topic #4

Development of New Statistical Methods

Research Topic 4, Aim 1: Development of corrective factors to enable fairer cross-site comparisons.

Data from the sites will be highly diverse with different laboratory assays, different databases, different times of sampling (e.g., every 3 months or every 6 months), and varying quality and validity of clinical assessment. These differences will be carefully characterized and, when possible, potential error estimated. In past multi-center observational studies, correction factors have had to be built into analytic strategies to enable one site's CD4 or viral load data to be comparable to another site that might be using a different assay. More modern laboratory assays have reduced, but not eliminated the need for cross-site correctional factors in comparing data. We will investigate lab and clinical measurement issues across sites and even across clinicians within sites, and evaluate the need to make data and/or analytic adjustments before data can be compared fairly across sites.

The nonparametric approach of Huang, Jie, Brunelle, and Rocco will also be considered for adjustment of lab variables for laboratory differences. This approach allows nonlinear transformation of an entire distribution of lab values and does not assume that a simple lab-specific correction factor is appropriate.

Research Topic 4, Aim 2: Development of Causal Inference Methods

We envision many interesting and challenging causal questions arising from this data. For example, one question of interest is with regards to optimizing the effectiveness of ART procedures. At one site there may be different treatment procedures (time of initiation, regimens, doses, etc.) than at another site. It is difficult to estimate the effect of the different treatment procedures on health outcomes because one cannot separate out the effect of the treatment procedures from the specific site. In other words, certain characteristics of the sites, not the procedures, may be causing any observed difference in outcomes. Ideally, one would randomize treatment procedures, but this is not feasible over all the sites proposed in this grant. (One possible direction for statistical methods research would be to study methods and experimental designs for combining observational and randomized trial data.)

In the absence of randomization, the standard approach would be to perform analyses trying to control for as many variables as possible that might explain the differences between sites and health outcomes, the idea being if there are no unmeasured confounders then one can make a fair causal comparison of the different procedures on particular health outcomes. This is the general analysis plan described throughout this grant application. However, there is no way to know if one has adjusted for all relevant variables.

An alternative approach would be to perform sensitivity analyses. These approaches assume a certain amount of bias due to unobserved variables, and then perform the analysis, repeating analyses over a biologically plausible range for the amount of bias. This will be coupled with an aggressive adjustment for confounding through the use of a propensity model using all available baseline covariates.

Research Topic 4, Aim 3: Development of Phylogenetic Analyses Methods

Proper sampling is essential to characterize the molecular epidemiology of HIV. However, sampling frames (complete lists of HIV positive individuals) are difficult to identify, so most studies use convenience samples, which could result in biased estimates of the distribution of HIV. Shepherd et al. described a stratified cluster sampling design for studying the molecular epidemiology of HIV in Honduras. Their approach was to divide the population into geographical and/or social strata; then within each stratum define clusters as groups, locations or facilities where HIV positive individuals may be found; next randomly select clusters within strata; and finally randomly select individuals within selected clusters. This approach has advantages because inference is less subject to bias, yet cost is still kept fairly low.

Estimates of proportions and variances using such multistage sampling plans can be computed using standard statistical software (SAS, STATA, SUDAAN, and R). However, it is often of interest to use these samples to construct regional phylogenetic trees. Confidence levels for phylogenetic trees are typically obtained using bootstrap techniques [Felsenstein, 1985; Efron et al, 1996]. However, we are unaware of a method for constructing phylogenetic trees and confidence levels for an evolutionary sequence from data obtained through multistage sampling designs.

We propose extending nonparametric bootstrap techniques to obtain confidence levels for evolutionary sequences when data come from multistage sampling designs. This will require carefully incorporating selection weights in resampling procedures.

The data management and telecommunications approach and technologies that will be employed within CCASAnet have been developed and tested for HIV multi-center clinical studies conducted over the past ten years by the HIV research Centers of the University of California, San Diego (UCSD). Dr. Masys, the Principal Investigator, served as the leader for the data management groups of the UCSD HIV Neurobehavioral Research Center (HNRC), AntiViral Research Center (AVRC) and UCSD Center for AIDS Research (CFAR) while serving as Director of Biomedical Informatics within the Dean’s Office, UCSD School of Medicine.

Back to Aims