Journal of Clinical Epidemiology 57 (2004) 1223–1231 Methods to assess intended effects of drug treatment in observational studies are reviewed Olaf H. Edwin P. Bruce M. Diederik E. Sean D. Bruno H.Ch. Hubert G.M. A. de aDepartment of Pharmacoepidemiology and Pharmacotherapy, Utrecht Institute of Pharmaceutical Sciences (UIPS), Utrecht University, Sorbonnelaan 16, 3584 CA Utrecht, the Netherlands bCentre for Biostatistics, Utrecht University, Utrecht, the Netherlands cCardiovascular Health Research Unit, Medicine, Health Services, and Epidemiology, University of Washington, Seattle, WA, USA dJulius Centre for Health Sciences and Primary Care, Utrecht Medical Centre (UMC), Utrecht, the Netherlands eDepartments of Pharmacy and Health Services, University of Washington, Seattle, WA, USA fDepartment of Epidemiology and Biostatistics, Erasmus University Rotterdam, Rotterdam, the Netherlands Accepted 30 March 2004 Background and objective: To review methods that seek to adjust for confounding in observational studies when assessing intended
drug effects.
Methods: We reviewed the statistical, economical and medical literature on the development, comparison and use of methods adjusting
for confounding.
Results: In addition to standard statistical techniques of (logistic) regression and Cox proportional hazards regression, alternative
methods have been proposed to adjust for confounding in observational studies. A first group of methods focus on the main problem ofnonrandomization by balancing treatment groups on observed covariates: selection, matching, stratification, multivariate confounder score,and propensity score methods, of which the latter can be combined with stratification or various matching methods. Another group ofmethods look for variables to be used like randomization in order to adjust also for unobserved covariates: instrumental variable methods,two-stage least squares, and grouped-treatment approach. Identifying these variables is difficult, however, and assumptions are strong.
Sensitivity analyses are useful tools in assessing the robustness and plausibility of the estimated treatment effects to variations in assumptionsabout unmeasured confounders.
Conclusion: In most studies regression-like techniques are routinely used for adjustment for confounding, although alternative methods
are available. More complete empirical evaluations comparing these methods in different situations are needed.
쑖 2004 Elsevier Inc. All rights reserved.
Keywords: Review; Confounding; Observational studies; Treatment effectiveness; Intended drug effects; Statistical methods effect in the population under study (confidence intervals,significance). Proper randomization should remove all In the evaluation of intended effects of drug therapies, kinds of potential selection bias, such as physician preference well-conducted randomized controlled trials (RCTs) have for giving the new treatment to selected patients or patient been widely accepted as the scientific standard The key preference for one of the treatments in the trial Ran- component of RCTs is the randomization procedure, which domization does not assure equality on all prognostic factors allows us to focus on only the outcome variable or variables in the treatment groups, especially with small sample in the different treatment groups in assessing an unbiased sizes, but it assures confidence intervals and P-values to treatment effect. Because adequate randomization will assure be valid by using probability theory that treatment groups will differ on all known and unknown There are settings where a randomized comparison of prognostic factors only by chance, probability theory can treatments may not be feasible due to ethical, economic easily be used in making inferences about the treatment or other constraints Also, RCTs usually exclude particulargroups of patients (because of age, other drug usage or non- * C⫹fax: ⫹31 30 253 9166.
compliance); are mostly conducted under strict, protocol- driven conditions; and are generally of shorter duration than 0895-4356/04/$ – see front matter 쑖 2004 Elsevier Inc. All rights reserved.
O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 the period that drugs are used in clinical practice Thus, of a general population or subgroup over time are not uncom- RCTs typically provide evidence of what can be achieved with mon Furthermore, there may exist differences in popu- treatments under the controlled conditions in selected groups lation definitions between different research settings.
of patients for a defined period of treatment.
The main alternatives are observational studies. Their 2.2. Candidates for treatment validity for assessing intended effects of therapies has longbeen debated and remains controversial The recent If current treatment guidelines exist, the comparison be- example of the potential cardiovascular risk reducing effects tween the treated and the untreated group can be improved of hormone replacement therapy (HRT) illustrates this con- by choosing for the untreated group only those subjects who troversy Most observational studies indicated that HRT are candidates for the treatment under study according to these reduces the risk of cardiovascular disease, whereas RCTs guidelines. As a preliminary selection, this method was used demonstrated that HRT increases cardiovascular risk in a cohort study to estimate the effect of drug treatment The main criticism of observational studies is the absence of hypertension on the incidence of stroke in the general of a randomized assignment of treatments, with the result population by selecting candidates on the basis of their that uncontrolled confounding by unknown, unmeasured, or blood pressure and the presence of other cardiovascular risk inadequately measured covariates may provide an alternative factors The selection of a cohort of candidates for explanation for the treatment effect treatment can also be conducted by a panel of physicians Along with these criticisms, many different methods have after presenting them the clinical characteristics of the been proposed in the literature to assess treatment effects patients in the study in observational studies. With all these methods, the mainobjective is to deal with the potential bias caused by the 2.3. Comparing treatments for the same indication nonrandomized assignment of treatments, a problem also When different classes of drugs, prescribed for the same known as confounding indication, have to be studied, at least some similarity in Here we review existing methods that seek to achieve prognostic factors between treatment groups occurs natu- valid and feasible assessment of treatment effects in observa- rally. This strategy was used in two case–control studies to tional studies.
compare the effects of different antihypertensive drug thera-pies on the risks of myocardial infarction and ischemic strokeOnly patients who used antihypertensive drugs for 2. Design for observational studies
the indication hypertension were included in these studies(and also some subgroups that had other indications such A first group of method of dealing with potential bias as angina for drugs that can be used to treat high blood following from nonrandomized observational studies is to pressure were removed).
narrow the treatment and/or control group in order to createmore comparable groups on one or more measured charac- 2.4. Case–crossover and case–time–control design teristics. This can be done by selection of subjects or bychoosing a specific study design. These methods can also The use of matched case–control (case–referent) studies be seen as only a first step in removing bias, in which case when the occurrence of a disease is rather rare is a well- further reduction of bias has to be attained by means of data- known research design in epidemiology. This type of design can also be adopted when a strong treatment effect is sus-pected or when a cohort is available from which thesubjects are selected (nested case–control study) Varia- 2.1. Historical controls tions of this design have been proposed to control for con- Before the introduction and acceptance of the RCT as the founding due to differences between exposed and unexposed gold standard for assessing the intended effect of treatments, it patients. One such variant is the case–crossover study, in was common to compare the outcome of treated patients which event periods are compared with control periods with the outcome of historical controls (patients previously within cases of patients who experienced an event. This study untreated or otherwise treated) An example of this design may avoid bias resulting from differences between method can be found in Kalra et al. The authors as- exposed and nonexposed patients, but variations in the sessed the rates of stroke and bleeding in patients with atrial underlying disease state within individuals could still con- fibrillation receiving warfarin anticoagulation therapy in found the association between treatment and outcome general medical clinics and compared these with the rates An extension of this design is the case–time–control design, of stroke and bleeding among similar patients with atrial which takes also into account changes of exposure levels fibrillation who received warfarin in a RCT.
over time. With this design and with certain assumptions Using historical controls as a comparison group is in confounding due to time trends in exposure can be removed, general a problematic approach, because the factor time but variations in the severity of disease over time within can play an important role. Changes of the characteristics individuals, although probably correlated with exposure O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 levels, cannot be controlled In a study comparing of a stroke, and women are subdivided by the history of a the effect of high and moderate β-antagonist use on the risk previous cardiovascular disease. By pooling all treatment of fatal or near-fatal asthma attacks, the odds ratio (OR) effects in the strata in the usual way, a corrected treatment from a case–time control analysis controlling for time trends effect can be calculated. Although by this method more in exposure, turned out to be much lower (OR ⫽ 1.2, 95% covariates can be handled than with normal stratification, confidence interval, CI95%, ⫽ 0.5–3.0) than in a conventional most of them will be partly used. We are unaware of any case–control analysis (OR ⫽ 3.1, CI95% ⫽ 1.8–5.4) medical study in which this method has been used.
Advantages of these designs in which each subject is its own control, are the considerably reduced intersubject 3.3. Common multivariable statistical techniques variability and the exclusion of alternative explanations frompossible confounders. These methods are on the other hand Compared to selection, restriction, stratification, or of limited use, because for only some treatments the outcome matching, more advanced multivariable statistical tech- can be measured at both the control period and the event niques have been developed to reduce bias due to differences period, and thereby excluding possible carryover effects.
in prognosis between treatment groups in observational stud-By assessing a model with outcome as the dependentand type of treatment as the independent variable of inter- 3. Data-analytical techniques
est, many prognostic factors can be added to the analysis toadjust the treatment effect for these confounders. Well known Another group of bias reducing methods are the data- and frequently used methods are multivariable linear re- analytical techniques, which can be divided into model- gression, logistic regression, and Cox proportional hazards based techniques (regression-like methods) and methods regression (survival analysis). Main advantage over earlier without underlying model assumptions (stratification and mentioned techniques is that more prognostic variables, quantitative and qualitative, can be used for adjustment, dueto a model that is imposed on the data. It's obvious that also 3.1. Stratification and matching in these models the number of subjects or the number of Intuitive and simple methods to improve the comparison events puts a restriction on the number of covariates; a between treatment groups in assessing treatment effects, are ratio of 10–15 subjects or events per independent variable the techniques of stratification (subclassification) and match- is mentioned in the literature ing on certain covariates as a data analytical technique. The An important disadvantage of these techniques when used limitations and advantages of these methods are in general for adjusting a treatment effect for confounding, is the danger the same. Advantages are (i) clear interpretation and commu- of extrapolations when the overlap on covariates between nication of results, (ii) direct warning when treatment groups treatment groups is too limited. While matching or stratifica- do not adequately overlap on used covariates, and (iii) no tion gives a warning or breaks down, regression analysis assumptions about the relation between outcome and covari- will still compute coefficients. Mainly when two or more ates (e.g., linearity) The main limitation of these covariates are used, a check on adequate overlap of the joint techniques is, that in general only one or two covariates or distributions of the covariates will be seldom performed.
rough strata or categories are possible. More covariates will The use of a functional form of the relationship between easily result in many empty strata in case of stratification outcome and covariates is an advantage for dealing with and many mismatches in case of matching. Another disad- more covariates, but have its drawback, mainly when treat- vantage is that continuous variables have to be classified, ment groups have different covariate distributions. In that using (mostly) arbitrary criteria.
case, the results are heavily dependent on the chosen relation- These techniques can easily be combined with methods ship (e.g., linearity).
like propensity scores and multivariate confounder score,as will be discussed below, using the advantages of clear 3.4. Propensity score adjustment interpretation and absence of assumptions about func-tional relationships.
An alternative way of dealing with confounding caused by nonrandomized assignment of treatments in cohortstudies, is the use of propensity scores, a method developed 3.2. Asymmetric stratification by Rosenbaum and Rubin D'Agostino found that A method found in the literature that is worth mentioning, "the propensity score for an individual, defined as the condi- is asymmetric stratification Compared to cross-stratifi- tional probability of being treated given the individual's cation of more covariates, in this method each stratum of covariates, can be used to balance the covariates in observa- the first covariate is subdivided by the covariate that have tional studies, and thus reduce bias." In other words, by this highest correlation with the outcome within that stratum.
method a collection of covariates is replaced by a single For instance, men are subdivided on the existence of diabetes covariate, being a function of the original ones. For an indi- mellitus because of the strongest relationship with the risk vidual i (i ⫽ 1, …, n) with vector xi of observed covariates,
O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 the propensity score is the probability e(xi) of being treated
model. Therefore, propensity score adjustment is less sensi- (Zi ⫽ 1) versus not being treated (Zi ⫽ 0): tive to assumptions about the functional form of the associa-tion of a particular covariate with the outcome (e.g., linear e(xi) ⫽ Pr(Zi ⫽ 1 Xi xi)
or quadratic) Recently, the propensity score method where it is assumed that the Zi are independent, given the X's.
was compared to logistic regression in a simulation study with By using logistic regression analysis, for instance, for a low number of events and multiple confounders every subject a probability (propensity score) is estimated With respect to the sensitivity of the model misspecification that this subject would have been treated, on the basis of the (robustness) and empirical power, the authors found the measured covariates. Subjects in treatment and control groups propensity score method to be superior overall. With respect with (nearly) equal propensity scores will tend to have the to the empirical coverage probability, bias, and precision, they same distributions of the covariates used and can be consid- found the propensity score method to be superior only when ered similar. Once a propensity score has been computed, this the number of events per confounder was low (say, 7 or score can be used in three different ways to adjust for the less). When there were more events per confounder, logistic uncontrolled assignment of treatments: (i) as a matching regression performs better on the criteria of bias and cover- variable, (ii) as a stratification variable, and (iii) as a continu- age probability.
ous variable in a regression model (covariance adjustment).
Examples of the these methods can be found in two studies 3.5. Multivariate confounder score of the effect of early statin treatment on the short-term riskof death The multivariate confounder score was suggested by The most preferred methods are stratification and match- Miettinen as a method to adjust for confounding in ing, because with only one variable (the propensity score) case–control studies. Although Miettinen did not specifically the disadvantages noted in section 3.1 disappear and the propose this method to adjust for confounding in studies of clear interpretation and absence of model-based adjustments intended effects of treatment, the multivariate confounder remain as the main advantages. When classified into score is very similar to the propensity score, except that the quintiles or deciles, a stratified analysis on these strata of propensity score is not conditional on the outcome of interest, the propensity score is most simple to adopt. Within these whereas the multivariate confounder score is conditional on classes, most of the bias due to the measured confounders not being a case disappears. Matching, on the other hand, can be much more The multivariate confounder score has been evaluated for laborious because of the continuous scale of the propensity validity Theoretically and in simulation studies, this score. Various matching methods have been proposed. In score was found to exaggerate significance, compared to the all these methods, an important role is given to the distance propensity score. The point estimates in these simulations matrix, of which the cells are most often defined as simply were, however, similar for propensity score and multivariate the difference in propensity score between treated and un- confounder score.
treated patients. A distinction between methods can be madebetween pair-matching (one treated to one untreated patient) 3.6. Instrumental variables and matching with multiple controls (two, three, or four). Thelatter method should be used when the number of untreated A technique widely used in econometrics, but not yet patients is much greater than the number of treated patients; generally applied in medical research, is the use of instru- an additional gain in bias reduction can be reached when a mental variables (IV). This method can be used for the variable number per pair, instead of a fixed number, is used estimation of treatment effects (the effect of treatment on Another distinction can be made between greedy meth- the treated) in observational studies as an alternative to ods and optimal methods. A greedy method selects at random making causal inferences in RCTs. In short, an instrumental a treated patient and looks for an untreated patient with variable is an observable factor associated with the actual smallest distance to form a pair. In subsequent steps, all treatment but not directly affecting outcome. Unlike standard other patients are considered for which a match can be made regression models, two equations are needed to capture within a defined maximum distance. An optimal method, these relationships: on the other hand, takes the whole distance matrix into account to look for the smallest total distance between all i ⫽ α0 ⫹ α1Zi vi possible pairs. An optimal method combined with a variable Yi ⫽ β0⫹β1Di ⫹ εi number of controls should be the preferred method The method of propensity scores was evaluated in a simu- where Yi is outcome, Di is treatment, Zi is the instrumental lation study, and it was found that the bias due to omitted variable or assignment, and α1 ≠ 0. Both treatment D and confounders was of similar magnitude as for regression assignment Z can be either continuous or dichotomous. In adjustment The bias due to misspecification of the case of a dichotomous D, equation can be written as propensity score model was, however, smaller than the bias Di* ⫽ α0 ⫹ α1Zi ⫹ νi, where Di* is a latent index (Di* ⬎ due to misspecification of the multivariable regression 0 → Di ⫽ 1; otherwise Di ⫽ 0).
O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 By equation it is explicitly expressed that it is un- whereas it could reasonably be assumed that differential known how treatments are assigned (at least we know it was distance did not directly affect mortality.
not random) and that we like to explain why one is treated As stated above, the main limitation of instrumental vari- and the other is not by a variable Z. Substituting equation ables estimation is that it is based on the assumption that the instrumental variable only affects outcome by being apredictor for the treatment assignment and no direct predictor Yi ⫽ (β0 ⫹ β1α0) ⫹ β1α1Zi ⫹ (β1vi ⫹ εi) for the outcome (exclusion restriction). This assumption is The slope β1α1 can be estimated by least squares regression difficult to fulfill; more important, it is practically untestable.
and is, when Z is dichotomous, the difference in outcome Another limitation is that the treatment effect may not between Z ⫽ 0 and Z ⫽ 1 (i.e., the intention-to-treat estima- be generalizable to the population of patients whose treat- tor). In order to estimate the direct treatment effect β1 ment status was not determined by the instrumental variable.
of treatment D on outcome Y, this estimator β1α1 must be This problem is similar to that seen with RCTs, where esti- divided by α1, the effect of Z on D from equation As mated treatment effects may not be generalizable to a broader an illustration, it can be seen that in case of a perfect instru- population. Finally, when variation in the likelihood of re- ment (e.g., random assignment), a perfect relationship exists ceiving a particular therapy is small between groups of between Z and D and the parameter α1 ⫽ 1, in which case patients based on an instrumental variable, differences in the intention-to-treat estimator and the instrumental vari- outcome due to this differential use of the treatment may be able estimator coincide. By using two equations to describe very small and, hence, difficult to assess.
the problem, the implicit but important assumption ismade that Z has no effect on outcome Y other than through 3.7. Simultaneous equations and two-stage its effect on treatment D (cov[Z least squares ii] ⫽ 0). Other assumptions are that α1 ≠ 0 and that there is no subject i "who does the The method just described as instrumental variables is in opposite of its assignment" This is illustrated in the fol- fact a simple example of the more general methods of simul- lowing example.
taneous equations estimation, widely used in economics and One of the earliest examples of the use of instrumental econometrics. When there are only two simultaneous equa- variables (simultaneous equations) in medical research was tions and regression analysis is used this method is also in the study of Permutt and Hebel where the effect of known as two-stage least squares (TSLS) In the first smoking on birth weight was studied. The treatment con- stage treatment D is explained by one or more variables sisted of encouraging pregnant women to stop smoking.
that do not directly influence the outcome variable Y. In the The difference in mean birth weight between the treatment second stage this outcome is explained by the predicted groups, the intention-to-treat estimator (β1α1), was found to probability of receiving a particular treatment, which is be 92 g, whereas the difference in mean cigarettes smoked adjusted for measured and unmeasured covariates. An exam- per day was ⫺6.4. This leads to an estimated effect β2 of ple of this method is used to assess the effects of parental 92/⫺6.4 ⫽ ⫺15, meaning an increase of 15 g in birth weight drinking on the behavioral health of children Paren- for every cigarette per day smoked less. The assumption tal drinking (the treatment) is not randomized, probably that the encouragement to stop smoking (Z) does not affect associated with unmeasured factors (e.g., parental skills) and birth weight (Y) other than through smoking behavior seems estimated in the first stage by exogenous or instrumental plausible. Also the assumption that there is no woman who variables that explain and constrain parents drinking behav- did not stop smoking because she was encouraged to stop, ior (e.g., price, number of relatives drinking).
is probably fulfilled.
Because the method of simultaneous equations and two- Another example of the use of an instrumental variable can stage least squares covers the technique of instrumental vari- be found in the study of McClellan et al. where the ables, the same assumptions and limitations can be mentioned effect of cardiac catheterization on mortality was assessed.
here. We have chosen to elaborate the instrumental variables The difference in distance between their home and the near- approach, because in the medical literature these type of est hospital that performed cardiac catheterizations and the methods are more known under that name.
nearest hospital that did not perform this procedure, wasused as an instrumental variable. Patients with a relatively 3.8. Ecologic studies and grouped-treatment effects small difference in distance to both types of hospitals (⬍2.5miles) did not differ from patients with a larger difference Ample warning can be found in the literature against in distance to both types of hospitals (⭓2.5 miles) with the use of ecologic studies to describe relationships on the regard to observed characteristics such as age, gender, and individual level (the ecologic fallacy); a correlation found comorbidity; however, patients who lived relatively closer at the aggregated level (e.g., hospital) cannot be interpreted to a hospital that performed cardiac catheterizations more as a correlation at the patient level. Wen and Kramer often received this treatment (26%) compared to patients however, proposed the use of ecologic studies as a method who lived farther away (20%). Thus, the differential distance to deal with confounding at the individual level when affected the probability of receiving cardiac catheterization, intended treatment effects have to be estimated. In situations O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 where considerable variation in the utilization of treat- A sensitivity analysis can be a valuable tool in assessing ments exists across geographic areas independent of the the possible influence of an unmeasured confounder. This severity of disease but mainly driven by practice style, the method was probably first used by Cornfield et al. when "relative immunity from confounding by indication may they attacked Fisher's that the apparent asso- outweigh the ‘ecologic fallacy'" by performing an ecologic ciation between smoking and lung cancer could be explained study Of course, such ecologic studies have low statisti- by an unmeasured genetic confounder related to both smok- cal power by the reduced number of experimental units and ing and lung cancer. The problem of nonrandomized assign- tell us little about the individuals in the compared groups.
ment to treatments in observational studies can be thought Moreover, Naylor argues that the limitations of the of as a problem of unmeasured confounding factors. Instead proposed technique in order to remove confounding by indi- of stating that an unmeasured confounder can explain the cation are too severe to consider an aggregated analysis as treatment effect found, sensitivity analyses try to find a a serious alternative when individual level data are available.
lower bound for the magnitude of association between that An alternative method described in the literature is known confounder and the treatment variable. Lin et al. devel- as the grouped-treatment approach. Keeping the analysis at oped a general approach for assessing the sensitivity of the the individual level, the individual treatment variable will treatment effect to the confounding effects of unmeasured be replaced by an ecological or grouped-treatment variable, confounders after adjusting for measured covariates, assum- indicating the percentage of treated persons at the aggregated ing that the true treatment effect can be represented in a level With this method the relative immunity for con- regression model. The plausibility of the estimated treatment founding by indication by an aggregated analysis is com- effects will increase if the estimated treatment effects are bined with the advantage of correcting for variation at the insensitive over a wide range of plausible assumptions about individual level. In fact this method is covered by the method these unmeasured confounders.
of two-stage least squares, where in the first stage morevariables are allowed to assess the probability of receivingthe treatment. This method faces the same assumptions 5. Summary and discussion
as the instrumental variables approach discussed earlier.
Most important is the assumption that unmeasured variables Although randomized clinical trials remain the gold stan- do not produce an association between prognosis and the dard in the assessment of intended effects of drugs, observa- grouped-treatment variable, which in practice will be hard tional studies may provide important information on effectiveness under everyday circumstances and in sub-groups not previously studied in RCTs. The main defect inthese studies is the incomparability of groups, giving a possi- 4. Validations and sensitivity analyses
ble alternative explanation for any treatment effect found.
Horwitz et al. proposed to validate observational Thus, focus in such studies is directed toward adjustment studies by constructing a cohort of subjects in clinical prac- for confounding effects of covariates.
tice that is restricted by the inclusion criteria of RCTs. Simi- Along with standard methods of appropriate selection of larity in estimated treatment effects from the observational reference groups, stratification and matching, we discussed studies and the RCTs would provide empirical evidence for multivariable statistical methods such as (logistic) regression the validity of the observational method. Although this may and Cox proportional hazards regression to correct for con- be correct in specific situations it does not provide founding. In these models, the covariates, added to a model evidence for the validity of observational methods for the with ‘treatment' as the only explanation, give alternative evaluation of treatments in general explanations for the variation in outcome, resulting in a To answer the question whether observational studies corrected treatment effect. In fact, the main problem of bal- produce similar estimates of treatment effects compared to ancing the treatment and control groups according to some randomized studies, several authors have compared the re- covariates has been avoided. A method that more directly sults of randomized and nonrandomized studies for a number attacks the problem of imbalance between treatment and of conditions, sometimes based on meta-analyses control group, is the method of propensity scores. By trying In general, these reviews have concluded that the direction to explain this imbalance with measured covariates, a score of treatment effects assessed in nonrandomized studies is is computed which can be used as a single variable to match often, but not always, similar to the direction of the treatment both groups. Alternatively, this score can be used as a strati- effects in randomized studies, but that differences between fication variable or as a single covariate in a regression nonrandomized and randomized studies in the estimated magnitude of treatment effect are very common. Trials may In all these techniques, an important limitation is that under- or overestimate the actual treatment effect, and the adjustment can only be achieved for measured covariates, same is true for nonrandomized comparison of treatments.
implicating possible measurement error on these covari- Therefore, these comparisons should not be interpreted as ates (e.g., the severity of a past disease) and possible omis- true validations.
sion of other important, unmeasured covariates. A method O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 not limited by these shortcomings is a technique known as drugs are advised to be taken lifelong. Another purpose of instrumental variables. In this approach, the focus is on observational studies is to investigate the causes of interindi- finding a variable (the instrument) that is related to the vidual variability in drug response. Most causes of variability allocation of treatments, but is related to outcome only in drug response are unknown. Observational studies can because of its relation to treatment. This technique can also be used to assess the intended effects of drugs in patients achieve the same effect as randomization in bypassing the that were excluded from RCTs (e.g., very young patients, or usual way in which physicians allocate treatment according patients with different comorbidities and polypharmacy), to prognosis, but its rather strong assumptions limit its use or in patients that were studied in RCTs but who might still in practice. Related techniques are two-stage least squares respond differently (e.g., because of genetic differences).
and the grouped-treatment approach, sharing the same limi- Comparison between the presented methods to assess tations. All these methods are summarized in adjusted treatment effects in observational studies is mainly Given the limitations of observational studies, the evi- based on theoretical considerations, although some empirical dence in assessing intended drug effects from observational evidence is available. A more complete empirical evaluation studies will be in general less convincing than from well con- that compares the different adjustment methods with respect ducted RCTs. The same of course is true when RCTs are to the estimated treatment effects under several conditions not well conducted (e.g., lacking double blinding or exclu- will be needed to assess the validity of the different meth- sions after randomization). This means that due to differ- ods. Preference for one method or the other can be expressed ences in quality, size or other characteristics disagreement in terms of bias, precision, power, and coverage probability among RCTs is not uncommon In general we sub- of the methods, whereas the different conditions can be scribe to the view that observational studies including appro- defined by means of, for instance, the severity of the dis- priate adjustments are less suited to assess new intended ease, the number of covariates, the strength of association drug effects (unless the expected effect is very large), but can between covariates and outcome, the association among the certainly be valuable for assessing the long-term beneficial covariates, and the amount of overlap between the groups.
effects of drugs already proven effective in short-term RCTs.
These empirical evaluations can be performed with existing For instance, the RCTs of acetylsalicylic acid that demon- databases or computer simulations. Given the lack of empiri- strated the beneficial effects in the secondary prevention of cal evaluations for comparisons of the different methods and coronary heart disease were of limited duration, but these the importance of the assessment of treatment effects in Table 1Strengths and limitations of methods to assess treatment effects in nonrandomized, observational studies Design approaches Historical controls • Easy to identify comparison group • Treatment effect often biased Candidates for treatment • Useful for preliminary selection • Difficult to identify not treated candidates Treatments for the same indication • Similarity of prognostic factors • Only useful for diseases treated with • Only effectiveness of one drug compared to another Case–crossover and case–time–control • Reduced variability by intersubject • Only useful to assess time-limited effects • Possible crossover effects Stratification and (weighted) matching • Clear interpretation / no assumptions • Only a few covariates or rough categories • Clarity of incomparability on used • More covariates than with normal • Still limited number of covariates Common statistical techniques: • More covariates than matching or • Focus is not on balancing groups regression, logistic regression, • Adequate overlap between groups survival analysis • Easy to perform difficult to assess Propensity scores • Many covariates possible • Performs better with only a few number of events per confounder Multivariate confounder score • Less insensitive to • Exaggerates significance • Immune to confounding by indication • Loss of power by reduced number of units • Loss of information at the individual level Instrumental variables (IV), • Large differences per area are needed • Difficult to identify instrumental variable(s) two-stage least squares; • Strong assumption that IV is unrelated with factors directly affecting outcome O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 observational studies, more effort should be directed toward [22] Klungel OH, Heckbert SR, Longstreth WT Jr, Furberg CD, Kaplan RC, these evaluations.
Smith NL, Lemaitre RN, Leufkens HG, de Boer A, Psaty BM. Antihy-pertensive drug therapies and the risk of ischemic stroke. Arch InternMed 2001;161:37–43.
[23] Abi-Said D, Annegers JF, Combs-Cantrell D, Suki R, Frankowski RF, Willmore LJ. A case–control evaluation of treatment efficacy: theexample of magnesium sulfate prophylaxis against eclampsia in pa- [1] Friedman LM, Furberg CD, DeMets DL. Fundamentals of clinical tients with preeclampsia. J Clin Epidemiol 1997;50:419–23.
trials. St Louis: Mosby-Year Book; 1996.
[24] Concato J, Peduzzi P, Kamina A, Horwitz RI. A nested case–control [2] Chalmers I. Why transition from alternation to randomisation in clini- study of the effectiveness of screening for prostate cancer: research cal trials was made [Letter]. BMJ 1999;319:1372.
design. J Clin Epidemiol 2001;54:558–64.
[3] Schulz KF, Grimes DA. Allocation concealment in randomised trials: [25] Maclure M. The case–crossover design: a method for studying tran- defending against deciphering. Lancet 2002;359:614–8.
sient effects on the risk of acute events. Am J Epidemiol 1991;133: [4] Urbach P. The value of randomization and control in clinical trials.
Stat Med 1993;12:1421–31; discussion 1433–41.
[26] Greenland S. Confounding and exposure trends in case–crossover and [5] Feinstein AR. Current problems and future challenges in randomized case–time–control designs. Epidemiology 1996;7:231–9.
clinical trials. Circulation 1984;70:767–74.
[27] Suissa S. The case–time–control design. Epidemiology 1995;6: [6] Gurwitz JH, Col NF, Avorn J. The exclusion of the elderly and women from clinical trials in acute myocardial infarction. JAMA 1992;268: [28] Suissa S. The case–time–control design: further assumptions and con- ditions. Epidemiology 1998;9:441–5.
[7] Wieringa NF, de Graeff PA, van der Werf GT, Vos R. Cardiovascular [29] Cochran WG. The effectiveness of adjustment by subclassification in drugs: discrepancies in demographics between pre- and post-registra- removing bias in observational studies. Biometrics 1968;24:295–313.
tion use. Eur J Clin Pharmacol 1999;55:537–44.
[30] Rubin DB. Estimating causal effects from large data sets using propen- [8] MacMahon S, Collins R. Reliable assessment of the effects of treat- sity scores. Ann Intern Med 1997;127:757–63.
ment on mortality and major morbidity, II: observational studies.
[31] Cook EF, Goldman L. Asymmetric stratification: an outline for an efficient method for controlling confounding in cohort studies. AmJ Epidemiol 1988;127:626–39.
[9] McKee M, Britton A, Black N, McPherson K, Sanderson C, Bain C.
[32] Psaty BM, Koepsell TD, Lin D, Weiss NS, Siscovick DS, Rosendaal FR, Methods in health services research. Interpreting the evidence: choos- Pahor M, Furberg CD. Assessment and control for confounding by ing between randomised and non-randomised studies. BMJ 1999;319: indication in observational studies. J Am Geriatr Soc 1999;47:749–54.
[33] Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events [10] Concato J, Shah N, Horwitz RI. Randomized, controlled trials, obser- per independent variable in proportional hazards regression analysis. II.
vational studies, and the hierarchy of research designs. N Engl J Med Accuracy and precision of regression estimates. J Clin Epidemiol [11] Grodstein F, Clarkson TB, Manson JE. Understanding the divergent [34] Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simula- data on postmenopausal hormone therapy. N Engl J Med 2003;348: tion study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996;49:1373–9.
[12] Beral V, Banks E, Reeves G. Evidence from randomised trials on the [35] Rosenbaum PR, Rubin DB. The central role of the propensity score long-term effects of hormone replacement therapy. Lancet 2002; in observational studies for causal effects. Biometrika 1983;70:41–55.
[36] D'Agostino RB Jr. Tutorial in biostatistics: propensity score methods [13] Messerli FH. Case–control study, meta-analysis, and bouillabaisse: for bias reduction in the comparison of a treatment to a non-random- putting the calcium antagonist scare into context [Editorial]. Ann ized control group. Stat Med 1998;17:2265–81.
Intern Med 1995;123:888–9.
[37] Stenestrand U, Wallentin L. Early statin treatment following acute [14] Grobbee DE, Hoes AW. Confounding and indication for treatment in myocardial infarction and 1-year survival. JAMA 2001;285:430–6.
evaluation of drug treatment for hypertension. BMJ 1997;315:1151–4.
[38] Aronow HD, Topol EJ, Roe MT, Houghtaling PL, Wolski KE, [15] Rosenbaum PR. Observational studies. 2nd edition. New York: Lincoff AM, Harrington RA, Califf RM, Ohman EM, Kleiman NS, Springer; 2002.
Keltai M, Wilcox RG, Vahanian A, Armstrong PW, Lauer MS. Effect [16] Sacks H, Chalmers TC, Smith H Jr. Randomized versus historical of lipid-lowering therapy on early mortality after acute coronary syn- controls for clinical trials. Am J Med 1982;72:233–40.
dromes: an observational study. Lancet 2001;357:1063–8.
[17] Kalra L, Yu G, Perez I, Lakhani A, Donaldson N. Prospective cohort [39] Rosenbaum PR, Rubin DB. Constructing a control group using multi- study to determine if trial efficacy of anticoagulation for stroke preven- variate matched sampling methods that incorporate the propensity tion in atrial fibrillation translates into clinical effectiveness. BMJ score. Am Stat 1985;39:33–8.
[40] Ming K, Rosenbaum PR. Substantial gains in bias reduction from [18] Ioannidis JP, Polycarpou A, Ntais C, Pavlidis N. Randomised trials matching with a variable number of controls. Biometrics 2000;56: comparing chemotherapy regimens for advanced non-small cell lung cancer: biases and evolution over time. Eur J Cancer 2003;39: [41] Drake C. Effects of misspecification of the propensity score on estima- tors of treatment effect. Biometrics 1993;49:1231–6.
[19] Klungel OH, Stricker BH, Breteler MM, Seidell JC, Psaty BM, de [42] Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic Boer A. Is drug treatment of hypertension in clinical practice as regression versus propensity score when the number of events is low effective as in randomized controlled trials with regard to the reduction and there are multiple confounders. Am J Epidemiol 2003;158:280–7.
of the incidence of stroke? Epidemiology 2001;12:339–44.
[43] Miettinen OS. Stratification by a multivariate confounder score. Am J [20] Johnston SC. Identifying confounding by indication through blinded prospective review. Am J Epidemiol 2001;154:276–84.
[44] Pike MC, Anderson J, Day N. Some insights into Miettinen's multivar- [21] Psaty BM, Heckbert SR, Koepsell TD, Siscovick DS, Raghunathan TE, iate confounder score approach to case–control study analysis. Epide- Weiss NS, Rosendaal FR, Lemaitre RN, Smith NL, Wahl PW. The miol Community Health 1979;33:104–6.
risk of myocardial infarction associated with antihypertensive drug [45] Newhouse JP, McClellan M. Econometrics in outcomes research: the therapies. JAMA 1995;274:620–5.
use of instrumental variables. Annu Rev Public Health 1998;19:17–34.
O.H. Klungel et al. / Journal of Clinical Epidemiology 57 (2004) 1223–1231 [46] Angrist JD, Imbens GW, Rubin DB. Identification of causal effects results of randomized controlled clinical trials of coronary artery using instrumental variables. J Am Stat Assoc 1996;91:444–55.
bypass surgery. J Am Coll Cardiol 1988;11:237–45.
[47] Permutt T, Hebel JR. Simultaneous-equation estimation in a clinical [56] Benson K, Hartz AJ. A comparison of observational studies and trial of the effect of smoking on birth weight. Biometrics 1989;45: randomized, controlled trials. N Engl J Med 2000;342:1878–86.
[57] Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektoni- [48] McClellan M, McNeil BJ, Newhouse JP. Does more intensive treat- dou MG, Contopoulos-Ioannidis DG, Lau J. Comparison of evidence ment of acute myocardial infarction in the elderly reduce mortality? of treatment effects in randomized and nonrandomized studies. JAMA Analysis using instrumental variables. JAMA 1994;272:859–66.
[49] Angrist JD, Imbens GW. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J Am Stat [58] Kunz R, Oxman AD. The unpredictability paradox: review of empiri- cal comparisons of randomised and non-randomised clinical trials.
[50] Snow Jones A, Miller DJ, Salkever DS. Parental use of alcohol and children's behavioural health: a household production analysis. Health [59] Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. Smoking and lung cancer: recent evidence and a discus- [51] Wen SW, Kramer MS. Uses of ecologic studies in the assessment of sion of some questions. J Natl Cancer Inst 1959;22:173–203.
intended treatment effects. J Clin Epidemiol 1999;52:7–12.
[60] Fisher RA. Lung cancer and cigarettes? Nature 1958;182:108.
[52] Naylor CD. Ecological analysis of intended treatment effects: caveat [61] Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regres- emptor. J Clin Epidemiol 1999;52:1–5.
sion results to unmeasured confounders in observational studies. Bio- [53] Johnston SC, Henneman T, McCulloch CE, van der Laan M. Modeling treatment effects on binary outcomes with grouped-treatment variables [62] LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F. Dis- and individual covariates. Am J Epidemiol 2002;156:753–60.
[54] Horwitz RI, Viscoli CM, Clemens JD, Sadock RT. Developing im- crepancies between meta-analyses and subsequent large randomized, proved observational methods for evaluating therapeutic effectiveness.
controlled trials. N Engl J Med 1997;337:536–42.
Am J Med 1990;89:630–8.
[63] Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of [55] Hlatky MA, Califf RM, Harrell FE Jr, Lee KL, Mark DB, Pryor DB.
bias: dimensions of methodological quality associated with estimates Comparison of predictions based on observational data with the of treatment effects in controlled trials. JAMA 1995;273:408–12.


Biol Trace Elem ResDOI 10.1007/s12011-013-9732-6 Biomonitoring with Honeybees of Heavy Metalsand Pesticides in Nature Reserves of the MarcheRegion (Italy) Sara Ruschioni & Paola Riolo & Roxana Luisa Minuz &Mariassunta Stefano & Maddalena Cannella &Claudio Porrini & Nunzio Isidoro Received: 29 April 2013 / Accepted: 6 June 2013 # Springer Science+Business Media New York 2013

Promega notes 100: novel biosensors to monitor cellular events in live cells

LIVE-CELL BIOSENSOR Novel Biosensors to Monitor Cellular Events in Live Cells Review of Fan, F. et al. (2008) Novel genetically encoded biosensors using firefly luciferase. ACS Chem. Biol. 3, 346–51. Neal Cosby, Promega Corporation entists targeted the hinge region of the luciferase mol- Drug discovery and life science researchers desire to