Our Blog

Coping with missing values in clinical trials

When it comes to clinical trial statistical analysis, missing values are a major challenge that we need to address. Have they ever been a problem for you? If so, or if you just want to learn more about them, keep on reading!
By Mercedes Ovejero Bruna (Senior Statistician/Data Scientist) and Iratxe Herráez Sánchez-Mariscal (Junior Statistician)
Biostatistics and Data Management Unit at Sermes CRO

What are missing values?

Missing values are information losses that take place when one or more values are not stored (or available) in any of the patient’s variables. Although undesirable, missing data is something usual in clinical trials, despite all the efforts we make to avoid this situation. What is more, they can have a significant effect on the conclusions that are to be drawn from the data, as they reduce the power of the study and, in some cases, they may result in significant biases (Dziura et al., 2013).

Some of the most common causes of missing values are (Mack et al., 2018):

Unanswered questions: it happens when the Case Report Form (CRF) is completed without providing a value for one or more elements. It is the most frequent cause of missing values, especially when questionnaires are used to evaluate any of the clinical trial variables.
Left truncation: this is a form of selection bias, it arises from events of interest that happened before the recruitment of a patient, and typically, that recruitment is anticipated.
Events occurred during the clinical trial: this type of circumstance is related to loss to follow-up, patients’ drop out of the study, patients’ death, etc. If the patient quits the study before its conclusion, no information will be available from that moment on.

From the data analysis perspective, there are three categories of missing values (Allison, 2001; Mirzaei et al., 2022; Rubin, 1987):

Missing Completely At Random (MCAR): In this category, the probability of encountering a missing value is not related to any observed or unobserved variable. Therefore, this type of missing value is associated with a random process. For instance, the probability of missing data is the same for individuals in different treatment groups and for those who have different disease severity or response to the treatment. MCAR data result in unbiased data analysis, yet it does generate a loss of precision and power in the analyses. Having said that, MCAR data are not usual in clinical trials.
Missing At Random (MAR): these cases occur when the probability of missing data is related to the observed variables. This category may be confusing, so let us look at it with an example. If we use a quality-of-life questionnaire and it is observed that item number 5 tends to be unanswered more frequently by older patients compared to younger ones, the mechanism of missing values is associated with age, a different variable of quality-of-life. The origin of these missing values may be due to characteristics of the participants or other variables such as study design, consequences of treatment, etc. This is the reason it is necessary to conduct a good study design and predict possible outcomes to minimize the risk of missing values.
Missing not at Random (MNAR): In this category, missing values are related to the values of the variable itself, even after controlling other variables. For instance, patients with more serious symptoms do not answer a questionnaire about their symptoms compared to patients with milder symptoms. In this case, it can be said that the severity of patient’s symptoms is directly influencing the probability of having missing values.

Sometimes it is not easy to identify the typology of missing values, however, there are guidelines that can be helpful to identify if there is a pattern in the missing values, or if certain variables are related to a greater probability of missing values. For example, variables that have missing values can be visualized and the relationship between the appearance of missing values and a certain pattern in the study variables can be analysed. This would make it possible to the detect situations in which MAR and MNAR are involved. For this visual inspection, there are R packages such as VIM (Kowarik and Templ, 2016) and naniar (Tener et al., 2021) that allow, in a straightforward way, to understand the pattern of missing values. There are also omnibus statistical tests to study if missing data of MCAR type, such as the implemented one in the missmech package in R (Jamshidian et al., 2014).

Strategies for coping with missing values

The treatment of missing values is therefore of utmost importance, since a failure to consider missing values and their mechanism during the analysis can be misleading (Kang, 2013). That is why there are different strategies for coping with missing values (Jakobsen et al., 2017):

Use only data from participants who have completed the whole trial with no missing data.
Use all available data.
Impute values (either by single or multiple imputations) for missing data and conduct an analysis as if all the data has initially been available.
Develop an analysis that includes a model for missing data processing.

However, not all methods are suitable on every occasion since it depends on the type of missing value involved, and its amount. Those factors will guide the methodology to be applied. The figure below indicates the methods recommended for each type of missing value as well as practices that are not appropriate (Dziura et al., 2013; Fielding et al., 2008).

If you are working with R, packages like mice (van Buuren and Groothuis-Oudshoorn, 2011) and Amelia (Honaker et al., 2011) are two examples of versatile missing value implementations.

The role of sensitivity analysis

As we have seen so far, missing values in clinical trials are unintentional, but unfortunately unavoidable. When missing values are encountered, an additional complexity is derived from this, because every single statistical analysis makes assumptions about the distribution of the unobserved values that cannot be corroborated. If an incorrect assumption is made, the obtained treatment effect and its standard error will be biased, resulting in misleading inferences. Since the actual value of this data cannot be known, it is necessary to evaluate the impact of the approach by considering a sensitivity analysis (EMA, 2010).

Sensitivity analysis can be defined as a set of analysis in which data is managed differently compared to the primary analysis. Sensitivity analysis can show how assumptions, different from those made in the primary analysis, influence the results obtained (Jakobsen et al., 2017). Sensitivity analysis are to be specified either in the Clinical Trial Protocol or in the Statistical Analysis Plan before the study takes place, in no case should it be stipulated afterwards (Mack et al., 2018).

In conclusion…

The strategy to avoid missing values entails considering all the development phases of a clinical trial, from study design to final data analysis, implementing methods that minimize the risk of missing data, as well as having action plans that allow its detection and treatment (Pugh et al., 2022). This is the reason Sermes team works directly or indirectly so that the impact of missing values can be reduced and studies can be properly carried out. Tasks such as the design of the patient monitoring plan, the calculation of the sample size or the design of the CRF are critical to achieve this objective.

From a data analysis point of view, there is no single universal method for dealing with missing values that provides similar outcomes as an analysis with complete data. The best strategy should start from studying the assumptions and causes that produce these missing values and inspecting those to discover underlying mechanisms that can be helpful to identify missing values and its occurrence.

In clinical trials, it can usually be assumed that missing data belong to the MAR or MNAR categories, therefore implying, the need to implement adapted methodology for this type of case study, discarding traditional practices that have shown their lack of reliability and even greater likelihood of biased conclusions. Finally, sensitivity analysis are highly recommended to be conducted to evaluate potential biases in the results (Cro et al., 2020).

References

Allison, P. D. (2001). Missing Data. Sage publications.

Cro, S., Morris, T. P., Kenward, M. G., & Carpenter, J. R. (2020). Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Statistics in medicine, 39(21), 2815-2842.

Dziura, J. D., Post, L. A., Zhao, Q., Fu, Z., & Peduzzi, P. (2013). Strategies for dealing with missing data in clinical trials: from design to analysis. The Yale journal of biology and medicine, 86(3), 343.

European Medicines Agency (EMA) (2010). Committee for Medicinal Products for Human Use. Guideline on Missing Data in Confirmatory Clinical Trials. Available in: https://www.ema.europa.eu/en/missing-data-confirmatory-clinical-trials.

Fielding, S., Fayers, P. M., McDonald, A., McPherson, G., & Campbell, M. K. (2008). Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data. Health and Quality of Life Outcomes, 6(1), 1-9.

Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: A Program for Missing Data. Journal of Statistical Software, 45(7), 1-47. URL https://www.jstatsoft.org/v45/i07/.

Jakobsen, J. C., Gluud, C., Wetterslev, J., & Winkel, P. (2017). When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC medical research methodology, 17(1), 1-10.

Jamshidian, M., Jalal, S., & Jansen, C. (2014). MissMech: An R package for testing homoscedasticity, multivariate normality, and missing completely at random (MCAR). Journal of Statistical software, 56, 1-31.

Kang H. (2013). The prevention and handling of the missing data. Korean journal of anesthesiology, 64(5), 402–406. https://doi.org/10.4097/kjae.2013.64.5.402.

Kowarik, A. & Templ, M. (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07.

Mack C, Su Z, Westreich D. Managing Missing Patient Data in Patient Registries. White Paper, addendum to Registries for Evaluating Patient Outcomes: A User’s Guide, Third Edition. (Prepared by L&M Policy Research, LLC, under Contract No. 290-2014-00004-C.) AHRQ Publication No. 17(18)-EHC015-EF. Rockville, MD: Agency for Healthcare Research and Quality; February 2018. www.effectivehealthcare.ahrq.gov. DOI: https://doi.org/10.23970/AHRQREGISTRIESMISSDATA.

Pugh, S. L., Brown, P. D., & Enserro, D. (2022). Missing repeated measures data in clinical trials. Neuro-Oncology Practice, 9(1), 35-42.

Rubin, D. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, LTD.

Tierney, N., Di Cook, M., McBain, M. & Fay, C. (2021). naniar: Data Structures, Summaries, and Visualisations for Missing Data. R package version 0.6.1. https://CRAN.R-project.org/package=naniar.

van Buuren, S, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. DOI 10.18637/jss.v045.i03.