Multiple Imputation in a Nutshell

The Analysis Factor

Published May 22, 2024

Imputation as an approach to missing data has been around for decades.

You probably learned about mean imputation in methods classes, only to be told to never do it for a variety of very good reasons. Mean imputation, in which each missing value is replaced, or imputed, with the mean of observed values of that variable, is not the only type of imputation, however.

Two Criteria for Better Imputations

Better, although still problematic, imputation methods have two qualities. They use other variables in the data set to predict the missing value, and they contain a random component.

Using other variables preserves the relationships among variables in the imputations. It feels like cheating, but it isn't. It ensures that any estimates of relationships using the imputed variable are not too low. Sure, underestimates are conservatively biased, but they're still biased.

The random component is important so that all missing values of a single variable are not exactly equal. Why is that important? If all imputed values are equal, standard errors for statistics using that variable will be artificially low.

There are a few different ways to meet these criteria. One example would be to use a regression equation to predict missing values, then add a random error term.

Although this approach solves many of the problems inherent in mean imputation, one problem remains. Because the imputed value is an estimate--a predicted value--there is uncertainty about its true value. Every statistic has uncertainty, measured by its standard error. Statistics computed using imputed data have even more uncertainty than their standard errors measure.

Your statistical package cannot distinguish between an imputed value and a real value.

Since the standard errors of statistics based on imputed values are too small, corresponding p-values are also too small. P-values that are reported as smaller than they actually are? Those lead to Type I errors.

How Multiple Imputation Works

Multiple imputation solves this problem by incorporating the uncertainty inherent in imputation. It has four steps:

Multiple Imputation in a Nutshell

The Analysis Factor

Two Criteria for Better Imputations

How Multiple Imputation Works

Recommended by LinkedIn

More articles by this author

Insights from the community

Others also viewed

Best Tips for Data Scientists

Bias-Variance tradeoff

Can Likert Scale Data ever be Continuous?

Decision Tree Classification

Addressing Normality in Latent Profile Analysis (LPA) and Latent Class Analysis (LCA)

Introduction to Group Feature Selection

Understanding the Minimum Description Length Principle: A Balance Between Model Complexity and Data Fit

Data Cleansing for CDTOs: Don't Let Dirty Data Poison Your Decisions

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

How Projection Expressions Help You Optimize Costs And Performance

Explore topics

Two Criteria for Better Imputations

How Multiple Imputation Works

Recommended by LinkedIn

Four Weeds of Data Analysis That are Easy to Get Lost In

Aug 15, 2024

The Unstructured Covariance Matrix: When it Does and Doesn't Work

Aug 7, 2024

Outliers: To Drop or Not to Drop

Jun 26, 2024

The 3 Stages of Mastering Statistical Analysis

Jun 19, 2024

Beyond R-squared: Assessing the Fit of Regression Models

Jun 12, 2024

When To Fight For Your Analysis and When To Jump Through Hoops

Jun 5, 2024

EM Imputation and Missing Data: Is Mean Imputation Really so Terrible?

May 29, 2024

When Assumptions of ANCOVA are Irrelevant

May 15, 2024

What’s in a Name? Moderation and Interaction, Independent and Predictor Variables

May 8, 2024

The Difference Between Interaction and Association

May 1, 2024

Insights from the community

Others also viewed

Best Tips for Data Scientists

Bias-Variance tradeoff

Can Likert Scale Data ever be Continuous?

Decision Tree Classification

Addressing Normality in Latent Profile Analysis (LPA) and Latent Class Analysis (LCA)

Introduction to Group Feature Selection

Understanding the Minimum Description Length Principle: A Balance Between Model Complexity and Data Fit

Data Cleansing for CDTOs: Don't Let Dirty Data Poison Your Decisions

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

How Projection Expressions Help You Optimize Costs And Performance

Explore topics