

The graph below incorporates measurement for influence, outcome and predictor outliers for a data set comprised of 20 observations with one predictor variable. It incorporates both outcome (residuals) and predictor (leverage) observations in its calculations, but more importantly tells you how much a case affects the model. Like the residuals, values far from 0 and the rest of the residuals indicate outliers on X.Ĭook’s distance is a measure of influence–how much each observation affects the predicted values. It measures the distance between a case’s X value and the mean of X. Leverage is a measurement of outliers on predictor variables.

Values far from 0 and the rest of the residuals indicate outliers on Y. Studentized residuals are a way to find outliers on the outcome variable. An observation is considered influential if excluding the observation alters the coefficients of the model. Some options are useful for identifying outcome outliers while others identify predictor outliers.Ī third group of options are useful for identifying influential observations (since not all outliers are influential). Why so many? One reason is no one measure can tell you everything you need to know about your outliers. There are 17 options for using this command. Predict is very important for detecting outliers and determining their impact on your model. One widely-used post-estimation command for linear regression is predict. As the name implies, all post-estimation commands are run after running the model (regression, logit, mixed, etc). This is done in Stata via post-estimation commands. In our upcoming Linear Models in Stata workshop, we will explore ways to find observations that influence the model. But first you remember that one or more variables had a few outliers. You look at your tables and interpret the results. Running the model is the moment of excitement. You put a lot of work into preparing and cleaning your data.
