Example: Distortion of results due to outliersYou calculate the average running time for all participants using your data. This type of outlier is problematic because it’s inaccurate and can distort your research results. This data point is a big outlier in your dataset because it’s much lower than all of the other times. You record this timing as their running time. Outliers that don’t represent true values can come from many possible sources:Įxample: Other outliersYou repeat your running time measurements for a new sample.įor one of the participants, you accidentally start the timer midway through their sprint. It’s important to select appropriate statistical tests or measures when you have a skewed distribution or many outliers. True outliers are also present in variables with skewed distributions where many data points are spread far from the mean in one direction. But these extreme values also represent natural variations because a variable like running time is influenced by many other factors. Most values are centered around the middle, as expected. Your data are normally distributed with a couple of outliers on either end. Example: True outlierYou measure 100-meter running times for a representative sample of 560 college students. True outliers should always be retained in your dataset because these just represent natural variations in your sample. What you should do with an outlier depends on its most likely cause. Other outliers may result from incorrect data entry, equipment malfunctions, or other measurement errors.Īn outlier isn’t always a form of dirty or incorrect data, so you have to be careful with them in data cleansing. Some outliers represent true values from natural variation in the population. Outliers are values at the extreme ends of a dataset. Frequently asked questions about outliers.Example: Using the interquartile range to find outliers.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |