[go: nahoru, domu]

Quartile: Difference between revisions

Content deleted Content added
→‎Outliers: Removed an unnecessary sentence and noted that the Excel function QUARTILE is a legacy function for Excel 2007 or earlier.
 
(6 intermediate revisions by 4 users not shown)
Line 6:
* The second quartile (''Q''<sub>2</sub>) is the [[median]] of a data set; thus 50% of the data lies below this point.
* The third quartile (''Q''<sub>3</sub>) is the 75th percentile where lowest 75% data is below this point. It is known as the ''upper'' quartile, as 75% of the data lies below this point.<ref name=":0">{{Cite book |author=Dekking, Michel <!--1946– --> |url=https://archive.org/details/modernintroducti0000unse_h6a1 |title=A modern introduction to probability and statistics: understanding why and how |date=2005 |publisher=Springer |isbn=978-1-85233-896-1 |location=London |pages=[https://archive.org/details/modernintroducti0000unse_h6a1/page/236/ 236-238] |oclc=262680588 |url-access=limited}}</ref>
Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a [[five-number summary]] of the data. This summary is important in statistics because it provides information about both the [[Mean (Statistics)|center]] and the [[Statistical dispersion|spread]] of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is [[Skewness|skewed]] toward one side. Since quartiles divide the number of data points evenly, the [[Range (statistics)|range]] is generally not the same between adjacent quartiles (i.e. usually (''Q''<sub>3</sub> - ''Q''<sub>2</sub>) ≠ (''Q''<sub>2</sub> - ''Q''<sub>1</sub>)). [[Interquartile range]] (IQR) is defined as the difference between the 75th and 25th percentiles or ''Q''<sub>3</sub> - ''Q''<sub>21</sub>. While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of [[outlier]]s in the data, and the difference in spread between the middle 50% of the data and the outer data points.<ref>{{Cite web |url=https://magoosh.com/statistics/quartiles-used-statistics/ |archive-url=https://web.archive.org/web/20191210060305/https://magoosh.com/statistics/quartiles-used-statistics/ |archive-date=2019-12-10 |url-status=deviated |title=How are Quartiles Used in Statistics? |last=Knoch |first=Jessica |date=February 23, 2018 |website=[[Magoosh]] |access-date=February 24, 2023}}{{cbignore}}</ref>
 
== Definitions ==
Line 49:
==== Method 1 ====
 
# Use the [[median]] to divide the ordered data set into two- halves. The median becomes the second quartiles.
#* If there are an odd number of data points in the original ordered data set, '''do not include''' the median (the central value in the ordered list) in either half.
#* If there are an even number of data points in the original ordered data set, split this data set exactly in half.
Line 57:
==== Method 2 ====
 
# Use the median to divide the ordered data set into two- halves. The median becomes the second quartiles.
#* If there are an odd number of data points in the original ordered data set, '''include''' the median (the central value in the ordered list) in both halves.
#* If there are an even number of data points in the original ordered data set, split this data set exactly in half.
Line 65:
==== Method 3 ====
 
# Use the median to divide the ordered data set into two- halves. The median becomes the second quartiles.
## If there are odd numbers of data points, then go to the next step.
## If there are even numbers of data points, then the Method 3 starts off the same as the Method 1 or the Method 2 above and you can choose to include or not include the median as a new datapoint. If you choose to include the median as the new datapoint, then proceed to the step 2 or 3 below because you now have an odd number of datapoints. If you do not choose the median as the new data point, then continue the Method 1 or 2 where you have started.
Line 112:
 
==== Example 2 ====
Ordered Data Set (of an oddeven number of data points): 7, 15, '''36, 39''', 40, 41.
 
The bold numbers (36, 39) are used to calculate the median as their average. As there are an even number of data points, the first three methods all give the same results. (The Method 3 is executed such that the median is not chosen as a new data point and the Method 1 started.)
Line 153:
There are methods by which to check for [[outliers]] in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest.<ref>{{Cite journal|last=Walfish|first=Steven|date=November 2006|title=A Review of Statistical Outlier Method|url=http://www.statisticaloutsourcingservices.com/|journal=Pharmaceutical Technology}}</ref> Outliers could also be evidence of a sample population that has a non-normal distribution or of a contaminated population data set. Consequently, as is the basic idea of [[descriptive statistics]], when encountering an [[outlier]], we have to explain this value by further analysis of the cause or origin of the outlier. In cases of extreme observations, which are not an infrequent occurrence, the typical values must be analyzed. The [[Interquartile Range]] (IQR), defined as the difference between the upper and lower quartiles (<math display="inline">Q_3 - Q_1 </math>), may be used to characterize the data when there may be extremities that skew the data; the [[interquartile range]] is a relatively [[robust statistic]] (also sometimes called "resistance") compared to the [[Range (statistics)|range]] and [[standard deviation]]. There is also a mathematical method to check for outliers and determining "fences", upper and lower limits from which to check for outliers.
 
After determining the first (lower) and third (upper) quartiles (<math display="inline">Q_1</math> and <math display="inline">Q_3</math> respectively) and the interquartile range (<math display="inline">\textrm{ICQIQR} = Q_3 - Q_1 </math>) as outlined above, then fences are calculated using the following formula:
 
: <math>\text{Lower fence} = Q_1 - (1.5 \times \mathrm{IQR}) </math>
Line 257:
* [http://www.hackmath.net/en/calculator/quartile-q1-q2-q3-calculation Quartiles calculator] – simple quartiles calculator
* [http://www.vias.org/tmdatanaleng/cc_quartile.html Quartiles] – An example how to calculate it
* [https://quartilecalculator.net/ Quartiles Calculator] – online quartile and interquartile range calculator
 
[[Category:Summary statistics]]