[go: nahoru, domu]

Quartile: Difference between revisions

Content deleted Content added
→‎Example 2: Added how the Method 3 is executed in the Example 2.
 
(10 intermediate revisions by 5 users not shown)
Line 6:
* The second quartile (''Q''<sub>2</sub>) is the [[median]] of a data set; thus 50% of the data lies below this point.
* The third quartile (''Q''<sub>3</sub>) is the 75th percentile where lowest 75% data is below this point. It is known as the ''upper'' quartile, as 75% of the data lies below this point.<ref name=":0">{{Cite book |author=Dekking, Michel <!--1946– --> |url=https://archive.org/details/modernintroducti0000unse_h6a1 |title=A modern introduction to probability and statistics: understanding why and how |date=2005 |publisher=Springer |isbn=978-1-85233-896-1 |location=London |pages=[https://archive.org/details/modernintroducti0000unse_h6a1/page/236/ 236-238] |oclc=262680588 |url-access=limited}}</ref>
Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a [[five-number summary]] of the data. This summary is important in statistics because it provides information about both the [[Mean (Statistics)|center]] and the [[Statistical dispersion|spread]] of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is [[Skewness|skewed]] toward one side. Since quartiles divide the number of data points evenly, the [[Range (statistics)|range]] is generally not the same between adjacent quartiles (i.e. usually (''Q''<sub>3</sub> - ''Q''<sub>2</sub>) ≠ (''Q''<sub>2</sub> - ''Q''<sub>1</sub>)). [[Interquartile range]] (IQR) is defined as the difference between the 75th and 25th percentiles or ''Q''<sub>3</sub> - ''Q''<sub>21</sub>. While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of [[outlier]]s in the data, and the difference in spread between the middle 50% of the data and the outer data points.<ref>{{Cite web |url=https://magoosh.com/statistics/quartiles-used-statistics/ |archive-url=https://web.archive.org/web/20191210060305/https://magoosh.com/statistics/quartiles-used-statistics/ |archive-date=2019-12-10 |url-status=deviated |title=How are Quartiles Used in Statistics? |last=Knoch |first=Jessica |date=February 23, 2018 |website=[[Magoosh]] |access-date=February 24, 2023}}{{cbignore}}</ref>
 
== Definitions ==
Line 49:
==== Method 1 ====
 
# Use the [[median]] to divide the ordered data set into two- halves. The median becomes the second quartiles.
#* If there are an odd number of data points in the original ordered data set, '''do not include''' the median (the central value in the ordered list) in either half.
#* If there are an even number of data points in the original ordered data set, split this data set exactly in half.
Line 57:
==== Method 2 ====
 
# Use the median to divide the ordered data set into two- halves. The median becomes the second quartiles.
#* If there are an odd number of data points in the original ordered data set, '''include''' the median (the central value in the ordered list) in both halves.
#* If there are an even number of data points in the original ordered data set, split this data set exactly in half.
Line 65:
==== Method 3 ====
 
# Use the median to divide the ordered data set into two- halves. The median becomes the second quartiles.
## If there are odd numbers of data points, then go to the next step.
## If there are even numbers of data points, then the Method 3 starts off the same as the Method 1 or the Method 2 above and you can choose to include or not include the median as a new datapoint. If you choose to include the median as the new datapoint, then proceed to the step 2 or 3 below because you now have an odd number of datapoints. If you do not choose the median as the new data point, then continue the Method 1 or 2 where you have started.
Line 112:
 
==== Example 2 ====
Ordered Data Set (of an oddeven number of data points): 7, 15, '''36, 39''', 40, 41.
 
The bold numbers (36, 39) are used to calculate the median as their average. As there are an even number of data points, the first three methods all give the same results. (The Method 3 is executed such that the median is not chosen as a new data point and the Method 1 started.)
Line 148:
<math>F_X(x) = P(X \leq x)</math>.<ref name=":0" />
 
The [[Cumulative distribution function|CDF]] gives the probability that the random variable <math>X</math> is less than or equal to the value <math>x</math>. Therefore, the first quartile is the value of <math>x</math> when <math>F_X(x) = 0.25</math>, the second quartile is <math>x</math> when <math>F_X(x) = 0.5</math>, and the third quartile is <math>x</math> when <math>F_X(x) = 0.75</math>.<ref>{{Cite web|url=https://math.bme.hu/~nandori/Virtual_lab/stat/dist/CDF.pdf|title=6. Distribution and Quantile Functions|website=math.bme.hu}}</ref> The values of <math>x</math> can be found with the [[quantile function]] <math>Q(p)</math> where <math>p = 0.25</math> for the first quartile, <math>p = 0.5</math> for the second quartile, and <math>p = 0.75</math> for the third quartile. The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function is [[Monotonic function|monotonically increasing]] because the [[Bijection|one-to-one correspondence]] between the input and output of the cumulative distribution function holds.
 
== Outliers ==
There are methods by which to check for [[outliers]] in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest.<ref>{{Cite journal|last=Walfish|first=Steven|date=November 2006|title=A Review of Statistical Outlier Method|url=http://www.statisticaloutsourcingservices.com/|journal=Pharmaceutical Technology}}</ref> Outliers could also be evidence of a sample population that has a non-normal distribution or of a contaminated population data set. Consequently, as is the basic idea of [[descriptive statistics]], when encountering an [[outlier]], we have to explain this value by further analysis of the cause or origin of the outlier. In cases of extreme observations, which are not an infrequent occurrence, the typical values must be analyzed. In the case of quartiles, theThe [[Interquartile Range]] (IQR), defined as the difference between the upper and lower quartiles (<math display="inline">Q_3 - Q_1 </math>), may be used to characterize the data when there may be extremities that skew the data; the [[interquartile range]] is a relatively [[robust statistic]] (also sometimes called "resistance") compared to the [[Range (statistics)|range]] and [[standard deviation]]. There is also a mathematical method to check for outliers and determining "fences", upper and lower limits from which to check for outliers.
 
After determining the first (lower) and third (upper) quartiles (<math display="inline">Q_1</math> and <math display="inline">Q_3</math> respectively) and the interquartile range (<math display="inline">\textrm{IQR} = Q_3 - Q_1 </math>) as outlined above, then fences are calculated using the following formula:
 
: <math>\text{Lower fence} = Q_1 - (1.5( \times \mathrm{IQR}) \, </math>
: <math>\text{Upper fence} = Q_3 + (1.5( \times \mathrm{IQR}), \,</math>[[File:Boxplot outliers example.jpg|thumb|Boxplot Diagram with Outliers]]
 
where ''Q''<sub>1</sub> and ''Q''<sub>3</sub> are the first and third quartiles, respectively. The lower fence is the "lower limit" and the upper fence is the "upper limit" of data, and any data lying outside these defined bounds can be considered an outlier. Anything below the Lower fence or above the Upper fence can be considered such a case. The fences provide a guideline by which to define an [[outlier]], which may be defined in other ways. The fences define a "range" outside which an outlier exists; a way to picture this is a boundary of a fence, outside which are "outsiders" as opposed to outliers. It is common for the lower and upper fences along with the outliers to be represented by a [[Box plot|boxplot]]. For athe boxplot shown on the right, only the vertical heights correspond to the visualized data set while horizontal width of the box is irrelevant. Outliers located outside the fences in a boxplot can be marked as any choice of symbol, such as an "x" or "o". The fences are sometimes also referred to as "whiskers" while the entire plot visual is called a "box-and-whisker" plot.
 
When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features, it might be simpleeasy to mistakenly view it as evidence that the population is non-normal or that the sample is contaminated. However, this method should not take place of a [[hypothesis test]] for determining normality of the population. The significance of the outliers varyvaries depending on the sample size. If the sample is small, then it is more probable to get interquartile ranges that are unrepresentatively small, leading to narrower fences. Therefore, it would be more likely to find data that are marked as outliers.<ref>{{Cite journal|last=Dawson|first=Robert|date=July 1, 2011|title=How Significant is a Boxplot Outlier?|journal=Journal of Statistics Education|volume=19|issue=2|doi=10.1080/10691898.2011.11889610|doi-access=free}}</ref>
 
== Computer software for quartiles ==
Line 194:
|}
 
=== Excel: ===
The Excel function ''QUARTILE(array, quart)'' provides the desired quartile value for a given array of data, using Method 3 from above. In the ''QuartileQUARTILE'' function (a legacy function from Excel 2007 or earlier, giving the same output of the function ''QUARTILE.INC''), array is the dataset of numbers that is being analyzed and quart is any of the following 5 values depending on which quartile is being calculated. <ref>{{Cite web|url=https://exceljet.net/excel-functions/excel-quartile-function|title=How to use the Excel QUARTILE function {{!}} Exceljet|website=exceljet.net|access-date=December 11, 2019}}</ref>
 
The Excel function ''QUARTILE(array, quart)'' provides the desired quartile value for a given array of data, using Method 3 from above. In the ''Quartile'' function, array is the dataset of numbers that is being analyzed and quart is any of the following 5 values depending on which quartile is being calculated. <ref>{{Cite web|url=https://exceljet.net/excel-functions/excel-quartile-function|title=How to use the Excel QUARTILE function {{!}} Exceljet|website=exceljet.net|access-date=December 11, 2019}}</ref>
{| class="wikitable"
|+
Line 217 ⟶ 216:
|Maximum value
|}
MATLAB:
 
=== MATLAB: ===
In order to calculate quartiles in Matlab, the function ''quantile''(''A'',''p)'') can be used. Where ''A'' is the vector of data being analyzed and ''p'' is the percentage that relates to the quartiles as stated below. <ref>{{Cite web|url=https://www.mathworks.com/help/stats/quantile.html|title=Quantiles of a data set – MATLAB quantile|website=www.mathworks.com|access-date=December 11, 2019}}</ref>
{| class="wikitable"
|+
Line 258 ⟶ 257:
* [http://www.hackmath.net/en/calculator/quartile-q1-q2-q3-calculation Quartiles calculator] – simple quartiles calculator
* [http://www.vias.org/tmdatanaleng/cc_quartile.html Quartiles] – An example how to calculate it
* [https://quartilecalculator.net/ Quartiles Calculator] – online quartile and interquartile range calculator
 
[[Category:Summary statistics]]