Content deleted Content added
→Method 4: the pth quantile is not q(p) but q(p/4). The notation for 'integer part' was incorrect. |
TatheAtharv (talk | contribs) |
||
(41 intermediate revisions by 25 users not shown) | |||
Line 2:
{{Use mdy dates|date=May 2020}}
In [[statistics]],
* The first quartile (''Q''<sub>1</sub>) is defined as the
* The second quartile (''Q''<sub>2</sub>) is the [[median]] of a data set; thus 50% of the data lies below this point.
* The third quartile (''Q''<sub>3</sub>) is the
Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a [[five-number summary]] of the data. This summary is important in statistics because it provides information about both the [[Mean (Statistics)|center]] and the [[Statistical dispersion|spread]] of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is [[Skewness|skewed]] toward one side. Since quartiles divide the number of data points evenly, the [[Range (statistics)|range]] is generally not the same between adjacent quartiles (i.e.
== Definitions ==
Line 19:
! ''Q''<sub>1</sub>
|{{plainlist|style=font-weight:bold|
*
*
* 25th [[percentile]]
}}
|
|-
! ''Q''<sub>2</sub>
|{{plainlist|style=font-weight:bold|
*
* [[
* 50th percentile
}}
|
|-
! ''Q''<sub>3</sub>
|{{plainlist|style=font-weight:bold|
*
*
* 75th percentile
}}
|
|}
Line 45:
=== Discrete distributions ===
For discrete distributions, there is no universal agreement on selecting the quartile values.<ref>{{cite journal |title=Sample quantiles in statistical packages|journal=American Statistician |date=November 1996 |volume=50 |issue=4 |pages=361–365 |first1=Rob J |last1=Hyndman |author1-link=Rob J. Hyndman |first2=Yanan |last2=Fan |url=http://robjhyndman.com/papers/quantiles/ |doi=10.2307/2684934|jstor=2684934}}</ref>
==== Method 1 ====
# Use the [[median]] to divide the ordered data set into two
#* If there
#* If there
# The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.
This rule is employed by the [[TI-83]] calculator [[boxplot]] and "1-Var Stats" functions.
Line 57:
==== Method 2 ====
# Use the
#* If there are an odd number of data points in the original ordered data set, '''include''' the median (the central value in the ordered list) in both halves.
#* If there are an even number of data points in the original ordered data set, split this data set exactly in half.
# The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.
The values found by this method are also known as "[[John Tukey|Tukey]]'s hinges";<ref>{{Cite book|isbn=978-0-201-07616-5|title=Exploratory Data Analysis|last1=Tukey|first1=John Wilder|author-link=John Tukey|date=1977|url-access=registration|url=https://archive.org/details/exploratorydataa00tuke_0}}</ref> see also [[midhinge]].
==== Method 3 ====
# Use the median to divide the ordered data set into two halves. The median becomes the second quartiles.
# If there are even numbers of data points, then Method 3 starts off the same as Method 1 or Method 2 above and you can choose to include or not include the median as a datapoint. If you choose to include the median as a new datapoint, proceed to step 2 or 3 of Method 3 because you now have an odd number of datapoints.▼
## If there are odd numbers of data points, then go to the next step.
▲## If there are even numbers of data points, then the Method 3 starts off the same as the Method 1 or the Method 2 above and you can choose to include or not include the median as a new datapoint. If you choose to include the median as
# If there are (4''n''+1) data points, then the lower quartile is 25% of the ''n''th data value plus 75% of the (''n''+1)th data value; the upper quartile is 75% of the (3''n''+1)th data point plus 25% of the (3''n''+2)th data point.
# If there are (4''n''+3) data points, then the lower quartile is 75% of the (''n''+1)th data value plus 25% of the (''n''+2)th data value; the upper quartile is 25% of the (3''n''+2)th data point plus 75% of the (3''n''+3)th data point.
==== Method 4 ====
If we have an ordered dataset <math>x_1, x_2, ..., x_n</math>, then we can interpolate between data points to find the <math>p</math>th empirical [[quantile]] if <math>x_i</math> is in the <math>i/(n+1)</math> quantile. If we denote the integer part of a number <math>a</math> by <math>\lfloor a \rfloor</math>, then the empirical quantile function is given by,
<math>q(p/4) = x_{k} + \alpha(x_{k+1} - x_{k})</math>,
Line 79 ⟶ 81:
==== Example 1 ====
Ordered Data Set (of an odd number of data points): 6, 7, 15, 36, 39, '''40''', 41, 42, 43, 47, 49.
The bold number (40) is the median splitting the data set into two halves with equal number of data points.
{| class="wikitable"
|-
Line 108 ⟶ 112:
==== Example 2 ====
Ordered Data Set (of an even number of data points): 7, 15, '''36, 39''', 40, 41.
The bold numbers (36, 39) are used to calculate the median as their average. As there are an even number of data points, the first three methods all give the same results. (The Method 3 is executed such that the median is not chosen as a new data point and the Method 1 started.)
{| class="wikitable"
|-
Line 140 ⟶ 144:
=== Continuous probability distributions ===
[[File:NormalCDFQuartile3.svg|thumb|Quartiles on a cumulative distribution function of a normal distribution]]
If we define a [[continuous probability distribution]]s as <math>P(X)</math> where <math>X</math> is a [[Real number|real valued]] [[random variable]], its [[cumulative distribution function]] (CDF) is given by
<math>F_X(x) = P(X \leq x)</math>.<ref name=":0" />
The [[Cumulative distribution function|CDF]] gives the probability that the random variable <math>X</math> is less than or equal to the value <math>x</math>. Therefore, the first quartile is the value of <math>x</math> when <math>F_X(x) = 0.25</math>, the second quartile is <math>x</math> when <math>F_X(x) = 0.5</math>, and the third quartile is <math>x</math> when <math>F_X(x) = 0.75</math>.<ref>{{Cite web|url=https://math.bme.hu/~nandori/Virtual_lab/stat/dist/CDF.pdf|title=6. Distribution and Quantile Functions|website=math.bme.hu}}</ref> The values of <math>x</math> can be found with the [[quantile function]] <math>Q(p)</math> where <math>p = 0.25
== Outliers ==
There are methods by which to check for [[outliers]] in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest.<ref>{{Cite journal|last=Walfish|first=Steven|date=November 2006|title=A Review of Statistical Outlier Method|url=http://www.statisticaloutsourcingservices.com/|journal=Pharmaceutical Technology}}</ref> Outliers could also
After determining the first (lower) and third (upper) quartiles (<math display="inline">Q_1</math> and <math display="inline">Q_3</math> respectively) and the interquartile range (<math display="inline">\textrm{IQR} = Q_3 - Q_1 </math>) as outlined above, then fences are calculated using the following formula:
: <math>\text{Lower fence} = Q_1 - (1.5
: <math>\text{Upper fence} = Q_3 + (1.5
When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features, it might be
== Computer software for quartiles ==
Line 190 ⟶ 194:
|}
=== Excel
The Excel function ''QUARTILE(array, quart)'' provides the desired quartile value for a given array of data, using Method 3 from above. In the ''
▲The Excel function ''QUARTILE(array, quart)'' provides the desired quartile value for a given array of data, using Method 3 from above. In the ''Quartile'' function, array is the dataset of numbers that is being analyzed and quart is any of the following 5 values depending on which quartile is being calculated. <ref>{{Cite web|url=https://exceljet.net/excel-functions/excel-quartile-function|title=How to use the Excel QUARTILE function {{!}} Exceljet|website=exceljet.net|access-date=December 11, 2019}}</ref>
{| class="wikitable"
|+
Line 213 ⟶ 216:
|Maximum value
|}
MATLAB:▼
In order to calculate quartiles in Matlab, the function ''quantile''(''A'',''p
{| class="wikitable"
|+
Line 254 ⟶ 257:
* [http://www.hackmath.net/en/calculator/quartile-q1-q2-q3-calculation Quartiles calculator] – simple quartiles calculator
* [http://www.vias.org/tmdatanaleng/cc_quartile.html Quartiles] – An example how to calculate it
* [https://quartilecalculator.net/ Quartiles Calculator] – online quartile and interquartile range calculator
[[Category:Summary statistics]]
|