Python statistics

Python has a built-in module that you can use to calculate mathematical statistics of numeric data.

mean()¶

• Arithmetic mean ("average") of data.
``````from statistics import mean

print(mean([1, 2, 3, 4, 4]))
# output: 2.8
print(mean([-1.0, 2.5, 3.25, 5.75]))
# output: 2.625
``````

fmean()¶

• Convert data to floats and compute the arithmetic mean.
• This runs faster than the mean() function and it always returns a float.
• If the input dataset is empty, it raises a StatisticsError.
``````from statistics import fmean

print(fmean([3.5, 4.0, 5.25]))
# output: 4.25
``````

geometric_mean()¶

• Convert data to floats and compute the geometric mean.
• Raises a StatisticsError if the input dataset is empty, if it contains a zero, or if it contains a negative value.
``````from statistics import geometric_mean

print(geometric_mean([54, 24, 36]))
# output: 36.000000000000014
``````

harmonic_mean()¶

• Return the harmonic mean of data.
• It can be used for averaging ratios or rates
``````from statistics import harmonic_mean

print(harmonic_mean([40, 60]))
# output: 48.0
``````

median()¶

• Return the median (middle value) of numeric data.
``````from statistics import median

print(median([1, 3, 5]))
# output: 3
print(median([1, 3, 5, 7]))
# output: 4.0
``````

median_low()¶

• Return the low median of numeric data.
``````from statistics import median_low

print(median_low([1, 3, 5]))
# output: 3
print(median_low([1, 3, 5, 7]))
# output: 3
``````

median_high()¶

• Return the high median of data.
``````from statistics import median_high

print(median_high([1, 3, 5]))
# output: 3
print(median_high([1, 3, 5, 7]))
# output: 5
``````

median_grouped()¶

• Return the 50th percentile (median) of grouped continuous data.
``````from statistics import median_grouped

print(median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]))
# output: 3.7
print(median_grouped([52, 52, 53, 54]))
# output: 52.5
``````

mode()¶

• Return the most common data point from discrete or nominal data.
``````from statistics import mode

print(mode([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]))
# output: 4
print(mode([52, 52, 53, 54]))
# output: 52
``````

multimode()¶

``````from statistics import multimode

print(mode("aabbbbbbbbcc"))
# output: ['b']
print(multimode('aabbbbccddddeeffffgg'))
# output: ['b', 'd', 'f']
``````

quantiles()¶

• Divide data into n continuous intervals with equal probability.
``````from statistics import quantiles

data = [
105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110,
100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129,
106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86,
111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95,
103, 107, 101, 81, 109, 104]

print([round(q, 1) for q in quantiles(data, n=10)])
# output: [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0]
``````

pstdev()¶

• Return the square root of the population variance.
``````from statistics import pstdev

print(pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]))
# output: 0.986893273527251
``````

pvariance()¶

• Return the population variance of `data`
``````from statistics import pvariance

data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
print(pvariance(data))
# output: 1.25
``````

stdev()¶

• Return the square root of the sample variance.
``````from statistics import stdev

print(stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]))
# output: 1.0810874155219827
``````

variance()¶

• Return the sample variance of data.
``````from statistics import variance

data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
print(variance(data))
# output: 1.3720238095238095
``````

covariance()¶

• Return the sample covariance of two inputs x and y.
• Covariance is a measure of the joint variability of two inputs.
``````from statistics import covariance

x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
print(covariance(x, y))
# output: 0.75
``````

correlation()¶

• Return the Pearson's correlation coefficient for two inputs.
• Pearson's correlation coefficient r takes values between -1 and +1.
• It measures the strength and direction of the linear relationship, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship.
``````from statistics import correlation

x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
print(correlation(x, x))
# output: 1
print(correlation(x, y))
# output: -1
``````

linear_regression()¶

• Return the slope and intercept of simple linear regression parameters estimated using ordinary least squares. Simple linear regression describes relationship between an independent variable x and a dependent variable y in terms of linear function:

y = slope * x + intercept + noise

• where slope and intercept are the regression parameters that are estimated, and noise represents the variability of the data that was not explained by the linear regression (it is equal to the difference between predicted and actual values of the dependent variable).
``````from statistics import NormalDist, linear_regression

x = [1, 2, 3, 4, 5]
noise = NormalDist().samples(5, seed=42)
y = [3 * x[i] + 2 + noise[i] for i in range(5)]
print(linear_regression(x, y))
# output: LinearRegression(slope=3.0907891417020465, intercept=1.756849704861633)
``````