Anova¶
Note: This notebook will use the same veryants.csv files
ANOVA tests the null hypothesis that all groups have the same population mean
(eg., the true average price of a sale is the same at every location of VeryAnts).
In [3]:
from scipy.stats import f_oneway
import pandas as pd
In [4]:
veryants = pd.read_csv('veryants.csv')
In [6]:
# store the data
a = veryants.Sale[veryants.Store == 'A']
b = veryants.Sale[veryants.Store == 'B']
c = veryants.Sale[veryants.Store == 'C']
In [6]:
fstat, pval = f_oneway(a,b,c)
pval
Out[6]:
If the p-value is below our significance threshold, we can conclude that at least one pair of our groups earned significantly different scores on average however, we won’t know which pair until we investigate further!¶
Tukey’s Range Test¶
Now, we want to find out which pair of stores are different
In [7]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
In [10]:
tukey_results = pairwise_tukeyhsd(veryants.Sale, veryants.Store, 0.05)
print(tukey_results)
Conclusion:¶
This result is different from our previous experiment using Multiple test
(The Store A vs C Sales Comparison)
P-values less than 0.05 are significant.
True = Reject the Null and use the alternative hypothesis (significant)
False = Accept the null (not significant)
A vs B
TRUE are significantly diffirent
A vs C
FALSE are notsignificantly diffirent
B vs C
FALSE are not significantly diffirent
In [13]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.boxplot(data=veryants, x='Store', y='Sale')
plt.show()