# Anova¶

Note: This notebook will use the same veryants.csv files

ANOVA tests the null hypothesis that `all groups have the same population mean`

(eg., the true average price of a sale is the same at every location of VeryAnts).

In [3]:

```
from scipy.stats import f_oneway
import pandas as pd
```

In [4]:

```
veryants = pd.read_csv('veryants.csv')
```

In [6]:

```
# store the data
a = veryants.Sale[veryants.Store == 'A']
b = veryants.Sale[veryants.Store == 'B']
c = veryants.Sale[veryants.Store == 'C']
```

In [6]:

```
fstat, pval = f_oneway(a,b,c)
pval
```

Out[6]:

##### If the p-value is below our significance threshold, we can conclude that at least one pair of our groups earned significantly different scores on average however, we won’t know which pair until we investigate further!¶

# Tukey’s Range Test¶

Now, we want to find out which pair of stores are different

In [7]:

```
from statsmodels.stats.multicomp import pairwise_tukeyhsd
```

In [10]:

```
tukey_results = pairwise_tukeyhsd(veryants.Sale, veryants.Store, 0.05)
print(tukey_results)
```

### Conclusion:¶

This result is different from our previous experiment using `Multiple test`

(The Store A vs C Sales Comparison)

P-values less than 0.05 are significant.

True = Reject the Null and use the alternative hypothesis (significant)

False = Accept the null (not significant)

`A vs B`

TRUE are significantly diffirent

`A vs C`

FALSE are notsignificantly diffirent

`B vs C`

FALSE are not significantly diffirent

In [13]:

```
import seaborn as sns
import matplotlib.pyplot as plt
sns.boxplot(data=veryants, x='Store', y='Sale')
plt.show()
```