Two T-Test

two t test

Two-Sample T-Test

For an association between a Binary(two) Categorical Variable and a Quantitative Variable.

Suppose that a company is considering a new color-scheme for their website. They think that visitors will spend more time on the site if it is brightly colored. To test this theory, the company shows the old and new versions of the website to 50 site visitors, each — and finds that, on average, visitors spent 2 minutes longer on the new version compared to the old. Will this be true of future visitors as well? Or could this have happened by random chance among the 100 people in this sample?

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

data = pd.read_csv('version_time.csv')
data.head()
Out[1]:
time_minutes version
0 11.92 new
1 12.90 old
2 13.76 old
3 15.68 old
4 16.28 old
In [6]:
#separate out times for  two versions
old = data.time_minutes[data.version=='old']
new = data.time_minutes[data.version=='new']
In [9]:
# Check if STD is equal
# a ratio between 0.9 and 1.1 should suffice
ratio = np.std(old) / np.std(new)
ratio
Out[9]:
0.9121502510423959
In [6]:
#run the t-test here:
tstat, pval =ttest_ind(old,new)
pval
Out[6]:
0.0020408264429904

The p-value is less than 0.05, we can conclude there is a significant difference.

In [7]:
#plot overlapping histograms
plt.hist(old, alpha=.8, label='old')
plt.hist(new, alpha=.8, label='new')
plt.legend()
plt.show()
In [5]:
import seaborn as sns
sns.boxplot(x = 'version', y = 'time_minutes', data= data)
Out[5]:
<AxesSubplot:xlabel='version', ylabel='time_minutes'>

Leave a Reply