Two-Sample T-Test¶

For an association between a Binary(two) Categorical Variable and a Quantitative Variable.

Suppose that a company is considering a new color-scheme for their website. They think that visitors will spend more time on the site if it is brightly colored. To test this theory, the company shows the old and new versions of the website to 50 site visitors, each — and finds that, on average, visitors spent 2 minutes longer on the new version compared to the old. Will this be true of future visitors as well? Or could this have happened by random chance among the 100 people in this sample?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

data = pd.read_csv('version_time.csv')
data.head()

#separate out times for  two versions
old = data.time_minutes[data.version=='old']
new = data.time_minutes[data.version=='new']

# Check if STD is equal
# a ratio between 0.9 and 1.1 should suffice
ratio = np.std(old) / np.std(new)
ratio

0.9121502510423959

#run the t-test here:
tstat, pval =ttest_ind(old,new)
pval

0.0020408264429904

The p-value is less than 0.05, we can conclude there is a significant difference.¶

#plot overlapping histograms
plt.hist(old, alpha=.8, label='old')
plt.hist(new, alpha=.8, label='new')
plt.legend()
plt.show()

import seaborn as sns
sns.boxplot(x = 'version', y = 'time_minutes', data= data)

<AxesSubplot:xlabel='version', ylabel='time_minutes'>

Two T-Test

Two-Sample T-Test¶

The p-value is less than 0.05, we can conclude there is a significant difference.¶

Leave a Reply Cancel reply

	time_minutes	version
0	11.92	new
1	12.90	old
2	13.76	old
3	15.68	old
4	16.28	old

Two-Sample T-Test¶

The p-value is less than 0.05, we can conclude there is a significant difference.¶

You Might Also Like

Hypothesis Testing1

Ridge and Lasso

Logistic Regression

Leave a Reply Cancel reply