Familiar (Blood Transfusion Company)¶
Welcome to Familiar, a startup in the new market of blood transfusion!
Blood transfusion is the process of transferring blood products into a person’s circulation intravenously. Transfusions are used for various medical conditions to replace lost components of the blood.
Part I.¶
Familiar's best package
- The first thing we want to know is whether Familiar’s most basic package, the Vein Pack, actually has a significant impact on the subscribers. It would be a marketing goldmine if we can show that subscribers to the Vein Pack live longer than other people.
Part II¶
Life span
- Compare the lifespan data between different packages (Vien and Artery).
Part III¶
Side Effect
- Analyze the side effect of different packages (Vien and Artery)
Loading Data¶
import pandas as pd
import numpy as np
from scipy.stats import ttest_1samp, ttest_ind, chi2_contingency
import matplotlib.pyplot as plt
import seaborn as sns
Data Inspection¶
df = pd.read_csv('familiar_lifespan.csv')
print(df.shape)
df.sample(5)
# checking outliers
plt.figure(figsize=(4,3))
sns.boxplot(x='pack', y='lifespan', data=df)
# Check the value_counts for the outliers
# We have only one observation with 68 lifespan under artery, I will replace this with the artery mean value.
lifespans.value_counts().sort_values(ascending=False).head(3)
# Calculating group means
lifespans.groupby('pack').mean()
# replacing artery values that are lower than 69 with artery mean
lifespans['lifespan'] = lifespans['lifespan'].where(lifespans['lifespan'] > 69.0, 74.873662 )
# Checking our NEW dataframe
plt.figure(figsize=(4,3))
sns.boxplot(x='pack', y='lifespan', data=lifespans)
I. Familiar’s best package¶
We’d like to find out if the average lifespan of Familiar’s best seller ‘Vien Package’ is significantly different from the average life expectancy of 73 years.
Hypothesis Testing (One sample T-test)¶
Comparing a sample average to a hypothetical population average
Null:
The average lifespan of a Vein Pack subscriber is 73 years.Alternative:
The average lifespan of a Vein Pack subscriber is NOT 73 years.
tstat, pval = ttest_1samp(vein_pack_lifespans, 73 )
print('Pvalue: ' + str('{:.10f}'.format(pval)))
plt.figure(figsize=(5,3.5))
sns.histplot(vein_pack_lifespans, kde= True)
# plt.hist(np.array(vein_pack_lifespans.lifespan))
plt.axvline(73, color = 'g', label='Expected Mean', linestyle ='--')
plt.axvline(vein_pack_lifespans.mean(), color = 'r', label='Observed Mean',linestyle ='--')
plt.legend(loc=0)
plt.show()
Conclusion:¶
Reject the null hypothesis. Subribers who take the Vien package has longer lifespan.
II. Life span¶
Pumping Life Into The Company
We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy for the Artery Pack.
In order to differentiate Familiar’s different product lines, we’d like to compare this lifespan data between our different packages. Our next step up from the Vein Pack is the Artery Pack.
Hypothesis Testing¶
Two sample T-test¶
For an association between a Binary(two) Categorical Variable and a Quantitative Variable.
Null:
The average lifespan of a Vein Pack subscriber is equal to the average lifespan of an Artery Pack subscriber.Alternative:
The average lifespan of a Vein Pack subscriber is NOT equal to the average lifespan of an Artery Pack subscriber.
# Check if STD is equal
# a ratio between 0.9 and 1.1 should suffice
# result is considerable
ratio = np.std(vein_pack_lifespans) / np.std(artery_pack_lifespans)
ratio
tstat, pval =ttest_ind(vein_pack_lifespans, artery_pack_lifespans)
print('P-value: ' + str(pval))
plt.figure(figsize=(6,4))
plt.hist(vein_pack_lifespans, alpha=.5, label='Vien Package', density=True)
plt.hist(artery_pack_lifespans, alpha=.5, label='Artery Package',density=True)
plt.title('Vien vs. Artery lifespan', fontsize=20)
plt.xlabel('Lifespan in years', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.axvline(np.mean(vein_pack_lifespans), color = 'b', label='Vien Package Mean', linestyle ='--')
plt.axvline(np.mean(artery_pack_lifespans), color = 'orange', label='Artery Package Mean', linestyle ='--')
plt.legend(fontsize='x-small')
plt.show()
Conclusion:¶
Our P-value is 0.09164 a little bit larger than 0.05. I’am failed to reject the null hypothesis, so I conclude that the average lifespan of Vein Pack subscribers are not significantly different from the average lifespan of an Artery Pack subscriber, though the Vien’s package has a little bit higher lifespans on average.
III. Side Effects:¶
A Familiar Problem
Familiar wants to be able to advise potential subscribers about possible side effects of these packs and whether they differ for the Vein vs. the Artery pack.
iron = pd.read_csv('familiar_iron.csv')
# Data Checking
iron.iron.unique()
# Data Checking
iron.dtypes
# I want to convert the iron variable to ordinal categorical type
iron.iron = pd.Categorical(iron.iron,['low','normal','high'], ordered=True)
iron.dtypes
Checking the association between the pack that a subscriber gets (Vein vs. Artery) and their iron level.
iron.head(2)
Contingency_table = pd.crosstab(iron.pack, iron.iron)
Contingency_table
chi2, pval, dof, expected = chi2_contingency(Contingency_table)
print('P-value: ' + str('{:.30f}'.format(pval)))
Conclusion¶
P value is very low. I strongly recommend to reject the null hypothesis. There is a significant difference in iron level between someone who take Vien pack compare to Artery pack.