# FetchMaker¶

The hottest new tech startup, FetchMaker. FetchMakerâ€™s mission is to match up prospective dog owners with their perfect pet.

FetchMaker estimates (based on historical data for all dogs) that 8% of dogs in their system are rescues.

### Part I.¶

`Rescued Whippets`

- Analyze if the rescued whippets (dog breed) are part of the 8% total dogs rescued.

### Part II¶

`Dogs Size`

- Mid-Sized Dog Weights. Is there a significant difference in the average weights of these three dog breeds?

### Part III¶

`Dogs Colors`

- Poodle and Shihtzu Colors differences.

`Details`

:

`weight:`

an integer representing how heavy a dog is in pounds`tail_length:`

a float representing tail length in inches`age:`

in years`color:`

a String such as “brown” or “grey”`is_rescue:`

a boolean 0 or 1

```
import numpy as np
import pandas as pd
from scipy.stats import binom_test, ttest_ind, f_oneway, chi2_contingency
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.stats.multicomp import pairwise_tukeyhsd
```

```
dogs = pd.read_csv('dog_data.csv')
dogs.head(2)
```

# I. Rescued whippets. ¶

There are 8% of rescued dogs (based on historical data for all dogs).

- How many whippets are there in total?
- How many whippets are rescued?

```
print('Total dogs :' + str(len(dogs)))
print('Total rescued dogs :' + str(np.sum(dogs['is_rescue']==1)))
# Setting Variables
whippets = dogs[dogs['breed'] == 'whippet']
rescue_whippet = np.sum(whippets['is_rescue'] == 1)
print('Total whippets: ' + str(len(whippets)))
print('Total rescue whippets: ' + str(rescue_whippet))
```

## Hypothesis Test Binomial¶

Used for Binary Categorical Data to compare a sample frequency to an expected population-level probability

In records there is 6% recued whippets from all the whippets/ Let’s test the probability getting 8% recued whippets among whippets.

`Null:`

8% of whippets came from rescue. (6% and 8% are not significant different)`Alternative:`

more or less than 8% of whippets are rescues

## Using binom_test function¶

```
# This Function create a simulation comparing our whippets distribution to a hypothetical distribution
pval = binom_test(rescue_whippet , len(whippets), .08)
print(pval )
```

We got very high Pval saying we must reject to fail the null hyphothesis thus the true recued whippets (6%) has no significant different from 8%.

## Manual analysis¶

Creating a simulation to test out hypothesis

```
# We simulate a 100 sample whippets and list down the number of rescue into a list.
# 8% is encoded to get a y (yes). Repeat this 10000 times. Put the result in a list
null_outcomes = []
# We use random.choise between 'y' and 'n'(rescue and non_rescue) with 8% probability of getting a 'y'.
# our sample size is 100 (total whippets)
# repeat this 300 times
for i in range(300):
simulated_whippets = np.random.choice(['y', 'n'], size=100, p=[0.08, 1-.08])
num_rescued = np.sum(simulated_whippets == 'y')
null_outcomes.append(num_rescued)
# Showing the first 10 results
null_outcomes[0:10]
```

```
plt.figure(figsize=(7,5.5))
plt.hist(null_outcomes)
plt.axvline(rescue_whippet , color = 'r', linestyle='--', label ='observed rescue whippets')
plt.axvline((len(whippets) * .08), color = 'g', linestyle='--', label ='expected rescue whippets')
plt.axvline(np.percentile(null_outcomes, [2.5]), color = 'purple', linestyle='--', label ='2.5% percentile')
plt.axvline(np.percentile(null_outcomes, [97.5]), color = 'purple', linestyle='-', label ='97.5% percentile')
plt.legend(fontsize = 'small')
plt.show()
```

### Confidence Interval (95%)¶

Our expected frequency should be in between 3.0 and 14.0.

```
# we subract the remaining 5% from both side of our null outcomes distributuin
np.percentile(null_outcomes, [2.5,97.5])
```

Our expected value for 8% rescued whippets should be in between 3 and 13 and we got 8. We are 95% confident to fail to reject the null hyphotesis thus 8% of whippets are came from rescue this is no significant difference from the real/actual rescued whippets.

### P-value (two sided)¶

```
# Turn into array
null_outcomes = np.array(null_outcomes)
# expected value is 8, we got observed of 6.
# 8 - 6 = 2, so to get the right value(right side of null distribution) we just add the difference from expected value. here we got 10
p_value_twoside = np.sum((null_outcomes <= 6) | (null_outcomes >= 10))/len(null_outcomes)
p_value_twoside
```

Our manual code for getting the P-value and using the binom_test function are similarly the same. I am 95% confident that the 8% of whippets breed are came from rescue.

### Visualizing our Null Outcomes over 300 loop trials¶

```
plt.figure(figsize=(15,7))
plt.plot(range(300), null_outcomes, label='rescued whippets over 100 loops', )
plt.axhline(len(whippets) * 0.08, color = 'r', linestyle = '--', label='8% (expected)')
plt.axhline(rescue_whippet , color = 'purple', linestyle='--', label ='rescued whippets', )
plt.xlabel("Number of trials")
plt.ylabel('Threshold')
plt.legend()
```

# II. Dog Size’s. ¶

Mid Sized Dog Weight (whippets, terriers and pitbulls)

Is there a significant difference in the average weights of these three dog breeds?

```
# Subset to just whippets, terriers, and pitbulls
dogs_wtp = dogs[dogs.breed.isin(['whippet', 'terrier', 'pitbull'])]
dogs_wtp.head(2)
```

### Checking the weights distribution¶

These three breed dogs seems normally distributed, though there is a single outliers in Terrier breed, It’s not that heavily skewed that can stirred our analysis.

```
# Setting variables
pitbull_weight = dogs_wtp.weight[dogs_wtp['breed'] == 'pitbull']
terrier_weight = dogs_wtp.weight[dogs_wtp['breed'] == 'terrier']
whippet_weight = dogs_wtp.weight[dogs_wtp['breed'] == 'whippet']
```

```
plt.figure(figsize=(6,5))
plt.hist(pitbull_weight, label='Pitbull', density=True, alpha=.5 )
plt.hist(terrier_weight, label='Terrier', density=True, alpha=.5)
plt.hist(whippet_weight, label='Whippet', density=True, alpha=.5)
plt.legend()
```