This analysis looks at whether new staff are given a higher salary rate than existing staff to enable the business to ensure existing staff remain engaged. There are just under 1500 staff and the details used are: employeeID, department, salary, whether a new member of staff or not, job level.

See a snippet of the data we are using below:

```
## # A tibble: 6 x 5
## employee_id department salary new_hire job_level
## <int> <chr> <dbl> <chr> <chr>
## 1 1 Sales 103264. No Salaried
## 2 2 Engineering 80709. No Hourly
## 3 4 Engineering 60737. Yes Hourly
## 4 5 Engineering 99116. Yes Salaried
## 5 7 Engineering 51022. No Hourly
## 6 8 Engineering 98400. No Salaried
```

We can look at the salary column and see that the minimum salary is Â£43,820 and the maximum salary is Â£164,073 with the mean value being Â£74,142.

`summary(pay$salary)`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 43820 59378 70425 74142 84809 164073
```

The average Salary for an exisiting member of staff is Â£73,425 and for a new hire is Â£76,074 which at first glance looks as if new members of staff are receiving a higher salary than existing members.

```
## # A tibble: 2 x 2
## new_hire avg_salary
## <chr> <dbl>
## 1 No 73425.
## 2 Yes 76074.
```

To test this statistically we use a t-test to see if the averages of the two groups are *reliably* different from each other, ie, would there be a difference between new hires and existing staff not just in our sample, but in a different group with the same characteristics. The t-test measures the difference between the group and the difference within the group.

```
## estimate estimate1 estimate2 statistic p.value parameter conf.low
## 1 -2649.672 73424.6 76074.28 -2.343708 0.01937799 685.1554 -4869.424
## conf.high method alternative
## 1 -429.9199 Welch Two Sample t-test two.sided
```

Our p-value is 0.019 which basically means that if we were to carry out this analysis again there is only a 1.9% chance that we would arrive at the same conclusion.

However, our average of both groups is different, perhaps we should consider their job level when analysing salary. From the visualisation below, we can see that we have a lower proportion of hourly staff in new hires, than in existing staff.

When we look at average salary amongst each group, there appears to be more of a similarity.

There are more new hourly staff recruited, to consider if there is a significant difference in new hires and existing hires at an hourly rate, we can check using a t-test. Our result comes back at 0.08 which gives us an 8% chance that if we were to run this test again, using the same characteristics, we can state that the average salary difference between new hires and existing staff is not significant and the pay is therefore fair.

```
## # A tibble: 1,039 x 5
## employee_id department salary new_hire job_level
## <int> <chr> <dbl> <chr> <chr>
## 1 2 Engineering 80709. No Hourly
## 2 4 Engineering 60737. Yes Hourly
## 3 7 Engineering 51022. No Hourly
## 4 10 Engineering 57106. Yes Hourly
## 5 11 Engineering 55065. No Hourly
## 6 12 Engineering 77158. No Hourly
## 7 13 Engineering 48365. No Hourly
## 8 14 Engineering 60945. Yes Hourly
## 9 15 Engineering 59161. No Hourly
## 10 16 Engineering 79324. Yes Hourly
## # ... with 1,029 more rows
```

```
# Test the difference in pay
library(broom)
t.test(salary ~ new_hire, data = pay_filter) %>%
tidy()
```

```
## estimate estimate1 estimate2 statistic p.value parameter conf.low
## 1 -1106.967 63965.71 65072.68 -1.750387 0.08066517 499.7005 -2349.483
## conf.high method alternative
## 1 135.5483 Welch Two Sample t-test two.sided
```

```
# Is the result significant?
significant <- FALSE
```