Fair Pay Analysis

This analysis looks at whether new staff are given a higher salary rate than existing staff to enable the business to ensure existing staff remain engaged. There are just under 1500 staff and the details used are: employeeID, department, salary, whether a new member of staff or not, job level.

See a snippet of the data we are using below:

## # A tibble: 6 x 5
##   employee_id department   salary new_hire job_level
##         <int> <chr>         <dbl> <chr>    <chr>    
## 1           1 Sales       103264. No       Salaried 
## 2           2 Engineering  80709. No       Hourly   
## 3           4 Engineering  60737. Yes      Hourly   
## 4           5 Engineering  99116. Yes      Salaried 
## 5           7 Engineering  51022. No       Hourly   
## 6           8 Engineering  98400. No       Salaried

We can look at the salary column and see that the minimum salary is £43,820 and the maximum salary is £164,073 with the mean value being £74,142.

summary(pay$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   43820   59378   70425   74142   84809  164073

The average Salary for an exisiting member of staff is £73,425 and for a new hire is £76,074 which at first glance looks as if new members of staff are receiving a higher salary than existing members.

## # A tibble: 2 x 2
##   new_hire avg_salary
##   <chr>         <dbl>
## 1 No           73425.
## 2 Yes          76074.

To test this statistically we use a t-test to see if the averages of the two groups are reliably different from each other, ie, would there be a difference between new hires and existing staff not just in our sample, but in a different group with the same characteristics. The t-test measures the difference between the group and the difference within the group.

##    estimate estimate1 estimate2 statistic    p.value parameter  conf.low
## 1 -2649.672   73424.6  76074.28 -2.343708 0.01937799  685.1554 -4869.424
##   conf.high                  method alternative
## 1 -429.9199 Welch Two Sample t-test   two.sided

Our p-value is 0.019 which basically means that if we were to carry out this analysis again there is only a 1.9% chance that we would arrive at the same conclusion.

However, our average of both groups is different, perhaps we should consider their job level when analysing salary. From the visualisation below, we can see that we have a lower proportion of hourly staff in new hires, than in existing staff.

When we look at average salary amongst each group, there appears to be more of a similarity.

There are more new hourly staff recruited, to consider if there is a significant difference in new hires and existing hires at an hourly rate, we can check using a t-test. Our result comes back at 0.08 which gives us an 8% chance that if we were to run this test again, using the same characteristics, we can state that the average salary difference between new hires and existing staff is not significant and the pay is therefore fair.

## # A tibble: 1,039 x 5
##    employee_id department  salary new_hire job_level
##          <int> <chr>        <dbl> <chr>    <chr>    
##  1           2 Engineering 80709. No       Hourly   
##  2           4 Engineering 60737. Yes      Hourly   
##  3           7 Engineering 51022. No       Hourly   
##  4          10 Engineering 57106. Yes      Hourly   
##  5          11 Engineering 55065. No       Hourly   
##  6          12 Engineering 77158. No       Hourly   
##  7          13 Engineering 48365. No       Hourly   
##  8          14 Engineering 60945. Yes      Hourly   
##  9          15 Engineering 59161. No       Hourly   
## 10          16 Engineering 79324. Yes      Hourly   
## # ... with 1,029 more rows
# Test the difference in pay
library(broom)
t.test(salary ~ new_hire, data = pay_filter) %>%
  tidy()
##    estimate estimate1 estimate2 statistic    p.value parameter  conf.low
## 1 -1106.967  63965.71  65072.68 -1.750387 0.08066517  499.7005 -2349.483
##   conf.high                  method alternative
## 1  135.5483 Welch Two Sample t-test   two.sided
# Is the result significant?
significant <- FALSE