Explanatory Methods II

class: center, middle, inverse, title-slide

# Explanatory Methods II
## Binomial Logistic Regression and Ordinal Regression
### <b>rstudio::</b>conf(2022)

---

class: left, middle, rstudio-logo, bigfont

## Aim of this module

&#9989; Learn two more explanatory modeling methods
  - Review binomial logistic regression
  - Review ordinal logistic regression
  
---
class: left, middle, rstudio-logo

# Binomial logistic regression modeling

---
class: left, middle, rstudio-logo

## Binary outcome variable

<table class=" lightable-minimal" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:left;"> Outcome </th>
   <th style="text-align:left;"> Model </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Continuous scale (eg money, height, weight) </td>
   <td style="text-align:left;"> Linear regression </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: lightblue !important;"> Binary scale (Yes/No) </td>
   <td style="text-align:left;background-color: lightblue !important;"> Binomial logistic regression </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Nominal category scale (eg A, B, C) </td>
   <td style="text-align:left;"> Multinomial logistic regression </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Ordinal category scale (eg Low, Medium, High) </td>
   <td style="text-align:left;"> Ordinal logistic regression </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Time dependent binary scale </td>
   <td style="text-align:left;"> Survival/proportional hazard regression </td>
  </tr>
</tbody>
</table>

---
class: left, middle, rstudio-logo

## Context:  the logistic function

In the early 1800s, the Belgian mathematician Pierre François Verhulst proposed a function for modeling population growth, as follows:

$$
f(x) = \frac{L}{1 + e^{-k(x - x_0)}}
$$
where `\(L\)` is the limit of the population (known as the 'carrying capacity'), `\(k\)` is the steepness of the curve and `\(x_0\)` is the midpoint of `\(x\)` (in population terms the midpoint of time).  With `\(k = 1\)` and `\(x_0\)` = 0 this looks like:

<img src="3-binomial_and_ordinal_regression_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />

---
class: left, middle, rstudio-logo

## Modeling binary outcomes

Imagine we are studying an outcome event `\(y\)` which can either occur or not occur, e.g. 'Hired' or 'Not Hired'.  We label `\(y = 1\)` if it does occur for a given observation, and `\(y = 0\)` otherwise.  `\(y\)` is called a *binary* or *dichotomous* outcome.

Now imagine we want to relate `\(y\)` to a set of input variables `\(X\)`.   In order to study this using some method similar to our linear model, we need a sensible scale for `\(y\)`, knowing that it cannot be less than zero or greater than 1.

One natural way forward is to consider the *probability* of y occurring:  `\(P(y = 1)\)`

---
class: left, middle, rstudio-logo

## Probability distribution for random variables

Let's assume we have a single input variable `\(x\)` and we assume that `\(y\)` is more likely to occur as `\(x\)` increases.  We sample the data for increasing values of `\(x\)` and calculate mean probability that `\(y = 1\)`.  Over large enough samples, we expect to see something like a normal probability distribution.

<img src="3-binomial_and_ordinal_regression_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />

---
class: left, middle, rstudio-logo

## Similarity with logistic distribution

Notice the similar S (sigmoid) shape of the normal and logistic distributions.  This means we could take our logistic function with a carrying capacity of 1 (maximum probability) as an approximation of a normal distribution.  This turns out to have benefits in interpretation.

<img src="3-binomial_and_ordinal_regression_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />

---

class: left, middle, rstudio-logo

## What happens if we use a logistic function?

$$
P(y = 1) = \frac{1}{1 + e^{-k(x - x_0)}} = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1)}}
$$
where `\(\beta_0 = -kx_0\)` and `\(\beta_1 = k\)`.

Meanwhile

$$
P(y = 0) =  1 - P(y = 1) = \frac{e^{-(\beta_0 + \beta_1x)}}{1 + e^{-(\beta_0 + \beta_1x)}}
$$
So, if we divide the two:

$$
\frac{P(y = 1)}{P(y = 0)} = \frac{1}{e^{-(\beta_0 + \beta_1x)}} = e^{\beta_0 + \beta_1x}
$$

---

class: left, middle, rstudio-logo

## The odds of y

Now, if we take natural logarithms of our last equation, we get:

$$
\mathrm{ln}\left(\frac{P(y = 1)}{P(y = 0)}\right) = \beta_0 + \beta_1x
$$
So we have a linear model in the *log odds* of `\(y\)`.

Since a *transformation* of our outcome is linear on our input variable, we can create a model known as a *generalized linear model*.  As we will see, the coefficients of a model like this can be interpreted very intuitively to explain the impact of input variables on the likelihood of `\(y\)` occurring.

---

class: left, middle, rstudio-logo

## The speed dating dataset

This dataset is from an experiment run by Columbia University students in New York, where they collected information from speed dating sessions.  Note: this experiment only included participants who identified as Male or Female and partners were formed only as Male-Female pairings.

```r
# get data
url <- "https://peopleanalytics-regression-book.org/data/speed_dating.csv"
speed_dating <- read.csv(url)

# select only some elements for this training
speed_dating <- speed_dating[ ,c("gender", "goal", "dec", "attr", "intel", "prob")] 
head(speed_dating)
```

```
##   gender goal dec attr intel prob
## 1      0    2   1    6     7    6
## 2      0    2   1    7     7    5
## 3      0    2   1    5     9   NA
## 4      0    2   1    7     8    6
## 5      0    2   1    5     7    6
## 6      0    2   0    4     7    5
```

---

class: left, middle, rstudio-logo

## Data fields for `speed_dating`

The individuals were given a series of surveys to complete as part of the experiment.
* `dec` is the decision on whether the individual wanted to meet that specific partner again after the speed date and is our outcome for this example.  
* `attr`, `intel`, and `prob` are ratings out of ten on physical attractiveness, intelligence and the individual's belief that the partner also liked them. 
* `gender` is the gender of the individual: female is 0 and male is 1
* `goal` is a categorical variable with a code for different goals of the individual in attending (eg 'seemed like a fun night out', or 'looking for a serious relationship').

In this example we are going to look purely at the speed date level and not consider the fact that several speed dates may involve the same individual.  We have 8,378 rows of data.

We are aiming to model the decision `dec` against the ratings `attr`, `intel` and `prob`.

---

class: left, middle, rstudio-logo

## Get to know the dataset

It is important to spend a little bit of time exploring your data to understand anything that you should consider when building your model.  Are there a lot of NA values?  Are variables too highly correlated?

```r
colSums(is.na(speed_dating))
```

```
## gender   goal    dec   attr  intel   prob 
##      0     79      0    202    296    309
```

<img src="3-binomial_and_ordinal_regression_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" />

---

class: left, middle, rstudio-logo

## Running the model on men only

```r
# run a binomial general linear model on the male decision makers
model_m <- glm(dec ~ attr + intel + prob, 
               data = speed_dating[speed_dating$gender == 1, ],
               family = "binomial")

# view a summary of the coefficients of the model
(coefficients_m <- summary(model_m)$coefficients |> as.data.frame())
```

```
##               Estimate Std. Error    z value      Pr(>|z|)
## (Intercept) -6.2537585 0.26346670 -23.736428 1.517359e-124
## attr         0.7658699 0.02990211  25.612572 1.105090e-144
## intel       -0.0688468 0.03020599  -2.279243  2.265262e-02
## prob         0.3269378 0.02112880  15.473558  5.233269e-54
```

Recalling the meaning of `(P > |z|)` - the p-value - we can determine that all three ratings play a significant role in the decision outcome of a date.

---

class: left, middle, rstudio-logo

## Interpreting the coefficients

Our coefficients indicate the linear impact on the log odds of a positive decision.  A negative coefficient decreases the log odds and a positive coefficient increases the log odds.  In this context we can see that physical attractiveness and sense of reciprocation both have a positive effect on likelihood of a positive decision when the decision maker is male, but intelligence has a negative impact.

We can easily extend the manipulations from a few slides back to get a formula for the odds of an event in terms of the coefficents `\(\beta_0, \beta_1, ..., \beta_p\)`:

$$
`\begin{align*}
\frac{P(y = 1)}{P(y = 0)} &= e^{\beta_0 + \beta_1x_1 + ... + \beta_px_p} \\
&= e^{\beta_0}(e^{\beta_1})^{x_1}...(e^{\beta_p})^{x_p}
\end{align*}`
$$

* `\(e^{\beta_0}\)` is the odds of the event assuming zero from all input variables 
* `\(e^{\beta_i}\)` is the multiplier of the odds associated with a one unit increase in `\(x_i\)` (for example, an extra point rating in physical attractiveness), assuming all else equal - because of the multiplicative effect, we call this the *odds ratio* for `\(x_i\)`.

---

class: left, middle, rstudio-logo

## Calculating and interpreting the odds ratios

Odds ratios are simply the exponent of the coefficient estimates.

```r
# add a column to our coefficients with odds ratios
coefficients_m$odds_ratio <- exp(coefficients_m[ ,"Estimate"])
coefficients_m
```

```
##               Estimate Std. Error    z value      Pr(>|z|)  odds_ratio
## (Intercept) -6.2537585 0.26346670 -23.736428 1.517359e-124 0.001923212
## attr         0.7658699 0.02990211  25.612572 1.105090e-144 2.150864692
## intel       -0.0688468 0.03020599  -2.279243  2.265262e-02 0.933469675
## prob         0.3269378 0.02112880  15.473558  5.233269e-54 1.386715208
```

We interpret each of our odds ratios as follows (assuming all else equal):

* An extra point in physical attractiveness increases the odds by 115%
* An extra point in intelligence *decreases* the odds by 7%
* An extra point in perceived reciprocation of interest increases the odds by 39%

If you are concerned about the precision of these statements, you can also get the 95% confidence intervals for the odds ratios by using `exp(confint(model_m))`

---

class: left, middle, rstudio-logo

## Warning:  odds &#x2260; probabability

Increases in odds have a diminishing effect on probability as the original probability increases.  So it is important to know the difference between the two.  Here is a graph showing the impact of a 10% increase in odds on the probability of an event, depending on the original probability.

<img src="3-binomial_and_ordinal_regression_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" />

---

class: left, middle, rstudio-logo

## Are dynamics different with women?

```r
# run a binomial general linear model on the female decision makers
model_f <- glm(dec ~ attr + intel + prob, 
               data = speed_dating[speed_dating$gender == 0, ],
               family = "binomial")

# view a summary of the coefficients of the model, incl odds_ratios
coefficients_f <- summary(model_f)$coefficients %>% as.data.frame()
coefficients_f$odds_ratio <- exp(coefficients_f[ ,"Estimate"])
coefficients_f
```

```
##                Estimate Std. Error    z value      Pr(>|z|)  odds_ratio
## (Intercept) -5.84737044 0.25260132 -23.148614 1.501241e-118 0.002887482
## attr         0.55142831 0.02506430  22.000546 2.845351e-107 1.735730416
## intel        0.08757465 0.02880993   3.039738  2.367839e-03 1.091523740
## prob         0.23149158 0.01979541  11.694207  1.364556e-31 1.260478710
```

Try interpreting these yourself.

---

class: left, middle, rstudio-logo

## Predicting using binomial logistic regression models

Binomial logistic regression models play an important role in predictive analytics.  New data fed into the fitted logistic function can predict the probability of a positive outcome.

```r
new_dates <- data.frame(
  attr = c(1, 5, 9),
  intel = c(2, 4, 8),
  prob = c(5, 7, 9)
)

predict(model_m, new_dates, type = "response")
```

```
##          1          2          3 
## 0.01814777 0.39861688 0.95394355
```

In classification learning, the data is split into a training and test set, the model is fitted using a training set, a probability 'cutoff' is used to determine positive or negative classes (usually 0.5), and then the predictive accuracy is determined by testing on the test set.

---

class: left, middle, rstudio-logo

## Assessing the fit of a binomial logistic regression model

Previously we looked at the fit of a linear regression model and determined a metric called `\(R^2\)`, which determined how much of the variance of `\(y\)` was explained by our model.  This is not so straightforward in binomial logistic regression, and is in fact still the subject of intense research.

Numerous variants of measures called *pseudo*- `\(R^2\)` exist to try to approximate something similar to an `\(R^2\)`.  The `DescTools` package provides easy access to these measures.  All have different definitions and should be handled carefully. Here are four of them.

```r
library(DescTools)
DescTools::PseudoR2(model_m, 
                    which = c("McFadden", "CoxSnell", "Nagelkerke", "Tjur"))
```

```
##   McFadden   CoxSnell Nagelkerke       Tjur 
##  0.2742679  0.3161661  0.4216450  0.3342360
```

The *Aikake Information Criterion*  is also valuable in directly comparing two competing models, with a lower AIC suggestion less information loss from the model.

```r
AIC(model_m)
```

```
## [1] 4019.39
```

---

class: left, middle, rstudio-logo

## Exercise - Binomial regression

Go to our [RStudio Cloud workspace](https://rstudio.cloud/spaces/230780/join?access_code=7cXJKFU1KUuuZGLwBVQpLG3dIxPUD3jak3ZQmESh) and work on **Assignment 03A - Binomial_regression**.

---

class: left, middle, rstudio-logo

# Ordinal logistic regression modeling

---

class: left, middle, rstudio-logo

## Ordinal data

Ordinal data is a type of categorical data that is both discrete and ordered, e.g. survey results on a Likert scale from 1 to 3, where 1 is 'Low', 2 is 'Middle', and 3 is 'High'.

<table class=" lightable-minimal" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:left;"> Outcome </th>
   <th style="text-align:left;"> Model </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Continuous scale (eg money, height, weight) </td>
   <td style="text-align:left;"> Linear regression </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Binary scale (Yes/No) </td>
   <td style="text-align:left;"> Binomial logistic regression </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Nominal category scale (eg A, B, C) </td>
   <td style="text-align:left;"> Multinomial logistic regression </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: lightblue !important;"> Ordinal category scale (eg Low, Medium, High) </td>
   <td style="text-align:left;background-color: lightblue !important;"> Ordinal logistic regression </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Time dependent binary scale </td>
   <td style="text-align:left;"> Survival/proportional hazard regression </td>
  </tr>
</tbody>
</table>

---

class: left, middle, rstudio-logo

## Modeling Ordinal Data

We can't analyze ordinal data ('Low', 'Middle', 'High') as continuous data because we don't know that the distance between 'Low' and 'Middle' is the same as the distance between 'Middle' and 'High'.  It also isn't appropriate to model this purely as categorical data since that we would lose information.  Instead, we can partition the outcomes so that we are finding all of the probabilities that the outcome is less than or equal to each cutoff point.  So for our example, we want to consider P(Low) and P(Low or Middle), which are now just 2 binomial logistic regression models:

* Low vs Middle/High
  * Low/Middle vs High
  
This will provide us with a different intercept relevant to each partition, but 1 coefficient for each independent variable that is consistent across the entire model.

---

class: left, middle, rstudio-logo

## Proportional Odds Assumption

The proportional odds assumption states that no input variable has a disproportionate effect on a specific level of the outcome variable.  This means that the slope of the logistic function is approximately the same for all category cutoffs.

<img src="3-binomial_and_ordinal_regression_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" />

---

class: left, middle, rstudio-logo

## Example using 'managers' dataset

This dataset is a fictionalized dataset on the performance and other characteristics of a group of managers in a large company. For our example, we will pull a small subset of variables to use in a model.

```r
# get data
url <- "https://peopleanalytics-regression-book.org/data/managers.csv"
managers <- read.csv(url) |>
  # select a subset of the fields
  dplyr::select(performance_group, test_score, group_size, yrs_employed, concern_flag)

head(managers)
```

```
##   performance_group test_score group_size yrs_employed concern_flag
## 1            Bottom        205         10          4.6            N
## 2            Middle        227         14          5.3            N
## 3            Bottom        227         10          5.2            N
## 4            Middle        273         19          4.9            N
## 5            Bottom        227         17          4.9            Y
## 6            Middle        159         10          4.3            N
```

---

class: left, middle, rstudio-logo

## Data fields for `managers`

* `performance_group` is each manager's most recent performance group (Bottom, Middle Top)
* `yrs_employed` is the number of years employed by the company 
* `test_score` score on test given to all managers
* `concern_flag` whether or not a complaint has been filed against manager
* `group_size` is the number of employees in the group they are responsible for

---

class: left, middle, rstudio-logo

## Structure of the data

What do you notice?  Should we make any adjustments?

```r
str(managers)
```

```
## 'data.frame':	571 obs. of  5 variables:
##  $ performance_group: chr  "Bottom" "Middle" "Bottom" "Middle" ...
##  $ test_score       : int  205 227 227 273 227 159 250 326 152 326 ...
##  $ group_size       : int  10 14 10 19 17 10 13 13 10 11 ...
##  $ yrs_employed     : num  4.6 5.3 5.2 4.9 4.9 4.3 4.8 5 4.4 4.5 ...
##  $ concern_flag     : chr  "N" "N" "N" "N" ...
```

---

class: left, middle, rstudio-logo

## Convert to factor datatypes

```r
# performance is ordered factor
managers$performance_group <- ordered(managers$performance_group,
                                      levels = c("Bottom","Middle","Top"))
# a Y/N flag is a binary factor
managers$concern_flag <- as.factor(managers$concern_flag)

str(managers)
```

```
## 'data.frame':	571 obs. of  5 variables:
##  $ performance_group: Ord.factor w/ 3 levels "Bottom"<"Middle"<..: 1 2 1 2 1 2 2 2 2 2 ...
##  $ test_score       : int  205 227 227 273 227 159 250 326 152 326 ...
##  $ group_size       : int  10 14 10 19 17 10 13 13 10 11 ...
##  $ yrs_employed     : num  4.6 5.3 5.2 4.9 4.9 4.3 4.8 5 4.4 4.5 ...
##  $ concern_flag     : Factor w/ 2 levels "N","Y": 1 1 1 1 2 1 1 1 1 1 ...
```

---

class: left, middle, rstudio-logo

## Proportional odds logistic regression model

Now that our data is structured properly, we can run a proportional odds logistic regression model using the `polr()` function in the `MASS` package.

```r
manager_mod <- polr(performance_group ~ test_score + group_size + yrs_employed + concern_flag, data = managers)
summary(manager_mod)
```

```
## Call:
## polr(formula = performance_group ~ test_score + group_size + 
##     yrs_employed + concern_flag, data = managers)
## 
## Coefficients:
##                   Value Std. Error t value
## test_score     0.004298   0.001138   3.778
## group_size     0.091593   0.031367   2.920
## yrs_employed  -0.882717   0.179961  -4.905
## concern_flagY -0.317585   0.300453  -1.057
## 
## Intercepts:
##               Value   Std. Error t value
## Bottom|Middle -3.3227  0.8817    -3.7685
## Middle|Top     0.2155  0.8659     0.2488
## 
## Residual Deviance: 929.1206 
## AIC: 941.1206
```

---

class: left, middle, rstudio-logo

## Understanding Coefficients (1/2)

We interpret our coefficients similarly to how we interpret in a binary model.

```r
# get coefficients (it's in matrix form)
coefficients <- summary(manager_mod)$coefficients
# calculate p-values
p_value <- (1 - pnorm(abs(coefficients[ ,"t value"]), 0, 1))*2
# bind back to coefficients
coefficients <- cbind(coefficients, p_value)
# take exponents of coefficients to find odds
odds_ratio <- exp(coefficients[ ,"Value"])
# combine with coefficient and p_value
(coefficients <- cbind(coefficients[ ,c("Value", "p_value")],odds_ratio))
```

```
##                      Value      p_value odds_ratio
## test_score     0.004298353 1.582800e-04 1.00430760
## group_size     0.091593276 3.500009e-03 1.09591899
## yrs_employed  -0.882717288 9.340653e-07 0.41365736
## concern_flagY -0.317585209 2.905024e-01 0.72790465
## Bottom|Middle -3.322720528 1.642038e-04 0.03605461
## Middle|Top     0.215476282 8.034798e-01 1.24045256
```

---

class: left, middle, rstudio-logo

## Understanding Coefficients (2/2)

We interpret our coefficients similarly to how we interpret in a binary model.

```
##                      Value      p_value odds_ratio
## test_score     0.004298353 1.582800e-04 1.00430760
## group_size     0.091593276 3.500009e-03 1.09591899
## yrs_employed  -0.882717288 9.340653e-07 0.41365736
## concern_flagY -0.317585209 2.905024e-01 0.72790465
## Bottom|Middle -3.322720528 1.642038e-04 0.03605461
## Middle|Top     0.215476282 8.034798e-01 1.24045256
```

- Each additional point earned on the manager test **increases** the odds of being in the next highest performance group by 0.4%.
  - Each additional person in your group **increases** the odds of being in the next highest performance group by 10%
  - Each additional year of employment **decreases** the odds of being in the next highest performance group by 59%.
  
---

class: left, middle, rstudio-logo

## Diagnostics
We can use Pseudo `\(R^2\)`, just like with our binary model to assess model fit.

```r
DescTools::PseudoR2(
  manager_mod, 
  which = c("McFadden", "CoxSnell", "Nagelkerke", "AIC")
)
```

```
##     McFadden     CoxSnell   Nagelkerke          AIC 
##   0.05462018   0.08972806   0.10927156 941.12062947
```

There is also the Lipsitz goodness of fit test, and others.

```r
generalhoslem::lipsitz.test(manager_mod)
```

```
## 
## 	Lipsitz goodness of fit test for ordinal response models
## 
## data:  formula:  performance_group ~ test_score + group_size + yrs_employed + formula:      concern_flag
## LR statistic = 7.5988, df = 9, p-value = 0.575
```

---

class: left, middle, rstudio-logo

## Exercise - Ordinal regression model

For this exercise, you will run your own ordinal logistic regression models, interpret the coefficients, and assess the fit.

Go to our [RStudio Cloud workspace](https://rstudio.cloud/spaces/230780/join?access_code=7cXJKFU1KUuuZGLwBVQpLG3dIxPUD3jak3ZQmESh) and work on **Assignment 03B - Ordinal_regression**.

Please work on  **Exercises 1-3**.

---

class: left, middle, rstudio-logo

## Verifying the proportional odds assumption

Make two binomial models...

```r
# convert performance group into 2 binary variables
managers$middlehigh <- ifelse(managers$performance_group  == "Bottom", 0, 1)
managers$high <- ifelse(managers$performance_group  == "High", 1,0 )

# make 2 binomial models for these new binary outcomes
middlehigh_mod <- glm(
  middlehigh ~ test_score + group_size + yrs_employed + concern_flag,
  data = managers,
  family = "binomial"
)

high_mod <- glm(
  high ~ test_score + group_size + yrs_employed + concern_flag,
  data = managers,
  family = "binomial"
)
```

---

class: left, middle, rstudio-logo

## Verifying the proportional odds assumption

...and compare the model results.  Using our best judgement, "small" differences mean we can feel confident that the proportional odds assumption has been met.

```r
(coefficient_comparison <- data.frame(
  middlehigh = summary(middlehigh_mod)$coefficients[ , "Estimate"],
  high= summary(high_mod)$coefficients[ ,"Estimate"],
  diff = summary(high_mod)$coefficients[ ,"Estimate"] - 
    summary(middlehigh_mod)$coefficients[ , "Estimate"]
))
```

```
##                middlehigh          high         diff
## (Intercept)    3.30025975 -2.656607e+01 -29.86632828
## test_score     0.00415042 -1.109849e-16  -0.00415042
## group_size     0.07844190 -5.306566e-15  -0.07844190
## yrs_employed  -0.84044257  8.571284e-15   0.84044257
## concern_flagY -0.31164157 -2.545811e-14   0.31164157
```

---

class: left, middle, rstudio-logo

## Verifying proportional odds assumption, method 2

Another option is to use the Brant-Wald test, which compares the proportional odds model to an approximated generalized ordinal logistic regression model using a chi-squared test. A low p-value can indicate that the coefficient does not satisfy the proportional odds assumption.

```r
library(brant)
brant::brant(manager_mod)
```

```
## -------------------------------------------- 
## Test for	X2	df	probability 
## -------------------------------------------- 
## Omnibus		0.56	4	0.97
## test_score	0.09	1	0.77
## group_size	0.39	1	0.53
## yrs_employed	0.04	1	0.84
## concern_flagY	0.01	1	0.91
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds
```

---

class: left, middle, rstudio-logo

## Exercise - verifying proportional odds assumption

For this exercise, you will verify that the proportional odds assumption holds.

Go to our [RStudio Cloud workspace](https://rstudio.cloud/spaces/230780/join?access_code=7cXJKFU1KUuuZGLwBVQpLG3dIxPUD3jak3ZQmESh) and work on **Assignment 03B - Ordinal_regression**.

Please work on  **Exercise 4**.

---

class: left, middle, rstudio-logo

# &#9749; Let's have a break! &#128524;