Frequentist and Bayesian analyses of Right contex Effects on adaptation of stop consonants (in English)

class: center, middle, inverse, title-slide

# Frequentist and Bayesian analyses of Right contex Effects on adaptation of stop consonants (in English)
### Joselyn Rodriguez
### Rutgers University
### 2021-04-29

---

<!-- to dos

make sure no output is seen for things it shouldnt be seen
make fig size for bayes diagnostic smaller
-->

# Jumping right in: The study
 
- Given previous research finding that when an ambiguous segment is encountered, listeners do not immediately try to disambiguate it to determine a lexical item, but maintain uncertainty until disambiguating information is encountered, we were interested in 
  - (1) what information is available 
  - (2) whether this finer-grained information is available for updating/learning
--

- Moving on...

---

# Jumping right in: The study

- In order to test whether or not this semantic information that is available after encountering an ambiguous item is available to use in updating of representational information, we used a **perceptual learning** paradigm.
  - Norris et al. (2003), Eisner & McQueen (2005),

### Perceptual learning paradigms: the gist

Pre-test --> Recalibration of some sort --> Post-test (yay! learning hopefully)

---

# Methods

- for this study, the methods were similar to other perceptual learning paradigms but instead of having a recalibration take place through lexical disambiguation, we used semantic disambiguation.

- ex:

"After the tent in the | campgrounds collapsed, we went to a hotel."

- the critical sentences were taken from a previous study examining semantic right context effects on sentence comprehension and were re-recorded by one female speaker of American English (not me)
  - Connine et al. (1991)
  
- **importantly**, participants were separated into two biasing conditions:
  - /t/-biasing and /d/-biasing
  - in these conditions, the ambiguous region of the vot continuuum was disambiguated through either /t/ or /d/-biasing semantic contexts

---

# Methods
#### Pre-test

- stimuli for the pre-test consisted of a VOT continuum ranging from 15-85ms at 5ms intervals and were presented as "the tent" and "the dent" with only the VOT of the /t/ and /d/ differing between the two
- this lasted about 11 minutes 
--

#### Exposure 
- sentences mentioned above. Each were "short lag" sentences such that the disambiguating information was provided within 3-5 syllables of encountering the ambiguous word
- participants were asked to respond whether they heard "tent" or "dent" used in the sentence
  - this was both as an attention check as well as a way of comparing these findings to previous studies
    - i.e., Connine et al (1991)
--

#### Pos-test
- identical to the pre-test

---

# Visualize then Analyze
## First, let's take a look at our raw data to get an idea of what it looks like

```r
test %>% 
  select(-workerId) %>% 
  head(n=10)
```

```
## # A tibble: 10 x 10
##    condition stimulus response sender sender_id   vot block_id block resp_t
##    <fct>     <chr>    <chr>    <chr>  <chr>     <dbl>    <dbl> <fct>  <dbl>
##  1 tent-bia… continu… dent     Sounds 6_0_0_1   -27.5        6 pre        0
##  2 tent-bia… continu… dent     Sounds 6_0_1_1   -12.5        6 pre        0
##  3 tent-bia… continu… tent     Sounds 6_0_2_1    -2.5        6 pre        1
##  4 tent-bia… continu… tent     Sounds 6_0_3_1    12.5        6 pre        1
##  5 tent-bia… continu… tent     Sounds 6_0_4_1     7.5        6 pre        1
##  6 tent-bia… continu… tent     Sounds 6_0_5_1    -7.5        6 pre        1
##  7 tent-bia… continu… tent     Sounds 6_0_6_1    17.5        6 pre        1
##  8 tent-bia… continu… tent     Sounds 6_0_7_1     2.5        6 pre        1
##  9 tent-bia… continu… tent     Sounds 6_0_8_1   -37.5        6 pre        1
## 10 tent-bia… continu… tent     Sounds 6_0_9_1    32.5        6 pre        1
## # … with 1 more variable: trial <dbl>
```

---

---
# Analysis (the important part)
## Bayesian vs. Frequentist analysis of the test data

- utilized a multi-level Bayesian model (using brms) and a Frequentist model using (lme4)

- we are utilizing a multi-level logistic regression because...
  - the data is looking at the proportion of /t/-responses (for pre/post tests)
  
- multi-level because...
  - we're interested in group-level effects since we want to look at within-participant and across-condition

---
# Analysis (the important part)
## Bayesian vs. Frequentist analysis of the test data

- For each model, we'll walk through:
    - defining the model
    - model comparison
    - results
    - diagnostics

---

# Analysis
## Test data - Frequentist approach:

- in order to compare the pre- and post-test, I used lme4 to run the following model:

``` 
glmer(resp_t ~ block * condition * vot + 
          (block | workerId) + 
          (1 | stimulus), 
          data = test, family = "binomial"(link = "logit") 
```
  
- taking this apart:
    - glmer: a generali**zed** multi-level regression model 
    - outcome variable: proportion of /t/ responses
    - all categorical effects were dummy coded (although I want to change this to simple effect coding)
    - fixed effects: block (pre vs. post), condition (tent or dent-biasing), and vot
    - random effects: workerId, item, block
      - specifically, random intercepts by item and subject and random slopes for blocks across subjects
      
---
# Analysis
## Test data - Frequentist approach:

- **Note** Model comparisons were completed using (planned) stepwise anova comparisons between models incorporating more complex random effects
  - including the additional random slope led to a slighly better model

```
                   npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)    
freq_analysis_1       9 6092.2 6158.5 -3037.1   6074.2                          
freq_analysis_2      10 5607.7 5681.4 -2793.8   5587.7 486.462  1  < 2.2e-16 ***
freq_analysis_full   12 5599.8 5688.3 -2787.9   5575.8  11.863  2   0.002655 ** 
```

---

# Analysis
## Test data - Frequentist approach:

Let's take a look at the output
- things to consider:
    - the vot was centered around the mean
    - the intercept is pre-test and "tent"-biasing
- "significant" effects:
  - interaction
  - vot (expected)
  - intercept (mean: 0.7267025 prob)
    - the probability of /t/ response in the pre-test and tent-biasing condition given a vot of 50

---

```
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: resp_t ~ block * condition * vot + (block | workerId) + (1 |  
##     stimulus)
##    Data: test
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   5599.8   5688.3  -2787.9   5575.8    11764 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -21.2909  -0.2106   0.1029   0.1994  11.5049 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  workerId (Intercept) 0.7355   0.8576        
##           blockpost   0.1984   0.4454   -0.57
##  stimulus (Intercept) 1.0324   1.0161        
## Number of obs: 11776, groups:  workerId, 46; stimulus, 16
## 
## Fixed effects:
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                          0.977956   0.319327   3.063  0.00219 ** 
## blockpost                           -0.079052   0.140760  -0.562  0.57438    
## conditiondent-biasing                0.112033   0.278827   0.402  0.68783    
## vot                                  0.130904   0.012040  10.873  < 2e-16 ***
## blockpost:conditiondent-biasing     -0.585382   0.203515  -2.876  0.00402 ** 
## blockpost:vot                       -0.004491   0.006573  -0.683  0.49440    
## conditiondent-biasing:vot            0.009660   0.007288   1.326  0.18499    
## blockpost:conditiondent-biasing:vot -0.014036   0.009413  -1.491  0.13592    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) blckps cndtn- vot    blck:- blckp: cndt-:
## blockpost   -0.337                                          
## cndtndnt-bs -0.409  0.382                                   
## vot          0.043 -0.092 -0.044                            
## blckpst:cn-  0.228 -0.687 -0.576  0.058                     
## blockpst:vt -0.075  0.305  0.077 -0.287 -0.197              
## cndtndnt-b: -0.062  0.138  0.190 -0.247 -0.253  0.438       
## blckpst:c-:  0.041 -0.197 -0.144  0.184  0.274 -0.653 -0.754
```

---
#### Diagnostics

- unfortunately, the residuals don't look particularly normal though
- (none of the models' residuals looked normal)

---
# Analysis
## Test data - Bayesian approach

- alright, this is very similar, but we're going to make a few adjustments. 
- setting priors:
  - given that it's on a logistic scale [-inf, inf] (but basically [-4,4]) we can set the intercept prior as a `\(\mathcal{N}\)`(0,1) and population level priors as a `\(\mathcal{N}\)`(0, 0.5)
  - (bernoulli is just a special case of binomial dist w/two outcomes)

```
brm(resp_t ~ block * condition * vot + 
                                  (block | workerId) +
                                  (1 | stimulus),
                                  data = test, family = "bernoulli"(link = "logit"), 
                                  prior = c(prior("normal(0,0.5)", class = "b"), 
                                            prior("normal(0,1", class = "Intercept")), 
                                  save_pars = save_pars(all = T)) 
```

---
# Analysis
## Test data - Bayesian approach:

- **Note** Model comparisons were run using waic and comparison of elpdf_diff finding the full model to be slightly better (lower WAIC)

```
elpd_diff                     se_diff
bayes_analysis_full  0.0       0.0   
bayes_analysis_2    -8.5       4.6
```

---

```
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: resp_t ~ block * condition * vot + (block | workerId) + (1 | stimulus) 
##    Data: test (Number of observations: 11776) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~stimulus (Number of levels: 16) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.18      0.25     0.80     1.78 1.00      968     1461
## 
## ~workerId (Number of levels: 46) 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)                0.88      0.11     0.68     1.14 1.00     1076
## sd(blockpost)                0.46      0.12     0.23     0.70 1.00     1066
## cor(Intercept,blockpost)    -0.52      0.18    -0.81    -0.09 1.00     2570
##                          Tail_ESS
## sd(Intercept)                1913
## sd(blockpost)                1336
## cor(Intercept,blockpost)     2641
## 
## Population-Level Effects: 
##                                     Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                               0.94      0.35     0.20     1.61 1.00
## blockpost                              -0.11      0.13    -0.38     0.15 1.00
## conditiondentMbiasing                   0.04      0.25    -0.46     0.52 1.00
## vot                                     0.13      0.01     0.10     0.16 1.00
## blockpost:conditiondentMbiasing        -0.51      0.18    -0.86    -0.15 1.00
## blockpost:vot                          -0.00      0.01    -0.02     0.01 1.00
## conditiondentMbiasing:vot               0.01      0.01    -0.01     0.02 1.00
## blockpost:conditiondentMbiasing:vot    -0.01      0.01    -0.03     0.01 1.00
##                                     Bulk_ESS Tail_ESS
## Intercept                                874     1088
## blockpost                               1833     2193
## conditiondentMbiasing                   1083     1703
## vot                                     1068     1062
## blockpost:conditiondentMbiasing         2094     2137
## blockpost:vot                           3004     2730
## conditiondentMbiasing:vot               2815     2886
## blockpost:conditiondentMbiasing:vot     2712     2762
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

---
# Analysis
## Test data - Bayesian approach

.pull-left[
- let's take a closer look at our posterior estimates

]
---

---

- focusing in on the parameters outside of ROPE

<img src="final_presentation_files/figure-html/unnamed-chunk-3-1.png" width="720" />
---

#### diagnostics

- posterior predictive checks

.pull-left[
<img src="final_presentation_files/figure-html/unnamed-chunk-4-1.png" width="504" />
]

.pull-right[
<img src="final_presentation_files/figure-html/unnamed-chunk-5-1.png" width="504" />
]

---
# Statistical Takeaways

- in the end, the results from both the bayesian and frequentist models were the same:
  - the strongest predictor was the interaction term between condition and block which was expected

.pull-left[
```
Population-Level Effects: 
                                Estimate 
Intercept                            0.92      
blockpost                           -0.11     
conditiondentMbiasing                0.05      
vot                                  0.13      
blockpost:conditiondentMbiasing     -0.50      
blockpost:vot                       -0.00     
conditiondentMbiasing:vot            0.01      
blockpost:conditiondentMbiasing:vot -0.01   
```
]

.pull-right[
```
Fixed effects:
                                   Estimate   
(Intercept)                          0.9779  
blockpost                           -0.0790  
conditiondent-biasing                0.1120       
vot                                  0.1309   
blockpost:conditiondent-biasing     -0.5853   
blockpost:vot                       -0.0044      
conditiondent-biasing:vot            0.0096      
blockpost:conditiondent-biasing:vot -0.0140   
```
]

- although running the frequentist model did take a bit more computational power than others because it included random slopes for the blocks by subject
  - if it hadn't been able to converge, then the bayesian model would've really been necessary

---

# Experimental Takeaways

- it looks like there was some effect of learning between the pre and post tests
- but, keep in mind that it really doesn't look like the ambiguous region was actually ambiguous and was therefore already biasing people towards /t/ responses before even completing the exposure phase
  - because of this, there is an effect in the post-condition but only for those in the /d/-biasing condition really

---
# References

Connine, C. M., Blasko, D. G., & Hall, M. (1991). Effects of subsequent sentence context in auditory word recognition: Temporal and linguistic constrainst. Journal of Memory and Language, 30(2), 234-250.

Eisner, F., & McQueen, J. M. (2005). The specificity of perceptual learning in speech
processing. Perception & Psychophysics, 67(2), 224–238.

Kraljic, T., & Samuel, A. G. (2005). Perceptual learning for speech: Is there a return to
normal? Cognitive Psychology, 51(2), 141–178.

Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive
Psychology, 47(2), 204–238.