Contrastable

This post gives an overview of an R package I’ve written. The contrastable package provides a tidy approach to contrast coding for regression analyses.
R pkg
Published

July 13, 2022

I’ve been working on a package called contrastable on and off for the past year or so. The package’s goal is to provide a tidy approach to setting factor contrasts for regression analysis. While this can be done with repeated contrasts<- calls, this workflow is tedious when working with multiple factors and especially error-prone when manually specifying contrast matrices to use. In this latter case, the user would need to be careful to specify the correct fractions in the correct order with the correct signs, which can be a lot to keep track of. These issues quickly become apparent when the number of factor levels is greater than 2. In this post I will:

Where to find contrastable

You can find the contrastable package on my Github at this repo: tsostarics/contrastable You can install it like so:

devtools::install_github('tsostarics/contrastable')
remotes::install_github('tsostarics/contrastable')

Contrasts overview

Contrast coding refers to assigning numeric values to levels of a categorical variable for use in regression analyses. Depending on the numbers used, different comparisons can be made between the group means of a variable. These comparisons can correspond to particular null hypotheses that a researcher might have, and particular combinations of numbers can encode high-level questions like “Are there differences between levels when compared to a common reference level?” or “Does each level differ from the levels that came before it?” Critically, the contrasts used don’t impact the model fit but do impact the coefficient estimates that are used to make inferences about the data. For example, you might conclude that there’s an overall effect of some factor when in reality the effect (shown by the coefficient estimate) is an effect that only holds for one particular group!

Consider an example where you have two groups of listeners where English is their native (L1) or non-native (L2) language. You might be interested in whether reading times are slower or faster in two different syntactic conditions, such as active vs passive constructions. Two possible research questions might be whether there’s an main effect of syntax on reading times on the one hand or whether there’s a simple effect such that reading times in the passive construction are only slower for L2 speakers.1 These are similar, but different, research questions and more importantly, the interpretation of one coefficient depends on how other variables are coded. Many researchers realize (or are starting to at least) that the default2 “0/1 contrasts” (aka treatment or dummy coding) will only give them the simple effect of structure, but if what you’re interested in is that main effect, then your statistics has not yet answered your question! To rectify this, researchers will opt for “the +.5/-.5 contrasts” to obtain main effects.

Warning: Naming inconsistency

The name for this contrast scheme is not consistent, especially in the 2-level case where the values are +.5/-.5. I’ve seen it called sum coding, simple coding, effects coding, scaled sum coding, helmert coding, difference coding, contrast coding, sum-to-zero coding, and +.5/-.5 coding. See Brehm and Alday (2022) for concerns about transparent descriptions of contrast coding schemes.

While researchers may differ on what they call +.5/-.5, for 2 levels the result is nonetheless the same. But, a researcher using “helmert coding” and a researcher using “scaled sum coding” for a factor with 3 or more levels will be using very different contrast matrices, and thus address very different research questions about the data. Let’s use some functions from contrastable to look at how these contrast matrices differ.

library(contrastable)
helmert_code(2) |> MASS::fractions()
     [,1]
[1,]  1/2
[2,] -1/2
scaled_sum_code(2) |> MASS::fractions()
  [,1]
1  1/2
2 -1/2
helmert_code(3) |> MASS::fractions()
     [,1] [,2]
[1,]  2/3    0
[2,] -1/3  1/2
[3,] -1/3 -1/2
scaled_sum_code(3) |> MASS::fractions()
  [,1] [,2]
1  2/3 -1/3
2 -1/3  2/3
3 -1/3 -1/3
helmert_code(5) |> MASS::fractions()
     [,1] [,2] [,3] [,4]
[1,]  4/5    0    0    0
[2,] -1/5  3/4    0    0
[3,] -1/5 -1/4  2/3    0
[4,] -1/5 -1/4 -1/3  1/2
[5,] -1/5 -1/4 -1/3 -1/2
scaled_sum_code(5) |> MASS::fractions()
  [,1] [,2] [,3] [,4]
1  4/5 -1/5 -1/5 -1/5
2 -1/5  4/5 -1/5 -1/5
3 -1/5 -1/5  4/5 -1/5
4 -1/5 -1/5 -1/5  4/5
5 -1/5 -1/5 -1/5 -1/5

Note that I use the term scaled sum coding for the “pairwise comparisons with main effects” contrast scheme. I opt for this term for three reasons.

  • First, I see sum coding used more frequently in statistics and econometrics to refer to +1/-1; this is also what contr.sum in R returns.
  • Second, the salient part of going from sum coding to scaled sum coding, especially in the 2-level case, is that there’s some kind of division or scaling operation involved; I frequently see people use contr.sum(2)/2, although importantly contr.sum(3)/3 does not yield the expected result.
  • Third, “simple” coding is counterintuitive to me since we’re trying to avoid “simple effects;” “effects coding” and “contrast coding” are largely meaningless as all coding schemes will encode some kind of effect, and setting any contrast matrix is an instance of contrast coding.

So, for the researcher trying to remember “I need to use those contrasts where they’re divided to get the main effects”, it (to me) seems easy to reach for a tool where scaled is in the name and is clearly distinguished from sum coding. 3

Typical approach to contrast coding

Typically when I see people in Linguistics set contrasts, they do something like the following, using the palmerpenguins dataset as an example.

Code
library(dplyr)          # Data wrangling
library(purrr)          # Mapping functions
library(palmerpenguins) # Dataset
penguins_with_contrasts <- penguins

# Default treatment/dummy coding for a 2 and 3 level factor
contrasts(penguins_with_contrasts$sex)
       male
female    0
male      1
Code
contrasts(penguins_with_contrasts$species)
          Chinstrap Gentoo
Adelie            0      0
Chinstrap         1      0
Gentoo            0      1
Code
# Easy enough for 2 levels, -contr.sum(2)/2 is also used a lot
contrasts(penguins_with_contrasts$sex) <- c(-.5, .5) 

# Not so fun for three levels!
contrasts(penguins_with_contrasts$species) <- matrix(c(-1/3, 2/3, -1/3,
                                                       -1/3, -1/3, 2/3),
                                                     nrow = 3)

The chance of making a mistake increases when including more and more categorical variables. Catching these mistakes can be very difficult, in part because this workflow erases the labels in the regression output. This means you have to keep track of what 1 and 2 in the regression coefficients correspond to.

Note: Column/Comparison/Coefficient names

While the dimnames argument can be used to set the labels, anecdotally I rarely see people use this in their analyses when perusing code on the osf. Winter (2019, 127) notes that “Using the ‘1’ after the predictor name is a notational convention for representing the slopes of sum-coded predictors in R” but this is slightly incorrect; in the absence of dimnames being set, R will use the numeric indices of the contrast matrix’s columns (no matter what the scheme is).

Below, the two sets of coefficients represent pairwise comparisons to the Adelie baseline, but the intercepts differ due to how the contrasts are set, with the first using treatment coding and the second using scaled sum coding. I’ll start with a case that only considers the categorical variable, but will include an additional continuous independent variable later on.

Code
# Compare the default treatment coding with the penguins dataset
# with the contrasts we specified in penguins_with_contrasts
treatment_coefs <- coef(lm(bill_length_mm ~ species,
                           data = penguins))    
scaledsum_coefs <- coef(lm(bill_length_mm ~ species, 
                           data = penguins_with_contrasts)) 

# I'm using list() to print and caption results side by side, purely aesthetic
list("(Default) Treatment Coding" = treatment_coefs,
     "(Manual) Scaled Sum Coding" = scaledsum_coefs)
$`(Default) Treatment Coding`
     (Intercept) speciesChinstrap    speciesGentoo 
       38.791391        10.042433         8.713487 

$`(Manual) Scaled Sum Coding`
(Intercept)    species1    species2 
  45.043364   10.042433    8.713487 

The model coefficients for the scaled sum coding shows the same pairwise comparisons as the model using treatment coding, but the intercepts differ. We can check what they correspond to manually:

Code
group_means <- 
  penguins |>
  dplyr::group_by(species) |> 
  dplyr::summarize(mean_length = mean(bill_length_mm, na.rm = TRUE)) |> 
  purrr::pluck('mean_length') |> 
  `names<-`(c('Adelie', 'Chinstrap', 'Gentoo'))

list("Group means"= group_means,
     "Grand mean" = mean(group_means))
$`Group means`
   Adelie Chinstrap    Gentoo 
 38.79139  48.83382  47.50488 

$`Grand mean`
[1] 45.04336

So the intercept for the treatment coded model is the mean of the Adelie group while the scaled sum coded model is the grand mean, or the mean of group means. But, typing in the scaled sum contrast matrix was a bit obnoxious with all the -1/3 we typed. If we had made a slight mistake while typing the matrix out, what would have happened to our model? Would our coefficients reflect the averages and differences we were expecting? As an example, let’s see what happens when we change a 2/3 to 1/3:

Code
# What if we accidentally typed 1/3 instead of 2/3?
contrasts(penguins_with_contrasts$species) <- matrix(c(-1/3, 1/3, -1/3,
                                                       -1/3, -1/3, 2/3),
                                                     nrow = 3)
mistake_coefs <- coef(lm(bill_length_mm ~ species, 
                         data = penguins_with_contrasts))

list("(Current) Mistaken Scaled Sum Coding:" = mistake_coefs,
     "(Previous) Correct Scaled Sum Coding:" = scaledsum_coefs)
$`(Current) Mistaken Scaled Sum Coding:`
(Intercept)    species1    species2 
  46.717103   15.063649    8.713487 

$`(Previous) Correct Scaled Sum Coding:`
(Intercept)    species1    species2 
  45.043364   10.042433    8.713487 

Here we can see that the intercept and the value for species1 have increased in magnitude. In particular, the new reported effect of species1 is much larger than it previously was. If we stopped at this point, we would conclude that the difference in bill length between the Chinstrap and Adelie groups is a whopping 15mm (remember we originally calculated it to be about 10). If we were interested in whether there was a positive or negative difference that was significant or not, we’d still make that conclusion, but any claims about the magnitude of the effect would be misguided. This problem opens up a related question though: What does this new inflated-in-magitude coefficient estimate represent?

Diagnosing our mistake

To check what these numbers correspond to, we have to check the hypothesis matrix that corresponds to our contrast matrix. The process of obtaining the hypothesis matrix has been referred to as finding the generalized inverse of the contrast matrix (see Schad et al. 2020 for details).

Code
matrix(c(1, 1, 1,         # Add a column of 1s for the intercept
         -1/3, 1/3, -1/3,
         -1/3, -1/3, 2/3),
       nrow = 3,
       dimnames = list(NULL, c('Intercept', 'species1', 'species2'))) |> 
  t() |> 
  solve() |> 
  MASS::fractions() # This function just shows numbers as fractions
     Intercept species1 species2
[1,]  1/6      -3/2       -1    
[2,]  1/2       3/2        0    
[3,]  1/3         0        1    

Here the intercept is represented by the weighted sum of each group mean, where the weights are shown in the intercept column. In most cases, the intercept should reflect the grand mean, or the mean of the group means, and so would usually have equal weights (i.e., 1/3 here) for the levels. In this case, we see the fractional weights are not the same. We can verify this by calculating the weighted mean ourselves:

Code
list("Grand Mean" = mean(group_means),
     "Weighted mean" = weighted.mean(group_means, c(1/6, 1/2, 1/3)))
$`Grand Mean`
[1] 45.04336

$`Weighted mean`
[1] 46.7171

Similarly, the coefficient for species1 shows the difference between the group means of levels 1 and 2 (i.e., mean of Chinstrap - mean of Adelie) but times a factor of 3/2. Crucially, if our goal is to evaluate the difference between the means of these two levels, then our mistake in coding the hypothesis matrix will give us a larger estimate (~15 vs 10). Consider a similar setup where the larger estimate was 5 instead of 0; if we were relying on null hypothesis testing it’s possible we’d get a significant effect when really we shouldn’t have.

Code
list("Mistaken Scaled Sum Coding" = mistake_coefs,
     "Correct Scaled Sum Coding" = scaledsum_coefs,
     "Computed Chinstrap-Adelie Difference with 3/2 scaling" = 
       (3/2 * group_means[['Chinstrap']]) - (3/2 * group_means[['Adelie']]),
     "Actual Chinstrap-Adelie Difference" = 
       group_means[['Chinstrap']] - group_means[['Adelie']])
$`Mistaken Scaled Sum Coding`
(Intercept)    species1    species2 
  46.717103   15.063649    8.713487 

$`Correct Scaled Sum Coding`
(Intercept)    species1    species2 
  45.043364   10.042433    8.713487 

$`Computed Chinstrap-Adelie Difference with 3/2 scaling`
[1] 15.06365

$`Actual Chinstrap-Adelie Difference`
[1] 10.04243

Point being: we made an honest mistake of typing 1/3 instead of 2/3 but this had ramifications for the coefficients in our model output that we use to make inferences. In practice, because we did the multiple contrasts<- calls, we would likely assume that what we did was correct in the absence of any errors.

Tidy approach to contrasts

Here I’ll show a different approach using the contrastable package. This package takes a tidy approach to take care of the overhead of labels and reference levels involved when using common contrast coding schemes. Specifically, this package provides a series of functions that use a special formula implementation that assigns specific meanings to each operator. The left hand side of the formula is the factor column whose contrasts you want to change. The right hand side consists of (at minimum) a function to generate contrast matrices such as contr.treatment or treatment_code. Additional operators provide extra optional functionality:

  • + x: Set reference level to level x
  • * x: Set intercept to be the mean of x
  • - 3:4: For polynomial contrasts only, drop trends 3 and 4
  • | c("A-B", "A-C"): Set the comparison labels to A-B and A-C (must be the last operator if used)

Recall that in many cases researchers want pairwise comparisons while retaining main effects, and so the choice of reference level for the comparisons is very important. By default, R uses the first level alphabetically as the reference level, but sometimes we want to change this manually ourselves. Here’s an example where we set the sex and species factors to the two contrast schemes we manually set before. The set_contrasts function will show a message if it detects additional factor variables in the dataframe that the user did not provide contrasts for.

Code
# library(contrastable) was loaded earlier
penguins_df <- 
  penguins |> 
  set_contrasts(sex ~ scaled_sum_code + "male", # Set reference level with +
                species ~ scaled_sum_code + 'Adelie') 
Expect contr.treatment or contr.poly for unset factors: island
Code
contrasts(penguins_df$species) |> MASS::fractions()
          Chinstrap Gentoo
Adelie    -1/3      -1/3  
Chinstrap  2/3      -1/3  
Gentoo    -1/3       2/3  
Code
contrasts(penguins_df$sex) |> MASS::fractions()
       female
female  1/2  
male   -1/2  

penguins_df now has its contrasts set, and we can run our model as usual. Note that we didn’t have to type out any matrices ourselves, but we got the correct contrasts that we needed.

Code
coef(lm(bill_length_mm ~ species + bill_depth_mm, data = penguins_df))
     (Intercept) speciesChinstrap    speciesGentoo    bill_depth_mm 
       20.997115         9.938955        13.403279         1.394011 

If we wanted to change the labels to better reflect the comparisons being made, we could do that in the formula too with the | operator.

Code
penguins_df <- 
  penguins_df |> 
  set_contrasts(species ~ scaled_sum_code + 'Adelie' | 
                  c('Chinstrap-Ad', 'Gentoo-Ad'))

coef(lm(bill_length_mm ~ species, data = penguins_df))
        (Intercept) speciesChinstrap-Ad    speciesGentoo-Ad 
          45.043364           10.042433            8.713487 

Additional functions

Typically when I use this package in my analyses the set_contrasts function is all I really need, but there are other functions that follow the same syntax that provide other information. To avoid retyping things, I’ll usually keep the contrasts in a list assigned to a separate variable and pass that to functions.

The glimpse_contrasts function can show information about the factors in a dataset along with the contrast schemes that have been assigned to each factor.

Code
my_contrasts <- 
  list(
    sex ~ scaled_sum_code + 'female',
    species ~ helmert_code
  )

glimpse_contrasts(penguins_df, my_contrasts) |> gt::gt()
factor n_levels level_names scheme reference intercept orthogonal centered dropped_trends explicitly_set
sex 2 female, male scaled_sum_code female grand mean NA TRUE NA TRUE
species 3 Adelie, Chinstrap, Gentoo helmert_code Gentoo grand mean TRUE TRUE NA TRUE
island 3 Biscoe, Dream, Torgersen contr.treatment Biscoe mean(Biscoe) FALSE FALSE NA FALSE

The enlist_contrasts function does the same thing as set_contrasts, but returns a list of contrast matrices that can be used in the contrasts argument of some model-fitting functions.4 It also provides an easy way to show the contrast matrices in an appendix or supplementary material.

Code
enlist_contrasts(penguins_df, my_contrasts) |> purrr::map(MASS::fractions)
Expect contr.treatment or contr.poly for unset factors: island
$sex
       male
female -1/2
male    1/2

$species
          >Adelie >Chinstrap
Adelie     2/3       0      
Chinstrap -1/3     1/2      
Gentoo    -1/3    -1/2      

Available contrast schemes

Here are the different contrast functions this package currently provides.

Code
# = contr.treatment
treatment_code(5) |> MASS::fractions()
  2 3 4 5
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 0 0 1 0
5 0 0 0 1
Code
# = contr.sum
sum_code(5) |> MASS::fractions()
  [,1] [,2] [,3] [,4]
1  1    0    0    0  
2  0    1    0    0  
3  0    0    1    0  
4  0    0    0    1  
5 -1   -1   -1   -1  
Code
# = contr.sum
scaled_sum_code(5) |> MASS::fractions()
  [,1] [,2] [,3] [,4]
1  4/5 -1/5 -1/5 -1/5
2 -1/5  4/5 -1/5 -1/5
3 -1/5 -1/5  4/5 -1/5
4 -1/5 -1/5 -1/5  4/5
5 -1/5 -1/5 -1/5 -1/5
Code
# NOT = contr.helmert, which is unscaled
helmert_code(5) |> MASS::fractions()
     [,1] [,2] [,3] [,4]
[1,]  4/5    0    0    0
[2,] -1/5  3/4    0    0
[3,] -1/5 -1/4  2/3    0
[4,] -1/5 -1/4 -1/3  1/2
[5,] -1/5 -1/4 -1/3 -1/2
Code
reverse_helmert_code(5) |> MASS::fractions()
     [,1] [,2] [,3] [,4]
[1,] -1/2 -1/3 -1/4 -1/5
[2,]  1/2 -1/3 -1/4 -1/5
[3,]    0  2/3 -1/4 -1/5
[4,]    0    0  3/4 -1/5
[5,]    0    0    0  4/5
Code
forward_difference_code(5) |> MASS::fractions()
     [,1] [,2] [,3] [,4]
[1,]  4/5  3/5  2/5  1/5
[2,] -1/5  3/5  2/5  1/5
[3,] -1/5 -2/5  2/5  1/5
[4,] -1/5 -2/5 -3/5  1/5
[5,] -1/5 -2/5 -3/5 -4/5
Code
backward_difference_code(5) |> MASS::fractions()
     [,1] [,2] [,3] [,4]
[1,] -4/5 -3/5 -2/5 -1/5
[2,]  1/5 -3/5 -2/5 -1/5
[3,]  1/5  2/5 -2/5 -1/5
[4,]  1/5  2/5  3/5 -1/5
[5,]  1/5  2/5  3/5  4/5
Code
# = contr.poly, poly(1:n, degree = n-1, raw = FALSE)
orth_polynomial_code(5) |> MASS::fractions()
     .L                .Q                .C                ^4               
[1,]          -265/419          929/1738  -2026009/6406803        4018/33617
[2,]  -2026009/6406803         -809/3027           191/302    -246481/515552
[3,]                 0       -6263/11717                 0         1042/1453
[4,] 12484830/39480499         -809/3027          -265/419    -246481/515552
[5,]           191/302          929/1738 10458821/33073696        4018/33617
Code
# = poly(1:n, degree = n-1, raw = TRUE)
raw_polynomial_code(5) |> MASS::fractions()
     1   2   3   4  
[1,]   1   1   1   1
[2,]   2   4   8  16
[3,]   3   9  27  81
[4,]   4  16  64 256
[5,]   5  25 125 625

Other packages and resources

This package is not the first package made for contrast coding, though to my knowledge it is the first to take a “tidy” approach to it.

The hypr package (Rabe et al. 2020) takes a different approach, where the focus is on considering the hypothesis matrix and declaring specifically which comparisons you want to make, then the package can provide a corresponding matrix. I like hypr a lot actually, but I find it a bit tedious when I know what the contrast matrix should look like but I have to type out the comparisons; still better than matrix calls though.

The emmeans package (Lenth 2022) is extremely useful for making pairwise comparisons, but is capable of a lot more as well. You can see its vignette on contrasts here.

The multcomp package (Hothorn, Bretz, and Westfall 2008) is useful for simultaneous inference, which seeks to extend workflows for multiple comparisons.

I haven’t used the contrasts package (O’Callaghan 2021) very much, but judging from its vignette here it seems like it extends the rms package’s contrast function (Harrell Jr 2022). It seems useful for calculating different comparisons after a model is run, but its usage isn’t very transparent to me on first glance.

While not a package, this page from UCLA pops up a lot when people discuss contrast coding. It’s very useful, and I used it as a starting point for implementing different contrast functions. However, I will note that I don’t follow its naming conventions.

Code
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] palmerpenguins_0.1.1 purrr_1.0.0          dplyr_1.1.0         
[4] contrastable_0.1.0  

loaded via a namespace (and not attached):
 [1] pillar_1.8.1      compiler_4.2.2    tools_4.2.2       digest_0.6.31    
 [5] jsonlite_1.8.4    evaluate_0.20     lifecycle_1.0.3   tibble_3.1.8     
 [9] gtable_0.3.1      pkgconfig_2.0.3   rlang_1.0.6       cli_3.5.0        
[13] rstudioapi_0.14   yaml_2.3.7        xfun_0.37         fastmap_1.1.0    
[17] withr_2.5.0       stringr_1.5.0     knitr_1.42        sass_0.4.5       
[21] generics_0.1.3    vctrs_0.5.2       htmlwidgets_1.6.1 grid_4.2.2       
[25] tidyselect_1.2.0  glue_1.6.2        R6_2.5.1          fansi_1.0.3      
[29] rmarkdown_2.20    ggplot2_3.4.0     magrittr_2.0.3    scales_1.2.1     
[33] htmltools_0.5.4   MASS_7.3-58.1     gt_0.8.0          colorspace_2.0-3 
[37] utf8_1.2.2        stringi_1.7.8     munsell_0.5.0     crayon_1.5.2     

References

Brehm, Laurel, and Phillip M. Alday. 2022. “Contrast Coding Choices in a Decade of Mixed Models.” Journal of Memory and Language 125: 104334. https://doi.org/10.1016/j.jml.2022.104334.
Harrell Jr, Frank E. 2022. Rms: Regression Modeling Strategies. https://CRAN.R-project.org/package=rms.
Hothorn, Torsten, Frank Bretz, and Peter Westfall. 2008. “Simultaneous Inference in General Parametric Models.” Biometrical Journal 50 (3): 346–63. https://cran.r-project.org/package=multcomp.
Lenth, Russell V. 2022. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. https://CRAN.R-project.org/package=emmeans.
O’Callaghan, Alan. 2021. Contrast: A Collection of Contrast Methods. https://CRAN.R-project.org/package=contrast.
Rabe, Maximilian M., Shravan Vasishth, Sven Hohenstein, Reinhold Kliegl, and Daniel J. Schad. 2020. “Hypr: An r Package for Hypothesis-Driven Contrast Coding.” Journal of Open Source Software 5 (48): 2134. https://doi.org/10.21105/joss.02134.
Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2020. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110: 104038. https://doi.org/10.1016/j.jml.2019.104038.
Winter, Bodo. 2019. Statistics for Linguists: An Introduction Using r. Routledge.

Footnotes

  1. One might also see main effects referred to as marginal effects and simple effects referred to as conditional effects. For the latter, the idea is that, for example, the “effect of syntax” is the effect of syntax given the reference level of the other categorical predictor. The main effect of syntax would be the average of the syntax effect for the L1 group and the syntax effort for the L2 group.↩︎

  2. You can view your default contrasts using options('contrasts'), where the unordered default is typically contr.treatment and the ordered default is typically contr.poly. I do not recommend changing these defaults because even though you may save some time for yourself, if you forget to explicitly mention that you changed the defaults then your analyses won’t be reproducible by the vast majority of others who have not changed the deafults.↩︎

  3. If it were really up to me, I would call it “centered pairwise coding” since you get pairwise comparisons with the intercept centered on the grand mean, but it doesn’t quite roll off the tongue and it’s a bit wordy to type out.↩︎

  4. Not all modeling functions use the contrasts argument, brms::brm is one example. In these cases you must set the contrast matrix to the factor, hence set_contrasts is typically more useful for the actual modeling part while enlist_contrasts is useful for showing matrices.↩︎

Citation

BibTeX citation:
@online{sostarics2022,
  author = {Thomas Sostarics},
  title = {Contrastable},
  date = {2022-07-13},
  url = {https://tsostaricsblog.netlify.app/posts/contrastable/},
  langid = {en}
}
For attribution, please cite this work as:
Thomas Sostarics. 2022. “Contrastable.” July 13, 2022. https://tsostaricsblog.netlify.app/posts/contrastable/.