Summary: Oct. 19th, 2017 Discussion

Nicole Mideo presented “Letters of recommendation: data on gender bias”. Here is her summary:

We learned in a previous BREWS discussion about implicit bias and it’s clear that reference letters are an important part of progressing up any career ladder, so for this discussion we asked whether implicit bias could impact the quality of letters being written for males versus females.

We began with a discussion of the Trix & Psenka (2003) paper which qualitatively compared reference letters written for male and female applicants for faculty positions at US med schools. Letters for female applicants were shorter, included more grindstone words (e.g., “dependable”), fewer repeated standout words (e.g., “excellent”), included more ‘doubt raisers’ and gender terms.

Given BREWS’s emphasis on data, I collected my own, running a bunch of reference letters written by me and other professors in EEB through an online implicit bias detector. The estimated bias in letters I have written spanned from quite heavily female-biased language to quite heavily male-biased language. This was true for the full dataset too. With the help of John Stinchcombe, I analysed bias in these letters using mixed effects models, with a random effect of professor (anonymised) and fixed effects of gender, stage, prof gender, and all interactions. Two surprises emerged. (1) There was no significant effect of the gender of the person the letter was being written about. (2) There was a significant effect of the stage of that person. Letters for undergrads (specifically ones who the letter writer only knew from lecture courses) were more female-biased than letters for undergrads who had worked in the lab, while letters for grad students, postdocs, and faculty were further towards the male-biased end of the spectrum.

These results got me digging into the guts of the algorithm. Standout words, ability words (e.g., “intelligent”), and research words (e.g., “data”) are all categorised as male-biased, while grindstone words and teaching words (e.g., “course”) are categorised as female-biased. The algorithm counts the instances of these types of words in a letter, looks at the difference in numbers of male and female biased words and divides this difference by the total number of “gendered” words to get an estimate of bias. So, the stage results make a lot of sense. If a letter writer knows a student only from a lecture course, then there are likely to be a lot of teaching words that lead to “female-biased language”.

The algorithms cite a Schmader et al. (2007) study as inspiration, which statistically compared letters written for male and female applicants for faculty positions in chemistry and biochemistry. The results showed a significant reduction in standout adjectives in letters for females, but all other differences were non-significant or marginal, e.g., letters for females didn’t contain significantly fewer research or ability words nor did they contain significantly more teaching or grindstone words. Despite this, the online algorithms still assume a gender bias in all of those categories of words. This seems problematic and we discussed rerunning the analysis with only the standout adjectives as being ‘gendered’.

Finally, we discussed what evidence exists that the words categorized as female-biased result in weaker letters, which seems to be the underlying operating assumption; the Trix & Psenka study, however, only looked at letters for successful applicants! We felt that the specific job could alter the value of word categories (e.g., teaching words will be very valuable for applicants to teachers’ college), and that different fields will have different ‘cultures’ of letter writing.

Overall, we agreed that assessing implicit bias in reference letter writing requires more data and rigorous analysis.