This Week’s Good Reads: Staff Scientists, Gender Bias, Open Access, and Peer Review’s Repeat Referees

1) One of the most popular posts on this blog was when I wrote about how little we know about ecology career paths after the PhD and suggested a population biology model for studying it. Far from limited to ecology, most post-PhD careers are not understood or tracked. A new survey through Science Careers is now tackling this question with an opt-in survey: Help Solve the Mystery of the Disappearing PhD’s, Science Careers, Beryl Lieff Benderly.

2) In last week’s good reads, I mentioned an article with good data graphics showing the post-doc pile-up and asking readers what to do about it. The results of the poll are in: Wanted: Staff-scientist positions for postdocs, Nature, Kendall Powell.

3) A new study about women in science concluded that female applicants for tenure-track jobs are preferred 2 to 1. Needless to say, it has sparked some controversy. (Read Nature News‘s summary of the study: Leading Scientists Favour Women in Tenure-Track Hiring Test, Nature, Boer Deng). For a thorough criticism of the study, check out: The Myth About Women in Science? Bias in the Study of Gender Inequality in STEM, The Other Sociologist, Zuleyka Zevallos

4) I recently blogged about one of Paige Jarreau’s results in her doctoral study of science blogging practices. Now, her full dissertation is up and ready for reading. In it, she reviews the literature on science blogging and then unpacks her results on why science bloggers blog, how they choose what to write about, who they are, whose blogs are read the most, and who is paid versus not. Check it out: All the Science That Is Fit to Blog: An Analysis of Science Blogging Practices, Dissertation, Paige Brown Jarreau.

4) NSF is requiring research the agency funds be open access, much like NIH: US Agencies Fall in Line on Public Access, Science, Jocelyn Kaiser.

5)  FYI: Rejected mss often get the same referees when resubmitted to a different journal, Dynamic Ecology, Jeremy Fox.

6) New Opportunities at the Interface of Ecology and Statistics, Methods in Ecology and Evolution, David Warton. (Full article paywalled) The new issue of Methods in Ecology and Evolution came out this week, and focuses on statistical methods ecologists should be aware of. Many of the papers focus on novel methods for understanding species distributions, especially to deal with common limitations in ecological data.

7) Cool study and video on hummingbird flight in high winds: Putting Hummingbirds to the Test, Smithsonian, Erin Blakemore.

And in other news:


This Week’s Good Reads: Staff Scientists, Gender Bias, Open Access, and Peer Review’s Repeat Referees — 3 Comments

  1. [This below comment is a response to Dr. Zuleyka Zevallos’s critique of the PNAS study on STEM faculty hiring bias by Wendy Williams and Stephen Ceci.

    Zuleyka, thank you for your engaging and well researched perspective. On Twitter, you mentioned that you were interested in my take on the study’s methods. So here are my thoughts.

    I’ll respond to your methodological critiques point-by-point in the same order as you: (a) self-selection bias is a concern, (b) raters likely suspected study’s purpose, and (c) study did not simulate the real world. Have I missed anything? If so, let me know. Then I’ll also discuss the rigor of the peer review process.

    As a forewarning to readers, the first half of this comment may come across as a boring methods discussion. However, the second half talks a little bit about the relevant players in this story and how the story has unfolded over time. Hence, the second half of this comment may interest a broader readership than the first half. But nevertheless, let’s dig into the methods.


    You note how emails were sent out to 2,090 professors in the first three of five experiments, of which 711 provided data yielding a response rate of 34%. You also note a control experiment involving psychology professors that aimed to assess self-selection bias.

    You critique this control experiment because, “including psychology as a control is not a true reflection of gender bias in broader STEM fields.” Would that experiment have been better if it incorporated other STEM fields? Sure.

    But there’s other data that also speak to this issue. Analyses reported in the Supporting Information found that respondents and nonrespondents were similar “in terms of their gender, rank, and discipline.” And that finding held true across all four sampled STEM fields, not just psychology.

    The authors note this type of analysis “has often been the only validation check researchers have utilized in experimental email surveys.” And often such analyses aren’t even done in many studies. Hence, the control experiment with psychology was their attempt to improve prior methodological approaches and was only one part of their strategy for assessing self-selection bias.


    You noted that, for faculty raters, “it is very easy to see from their study design that the researchers were examining gender bias in hiring.” I agree this might be a potential concern.

    But they did have data addressing that issue. As noted in the Supporting Information, “when a subset of 30 respondents was asked to guess the hypothesis of the study, none suspected it was related to applicant gender.” Many of those surveyed did think the study was about hiring biases for “analytic powerhouses” or “socially-skilled colleagues.” But not about gender biases, specifically. In fact, these descriptors were added to mask the true purpose of the study. And importantly, the gendered descriptors were counter-balanced.

    The fifth experiment also addresses this concern by presenting raters with only one applicant. This methodological feature meant that raters couldn’t compare different applicants and then infer that the study was about gender bias. A female preference was still found even in this setup that more closely matched the earlier 2012 PNAS study.


    You note scientists hire based on CVs, not short narratives. Do the results extend to evaluation of CVs?

    There’s some evidence they do. From Experiment 4.

    In that experiment, 35 engineering professors favored women by 3-to-1.

    Could the evidence for CV evaluation be strengthened? Absolutely. With the right resources (time; money), any empirical evidence can be strengthened. That experiment with CVs could have sampled more faculty or other fields of study. But let’s also consider that this study had 5 experiments involving 873 participants, which took three years for data collection.

    Now let’s contrast the resources invested in the widely reported 2012 PNAS study. That study had 1 experiment involving 127 participants, which took two months for data collection. In other words, this current PNAS study invested more resources than the earlier one by almost 7:1 for number of participants and over 18:1 for time collecting data. The current PNAS study also replicated its findings across five experiments, whereas the earlier study had no replication experiment.

    My point is this: the available data show that the results for narrative summaries extend to CVs. Evidence for the CV results could be strengthened, but that involves substantial time and effort. Perhaps the results don’t extend to evaluation of CVs in, say, biology. But we have no particular reason to suspect that.

    You raise a valuable point, though, that we should be cautious about generalizing from studies of hypothetical scenarios to real-world outcomes. So what do the real-world data show?

    Scientists prefer *actual* female tenure-track applicants too. As I’ve noted elsewhere, “the proportion of women among tenure-track applicants increased substantially as jobseekers advanced through the process from applying to receiving job offers.”

    This real-world preference for female applicants may come as a surprise to some. You wouldn’t learn about these real-world data by reading the introduction or discussion sections of the 2012 PNAS study, for instance.

    That paper’s introduction section does acknowledge a scholarly debate about gender bias. But it doesn’t discuss the data that surround the debate. The discussion section makes one very brief reference to correlational data, but is silent beyond that.

    Feeling somewhat unsatisfied with the lack of discussion, I was eager to hear what those authors had to say about those real-world data in more depth. So I talked with that study’s lead author, Corinne Moss-Racusin, in person after her talk at a social psychology conference in 2013.

    She acknowledged knowing about those real-world data, but quickly dismissed them as correlational. She had a fair point. Correlational data can be ambiguous. These ambiguous interpretations are discussed at length in the Supporting Information for the most recent PNAS paper.

    Unfortunately, however, I’ve found that dismissing evidence simply because it’s “correlational” can stunt productive discussion. In one instance, an academic journal declined to even send a manuscript of mine out for peer review “due to the strictly correlational nature of the data.” No specific concerns were mentioned, other than the study being merely “correlational.”

    Moss-Racusin’s most recent paper on gender bias pretends that a scholarly debate doesn’t even exist. Her most recent paper cites an earlier paper by Ceci and Williams, but only to say that “among other factors (Ceci & Williams, 2011), gender bias may play a role in constraining women’s STEM opportunities.”

    Failing to acknowledge this debate prevents newcomers to this conversation from learning about the real-world, “correlational” data. All data points should be discussed, including both the earlier and new PNAS studies on gender bias. The real-world data, no doubt, have ambiguity attached to them. But they deserve discussion nevertheless.


    Peer review is a cornerstone of producing valid science. But was the peer review process rigorous in this case? I have some knowledge on that.

    I’ve talked at some length with two of the seven anonymous peer reviewers for this study. Both of them are extremely well respected scholars in my field (psychology), but had very different takes on the study and its methods.

    One reviewer embraced the study, while the other said to reject it. This is common in peer review. The reviewer recommending rejection echoed your concern that raters might guess the purpose of the study if they saw two men and one woman as applicants.

    You know what Williams and Ceci did to address that concern? They did another study.

    Enter data, stage Experiment 5.

    That experiment more closely resembled the earlier 2012 PNAS paper and still found similar results by presenting only one applicant to each rater. These new data definitely did help assuage the critical reviewer’s concerns.

    That reviewer still has a few other concerns. For instance, the reviewer noted the importance of “true” audit studies, like Shelley Correll’s excellent work on motherhood discrimination. However, a “true” audit study might be impossible for the tenure-track hiring context because of the small size of academia.

    The PNAS study was notable for having seven reviewers because the norm is two. The earlier 2012 PNAS study had two reviewers. I’ve reviewed for PNAS myself (not on a gender bias study). The journal published that study with only myself and one other scholar as the peer reviewers. The journal’s website even notes that having two reviewers is common at PNAS.

    So having seven reviewers is extremely uncommon. My guess is that the journal’s editorial board knew that the results would be controversial and therefore took heroic efforts to protect the reputation of the journal. PNAS has come under fire by multiple scientists who repeatedly criticize the journal for letting studies simply “slip by” and get published because of an old boy’s network.

    The editorial board probably knew that would be a concern for this current study, regardless of the study’s actual methodological strengths. This suspicion is further supported by some other facts about the study’s review process.

    External statisticians evaluated the data analyses, for instance. This is not common. Quoting from the Supporting Information, “an independent statistician requested these raw data through a third party associated with the peer review process in order to replicate the results. His analyses did in fact replicate these findings using R rather than the SAS we used.”

    Now I embrace methodological scrutiny in the peer review process. Frankly, I’m disappointed when I get peer reviews back and all I get is “methods were great.” I want people to critique my work! Critique helps improve it. But the scrutiny given to this study seems extreme, especially considering all the authors did to address the concerns such as collecting data for a fifth experiment.

    I plan on independently analyzing the data myself, but I trust the integrity of the analyses based on the information that I’ve read so far.


    Bloggers have brought up valid methodological concerns about the new PNAS paper. I am impressed with the time and effort put into producing detailed posts such as yours. However, my overall assessment is that these methodological concerns are not persuasive in the grand scheme. But other scholars may disagree.

    So that’s my take on the methods. I welcome your thoughts in response. I doubt this current study will end debate about sex bias in science. Nor should it. We still have a lot to learn about what contexts might undermine women.

    But the current study’s diverse methods and robust results indicate that hiring STEM faculty is likely not one of those contexts.

    Disclaimer: Ceci was the editor of a study I recently published in Frontiers in Psychology. I have been in email conversation with Williams and Ceci, but did not send them a draft of this comment before posting. I was not asked by them to write this comment.

  2. Pingback: This Week’s Good Reads: Bad Statistics, Changes in Media, and Gender Bias Controversy Continues | The UnderStory

Leave a Reply

Your email address will not be published. Required fields are marked *