Paper Summary: This paper investigates gender stereotypes in a large set of language models in English and in several Slavic languages. Unlike earlier works which investigate gender stereotypes in say occupations (e.g. a nurse is translated as a female nurese), this purports to look at root stereotypes in the language model (e.g. Women are beautiful, Men are leaders). To do this, the authors curate with the help of sociologist a set of stereotypes and produce parallel sentences in gender ambiguous sentences (in English) with gendered translations in several slavic languages. They introduce a set of metrics to measure the degree of gender stereotyping. As part of their experiments, they measure gender stereotypes in MLM and generative models for English and also in translations of english segments. They also offer some advice for how to extend this to additional languages and directions. Summary Of Strengths: The experiments performed cover a wide variety of modeling tasks and design a unique way to measure gender stereotypes in the MLM as well as in generative models. Summary Of Weaknesses: While they highlight stereotypical assumptions of genders, the issue of its effect on downstream tasks isn't really discussed. For instance, are gendered translations worse for these models when translating. Is co-reference resolution performance correlated to stereotypical gender assumptions?