Male-female character ratio in books 4:1, says USC research

New Delhi: Representation of male characters in books is four times more than female ones, claims a new AI(Artificial Intelligence) aided study that examined more than 3,000 English-language works published from 1800 to 1950.

Researchers at the USC Viterbi School of Engineering made use of artificial intelligence to scan books ranging from science fiction and adventure, to mystery and romance - across short stories, poetry, and novels.

The study outlined several methods for defining female prevalence in the literature. The researchers utilised named entity recognition, a prominent natural language processing method used to extract gender-specific characters.

The brain behind the research

"One of the ways we define this is through looking at how many female pronouns are in a book compared to male pronouns," said Mayank Kejriwal, a research lead at USC's Information Sciences Institute. Another technique is to quantify how many female characters are the main characters in it.

How the research was conducted

In the study, the differences between male and female characters' prevalence were determined using three robust measures of prevalence, on a corpus of copyright-expired literary texts from the Project Gutenberg English-language corpus.

Using computationally replicable methodologies relying on modern natural language processing tools, it was found that female character prevalence is significantly lower than that of male character prevalence, although the difference declines (while still being significant) when controlling for the gender of the author.

It was also found that male character ratios have not varied much over time in the sample.

Akarsh Nagaraj, a co-author of the study, noted the importance of how their methods and the study's findings imparted them a greater understanding of biases in society and their implications.

There was a 4:1 ratio of male versus female main characters, the study found.

More negative terms associated with female characters, according to study

There were also more negative terms used in connection to the female characters such as 'weak' and 'stupid' compared to 'strong' and 'power' for men, according to the study.

On average, there are 32 (unique) male and eight female characters per male-authored book compared to 38 male and 21 female characters in female-authored books.

The female character prevalence did not change much over the years from 1800 to 1950, the study said.

The authors say research has continued to shed light on the extent and significance of gender disparity in social, cultural, and economic spheres, and more recently, computational tools from the natural language processing literature have been proposed for measuring such disparity using relatively extensive datasets and empirically rigorous methodologies.

They said in the study, they contribute to this line of research by studying gender disparity, at scale, in copyright-expired literary texts published in the pre-modern period (defined in this work as the period ranging from the mid-19th through the mid-20th century).

Challenges in the study

According to them, one of the challenges in using such tools is to ensure quality control, and by extension, trustworthy statistical analysis.

Another challenge is in using materials and methods that are publicly available and have been established for some time, both to ensure that they can be used and vetted in the future and also, to add confidence to the methodology itself, they added.