The Representation of Gender in NIF's Data Holdings (repost)
The Representation of Gender in NIF's Data Holdings
- Click the link to go to the full PDF document, which includes at the end an interactive chart.
The following is the full document, complete with images.
The Neuroscience Information framework indexes 150 individual databases deeply, meaning that it exposes data from those databases and data sets. These include large aggregators of data such as the model organism databases (mouse genome informatics, MGI) and small data sets such as the 1,000 functional connectomes. These holdings break down to approximately 350 million individual data records, most of which are tagged and aligned to some extent to a structured vocabulary.
In the first pass, the term male was searched exclusively. Of the 350 million total records in NIF’s holdings, the search for male reveals that there are 159 million records that mention the term male as well as 5.9 million articles: http://neuinfo.org/nif/nifgwt.html?query=male
Searching female exclusively reveals that 127 million records and 5.9 million articles mention the term female: http://neuinfo.org/nif/nifgwt.html?query=female
In order to break down some of these findings, we can examine both the literature and the data results and compare the prevalence of male versus female. All raw data and numbers are available in the attached appendix.
In most cases, the data labeled with either male or female are indeed data gathered from that particular gender of animal. However, it should be noted that there are gene names or phenotypic descriptions in this set that include the term male such as “gene function required for the development of male germ cells” or the male-specific lethal gene. Currently we cannot easily exclude gene names from the search results, so an interpretation of the following data should be treated with some caution as not all of the results are specific to an organism that is male or an organism that is female.
Below are pie charts that visually represent the data collected for queries performed on NIF data records and literature (Figure 1). The results of these queries are separated into records and papers that returned male (only male with no mention of female in the paper or data record, blue color), female (only female with no mention of male, red color), and both male and female (green color). These charts suggest that data records in which a gender is recorded deal exclusively with males approximately 55% of the time while the literature deals with males and females together approximately 45% of the time, favoring males alone slightly over females. For a full set of numbers corresponding to these charts please see the figure 8 or the attached excel spreadsheet.
In figures two and three, we have extracted only the records and publications that deal with humans or animals and have found that a somewhat different picture emerges. Human data records follow the trend described above with a slight male bias in the data records and literature. In the animal data, the bias toward studying males is quite strong. It appears that about half of the papers in the animal literature study males exclusively and the other half study either females or both males and females. Individual data records show that about one-sixth pertain to male, one-sixteenth pertain exclusively to females and more than three-fourths of the individual data records pertain to both males and females. This implies that, in most cases, data records in animals cannot be reliably traced back to a particular gender.
Further analysis of the animal literature and data records as demonstrated in figures four and five reveals that mouse researchers generally do not keep track of gender and rat researchers largely study males.
Breaking down the literature more granularly allows us to generate the following graphs (Figures 6 and 7). These graphs were generated by searching gopubmed.org. It is worth noting that, before open access literature, we had almost no data about the gender of subjects being studied.
Figure 6: Published papers from 1998-2012 that contain the keyword male
Figure 7: Published papers from 1998-2012 that contain the keyword female
Figure 8: Heat map of directed search queries into NIF’s data holdings broken down by source and query. Each box contains the number of results for each search executed in NIF; these numbers were used to construct the figures above. For a fully interactive heat map please see the attached excel sheet. Clicking on the column names should lead to a general search of the NIF data; each row header is labeled with a database name and links to a descriptive page about that database. Clicking on all green cells should execute a search against a specific database.
(Go to the PDF to access a live, clickable heatmap