Neglected 95% Dashboard

Waffle plot of journal paper percentages, by continent (each square = 1% of data)

Table of journal paper percentages, by continent

1 records

	Papers	North America	Europe	Asia	Oceania	Latin America	Africa	Missing*
1	13881	74.15	15.61	6.53	2.87	0.19	0.64	27.15

Representativity of First Authors in Psychology

A large proportion of first authors in psychology are located in North America or Europe, mostly in the US (Thalmayer et al., 2021, Arnett, 2008). In this dashboard, I present some aggregated data by continent, country, year, and journal (for first authors only).

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

The data from this report include information about publications from one of six psychology journals (Developmental Psychology, Journal of Personality and Social Psychology, Journal of Abnormal Psychology, Journal of Family Psychology, Health Psychology, and Journal of Educational Psychology) for years 2008 to 2023. They include information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the easyPubMed package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values.

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, and (d) find a way to use country flags for the countries-by-journal figure.

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Scatter plot of journal paper percentages, by continent and year

Table of journal paper percentages, by continent

Show entries

Search:

	Year	Papers	North America	Europe	Asia	Oceania	Latin America	Africa	Missing*
1	2023	295	55.25	21.69	14.58	3.39	1.36	3.73	30.91
2	2022	260	43.08	33.85	10.77	4.62	0.77	6.92	57.65
3	2021	544	56.07	26.65	9.56	3.31	0.74	3.68	35.85
4	2020	283	58.66	22.97	7.07	3.18	0	8.13	55.57
5	2019	418	79.19	11.72	6.46	2.39	0	0.24	47.42
6	2018	522	75.48	15.33	5.56	3.45	0	0.19	26.69
7	2017	480	74.58	17.08	4.79	3.54	0	0	32.68
8	2016	452	81.42	11.73	3.1	3.76	0	0	35.34
9	2015	365	74.52	17.81	4.11	3.29	0	0.27	37.5
10	2014	478	70.5	22.38	4.81	1.88	0.21	0.21	28.66

Showing 1 to 10 of 37 entries

Previous1 2 3 4Next

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Scatter plot of journal paper percentages, by continent and year

Table of journal paper percentages, by continent

Show entries

Search:

	Year	Papers	North America	Europe	Asia	Oceania	Latin America	Africa	Missing*
1	2023	295	55.25	21.69	14.58	3.39	1.36	3.73	30.91
2	2022	260	43.08	33.85	10.77	4.62	0.77	6.92	57.65
3	2021	544	56.07	26.65	9.56	3.31	0.74	3.68	35.85
4	2020	283	58.66	22.97	7.07	3.18	0	8.13	55.57
5	2019	418	79.19	11.72	6.46	2.39	0	0.24	47.42
6	2018	522	75.48	15.33	5.56	3.45	0	0.19	26.69
7	2017	480	74.58	17.08	4.79	3.54	0	0	32.68
8	2016	452	81.42	11.73	3.1	3.76	0	0	35.34
9	2015	365	74.52	17.81	4.11	3.29	0	0.27	37.5
10	2014	478	70.5	22.38	4.81	1.88	0.21	0.21	28.66

Showing 1 to 10 of 37 entries

Previous1 2 3 4Next

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data)

Table of journal paper percentages, by continent and journal

Show entries

Search:

	Journal	Papers	North America	Europe	Asia	Oceania	Latin America	Africa	Missing*
1	Developmental Psychology	3079	68.56	19.78	7.79	2.99	0.26	0.62	20.19
2	Health Psychology	2722	76.89	15.32	3.89	3.23	0.18	0.48	19.56
3	Journal of Abnormal Psychology	2240	78.66	12.59	4.64	3.53	0.13	0.45	49.64
4	Journal of Educational Psychology	122	90.16	4.1	4.1	1.64	0	0	50.81
5	Journal of Family Psychology	1643	77.24	10.65	7.79	3.16	0.18	0.97	25.42
6	Journal of Personality and Social Psychology	4075	72.34	16.66	7.95	2.11	0.17	0.76	17.06

Showing 1 to 6 of 6 entries

Previous1Next

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Waffle plot of journal paper percentages, by country (each flag = 1% of data)

Table of journal paper percentages, by country

Show entries

Search:

	Country	Country Code	Papers	Percentage
1	Missing*		5152	27.04
2	United States	US	9398	67.7
3	Canada	CA	894	6.44
4	United Kingdom	GB	602	4.34
5	Netherlands	NL	442	3.18
6	Germany	DE	337	2.43
7	Australia	AU	303	2.18
8	Israel	IL	178	1.28
9	Switzerland	CH	145	1.04
10	United Arab Emirates	AE	144	1.04

Showing 1 to 10 of 87 entries

Previous1 2 3 4 5…9Next

* Percentages are calculated after excluding missing values. The Missing row shows the real percentage of missing values.

Scatter plot of journal paper percentages, by country and year

Table of journal paper percentages, by country and year

Show entries

Search:

	Year	Country	Papers	Percentage
1	2023	United States	142	48.14
2	2023	Missing*	132	30.91
3	2023	Canada	21	7.12
4	2023	United Kingdom	13	4.41
5	2023	Switzerland	11	3.73
6	2023	China	10	3.39
7	2023	Italy	8	2.71
8	2023	Australia	7	2.37
9	2023	Germany	6	2.03
10	2023	Hong Kong SAR China	6	2.03

Showing 1 to 10 of 985 entries

Previous1 2 3 4 5…99Next

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data)

Table of journal paper percentages, by country and journal

Show entries

Search:

	Journal	Country	Papers	Percentage
1	Journal of Personality and Social Psychology	United States	2637	64.62
2	Journal of Personality and Social Psychology	Missing*	832	16.93
3	Journal of Personality and Social Psychology	Canada	311	7.62
4	Journal of Personality and Social Psychology	Netherlands	167	4.09
5	Journal of Personality and Social Psychology	Germany	161	3.95
6	Journal of Personality and Social Psychology	United Kingdom	117	2.87
7	Journal of Personality and Social Psychology	United Arab Emirates	90	2.21
8	Journal of Personality and Social Psychology	Israel	68	1.67
9	Journal of Personality and Social Psychology	Australia	57	1.4
10	Journal of Personality and Social Psychology	Philippines	56	1.37

Showing 1 to 10 of 297 entries

Previous1 2 3 4 5…30Next

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Figure 1, Proportion of American First Authors 1987-Today

This table allows investigating why the country/university could not be identified

Show entries

Search:

	journal	year	university_old	department	address	lastname	firstname	month	day	jabbrv	title	doi	pmid	abstract	date
1	Health Psychology	2017			École de psychologie, Université Laval.	Jarrin	Denise C	06	20	Health Psychol	Nocturnal heart rate variability in patients treated with cognitive-behavioral therapy for insomnia.	https://doi.org/10.1037/hea0000347	27054300	Insomnia and reduced heart rate variability (HRV) increase the risk of cardiovascular disease and its precursors; thus, it is important to evaluate whether treatment for insomnia provides cardiovascular safeguards. The present study aimed to evaluate potential cardiovascular benefits of cognitive behavioral therapy for insomnia (CBT-I). The present study included 65 patients treated for chronic insomnia (M = 51.8 years, SD = 10.0; 66.2% female) at a university hospital. Patients received CBT-I o	2017-06-20
2	Journal of Family Psychology	2022	107 Family Science Center II	107 Family Science Center II	107 Family Science Center II	O'Neal	Catherine Walker	01	31	J Fam Psychol	Latent profiles of postdeployment reintegration among service members and their partners.	https://doi.org/10.1037/fam0000894	34138591	Consistent with the emotional cycle of deployment, postdeployment reintegration is often a time of highs and lows as service members (SMs) and their families adjust to their new normal. However, few studies have considered the nuances of reintegration, specifically the various patterns of personal and family reintegration experiences that may exist. The present study uses latent profile analysis to identify unique reintegration patterns along four dimensions (i.e., positive personal, negative pe	2022-01-31
3	Developmental Psychology	2001			72073,1211@compuserve,com	Harris	J R	02	22	Dev Psychol	Socialization, personality development, and the child's environments: comment on Vandell (2000).	https://doi.org/10.1037//0012-1649.36.6.711	11081695	Although many socialization agents influence children's behavior (D. L. Vandell, 2000), the evidence (e.g., from intervention studies) indicates that each exerts its influence only within its own domain. Context effects and genetic effects are among the confounding factors that make it impossible, given current data, to reject the null hypothesis of zero long-term effects of parenting on child outcomes. Problems with the prevailing view of development cannot be solved by invoking within-home env	2001-02-22
4	Journal of Family Psychology	2019			Affirmations Psychological Services.	Hernandez-Kane	Krystal M	09	13	J Fam Psychol	Sex through a sacred lens: Longitudinal effects of sanctification of marital sexuality.	https://doi.org/10.1037/fam0000392	29723002	Research attending to the role of religion and spirituality in enhancing sexuality in marriage is virtually absent. In response to this scarcity, this longitudinal study examined the sanctification of marital sexuality among newly married, heterosexual individuals (N = 67; married 4-18 months at Time 1). Greater sanctification of marital sexuality early in the marriage predicted more frequent sexual intercourse, sexual satisfaction, and marital satisfaction 1 year later for individual respondent	2019-09-13
5	Health Psychology	2016	Annenberg School for Communication.	Annenberg School for Communication.	Annenberg School for Communication.	Zhang	Jingwen	07	20	Health Psychol	Mediation and moderation of an efficacious theory-based abstinence-only intervention for African American adolescents.	https://doi.org/10.1037/hea0000244	26214076	This secondary data analysis sought to determine what mediated reductions in self-reported sexual initiation over the 24-month postintervention period in early adolescents who received "Promoting Health among Teens," a theory-based, abstinence-only intervention (Jemmott, Jemmott, & Fong, 2010). African American Grade 6 and 7 students at inner-city public middle schools were randomized to 1 of 5 interventions grounded in social-cognitive theory and the theory of reasoned action: 8-hr abstinen	2016-07-20

Showing 1 to 5 of 5,152 entries

Previous1 2 3 4 5…1,031Next

---
title: "Neglected 95% Dashboard"
author: "Rémi Thériault"
output:
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social: menu
    source_code: embed
    # theme: lumen
    storyboard: false
params:
  API_TOKEN_PUBMED: API_TOKEN_PUBMED
---

```{r setup, include=FALSE}
library(flexdashboard)
query_pubmed <- FALSE
```

```{r API_TOKEN_PUBMED, eval=query_pubmed}
API_TOKEN_PUBMED <- keyring::key_get("pubmed", "rempsyc")
# API_TOKEN_PUBMED <- Sys.getenv("API_TOKEN_PUBMED")
# API_TOKEN_PUBMED <- params$API_TOKEN_PUBMED

if (API_TOKEN_PUBMED == "") stop("API_TOKEN_PUBMED is an empty string. Terminating workflow.")

if (nchar(API_TOKEN_PUBMED) != 36) stop("API_TOKEN_PUBMED is not 36 characters-long. Terminating workflow.")

```

```{r packages}
# Load packages
library(extrafont)
library(easyPubMed)
library(dplyr)
library(maps)
library(purrr)
library(stringr)
library(stringi)
library(DT)
library(fuzzyjoin)
library(countrycode)
library(Ecfun)
library(tidyr)
library(waffle)
library(plotly)
library(ggplot2)
library(ggflags)
library(knitr)
library(lubridate)
library(xts)
library(tibble)
library(dygraphs)
library(rempsyc)
library(RColorBrewer)
library(tools)
library(pubmedDashboard)

```

```{r batch_pubmed_download, results='hide', eval=query_pubmed}
# Download data
d.fls <- batch_pubmed_download(
  pubmed_query_string = paste(
    "Developmental Psychology [Journal]",
    "OR Journal of Personality and Social Psychology [Journal]",
    "OR Journal of Abnormal Psychology [Journal]",
    "OR Journal of Family Psychology [Journal]",
    "OR Health Psychology [Journal]",
    "OR Journal of Educational Psychology [Journal]",
    "AND ('2020/01/01'[Date - Publication] : '3000/12/31'[Date - Publication])" 
  ),
  dest_file_prefix = "data/easyPubMed_data_",
  api_key = API_TOKEN_PUBMED,
  batch_size = 5000)

```

```{r all_articles_to_df, eval=query_pubmed}
source("easypubmed2.R")

# Convert XLM data to a data frame of first authors
# articles.df <- table_articles_byAuth(d.fls, included_authors = "first")
all_articles_to_df <- function(d.fls){
  y <- lapply(seq_along(d.fls), function(x) {
    list.articles <- articles_to_list(d.fls[x])
    list.articles.df <- lapply(list.articles, article_to_df2)
    articles.df <- do.call(rbind, list.articles.df)
  })
  do.call(rbind, y)
}

articles.df <- all_articles_to_df(d.fls)

articles.df <- articles.df %>% 
  filter(!duplicated(pmid))

```

```{r address_split, message=FALSE, warning=FALSE, eval=query_pubmed}
# Split address in university and department
articles.df$address <- gsub(".", " ", articles.df$address, fixed = TRUE)
addr.split <- str_split(articles.df$address, ",")

split_address <- function(addr.split, string) {
  ind1 <- purrr::map(addr.split, ~which(grepl(string, .x)))
  ind1 <- imap(addr.split, \(x, idx) pluck(ind1[[idx]])[1])
  ind1 <- imap(addr.split, \(x, idx) pluck(x, ind1[[idx]]))
  ind1 <- trimws(as.character(ind1))
  ind1 <- replace(ind1, ind1 == "NULL", NA)
}

string.dep <- "Department|Departamento|Departament|Departement|Faculty|Center|School|Unit|Institute|Institut|Centre|Division|Unidad"
dep <- split_address(addr.split, string.dep)

string.uni <- "University"
uni <- split_address(addr.split, string.uni)

string.uni <- "University|Université|Universite|Universitat|Universität|Universiteit|College|School|Institute|Institut|Center|Centre|CEMIC, CONICET|CNRS|INSEAD"
uni2 <- split_address(addr.split, string.uni)
uni3 <- ifelse(!is.na(uni), uni, uni2)

articles.df <- articles.df %>%
  mutate(department = dep,
         university = uni3)

articles.df <- articles.df %>%
  select(journal, year, university, department, address, lastname, firstname,
         month, day, jabbrv, title, doi, pmid, abstract)
```

```{r correct_universities, eval=query_pubmed}
# Correct a few university names manually
articles.df <- articles.df %>%
  mutate(university = case_when(
    university == "Stony Brook University" ~ "University of Colorado at Colorado Springs",
    university == "Technion-Israel Institute of Technology" ~ "Technion - Israel Institute of Technology",
    university == "University of Montreal" ~ "Université de Montréal",
    university == "Philipps-University of Marburg" ~ "Phillips-Universität Marburg",
    university == "University of Wisconsin-Madison" ~ "University of Wisconsin - Madison",
    TRUE ~ university
  ))

```

```{r countries_data, eval=query_pubmed}
# Download university + country data
countries <- read.csv(
  "https://raw.githubusercontent.com/endSly/world-universities-csv/master/world-universities.csv",
  header = FALSE)
names(countries) <- c("country_code", "university", "website")
countries <- countries[1:2]

# Correct/add a few university countries manually
countries <- countries %>% 
  mutate(
    country_code = replace(country_code, 
                           university == "University of the Netherlands Antilles, Curacao", "CW"),
    country_code = replace(country_code, 
                           university == "University of Sint Eustatius School of Medicine", "SX"),
    country_code = replace(country_code, 
                           university == "St.James's School of Medicine, Bonaire" |
                             university == "American University of the Caribbean, Sint Maarten" |
                             university == "International University School of Medicine (IUSOM)", "BQ")) %>% 
  add_row(country_code = "AR", university = "CEMIC, CONICET") %>%
  add_row(country_code = "AU", university = "Institute for Positive Psychology and Education") %>%
  add_row(country_code = "AU", university = "Melbourne School of Psychological Sciences") %>%
  add_row(country_code = "AT", university = "University of Vienna") %>%
  add_row(country_code = "BE", university = "University of Liege") %>%
  add_row(country_code = "BR", university = "Institute D'Or for Research and Teaching") %>%
  add_row(country_code = "CA", university = "Montreal Behavioural Medicine Centre (MBMC)") %>%
  add_row(country_code = "CA", university = "University of Quebec") %>%
  add_row(country_code = "CA", university = "University of Montreal") %>%
  add_row(country_code = "CA", university = "Universite Laval") %>%
  add_row(country_code = "CH", university = "Jacobs Center for Productive Youth Development") %>%
  add_row(country_code = "CH", university = "University of Zurich") %>%
  add_row(country_code = "DE", university = "University of Tubingen") %>%
  add_row(country_code = "DE", university = "Max Planck Institute for Human Development") %>%
  add_row(country_code = "DE", university = "Max Planck Institute") %>%
  add_row(country_code = "DE", university = "Humboldt University") %>%
  add_row(country_code = "DE", university = "Heidelberg University") %>%
  add_row(country_code = "DE", university = "University of Cologne") %>%
  add_row(country_code = "DE", university = "University of Bonn") %>%
  add_row(country_code = "DE", university = "University of Münster") %>%
  add_row(country_code = "DE", university = "University of Tübingen") %>%
  add_row(country_code = "DE", university = "University of Göttingen") %>%
  add_row(country_code = "DE", university = "Heidelberg University") %>%
  add_row(country_code = "DE", university = "Leipzig University") %>%
  add_row(country_code = "FR", university = "CNRS") %>%
  add_row(country_code = "FR", university = "INSEAD") %>% 
  add_row(country_code = "GB", university = "Manchester Centre for Health Psychology") %>%
  add_row(country_code = "GB", university = "York St John University") %>%
  add_row(country_code = "GB", university = "Moray House School of Education and Sport") %>%
  add_row(country_code = "GB", university = "Cambridge ") %>%
  add_row(country_code = "KR", university = "SKK Graduate School of Business") %>%
  add_row(country_code = "KR", university = "Sungkyunkwan University") %>%
  add_row(country_code = "NL", university = "Vrije Universiteit Amsterdam") %>%
  add_row(country_code = "SG", university = "Yale-NUS College") %>%
  add_row(country_code = "SG", university = "Lee Kong Chian School of Business") %>%
  add_row(country_code = "US", university = "University of Colorado") %>%
  add_row(country_code = "US", university = "Stony Brook University") %>%
  add_row(country_code = "US", university = "Rutgers Business School-Newark and New Brunswick") %>%
  add_row(country_code = "US", university = "Columbia Business School") %>%
  add_row(country_code = "US", university = "Office of Population Research") %>%
  add_row(country_code = "US", university = "School of Family Life") %>%
  add_row(country_code = "US", university = "Tepper School of Business") %>%
  add_row(country_code = "US", university = "Stephen M. Ross School of Business") %>%
  add_row(country_code = "US", university = "Fuqua School of Business") %>%
  add_row(country_code = "US", university = "Jones Graduate School of Business") %>%
  add_row(country_code = "US", university = "Questrom School of Business") %>%
  add_row(country_code = "US", university = "Booth School of Business") %>%
  add_row(country_code = "US", university = "Annenberg School for Communication and Journalism") %>%
  add_row(country_code = "US", university = 
            "National Center for Posttraumatic Stress Disorder at VA Boston Healthcare System") %>%
  add_row(country_code = "US", university = "Cincinnati Children's Hospital Medical Center") %>%
  add_row(country_code = "US", university = "University of Wisconsin") %>%
  add_row(country_code = "US", university = "Stanford") %>%
  add_row(country_code = "VN", university = "SWPS University of Social Sciences and Humanities")

# Custom function for two-sided vlookup
partial_vlookup <- function(pattern, lookup_vector) {
  out <- map_chr(pattern, \(x) {
    out <- grep(x, lookup_vector, value = TRUE, fixed = TRUE)[1]
    if (is.na(out)) {
      out <- rgrep(lookup_vector, x, value = TRUE, fixed = TRUE)[1]
    }
    out
    })
  out
}

# Custom function for two-sided vlookup
partial_vlookup <- function(pattern, lookup_vector) {
  out <- map_chr(pattern, \(x) {
    out <- grep(x, lookup_vector, value = TRUE, fixed = TRUE)[1]
    if (is.na(out)) {
      out <- rgrep(lookup_vector, x, value = TRUE, fixed = TRUE)[1]
    }
    out
    })
  out
}

```

```{r partial_vlookup, eval=query_pubmed}
# Match universities and countries
articles.df2 <- articles.df %>%
  mutate(
    university_old = university,
    university = partial_vlookup(university, countries$university),
    university = ifelse(is.na(university), partial_vlookup(address, countries$university), university),
    .after = university)

articles.df3 <- articles.df2 %>%
  left_join(countries, by = "university", multiple = "first") %>% 
  relocate(country_code, .after = year)

```

```{r country_code_conversion, eval=query_pubmed}
list.countries <- c(unique(countryname_dict$country.name.en), "USA", "Korea", "UK", "Scotland")
cd <- get_dictionary("us_states")
data(world.cities)

get_country <- function(address, list.countries) {
  # Get country from countrycode
  country <- countrycode(address, "country.name", "country.name", warn = TRUE)
  if (is.na(country)) { # Get country from countrycode list of countries
    country <- rgrep(toTitleCase(tolower(list.countries)), 
                     toTitleCase(tolower(address)), 
                     value = TRUE, fixed = TRUE)[1]
    country <- case_match(country,
                          "Scotland" ~ "UK",
                          .default = country)
    if (is.na(country)) { # Get country from countrycode list of US states
      state <- rgrep(toTitleCase(tolower(cd$state.name)), 
                     toTitleCase(tolower(address)), 
                     value = TRUE, fixed = TRUE)[1]
      country <- ifelse(state %in% cd$state.name, "USA", NA)
      if (is.na(country)) { # Get country from list of US states (abbreviations)
        state <- rgrep(cd$state.abb, address, value = TRUE, fixed = TRUE)[1]
        country <- ifelse(state %in% cd$state.abb, "USA", NA)
        if (is.na(country)) { # Get country from city name
          state <- rgrep(world.cities$name, address, value = TRUE, fixed = TRUE)[1]
          country <- ifelse(is.na(state), state, world.cities[world.cities$name == state, "country.etc"])
        }
        }
      }
    if (!is.na(country)) {
      country <- countrycode(country, "country.name", "country.name", warn = TRUE)
    }
    }
  country
}

# Get full name country, continent, and region
articles.df4 <- articles.df3 %>% 
  rowwise() %>% 
  mutate(country = countrycode(country_code, "genc2c", "country.name"),
         country = ifelse(is.na(country), 
                          get_country(address, list.countries), 
                          country),
         country_code = ifelse(is.na(country_code), 
                               countrycode(country, "country.name", "genc2c"), 
                               country_code),
         region = countrycode(country_code, "genc2c", "un.regionsub.name"),
         continent = countrycode(country_code, "genc2c", "continent"),
         continent = case_when(continent == "Americas" ~ region,
                               TRUE ~ continent),
         doi = paste0("https://doi.org/", doi),
         .after = country_code) %>% 
  mutate(date = paste(year, month, day, sep = "-"),
         date = as_date(date)) %>% 
  ungroup()

saveRDS(articles.df4, "data/articles.df4.rds")
```

# Continent

## Column 1 {data-width=2150}

### Waffle plot of journal paper percentages, by continent (each square = 1% of data) {data-height=600}

```{r recover_existing_data, eval=!query_pubmed}
articles.df4 <- readRDS("data/articles.df4.rds")
```

```{r get_historic_data}
articles_2010_2019 <- readRDS("data/articles_2010_2019.rds")
articles_2000_2009 <- readRDS("data/articles_2000_2009.rds")
articles_1990_1999 <- readRDS("data/articles_1990_1999.rds")
articles_1980_1989 <- readRDS("data/articles_1980_1989.rds")
articles_1970_1979 <- readRDS("data/articles_1970_1979.rds")
articles_1960_1969 <- readRDS("data/articles_1960_1969.rds")
articles_1950_1959 <- readRDS("data/articles_1950_1959.rds")
articles_1940_1949 <- readRDS("data/articles_1940_1949.rds")

articles.df4_original <- articles.df4 %>% 
  bind_rows(articles_2010_2019, articles_2000_2009, articles_1990_1999,
            articles_1980_1989, articles_1970_1979, articles_1960_1969,
            articles_1950_1959, articles_1940_1949) %>% 
  #extract_duplicates("pmid")
  best_duplicate("pmid")

# We filter for year 1987 because there are almost no publications before that
# And remove european journal of health of health psychology because we did not 
# include it within the search
articles.df4 <- articles.df4_original %>%
  filter(year >= 1987,
         journal != "European journal of health psychology")

```

```{r continent_waffle_overall}
continent.order <- c("Northern America", "Europe", "Asia", "Oceania", "Latin America and the Caribbean", "Africa")
continent.order.short <- c("North America", "Europe", "Asia", "Oceania", "Latin America", "Africa")

articles.df4 <- articles.df4 %>% 
  mutate(continent = factor(continent, levels = continent.order),
         journal = gsub(":.*", "", journal),
         journal = toTitleCase(journal),
         journal = trimws(journal))

saveRDS(articles.df4, "data/fulldata.rds")

df.continent <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Asia = sum(continent == "Asia")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            Missing = first(missing),
            ) %>% 
  mutate(across(`North America`:Missing, ~ .x * 100)) # %>% 
  # mutate(across(`North America`:Missing, ~ round(.x, 2))) %>% 
  # rename_with(str_to_title) %>%
  # rename(`Missing*` = Missing) %>% 
  # datatable(#extensions = 'Responsive',
  #           options = list(searching = FALSE, paging = FALSE),
  #           caption = "Journal paper percentages, by continent")

# data.waffle <- set_names(as.numeric(t(df.continent[2:7])), names(df.continent[2:7]))

data.waffle <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(continent) %>% 
  add_count(name = "Papers") %>% 
  ungroup() %>% 
  mutate(nrow = n()) %>%
  count(continent, nrow, sort = TRUE, name = "Papers") %>% 
  mutate(continent = case_match(
    continent,
    continent.order[1] ~ continent.order.short[1],
    continent.order[5] ~ continent.order.short[5],
    continent ~ continent),
    Percentage = Papers / nrow * 100) %>% 
  select(-c(nrow, Papers)) %>% 
  rename_with(str_to_title, .cols = 1)

waffle(data.waffle, legend_pos = "right") +
  theme(legend.text = element_text(size = 15)) #+ # rows = 5, 
  #coord_cartesian(ylim=c(0,2))#%>% 
  #ggplotly()

```

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table}
table1 <- df.continent  %>%
  mutate(across(`North America`:Missing, ~ round(.x, 2))) %>%
  rename_with(str_to_title) %>%
  rename(`Missing*` = Missing)

table1 %>%
  datatable(#extensions = 'Responsive',
            options = list(searching = FALSE, paging = FALSE),
            caption = "Journal paper percentages, by continent")

# saveRDS(table1, "DT_tables_host/data/table1.rds")
```

## Column 2 {.tabset .tabset-fade}

### Context

**Representativity of First Authors in Psychology**

A large proportion of first authors in psychology are located in North America or Europe, mostly in the US ([Thalmayer et al., 2021](https://psycnet.apa.org/doi/10.1037/amp0000622), [Arnett, 2008](https://doi.org/10.1037/0003-066x.63.7.602)). In this dashboard, I present some aggregated data by continent, country, year, and journal (for first authors only).

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

### Method & Data

The data from this report include information about publications from one of six psychology journals (*Developmental Psychology*, *Journal of Personality and Social Psychology*, *Journal of Abnormal Psychology*, *Journal of Family Psychology*, *Health Psychology*, and *Journal of Educational Psychology*) for years 2008 to 2023. They include information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the `easyPubMed` package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

### Missing data

Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values.

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

### Next Steps

Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, and (d) find a way to use country flags for the countries-by-journal figure.

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Continent, by Year (lm)

## Column 1 {data-width=800}

### Scatter plot of journal paper percentages, by continent and year {data-height=600}

```{r, continent_scatter_overall}
df.continent2 <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(`North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Asia = sum(continent == "Asia")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            ) %>% 
  mutate(across(2:6, ~ .x * 100)) %>% 
  arrange(year)

df.continent3 <- df.continent2 %>%
  pivot_longer(-year, names_to = "continent", values_to = "papers_percentage") %>%
  mutate(year = as.numeric(year), continent = factor(continent, levels = continent.order.short),
         papers_percentage = round(papers_percentage))

colors <- suppressWarnings(RColorBrewer::brewer.pal(length(unique(df.continent3$continent)), "Set2"))

nice_scatter(df.continent3,
             predictor = "year", 
             response = "papers_percentage", 
             group = "continent", 
             colours = colors,
             method = "lm",
             groups.order = "decreasing",
             ytitle = "% of All Papers") %>% 
  ggplotly(tooltip = c("x", "y"))

```

## Column 2

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table_journal_year}
continent.paper.missing <- articles.df4 %>% 
  group_by(year) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

table2 <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Asia = sum(continent == "Asia")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            `Missing*` = first(missing),
            ) %>% 
  mutate(`Missing*` = continent.paper.missing, # [-1]
         across(`North America`:`Missing*`, ~ round(.x * 100, 2))) %>% 
  arrange(desc(year)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and year")

table2

# saveRDS(table2, "data/table2.rds")
```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Continent, by Year (loess)

## Column 1 {data-width=800}

### Scatter plot of journal paper percentages, by continent and year {data-height=600}

```{r, continent_scatter_overall_loess}
df.continent2 <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(`North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Asia = sum(continent == "Asia")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            ) %>% 
  mutate(across(2:6, ~ .x * 100)) %>% 
  arrange(year)

df.continent3 <- df.continent2 %>%
  pivot_longer(-year, names_to = "continent", values_to = "papers_percentage") %>%
  mutate(year = as.numeric(year), continent = factor(continent, levels = continent.order.short),
         papers_percentage = round(papers_percentage))

colors <- suppressWarnings(RColorBrewer::brewer.pal(length(unique(df.continent3$continent)), "Set2"))

nice_scatter(df.continent3,
             predictor = "year", 
             response = "papers_percentage", 
             group = "continent", 
             colours = colors,
             method = "loess",
             groups.order = "decreasing",
             ytitle = "% of All Papers") %>% 
  ggplotly(tooltip = c("x", "y"))

```

## Column 2

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table_journal_year_loess}
continent.paper.missing <- articles.df4 %>% 
  group_by(year) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

table3 <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(year) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Asia = sum(continent == "Asia")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            `Missing*` = first(missing),
            ) %>% 
  mutate(`Missing*` = continent.paper.missing, # [-1]
         across(`North America`:`Missing*`, ~ round(.x * 100, 2))) %>% 
  arrange(desc(year)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and year")

table3

# saveRDS(table2, "data/table3.rds")
```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Continent, by Journal

## Column 1 {data-width=700}

### Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data) {data-height=600}

```{r continent_table_journal_figure}
df.continent5 <- articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(journal) %>% 
  summarize(`North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Asia = sum(continent == "Asia")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            ) %>% 
  mutate(across(2:6, ~ .x * 100)) %>% 
  arrange(journal)

df.continent6 <- df.continent5 %>%
  pivot_longer(-journal, names_to = "continent", values_to = "number") %>%
  mutate(continent = factor(continent, levels = continent.order.short)) 
  
df.continent6 %>%
  ggplot(aes(fill = continent, values = number)) +
  geom_waffle(color = "white", size = 0.8, na.rm = TRUE) +
  facet_wrap(~journal) +
  scale_y_continuous(expand=c(0,0)) +
  coord_equal() + 
  theme_minimal(base_size=6) +
  theme(panel.grid = element_blank(), 
        axis.text = element_blank(), 
        plot.title=element_text(hjust=0.5),
        legend.title = element_blank(),
        legend.text = element_text(size = 5)) +
   ggplot2::scale_fill_manual(values = colors)

```

## Column 2

### Table of journal paper percentages, by continent and journal {data-height=200}

```{r continent_table_journal}
journal.paper.missing <- articles.df4 %>% 
  group_by(journal) %>% 
  summarize(Missing = sum(is.na(continent))/n()) %>% 
  pull(Missing)

articles.df4 %>% 
  mutate(missing = sum(is.na(continent))/n()) %>% 
  filter(!is.na(continent)) %>% 
  group_by(journal) %>% 
  summarize(Papers = n(),
            `North America` = sum(continent == "Northern America")/n(),
            Europe = sum(continent == "Europe")/n(),
            Asia = sum(continent == "Asia")/n(),
            Oceania = sum(continent == "Oceania")/n(),
            `Latin America` = sum(continent == "Latin America and the Caribbean")/n(),
            Africa = sum(continent == "Africa")/n(),
            `Missing*` = first(missing),
            ) %>% 
  mutate(`Missing*` = journal.paper.missing,
         across(`North America`:`Missing*`, ~ round(.x * 100, 2))) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by continent and journal")

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Country

## Column 1 {data-width=800}

### Waffle plot of journal paper percentages, by country (each flag = 1% of data)

```{r country_table_overall, fig.width=4.5, fig.height=4.5}
df.country <- articles.df4 %>% 
  mutate(missing = sum(is.na(country_code))/n()) %>% 
  filter(!is.na(country_code)) %>% 
  mutate(nrow = n()) %>% 
  count(country, country_code, nrow, sort = TRUE, name = "Papers") %>% 
  mutate(Percentage = Papers / nrow) %>% 
  select(-nrow) %>% 
  add_row(country = "Missing*",
          Papers = sum(is.na(articles.df4$country)),
          Percentage = sum(is.na(articles.df4$country)) / nrow(articles.df4),
          .before = 1) %>% 
  rename_with(str_to_title)

df.country3 <- articles.df4 %>% 
  mutate(missing = sum(is.na(country))/n()) %>% 
  filter(!is.na(country)) %>% 
  group_by(country) %>% 
  add_count(name = "Papers") %>% 
  mutate(Percentage = Papers / nrow(.),
         country = case_when(
           Percentage < 0.02 ~ "Other",
           TRUE ~ country)
         ) %>%
  ungroup() %>% 
  mutate(nrow = n()) %>%
  count(country, nrow, sort = TRUE, name = "Papers") %>% 
  mutate(Percentage = Papers / nrow * 100) %>% 
  select(-c(nrow, Papers)) %>% 
  rename_with(str_to_title, .cols = 1)

colours.country <- colorRampPalette(colors)(length(df.country3$Country))

# waffle(df.country3, legend_pos = "right", colors = colours.country) +
#   theme(legend.text = element_text(size = 15))

waffle_flags <- function(in_map_var, len_x = NA, na_flag = "ac"){
  in_map_var <- data.frame(country = in_map_var)
  my_prop <- in_map_var %>% 
    count(country, sort = TRUE) %>% 
    mutate(n2 = round(n / nrow(in_map_var) * 100))
  in_map_var <- lapply(seq_len(nrow(my_prop)), \(x) {
    rep(my_prop$country[x], my_prop$n2[x])}) %>% 
    unlist
  # work out grid dimensions  
  var_count <- length(in_map_var)
  if(is.na(len_x)){
    x_count <- ceiling(sqrt(var_count))
  } else {
    x_count <- len_x
  }
  y_count <- ceiling(var_count / x_count)
  #y_count <- 10
  grid_count <- x_count * y_count
  df <- 
    data.frame(x = rep(1:y_count, each = x_count),
               y = rep(1:x_count, y_count),
               country = c(in_map_var, rep(na_flag, grid_count - var_count))
    )
  country_4legend <- unique(df$country)[unique(df$country) != na_flag]
  p <- 
    ggplot(df, aes(x, y, country = country)) + 
    geom_flag(size = 8.5) +
    scale_country(breaks = country_4legend) +
    theme_void() +
    coord_equal() +
    theme(legend.position = "right")
  if(grid_count > var_count){
    p <- 
      p +
      geom_point(data = df[var_count:grid_count, ], aes(x, y), colour = "white", size = 10)
  }
  return(p)
}

my_prop <- df.country3 %>% 
  mutate(Country = countrycode(Country, "country.name", "genc2c"),
         Country = tolower(Country)) %>% 
  filter(!is.na(Country))
in_map_var <- lapply(seq_len(nrow(my_prop)), \(x) {
  rep(my_prop$Country[x], my_prop$Percentage[x])}) %>% 
  unlist
waffle_flags(in_map_var)

```

## Column 2

### Table of journal paper percentages, by country {data-height=200}

```{r country_table_journal}
df.country %>% 
  mutate(Percentage = round(Percentage * 100, 2)) %>% 
  rename(`Country Code` = Country_code) %>% 
  datatable(caption = "Journal paper percentages, by country")

```

> \* Percentages are calculated after excluding missing values. The *Missing* row shows the real percentage of missing values.

# Country, by Year

## Column 1 {data-width=1000}

### Scatter plot of journal paper percentages, by country and year

```{r, country_series_year}
df.country.year.papers <- articles.df4 %>% 
  filter(!is.na(country)) %>% 
  count(year, name = "Papers") %>% 
  arrange(desc(year), desc(Papers))

get_year_papers <- function(year) {
  df.country.year.papers[which(
    df.country.year.papers$year == year), "Papers"]
}

df.country.year.missing <- articles.df4 %>% 
  filter(is.na(country)) %>% 
  group_by(year) %>% 
  count(year, name = "Papers") %>% 
  arrange(desc(year), desc(Papers)) %>% 
  left_join(by = "year", articles.df4 %>%
              group_by(year) %>% 
              count(year, name = "all_papers") %>% 
              arrange(desc(year))) %>% 
  mutate(percentage = round(Papers / all_papers * 100, 2),
         country = "Missing*") %>% 
  select(-all_papers)

df.country.year <- articles.df4 %>% 
  group_by(year, country) %>% 
  filter(!is.na(continent)) %>% 
  count(name = "Papers") %>% 
  mutate(percentage = as.numeric(round(Papers / get_year_papers(year) * 100, 2)))

df.country.year2 <- df.country.year %>% 
  ungroup() %>% 
  add_row(df.country.year.missing) %>% 
  arrange(desc(year), desc(Papers))
  
getPalette = colorRampPalette(brewer.pal(8, "Set2"))
colours.country2 <- getPalette(length(unique(df.country.year$country)))
df.country.year %>%
  mutate(year = as.numeric(year),
         country = as.factor(country)) %>%
  nice_scatter(
             predictor = "year",
             response = "percentage",
             group = "country",
             colours = colours.country2,
             method = "lm",
             groups.order = "decreasing",
             ytitle = "% of All Papers") %>%
  ggplotly(tooltip = c("x", "y"))

# Include flags on scatter plot
# Note: doesn't work with ggplotly it seems
# library(ggflags)
# df.country.year %>%
#   mutate(year = as.numeric(year),
#          country = countrycode(country, "country.name", "genc2c"),
#          country = tolower(country),
#          country = as.factor(country)) %>%
#   nice_scatter(
#              predictor = "year",
#              response = "percentage",
#              group = "country",
#              colours = colors,
#              method = "lm",
#              groups.order = "decreasing",
#              ytitle = "% of All Papers") + 
#   geom_flag(aes(country = country)) +#%>%
#   scale_country(aes(country = country)) #%>%
  #ggplotly(tooltip = c("x", "y"))

# Time series dygraph
# q <- df.country.year %>%
#   ungroup() %>%
#   select(year, country, percentage) %>%
#   mutate(year = as.Date(year, "%Y")) %>%
#   pivot_wider(names_from = country, values_from = percentage) %>%
#   as.xts
# 
# dygraph(q) %>%
#   dyRangeSelector() %>%
#   dyUnzoom() %>%
#   dyCrosshair(direction = "vertical") %>% 
#   dyOptions(strokeWidth = 3)

```

## Column 2

### Table of journal paper percentages, by country and year {data-height=200}

```{r, country_table_year}
df.country.year2 %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by country and year")

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Country, by Journal

## Column 1 {data-width=800}

### Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data) {data-height=600}

```{r country_table_journal_figure}
# df.country.journal.missing <- articles.df4 %>% 
#   filter(!is.na(country)) %>% 
#   count(journal, country, name = "Papers") %>% 
#   arrange(desc(journal), desc(Papers))
# 
# get_journal_papers <- function(journal, country) {
#   df.country.journal.missing[which(
#     df.country.journal.missing$journal == journal &
#       df.country.journal.missing$country == country), "Papers"]
# }

df.country.journal.missing2 <- articles.df4 %>% 
  filter(!is.na(country)) %>% 
  count(journal, name = "Papers") %>% 
  arrange(desc(journal), desc(Papers))

get_journal_papers2 <- function(journal) {
  df.country.journal.missing2[which(
    df.country.journal.missing2$journal == journal), "Papers"]
}

df.country.journal <- articles.df4 %>% 
  filter(!is.na(country)) %>% 
  group_by(journal, country) %>% 
  add_count(name = "Papers") %>% 
  mutate(percentage = Papers / nrow(.),
         country = case_when(
           percentage < 0.01 ~ "Other",
           TRUE ~ country)
         ) %>%
  ungroup() %>% 
  count(journal, country, sort = TRUE, name = "Papers") %>% 
  rowwise %>% 
  mutate(percentage = as.numeric(round(Papers / get_journal_papers2(journal) * 100, 2)),
         country = as.factor(country)) %>% 
  arrange(desc(journal), desc(Papers))

colours.country2 <- colorRampPalette(colors)(length(unique(df.country.journal$country)))

df.country.journal %>%
  ggplot(aes(fill = country, values = percentage)) +
  geom_waffle(color = "white", size = 0.8, na.rm = TRUE) +
  facet_wrap(~journal) +
  scale_y_continuous(expand=c(0,0)) +
  coord_equal() + 
  theme_minimal(base_size=6) +
  theme(panel.grid = element_blank(), 
        axis.text = element_blank(), 
        plot.title=element_text(hjust=0.5),
        legend.title = element_blank(),
        legend.text = element_text(size = 5)) +
   ggplot2::scale_fill_manual(values = colours.country2)

```

## Column 2

### Table of journal paper percentages, by country and journal {data-height=200}

```{r country_table_journal2}
df.country.journal2 <- articles.df4 %>% 
  group_by(journal, country) %>% 
  filter(!is.na(country)) %>% 
  count(name = "Papers") %>% 
  mutate(percentage = as.numeric(round(Papers / get_journal_papers2(journal) * 100, 2))) %>% 
  arrange(desc(journal), desc(Papers))

# Note: there is no missing values for variable "journal"
# any(is.na(articles.df4$journal))

df.country.journal.missing3 <- articles.df4 %>% 
  filter(is.na(country)) %>% 
  group_by(journal) %>% 
  count(journal, name = "Papers") %>% 
  arrange(desc(journal), desc(Papers)) %>% 
  left_join(by = "journal", articles.df4 %>%
              group_by(journal) %>% 
              count(journal, name = "all_papers") %>% 
              arrange(desc(journal))) %>% 
  mutate(percentage = as.numeric(round(Papers / all_papers * 100, 2)),
         country = "Missing*") %>% 
  select(-all_papers)

df.country.journal2 %>% 
  ungroup() %>% 
  add_row(df.country.journal.missing3) %>% 
  arrange(desc(journal), desc(Papers)) %>% 
  rename_with(str_to_title) %>% 
  datatable(caption = "Journal paper percentages, by country and journal")

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.


# Figure 1

## Column 1

### Figure 1, Proportion of American First Authors 1987-Today

```{r, fig1}
df.journal.year <- articles.df4 %>% 
  filter(!is.na(country)) %>% 
  group_by(year, journal) %>% 
  count

journal.order <- c("DP", "JPSP", "JAP", "JFP", "HP", "JEP")

df.us.journal.year <- articles.df4 %>% 
  filter(!is.na(country),
         country_code == "US") %>% 
  group_by(year, journal) %>% 
  count(name = "n_us") %>% 
  full_join(df.journal.year, by = join_by(year, journal)) %>% 
  mutate(percentage_american = round(n_us / n * 100, 2),
         year = as.numeric(year),
         journal = tolower(journal),
         journal = case_match(
           journal,
           "developmental psychology" ~ "DP",
           "journal of personality and social psychology" ~ "JPSP",
           "journal of abnormal psychology" ~ "JAP",
           "journal of family psychology" ~ "JFP",
           "health psychology" ~ "HP", 
           "journal of educational psychology" ~ "JEP"),
         journal = factor(journal, levels = journal.order)) %>% 
  ungroup

df.us.journal.year_temp <- df.us.journal.year %>% 
  group_by(year) %>% 
  summarize(journal = "Total",
            n_us = sum(n_us),
            n = sum(n),
            percentage_american = round(n_us / n * 100, 2))

XData <- rbind(df.us.journal.year, df.us.journal.year_temp)

# fig1 <- nice_scatter(XData,
#              predictor = "year", 
#              response = "percentage_american", 
#              group = "journal",
#              method = "lm",
#              ytitle = "% American First Authors")

fig1 <- nice_scatter(XData,
             predictor = "year",
             response = "percentage_american",
             group = "journal",
             ymin = 30,
             ymax = 100,
             yby = 10,
             #colours = rep("black", 6),
             #has.points = FALSE,
             has.linetype = TRUE,
             has.shape = TRUE,
             has.line = FALSE,
             alpha = 1,
             method = "lm",
             ytitle = "% American First Authors") +
  geom_line() +
  geom_point(fill = "black", aes(size = journal)) +
  scale_colour_manual(values = rep("black", 7), name = "") +
  scale_shape_manual(values = c(18, 17, 17, 23, 19, 20, 15), name = "") +
  scale_size_manual(values = c(2, 2, 4, 4, 4, 2, 4), name = "") +
  ggplot2::labs(fill = "", linetype = "") +
  theme(legend.title=element_blank()) +
  theme(panel.grid.major.y = element_line(colour = "black", linewidth = 0.5))

fig1 <- ggplotly(fig1, tooltip = c("x", "y"))

for (i in 1:length(fig1$x$data)){
    if (!is.null(fig1$x$data[[i]]$name)){
        fig1$x$data[[i]]$name =  gsub("\\(","",str_split(fig1$x$data[[i]]$name,",")[[1]][1])
    }
}
fig1

```
  
# Missing Data

## Column 1 {data-width=700}

### This table allows investigating why the country/university could not be identified

```{r missing_universities, warning=FALSE}
articles.df4 %>% 
  filter(is.na(country)) %>% 
  select(-c(country_code:region)) %>% 
  arrange(address) %>% 
  mutate(doi = paste0("<a href='", doi,"' target='_blank'>", doi, "</a>")) %>%
  datatable(
    extensions = 'Responsive',
    options = list(iDisplayLength = 5),
    caption = "Journal paper metadata",
    escape = FALSE
    )
```

## Column 2