ISSS608 Visual Analytics Take-home Exercise 3
Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill for R Markdown at https://rstudio.github.io/distill.
This take-home exercise aims to reveal the economic of the city of Engagement, Ohio USA by using appropriate static and interactive statistical graphics methods.
The main question to be answered is displayed as below:
- Over time, are businesses growing or shrinking?
- How are people changing jobs?
- Are standards of living improving or declining over time?
Besides, the 2nd of the 3 questions will be addressed with the consideration of financial status of Engagement’s businesses and residents and application of visual analytic techniques.
- How does the financial health of the residents change over the
period covered by the dataset?
- How do wages compare to the overall cost of living in Engagement?
- Are there groups that appear to exhibit similar patterns?
packages = c('ggiraph', 'plotly',
'DT', 'patchwork',
'gganimate', 'tidyverse',
'readxl', 'gifski', 'gapminder',
'treemap', 'treemapify',
'rPackedBar','lubridate','ViSiElse','zoo')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The code chunk below import Participants.csv and
FinancialJournal.csv from the data folder into R by using read_csv()
of readr and
save it as tibble dataframes respectively called participants
and finjournal.
finjournal <- read_csv('Data/FinancialJournal.csv')%>%
mutate(yearmonth = as.yearmon(timestamp))
participants <- read_csv('Data/Participants.csv')
According to Investopedia,the
measurement financial health includes the amount of
savings as well as the spending on fixed or
non-discretionary expenses.
In this case, a new table called personal_monthly_savings is established by summing up the financial amounts group by participantId and year-month.
In addition, another table called personal_categorized_consumption is built by summing up the financial amounts group by participantId and category.
grp2 <- c('participantId','category')
personal_categorized_consumption <- finjournal[, c('participantId','category', 'amount')]%>%
group_by(across(all_of(grp2))) %>%
summarize(cashflow = sum(amount))
personal_categorized_consumption1 <- merge(personal_categorized_consumption,participants,by="participantId")
From the interactive barchart below we can observe that the average savings per month for all residents mainly varies from $2000 to $3000, except for March 2022, when the figure exceed $4000. There is no significant increase or decrease over 15 months, while the average savings for the recent 3 month are steadily decreasing, pointing out a potential threat towards their financial health.
tooltip <- function(y, ymax, accuracy = .01) {
mean <- scales::number(y, accuracy = accuracy)
sem <- scales::number(ymax - y, accuracy = accuracy)
paste("Mean monthly savings:", mean, "+/-", sem)
}
gg_point <- ggplot(data=personal_monthly_savings,
aes(x = yearmonth),
) +
stat_summary(aes(y = monthly_savings,
tooltip = after_stat(
tooltip(y, ymax))),
fun.data = "mean_se",
geom = GeomInteractiveCol,
fill = "light blue"
) +
stat_summary(aes(y = monthly_savings),
fun.data = mean_se,
geom = "errorbar", width = 0.05, size = 0.1
)
girafe(ggobj = gg_point,
width_svg = 10,
height_svg = 10*0.618)
Looking at the monthly savings for different education level, it is apparent that residents with higher education level (bachelor & graduate degree) tend to show significantly higher savings than the other groups. The overtime patter across all 4 groups are almost the same, while as the education level become higher, the average figures increase as well.
tooltip <- function(y, ymax, accuracy = .01) {
mean <- scales::number(y, accuracy = accuracy)
sem <- scales::number(ymax - y, accuracy = accuracy)
paste("Mean monthly savings:", mean, "+/-", sem)
}
gg_point <- ggplot(data=personal_monthly_savings1,
aes(x = yearmonth),
) +
stat_summary(aes(y = monthly_savings,
tooltip = after_stat(
tooltip(y, ymax))),
fun.data = "mean_se",
geom = GeomInteractiveCol,
fill = "light blue"
) +
stat_summary(aes(y = monthly_savings),
fun.data = mean_se,
geom = "errorbar", width = 0.05, size = 0.1
)+
facet_wrap(~ educationLevel,nrow = 2)
girafe(ggobj = gg_point,
width_svg = 10,
height_svg = 10*0.618)
From the interactive bar-chart below, is is observed that there is hardly differences in allocation of expenditure across different education level. Cost for shelter takes up the largest proportion, followed by recreation, food and education. Since shelter, as a fixed/non-discretionary expense accounts for the most, there is concern of their financial health, especially for the residents with lower education level, for their lower wages.
tooltip <- function(y, ymax, accuracy = .01) {
mean <- scales::number(y, accuracy = accuracy)
sem <- scales::number(ymax - y, accuracy = accuracy)
paste("Mean consumption:", mean, "+/-", sem)
}
gg_point <- ggplot(data=personal_categorized_consumption1,
aes(x = category),
) +
stat_summary(aes(y = cashflow,fill = category,
tooltip = after_stat(
tooltip(y, ymax))),
fun.data = "mean_se",
geom = GeomInteractiveCol,
#fill = "light blue"
) +
stat_summary(aes(y = cashflow),
fun.data = mean_se,
geom = "errorbar", width = 0.6, size = 0.3
)+
facet_wrap(~ educationLevel,nrow = 2)
girafe(ggobj = gg_point,
width_svg = 10,
height_svg = 10*0.618)
The graph below illustrates the expenses for residents having kids and without kids. From the graph it is noticed that residents without kids don’t have any educational expenditure.And those who have kids tend to have a higher monthly cost for shelter.
personal_categorized_consumption2 <- merge(personal_categorized_consumption %>%filter(category != 'Wage'),participants,by="participantId")
tooltip <- function(y, ymax, accuracy = .01) {
mean <- scales::number(y, accuracy = accuracy)
sem <- scales::number(ymax - y, accuracy = accuracy)
paste("Mean consumption:", mean, "+/-", sem)
}
gg_point <- ggplot(data=personal_categorized_consumption2,
aes(x = category),
) +
stat_summary(aes(y = cashflow,fill = category,
tooltip = after_stat(
tooltip(y, ymax))),
fun.data = "mean_se",
geom = GeomInteractiveCol,
#fill = "light blue"
) +
stat_summary(aes(y = cashflow),
fun.data = mean_se,
geom = "errorbar", width = 0.6, size = 0.3
)+
facet_wrap(~ haveKids,nrow = 2)
girafe(ggobj = gg_point,
width_svg = 10,
height_svg = 10*0.618)