Since I began my little reading project, I’ve wanted to create a helpful visualization to track the pace of my reading. I started with a chart plugin, but was quickly disappointed with what free WordPress plugins have to offer, which was basically a bar chart by month. Boo.
So I turned to R and after looking at some visualizations, I decided to make a Dumbell chart. Quickly, I discovered, I do not know how to use ggplot correctly and I struggled to coerce the data into the shape that ggplot would accept it and put the points and lines together in the way I wanted.
That’s when I turned to ChatGPT because at least it would give me big blocks of code that I could work with, rather than having to do it all alone. I was right to be reserved in my expectations, the first few attempts all yielded errors. In R, you have to load certain libraries to use their functions, and most of the advanced stuff requires several additional libraries, but what ChatGPT generated didn’t take into account these dependencies.
I did go back to ChatGPT and told it the errors I was getting, but the responses seemed to veer off rather than get closer to the desired outcome. This is where I had to apply my knowledge in order to investigate what I was actually getting back. I had to manually fix the errors (by loading missing libraries or coercing variables into a new format), then give the fixed code back to it and ask it to make further changes. The bot got pretty close, but I still needed at least some basic knowledge of what the code should look like in order to troubleshoot and actually deliver a final plot. For example, at one point, I asked it to flip the coordinates, which it did in 2 different ways simultaneously (so it wasn’t flipped at all.) This one was hard to explain back to the bot, and was just a lot easier for me to remove the 2nd transposition and pretend it had got it right.
So far, this seems to be the story for a lot of people using it to code. ChatGPT is able to give you the skeleton, but you still have to know what you’re doing in order to make it work for what you want to do.
Thus I’m not scared — yet. This looks like an amazing resource for code kiddies like me who just want to do something fun without having to deep dive, but it can’t replace having an idea of the syntax of the language you’re working in or the knowledge to ask the right question.
The Final Code Collaboration
# Load the necessary libraries
library(ggplot2)
library(tidyverse)
library(lubridate)
library(scales)
#get the data
data <- read_csv("Downloads/Reading Schedule - Transform.csv")
#make sure the data is in good shape to work with
data$start <- as.Date(data$start)
data$end <- as.Date(data$end)
data$start_numeric <- as.numeric(data$start - as.Date("2023-01-01"))
data$end_numeric <- as.numeric(data$end - as.Date("2023-01-01"))
data$difference <- data$end - data$start
# Reorder the books based on their start date
data <- data %>%
mutate(start = as.Date(start)) %>%
arrange(start) %>%
mutate(book = factor(book, levels = unique(book)))%>%
mutate(word_size = words/25000)
# Create a dumbbell chart using ggplot2
ggplot(data, aes(x = end_numeric, y = reorder(book, -start_numeric))) +
geom_point(aes(color = female, shape = canadian), size = 2.5) +
geom_point(aes(x = start_numeric), color = "grey", size = data$word_size) +
geom_segment(aes(yend = reorder(book, start), xend = start_numeric)) +
geom_segment(aes(yend = reorder(book, start), xend = end_numeric)) +
geom_text(aes(label = difference, x = (start_numeric + end_numeric)/2), vjust = -0.1) +
scale_x_continuous(limits = c(min(data$start_numeric) - 1, max(data$end_numeric) + 1),
breaks = seq(min(data$start_numeric), max(data$end_numeric), by = 10),
labels = date_format(seq(min(data$start), max(data$end), by = "day"), "%b %d")) +
scale_color_manual(values = c("blue", "red")) +
scale_shape_manual(values = c(15, 17)) +
labs(y = "Book", x = "Days in the Year", color = "Female", shape = "Canadian")
The Resulting Graph
What’s Next
This is still not exactly what I want and my ignorance shows with how I haven’t added the bubble size to the legend (it’s a ratio of the number of words in the book) and the Days in the Year ideally would be actual dates. But I’m happy with this for now. It definitely tells a more interesting story than a bar chart.