Draft

Using fuzzyjoin to work with NCES data

Demonstrating how fuzzyjoin can help you with messy data through an example using school names within the Virgin Islands
R
Data Wrangling
Author
Affiliation
Published

January 17, 2025

Battling first child amnesia

I am a father of two sons; one 4.5 years old, and the other is but a few months. This may seem weird, but even though I went through everything with my first son… I have complete amnesia about what was normal, what napping schedules were like, and such-like at this age.

Fortunately, we used a baby tracker, which allowed me to export a csv. What a golden opportunity for some data visualization!

Let’s just load up the library and take a look…

Code in R
library(tidyverse)


child <- read_csv("https://github.com/drjohnrussell/drjohnrussell.github.io/raw/refs/heads/master/posts/2025-01-30-plotting-sleep-intervals/data/baby.csv")

head(child)
# A tibble: 6 × 3
  Baby       Time              `Duration (min)`
  <chr>      <chr>                        <dbl>
1 First-Born 10/23/20 4:30 AM                60
2 First-Born 10/23/20 6:00 AM                55
3 First-Born 10/23/20 8:31 AM                39
4 First-Born 10/23/20 11:13 AM               28
5 First-Born 10/23/20 3:49 PM                14
6 First-Born 10/23/20 7:16 PM               298

|> mutate(starttime=mdy_hm(Time), endtime=starttime+minutes(Duration (min))) ## separate the ones that go over multiple days

Viggo <- read_csv(“Viggo_sleep.csv”) |> mutate(starttime=mdy_hm(Time), endtime=starttime+minutes(Duration (min)))

Elginmultiple <- Elgin |> filter(day(starttime)!=day(endtime))

Viggomultiple <- Viggo |> filter(day(starttime)!=day(endtime))

Elgin1 <- Elginmultiple |> mutate(endtime=make_datetime(year(starttime),month(starttime),day(starttime),hour=23,min=59,sec=0)) Elgin2 <- Elginmultiple |> mutate(starttime=make_datetime(year(endtime),month(endtime),day(endtime),hour=0,min=0))

Viggo1 <- Viggomultiple |> mutate(endtime=make_datetime(year(starttime),month(starttime),day(starttime),hour=23,min=59,sec=0)) Viggo2 <- Viggomultiple |> mutate(starttime=make_datetime(year(endtime),month(endtime),day(endtime),hour=0,min=0))

birth <- mdy(“05-30-20”) birthViggo <- mdy(“07-28-24”)

Elgin <- Elgin |> filter(day(starttime)==day(endtime)) |> bind_rows(Elgin1,Elgin2)

Elgin2 <- Elgin |> mutate(daysold=floor(difftime(starttime,birth,units=“weeks”)), dayweek=wday(starttime,label=TRUE,abbr=FALSE,week_start=7), starthour=hm(paste0(hour(starttime),“:”,minute(starttime))), endhour=hm(paste0(hour(endtime),“:”,minute(endtime))))

Viggo <- Viggo |> filter(day(starttime)==day(endtime)) |> bind_rows(Viggo1,Viggo2)

Viggo2 <- Viggo |> mutate(daysold=floor(difftime(starttime,birthViggo,units=“weeks”)), dayweek=wday(starttime,label=TRUE,abbr=FALSE,week_start=7), starthour=hm(paste0(hour(starttime),“:”,minute(starttime))), endhour=hm(paste0(hour(endtime),“:”,minute(endtime))))

nursing <- read_csv(“Elgin Russell_nursing.csv”) |> mutate(Time=mdy_hm(Time)) |> select(Time)

milk <- read_csv(“Elgin Russell_expressed.csv”) |> mutate(Time=mdy_hm(Time)) |> select(Time) |> bind_rows(nursing)

Elginmilk <- milk |> mutate(daysold=floor(difftime(Time,birth,units=“weeks”)), dayweek=wday(Time,label=TRUE,abbr=FALSE,week_start=7), hour=hm(paste0(hour(Time),“:”,minute(Time)))) |> filter(daysold>20, daysold<30)

BREAKS <- c(0:12)*7200

Elgin2 |> filter(daysold>20, daysold<30) |> ggplot() + geom_rect(mapping=aes(xmin=as.numeric(dayweek)-.45, xmax=as.numeric(dayweek)+.45, ymin=starthour, ymax=endhour),fill=“blue”, alpha=.5) + geom_rect(data=Viggo2 |> filter(daysold>20), mapping=aes(xmin=as.numeric(dayweek)-.45, xmax=as.numeric(dayweek)+.45, ymin=starthour, ymax=endhour),fill=“green”, alpha=.5) + facet_wrap(~daysold) + geom_point(data=Elginmilk, mapping=aes(x=as.numeric(dayweek), y=hour),color=“pink”) + scale_y_time(breaks=BREAKS) + scale_x_continuous(breaks=seq_along(levels(Elgin2\(dayweek)), labels=levels(Elgin2\)dayweek)) + labs(x=““,y=”“)

round(difftime(today(),mdy(“07-28-24”),units=“weeks”),0)

ggsave(“Elgin.svg”,width=11,height=18,units=“in”)

Citation

BibTeX citation:
@online{russell2025,
  author = {Russell, John},
  title = {Using Fuzzyjoin to Work with {NCES} Data},
  date = {2025-01-17},
  url = {https://drjohnrussell.github.io/posts/2025-01-17-fuzzyjoin-in-action/},
  langid = {en}
}
For attribution, please cite this work as:
Russell, John. 2025. “Using Fuzzyjoin to Work with NCES Data.” January 17, 2025. https://drjohnrussell.github.io/posts/2025-01-17-fuzzyjoin-in-action/.