Data Quest: Motorway Services UK

Author
Published

November 22, 2024

I’m working on a few ideas about Motorway Service Stations in the UK, or more specifically the mainland of Great Britain (England, Scotland and Wales). However, I was surprised to discover there weren’t any well structured datasets. There is an incredible website available at www.motorwayservices.info - I thoroughly recommend a visit.

For very sensible reasons they don’t allow data to be scraped (we can’t use {rvest}), so I’ve manually downloaded all 107 web pages for the motorway service stations and have them in this folder:

Code
library("cjhRutils")
library("tidyverse")
list.files(quarto_here("service-stations/"), ".html") %>% head()
[1] "Abington Services M74 - Motorway Services Information.html"          
[2] "Annandale Water Services A74(M) - Motorway Services Information.html"
[3] "Baldock Services A1(M) - Motorway Services Information.html"         
[4] "Beaconsfield Services M40 - Motorway Services Information.html"      
[5] "Birch Services M62 - Motorway Services Information.html"             
[6] "Birchanger Green Services M11 - Motorway Services Information.html"  

Now we can use {rvest} to read these HTML files - let’s target the info I want

Code
library("tidyverse")
library("rvest")
example_abingdon <- read_html(quarto_here("service-stations/Abington Services M74 - Motorway Services Information.html")) %>% 
  html_nodes(".infotext") %>% 
  html_text() %>% 
  tibble(
    info = .
  ) %>% 
  separate_wider_delim(info, ":", names = c("property", "value")) %>% 
  mutate(value = str_trim(value))

example_abingdon
# A tibble: 18 × 2
   property                       value                                         
   <chr>                          <chr>                                         
 1 Motorway                       "M74"                                         
 2 Where                          "at J13"                                      
 3 County                         "South Lanarkshire"                           
 4 Postcode                       "ML12 6RG"                                    
 5 Type                           "Single site, used by traffic in both directi…
 6 Operator                       "Welcome Break"                               
 7 Contact Phone                  "01864 502637"                                
 8 Eat-In Food                    "Starbucks, Papa John's, Burger King, Dunkin'…
 9 Takeaway Food / General        "Retail Shop"                                 
10 Other Non-Food Shops           "WH Smith"                                    
11 Picnic Area                    "yes"                                         
12 Cash Machines in main building "Yes (transaction charge applies)"            
13 Parking Charges                "Cars free for the first 2 hours then £5 for …
14 Other Facilities/Information   "GameZone, Tourist Information, BT Openzone"  
15 Motel                          "Days Inn Hotel Abington (Glasgow)"           
16 Fuel Brand                     "Shell"                                       
17 LPG available                  "Yes"                                         
18 Cash Machines at fuel station  "Yes (transaction charge applies)"            

Okay! That’s enough processing to a function I can use to read in all of the data:

Code
read_motorway_services_info <- function(file_path){
  name_service_station <- str_remove(basename(file_path), " - Motorway Services Information.html")
  
  read_html(file_path) %>% 
  html_nodes(".infotext") %>% 
  html_text() %>% 
  tibble(
    info = .
  ) %>%
  separate_wider_delim(info, ":", names = c("property", "value"), too_many = "merge") %>%
  mutate(value = str_trim(value)) %>%
  mutate(service_station = name_service_station) %>% 
  identity()
}

quarto_here("service-stations/Baldock Services A1(M) - Motorway Services Information.html") %>% 
  read_motorway_services_info()
# A tibble: 19 × 3
   property                       value                          service_station
   <chr>                          <chr>                          <chr>          
 1 Motorway                       A1(M)                          Baldock Servic…
 2 Where                          at J10 and from A507           Baldock Servic…
 3 County                         Hertfordshire                  Baldock Servic…
 4 Postcode                       SG7 5TR                        Baldock Servic…
 5 Type                           Single site, used by traffic … Baldock Servic…
 6 Operator                       Extra MSA                      Baldock Servic…
 7 Contact Phone                  01494 678876                   Baldock Servic…
 8 Eat-In Food                    KFC, Le Petit Four, McDonalds… Baldock Servic…
 9 Takeaway Food / General        M&S Simply Food, WH Smith (wi… Baldock Servic…
10 Picnic Area                    yes                            Baldock Servic…
11 Children's Playground          Yes                            Baldock Servic…
12 Cash Machines in main building Yes (transaction charge appli… Baldock Servic…
13 Parking Charges                First two hours free for all … Baldock Servic…
14 Other Facilities/Information   Fast Food & Bakeries and Conv… Baldock Servic…
15 Motel                          Days Inn Stevenage North       Baldock Servic…
16 Fuel Brand                     Shell                          Baldock Servic…
17 LPG available                  Yes                            Baldock Servic…
18 Cash Machines at fuel station  Yes (free)                     Baldock Servic…
19 Other Facilities/Information   Costa Express & Deli2Go avail… Baldock Servic…
Code
data_raw_services <- list.files(quarto_here("service-stations/"), "[.]html", full.names = TRUE) %>% 
  map_dfr(~read_motorway_services_info(.x))

Northbound / Southbound and Eastbound / Westbound

Some service stations come in pairs (dual-site service areas or twin sites) that are split by the motorway and yet still have the same name. For instance, Rownhams Services has a McDonalds when accessed westbound but not eastbound. If you looked at a map of the services it appears that they’re not connected (that’s an overhead sign not a footbridge!).

But they are! There’s a subway connecting them, which is apparently difficult to discover. Thankfully, our data source www.motorwayservices.info knows they’re connected but does suggest it’s a footbridge.

Code
data_raw_services %>% 
  filter(service_station == "Rownhams Services M27") %>% 
  filter(property == "Type") %>% 
  pull(value)
[1] "Separate facilities for each carriageway, but linked by a pedestrian footbridge"

We need a way to identify these stations. It turns out the “Eat-In Food” property is our friend and identifies the 6 twin-site stations:

Code
vec_eat_in_pairs <- data_raw_services %>% 
  filter(property == "Eat-In Food",
         str_detect(value, "Northbound|Eastbound")) %>% 
  pull(service_station)
vec_eat_in_pairs
[1] "Northampton Services M1" "Rownhams Services M27"  
[3] "Sandbach Services M6"    "Strensham Services M5"  
[5] "Tibshelf Services M1"    "Watford Gap Services M1"

Where can we eat

The Eat-In variable is the most complicated, interesting and ripe for visualisation. So let’s treat it separately. First we’ll identify our twin-site restaurants:

Code
data_raw_eat_in <- data_raw_services %>% 
  filter(property == "Eat-In Food") %>% 
  mutate(directional = str_detect(value,
                                "Northbound|Eastbound"))

data_raw_directional_eat <- data_raw_eat_in %>% 
  filter(directional == TRUE) %>% 
  mutate(direction = case_when(
    str_detect(value, "Northbound") ~ "Northbound|Southbound",
    str_detect(value, "Eastbound") ~ "Eastbound|Westbound"
  )) %>% 
  separate_longer_delim(direction,
                        delim = "|") %>% 
  mutate(value = case_when(
    direction == "Northbound" ~ str_extract(value,
                                  "(?<=Northbound: ).*(?=Southbound)"),
    direction == "Southbound" ~ str_extract(value, "(?<=Southbound).*"),
    direction == "Eastbound" ~ str_extract(value,
                                  "(?<=Eastbound: ).*(?=Westbound)"),
    direction == "Westbound" ~ str_extract(value,
                                  "(?<=Westbound: ).*")
  )) 

data_raw_directional_eat
# A tibble: 12 × 5
   property    value                       service_station directional direction
   <chr>       <chr>                       <chr>           <lgl>       <chr>    
 1 Eat-In Food "Costa, Restbite, The Burg… Northampton Se… TRUE        Northbou…
 2 Eat-In Food " Costa, Hot Food Co., McD… Northampton Se… TRUE        Southbou…
 3 Eat-In Food "Costa, Restbite. "         Rownhams Servi… TRUE        Eastbound
 4 Eat-In Food "Costa, Restbite, McDonald… Rownhams Servi… TRUE        Westbound
 5 Eat-In Food "Costa and Restbite, "      Sandbach Servi… TRUE        Northbou…
 6 Eat-In Food ": Costa, McDonald's, Hot … Sandbach Servi… TRUE        Southbou…
 7 Eat-In Food "Soho Coffee Company, Hot … Strensham Serv… TRUE        Northbou…
 8 Eat-In Food ": Costa, Hot Food Co., Mc… Strensham Serv… TRUE        Southbou…
 9 Eat-In Food "Costa, Restbite, McDonald… Tibshelf Servi… TRUE        Northbou…
10 Eat-In Food ": Costa, Restbite, McDona… Tibshelf Servi… TRUE        Southbou…
11 Eat-In Food "Costa, Fresh Food Cafe, M… Watford Gap Se… TRUE        Northbou…
12 Eat-In Food ": Costa, Restbite, The Bu… Watford Gap Se… TRUE        Southbou…

Frustratingly, Strensham Services has an extra little bit of data about Subway being in the Northbound Forecourt. That’ll need manual removal. But other than that I think we end up with fairly well structured data for the eat-in component that we can begin to clean up.

Code
data_raw_directional_eat <- data_raw_directional_eat %>% 
  mutate(value = str_remove(value, ":"),
         value = str_remove(value, " Northbound.*"),
         value = str_trim(value))

data_raw_directionless_eat <- data_raw_eat_in %>% 
  filter(directional == FALSE) %>% 
  mutate(value = str_remove(value, ":|;"),
         value = str_remove(value, "(Westbound)"),
         value = str_trim(value),
         direction = "Directionless")

data_clean_eat_in <- data_raw_directionless_eat %>% 
  bind_rows(data_raw_directional_eat) %>% 
  select(-directional)

There are lots of alternative spellings in the data, here’s a case_when to grab them all. At some point in the future it would be interesting to see if edit distances could help, but for now let’s concentrate on getting a useful dataset.

Code
fn_fix_value_columns <- function(data){
  data %>% 
    mutate(value = case_when(
      str_detect(tolower(value), "arlo") ~ "Arlo's", 
      str_detect(tolower(value), "^bk$") ~ "Burger King",
      str_detect(tolower(value), "cotton") ~ "Cotton Traders", 
      str_detect(tolower(value), "chozen") ~ "Chozen Noodles", 
      str_detect(tolower(value), "cornwall") ~ "West Cornwall Pasty Company", 
      str_detect(tolower(value), "costa") ~ "Costa", 
      str_detect(tolower(value), "eat & drink co") ~ "Eat & Drink Co", 
      str_detect(tolower(value), "edc") ~ "Eat & Drink Co", 
      str_detect(tolower(value), "fone") ~ "FoneBiz", 
      str_detect(tolower(value), "full house") ~ "Full House", 
      str_detect(tolower(value), "greg") ~ "Greggs", 
      str_detect(tolower(value), "harry") ~ "Harry Ramsden's", 
      str_detect(tolower(value), "hot food co") ~ "Hot Food Co",
      str_detect(tolower(value), "krispy") ~ "Krispy Kreme", 
      str_detect(tolower(value), "le petit") ~ "Le Petit Four", 
      str_detect(tolower(value), "lucky coin") ~ "Lucky Coin", 
      str_detect(tolower(value), "m&s") ~ "M&S", 
      str_detect(tolower(value), "marks") ~ "M&S", 
      str_detect(tolower(value), "mcdona") ~ "McDonald's", 
      str_detect(tolower(value), "papa john") ~ "Papa John's", 
      str_detect(tolower(value), "pizza hut") ~ "Pizza Hut", 
      str_detect(tolower(value), "quicksilver") ~ "Quicksilver", 
      str_detect(tolower(value), "regus") ~ "Regus Business Lounge", 
      str_detect(tolower(value), "restbite") ~ "Restbite", 
      str_detect(tolower(value), "soho") ~ "SOHO Coffee Co", 
      str_detect(tolower(value), "spar") ~ "SPAR", 
      str_detect(tolower(value), "starbucks") ~ "Starbucks", 
      str_detect(tolower(value), "the burger") ~ "The Burger Company", 
      str_detect(tolower(value), "top gift") ~ "Top Gift", 
      str_detect(tolower(value), "tourist information") ~ "Tourist Information", 
      str_detect(tolower(value), "upper") ~ "Upper Crust", 
      str_detect(tolower(value), "whs") ~ "WHSmiths", 
      str_detect(tolower(value), "wild") ~ "Wild Bean Cafe", 
      tolower(value) %in% tolower(c("WH Smith", "WHSMiths", "Whsmith","W H Smiths", "W.H.Smiths", "W H Smith", "WH Smiths", "Wh Smith", "WH smith")) ~ "WHSmiths", 
      value == "Buger King" ~ "Burger King", 
      value == "M & S Simply food" ~ "M&S",
      TRUE ~ value
    ))
}

data_long_eat_in <- data_clean_eat_in %>% 
  separate_longer_delim(value,
                        delim = ",") %>% 
  mutate(value = str_trim(value)) %>% 
  filter(value != "") %>% 
  fn_fix_value_columns() %>% 
  select(retailer = value,
          service_station,
         direction)

Now… I’m a little unsure about what to do with the “Takeaway Food / General” property as it also contains information about where we can get food but for the 6 twin stations the direction isn’t provided. Let’s deal with the directionless other retailers now:

Code
data_long_other_shops_directionless <- data_raw_services %>% 
  filter(!service_station %in% vec_eat_in_pairs) %>% 
  filter(property %in% c("Takeaway Food / General", "Other Non-Food Shops")) %>% 
  select(value, service_station) %>% 
  filter(value != "01823680370") %>% 
  separate_longer_delim(value,
                        delim = ",") %>% 
  mutate(value = str_trim(value, side = "both")) %>% 
  fn_fix_value_columns() %>% 
  mutate(value = str_remove(value, "[(].*y[)]"),
         value = str_trim(value)) %>% 
  reframe(retailer = value,
          service_station = service_station,
         direction = "Directionless")

## There's one bad record
data_long_other_shops_directionless <- tibble(
  retailer = c("Gamezone", "WHSmiths", "Waitrose"),
  service_station = "Newport Pagnell Services M1",
  direction = "Directionless"
) %>%
  bind_rows(filter(
    data_long_other_shops_directionless,!str_detect(retailer, "24hr Gamezone WHSmith & Waitrose")
  ))

And now I’ll expand out the twin stations:

Code
## Expand out the twins
data_long_other_shops_w_direction <- data_raw_services %>% 
  filter(service_station %in% vec_eat_in_pairs) %>% 
  filter(property %in% c("Other Non-Food Shops", "Takeaway Food / General")) %>% 
  select(value, service_station) %>% 
  mutate(direction = case_when(
    service_station == "Rownhams Services M27" ~ "Eastbound;Westbound",
    TRUE ~ "Northbound;Southbound"
  )) %>% 
  separate_longer_delim(direction,
                        delim = ";") %>% 
  fn_fix_value_columns() %>% 
  rename(retailer = value)

It’s time to combine everything together into a list of retailers which I’ll export into Excel and quickly categorise.

Code
data_long_retailers <- bind_rows(data_long_eat_in, data_long_other_shops_w_direction, data_long_other_shops_directionless)

data_long_retailers %>% 
  distinct(retailer) %>% 
  arrange(retailer) %>% 
  write_csv(quarto_here("retailer_types.csv"))

Let’s impose these categorisations:

  • is_food_retailer:
    • Do we KNOW we it sells some food items?
  • is_retaurant:
    • Do we KNOW we can order food to eat in?
  • is_takeaway:
    • Do we KNOW we can order food to takeaway
  • is_prepared_food_only
    • Do we KNOW that there is no hot/fresh food, Tesco
  • is_coffee_shop
    • Do we KNOW you’d nip there for a coffee and it’ll be good? Controversially, McDonald’s isn’t included.
Code
library("readxl")
data_type_of_retailer <- read_excel(quarto_here("retailer_types.xlsx"))

data_services_retailers <- data_long_retailers %>% 
  left_join(data_type_of_retailer) %>% 
  mutate(across(starts_with("is"), ~ case_when(
    .x == "Y" ~ TRUE,
    .x == "N" ~ FALSE,
    TRUE ~ NA
    )))

data_services_retailers
# A tibble: 650 × 8
   retailer service_station direction is_food_retailer is_restaurant is_takeaway
   <chr>    <chr>           <chr>     <lgl>            <lgl>         <lgl>      
 1 Starbuc… Abington Servi… Directio… TRUE             NA            TRUE       
 2 Papa Jo… Abington Servi… Directio… TRUE             TRUE          TRUE       
 3 Burger … Abington Servi… Directio… TRUE             TRUE          TRUE       
 4 Dunkin'… Abington Servi… Directio… TRUE             FALSE         TRUE       
 5 Harry R… Abington Servi… Directio… TRUE             TRUE          TRUE       
 6 Costa    Annandale Wate… Directio… TRUE             FALSE         TRUE       
 7 Restbite Annandale Wate… Directio… TRUE             NA            NA         
 8 The Bur… Annandale Wate… Directio… TRUE             TRUE          TRUE       
 9 KFC      Baldock Servic… Directio… TRUE             TRUE          TRUE       
10 Le Peti… Baldock Servic… Directio… TRUE             FALSE         TRUE       
# ℹ 640 more rows
# ℹ 2 more variables: is_prepared_food_only <lgl>, is_coffee_shop <lgl>

Non-food information

The non-food information is so much easier to deal with. Because I want to create an {sf} object and potentially support exporting as ESRI shapefiles let’s make sure our colnanes have a maximum of 10 characters.

Code
data_wide_services <- data_raw_services %>% 
  filter(property %in% c("Motorway",
                         "Where",
                         "Postcode",
                         "Type",
                         "Operator",
                         "Parking Charges",
                         "LPG available",
                         "Electric Charge Point")) %>% 
  mutate(property = case_when(
    property == "LPG available" ~ "has_lpg",
    property == "Electric Charge Point" ~ "has_electric_charge",
    TRUE ~ property
  )) %>% 
  mutate(value = str_replace_all(value, "ï��[0-9]{1,}", "£"),
         value = str_remove_all(value, "Â")) %>% 
  pivot_wider(names_from = property,
              values_from = value) %>% 
  janitor::clean_names() %>% 
  mutate(is_single_site = str_detect(type, "Single site"),
         is_twin_station = str_detect(type, "Separate facilities"),
         has_walkway_between_twins = case_when(
           is_twin_station == TRUE & str_detect(type, "linked") ~ TRUE,
           is_twin_station == TRUE & str_detect(type, "no link") ~ FALSE,
           TRUE ~ NA),
         is_ireland = str_detect(service_station, "Ireland")) %>% 
  reframe(
    name = service_station,
    motorway,
    where,
    postcode,
    type,
    operator,
    is_ireland,
    p_charges = parking_charges,
    has_charge = has_electric_charge,
    is_single = is_single_site,
    is_twin = is_twin_station,
    has_walk = has_walkway_between_twins
  )

There are some services like Gloucester Services M5 that appear as two distinct rows but they still pass is_single == FALSE. Let’s identify these services and mark them in the dataset as pair_name.

Code
data_paired_services <- data_wide_services %>% 
  filter(is_single == FALSE) %>% 
  filter(str_detect(name, "North|South|East|West")) %>% 
  select(name) %>% 
  separate_wider_delim(name, delim = "Services",
                       names = c("name", "direction")) %>% 
  mutate(across(everything(), ~str_trim(.))) %>% 
  separate_wider_delim(direction,
                       delim = " ",
                       names = c("direction",
                                 "motorway"),
                       too_few = "align_end") %>% 
  add_count(name, motorway) %>% 
  filter(n > 1) %>% 
  reframe(name = paste(name, "Services", direction, motorway),
          pair_name = paste(name, motorway))


data_services_info <- data_wide_services %>% 
  left_join(data_paired_services) %>% 
  mutate(is_pair = ifelse(is.na(pair_name), FALSE, TRUE))

data_services_info
# A tibble: 106 × 14
   name   motorway where postcode type  operator is_ireland p_charges has_charge
   <chr>  <chr>    <chr> <chr>    <chr> <chr>    <lgl>      <chr>     <chr>     
 1 Abing… M74      at J… ML12 6RG Sing… Welcome… FALSE      "Cars fr… <NA>      
 2 Annan… A74(M)   at J… DG11 1HD Sing… RoadChef FALSE      "Parking… Yes (More…
 3 Baldo… A1(M)    at J… SG7 5TR  Sing… Extra M… FALSE      "First t… <NA>      
 4 Beaco… M40      at J… HP9 2SE  Sing… Extra M… FALSE       <NA>     <NA>      
 5 Birch… M62      betw… OL10 2HQ Sepa… Moto     FALSE      "Car - £… <NA>      
 6 Birch… M11      at J… CM23 5QZ Sing… Welcome… FALSE      "Parking… <NA>      
 7 Black… M65      at J… BB3 0AT  Sing… Extra M… FALSE      "Parking… <NA>      
 8 Blyth… A1(M)    at J… S81 8HG  Sing… Moto     FALSE      "Free fo… Yes (More…
 9 Bothw… M74      betw… G71 8BG  Faci… RoadChef FALSE      "Parking… Yes (More…
10 Bridg… M5       at J… TA6 6TS  Sing… Moto     FALSE      "Cars - … <NA>      
# ℹ 96 more rows
# ℹ 5 more variables: is_single <lgl>, is_twin <lgl>, has_walk <lgl>,
#   pair_name <chr>, is_pair <lgl>

Getting the locations of the services…

I’ve gone and got the coords from Google Maps and stored them in an Excel file (because I’m not perfect). Here’s a very quick interactive {leaflet} map showing where they are:

Code
library("sf")
library("leaflet")
data_raw_service_locations <- read_excel(quarto_here("services-locations.xlsx"))

data_clean_long_lat <- data_raw_service_locations %>% 
  separate_wider_delim(google_pin,
                       delim = ",", 
                       names = c("lat", "long")) %>% 
  mutate(long = as.numeric(long),
         lat = as.numeric(lat)) %>% 
  select(name, long, lat)


data_sf_service_locs <- data_clean_long_lat %>% 
  full_join(data_services_info) %>% 
  st_as_sf(coords = c("long", "lat"),
           crs = 4326)
  
data_sf_service_locs %>% 
  filter(is_ireland == FALSE) %>% 
  leaflet() %>% 
  addProviderTiles(providers$OpenStreetMap) %>% 
  addCircleMarkers()

Operators

I want to make sure we’re not over counting operators due to pair sites! So let’s make some explicit counts.

  • n_named_sites_all: How many uniquely named sites are there across Great Britian? Gloucester Northbound Services M5 and Gloucester Southbound Services M5 are distinctly named, but the Birch Services M62 is listed once despite being a twinned site.

  • n_named_sites_mainland: same as above but discounting services in Ireland

  • n_named_sites_ireland: only counts uniquely named services in Ireland

  • n_single_sites: How many sites are accessible by traffic in both directions

  • n_twins: How many sites are twinned, two locations on each side of the motorway with or without a walkway between them

  • n_pairs: How many sites have paired names, eg Gloucester Northbound Services M5 and Gloucester Southbound Services M5

Code
data_process_ops_single <- data_services_info %>% 
  select(name, operator, is_single) %>% 
  count(operator, is_single) %>% 
  filter(is_single == TRUE) %>% 
  reframe(operator,
          n_single_sites = n)

data_process_ops_twins <- data_services_info %>% 
  count(operator, is_twin) %>% 
  filter(is_twin == TRUE) %>% 
  reframe(operator,
          n_twins = n)

data_process_ops_pair <- data_services_info %>% 
  select(name, operator, is_pair) %>% 
  count(operator, is_pair) %>% 
  filter(is_pair == TRUE) %>% 
  reframe(operator,
          n_pairs = n)

data_process_ops_simple_all <- data_services_info %>% 
  count(operator, name = "n_named_sites_all") 

data_process_ops_simple_ireland <- data_services_info %>% 
  filter(str_detect(name, "Ireland")) %>% 
  count(operator, name = "n_named_sites_ireland") 


data_operators <- data_process_ops_simple_all %>% 
  left_join(data_process_ops_simple_ireland) %>% 
  left_join(data_process_ops_single) %>% 
  left_join(data_process_ops_twins) %>% 
  left_join(data_process_ops_pair) %>% 
  mutate(across(everything(), ~replace_na(.x, 0))) %>% 
  mutate(n_named_sites_mainland = n_named_sites_all - n_named_sites_ireland) %>% 
  select(
    operator,
    n_named_sites_all,
         n_named_sites_mainland,
         n_named_sites_ireland,
         everything())

data_operators
# A tibble: 9 × 7
  operator      n_named_sites_all n_named_sites_mainland n_named_sites_ireland
  <chr>                     <int>                  <int>                 <int>
1 Applegreen                    3                      0                     3
2 BP Connect                    1                      1                     0
3 Euro Garages                  2                      2                     0
4 Extra MSA                     7                      7                     0
5 Moto                         39                     39                     0
6 RoadChef                     20                     20                     0
7 Stop 24                       1                      1                     0
8 Welcome Break                27                     27                     0
9 Westmorland                   6                      6                     0
# ℹ 3 more variables: n_single_sites <int>, n_twins <int>, n_pairs <int>

Exporting all that good data

I’d really love this dataset to become a Tidy Tuesday dataset! So while writing this post I’ve created a fork of the repo. If my eventual pull request gets accepted we’d be able to pull the data from the official TidyTuesday repo, but until then it’s available as follows

Code
head(read_csv("https://raw.githubusercontent.com/charliejhadley/tidytuesday/refs/heads/Motorway-Services-UK/data/curated/motorway-services-uk/data_service_locations.csv"))
# A tibble: 6 × 15
  name    long   lat motorway where postcode type  operator p_charges has_charge
  <chr>  <dbl> <dbl> <chr>    <chr> <chr>    <chr> <chr>    <chr>     <chr>     
1 Abin… -3.70   55.5 M74      at J… ML12 6RG Sing… Welcome… "Cars fr… <NA>      
2 Anna… -3.41   55.2 A74(M)   at J… DG11 1HD Sing… RoadChef "Parking… Yes (More…
3 Bald… -0.205  52.0 A1(M)    at J… SG7 5TR  Sing… Extra M… "First t… <NA>      
4 Beac… -0.630  51.6 M40      at J… HP9 2SE  Sing… Extra M…  <NA>     <NA>      
5 Birc… -2.23   53.6 M62      betw… OL10 2HQ Sepa… Moto     "Car - £… <NA>      
6 Birc…  0.192  51.9 M11      at J… CM23 5QZ Sing… Welcome… "Parking… <NA>      
# ℹ 5 more variables: is_single <lgl>, is_twin <lgl>, has_walk <lgl>,
#   pair_name <chr>, is_pair <lgl>

Let’s make a map

I’ve obtained some high quality shapefiles for the UK from the ONS which I’m going to immediately start throwing information away from.

  • There aren’t any true service stations in Northern Ireland, so we’ll include only England, Scotland and Wales

  • There are only service stations on the mainland! So let’s discount any polygon with an area smaller than 1E10m^2

Code
data_sf_uk <- read_sf(quarto_here("Countries_December_2023_Boundaries_UK_BFC_-5189344684762562119/"))

data_sf_gb_mainland <- data_sf_uk %>% 
  filter(CTRY23NM != "Northern Ireland") %>% 
  st_cast("POLYGON") %>% 
  mutate(area = as.numeric(st_area(geometry))) %>% 
  filter(area >= 1E10) %>%
  # st_union() %>%
  st_as_sf()

data_sf_gb_mainland
Simple feature collection with 3 features and 9 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 134112.4 ymin: 11429.67 xmax: 655653.8 ymax: 976859.9
Projected CRS: OSGB36 / British National Grid
# A tibble: 3 × 10
  CTRY23CD  CTRY23NM CTRY23NMW  BNG_E  BNG_N  LONG   LAT GlobalID               
* <chr>     <chr>    <chr>      <dbl>  <dbl> <dbl> <dbl> <chr>                  
1 E92000001 England  Lloegr    394883 370883 -2.08  53.2 ea73ad5d-1f4e-4f07-8e0…
2 S92000003 Scotland Yr Alban  277744 700060 -3.97  56.2 f2267107-2e4a-442e-bc8…
3 W92000004 Wales    Cymru     263405 242881 -3.99  52.1 d818bd0d-8e08-446f-889…
# ℹ 2 more variables: geometry <POLYGON [m]>, area <dbl>

It takes a fair amount of time to plot, so I can use `{rmapshaper} to simplify the borders, which look okay:

Code
library("rmapshaper")

data_sf_simpler_mainland <- ms_simplify(data_sf_gb_mainland, keep = 0.0005)

ggplot() +
  geom_sf(data = data_sf_simpler_mainland) +
  geom_sf(data = filter(data_sf_service_locs, is_ireland == FALSE )) +
  coord_sf(crs = 4326,
           ylim = c(50, 59))

Let’s build towards an okay looking chart:

Code
library("patchwork")
library("ggtext")

data_plot_services <- data_sf_service_locs %>% 
  filter(is_ireland == FALSE) %>% 
  left_join(select(data_operators, operator, n_named_sites_all)) %>% 
  mutate(operator = fct_reorder(operator, n_named_sites_all))

colour_motorway_blue <- "#3070B5"

gg_services_roadless <- ggplot() +
  geom_sf(data = st_transform(data_sf_simpler_mainland, crs = 4326),
          fill = colour_motorway_blue,
          colour = "white",
          linewidth = 0.8) +
  geom_sf(data = st_transform(data_plot_services, crs = 4326),
          aes(fill = operator),
          pch = 21,
          size = 3.5,
          colour = "white") +
  geom_richtext(aes(x = -9,
                    y = 54,
                    label = "Tiredness can kill<br>Take a break"),
                family = "Transport",,
                fill = "transparent",
                label.color = NA,
                colour = "white"
  ) +
  scale_fill_brewer(palette = "Dark2") +
  guides(fill = guide_legend(
    # override.aes = list(size = 8), 
                             title = "", reverse = TRUE)
         ) +
  scale_x_continuous(labels = scales::label_number(accuracy = 0.01)) +
  scale_y_continuous(labels = scales::label_number(accuracy = 0.01)) +
  coord_sf(crs = 4326,
           ylim = c(50, 59),
           xlim = c(-12, 1.76)) + 
  # theme_classic(base_family = "Transport") +
  theme_void(base_family = "Transport") +
  theme(legend.text = element_text(colour = "white"),
        # legend.spacing.y = unit(2.0, "cm"),
        legend.background = element_rect(fill = colour_motorway_blue, colour = "transparent"),
        plot.background = element_rect(fill = colour_motorway_blue),
        panel.background = element_blank()
  )

gg_services_roadless

Making it look like a motorway sign

There is a simply beautiful design guide for UK traffic signs that goes into all of the detail, for instance:

At some point it could be fun to take all of this and convert it into a {ggplot2} theme - but that’s a lot of work. I want to focus on getting that nice round white border on my chart. That’s more difficult than I originally thought, there are two pathways:

  • Fiddle around with grobs thanks to Claus Wilke’s great StackOverflow Answer on adding round corners to the panel border.

  • Shove a rounded rectangle onto the chart through the geom_rrect() function from {ggchicklet}… which is much easier:

Code
library("ggchicklet") # remotes::install_github("hrbrmstr/ggchicklet")
gg_services_roadless +
  geom_rrect(aes(xmin = -12, xmax = 1.76, ymin = 50, ymax = 59),
             fill = "transparent",
             colour = "white",
             r = unit(0.1, 'npc'))

Now let’s rebuild the chart and set the sizing to work well on export :)

Code
lims_x <- list(min = -12.9, max = 1.86)
lims_y <- list(min = 50.2, max = 58.5)

gg_services_roadless <- ggplot() +
  geom_rrect(aes(xmin = lims_x$min - 0.8, 
                 xmax = lims_x$max + 0.8, 
                 ymin = lims_y$min - 0.65, 
                 ymax = lims_y$max + 0.65),
             fill = colour_motorway_blue,
             colour = "white",
             size = 15,
             r = unit(0.1, 'npc')) +
  geom_sf(data = st_transform(data_sf_simpler_mainland, crs = 4326),
          fill = colour_motorway_blue,
          colour = "white",
          linewidth = 0.8) +
  geom_sf(data = st_transform(data_plot_services, crs = 4326),
          aes(fill = operator),
          pch = 21,
          size = 3.5,
          colour = "white") +
  geom_richtext(aes(x = -8.5,
                    y = 53.7,
                    label = "Tiredness can kill<br>Take a break"), 
                size = 20,
                family = "Transport",,
                fill = "transparent",
                label.color = NA,
                colour = "white"
                ) +
  scale_fill_brewer(palette = "Dark2") +
  guides(fill = guide_legend(override.aes = list(size = 8), title = "", reverse = TRUE)) +
  scale_x_continuous(labels = scales::label_number(accuracy = 0.01), expand = expansion(add = 1)) +
  scale_y_continuous(labels = scales::label_number(accuracy = 0.01), expand = expansion(add = c(1, 1))) +
  coord_sf(crs = 4326,
           ylim = as.numeric(lims_y),
           xlim = as.numeric(lims_x)) + 
  # theme_classic(base_family = "Transport") +
  theme_void(base_family = "Transport") +
  theme(legend.position=c(.85,.75),
        legend.text = element_text(colour = "white", size = 20),
        legend.spacing.y = unit(2.0, "cm"),
    legend.key.size = unit(1.7, "cm"),
    legend.background = element_rect(fill = "transparent", colour = "transparent"),
    plot.background = element_rect(fill = "grey90", colour = "transparent"),
        panel.background = element_blank(),
        plot.margin = margin(t = 1, r = 0, b = 1, l = 0)
    )

gg_services_roadless

Code
ggsave(quarto_here("gg_services_roadless_simple.png"),
       gg_services_roadless,
       width = 2 * 7.2,
       height = 2 * 7.5,
       bg = "grey90")

Reuse

Citation

BibTeX citation:
@online{hadley2024,
  author = {Hadley, Charlie},
  title = {Data {Quest:} {Motorway} {Services} {UK}},
  date = {2024-11-22},
  url = {https://visibledata.co.uk/posts/2024-11-13_motorway-service-stations-info/},
  langid = {en}
}
For attribution, please cite this work as:
Hadley, Charlie. 2024. “Data Quest: Motorway Services UK.” November 22, 2024. https://visibledata.co.uk/posts/2024-11-13_motorway-service-stations-info/.