Writing verbatim SQL in pipe chains using {dbplyr}

Author

Charlie Hadley

Published

June 15, 2022

I’ve been trying to use the {dbplyr} package when working with SQL and R. It provides a really nice interface for writing tidyverse code against a database connection. I feel that {dbplyr} gives a generally nice interface than using {DBI} which is the approach used in db.rstudio.com, but it’s under documented.

Here’s a comparison of code using {DBI} and {dbplyr}

library(tidyverse)
library(fivethirtyeight)
library(dbplyr)

bechdel_tbl <- memdb_frame(bechdel, .name = "bechdel")

bechdel_tbl %>% 
  filter(year < 2000) %>% 
  group_by(clean_test) %>% 
  summarise(mean_budget_2013 = mean(budget_2013))
# Source:   SQL [5 x 2]
# Database: sqlite 3.38.5 [:memory:]
  clean_test mean_budget_2013
  <chr>                 <dbl>
1 dubious           59202525.
2 men               45896513.
3 notalk            57963237.
4 nowomen           51891839.
5 ok                44990138.
library(tidyverse)
library(fivethirtyeight)
library(DBI)

con <- dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, bechdel, "bechdel")

tbl(con, "bechdel") %>% 
  filter(year < 2000) %>% 
  group_by(clean_test) %>% 
  summarise(mean_budget_2013 = mean(budget_2013))
# Source:   SQL [5 x 2]
# Database: sqlite 3.38.5 [:memory:]
  clean_test mean_budget_2013
  <chr>                 <dbl>
1 dubious           59202525.
2 men               45896513.
3 notalk            57963237.
4 nowomen           51891839.
5 ok                44990138.

Now, that’s great for when we want to write new queries. But I wanted to go in the other direction. There’s lots of existing SQL code that I don’t want to re-write into R code, so it would be nice to write verbatim SQL within the pipe syntax. I couldn’t figure it out from the documentation so asked the question on Twitter, and as is often the case Garrick Aden-Buie gave me the solution.

tweetrmd::tweet_embed("https://twitter.com/grrrck/status/1537082896124616705")

Let’s write that into a code chunk:

bechdel_tbl %>% 
  remote_con() %>% 
  tbl(sql("SELECT `clean_test`, AVG(`budget_2013`) AS `mean_budget_2013`
FROM `bechdel`
WHERE (`year` < 2000.0)
GROUP BY `clean_test`"))
# Source:   SQL [5 x 2]
# Database: sqlite 3.38.5 [:memory:]
  clean_test mean_budget_2013
  <chr>                 <dbl>
1 dubious           59202525.
2 men               45896513.
3 notalk            57963237.
4 nowomen           51891839.
5 ok                44990138.

Note that we’re still connected to the database here. If we want to return a local tibble we need to use collect() which means our entire workflow is as follows:

bechdel_tbl <- memdb_frame(bechdel, .name = "bechdel")

bechdel_tbl %>% 
  remote_con() %>% 
  tbl(sql("SELECT `clean_test`, AVG(`budget_2013`) AS `mean_budget_2013`
FROM `bechdel`
WHERE (`year` < 2000.0)
GROUP BY `clean_test`")) %>% 
  collect()
# A tibble: 5 × 2
  clean_test mean_budget_2013
  <chr>                 <dbl>
1 dubious           59202525.
2 men               45896513.
3 notalk            57963237.
4 nowomen           51891839.
5 ok                44990138.

Reuse

Citation

BibTeX citation:
@online{hadley2022,
  author = {Charlie Hadley},
  title = {Writing Verbatim {SQL} in Pipe Chains Using \{Dbplyr\}},
  date = {2022-06-15},
  url = {https://visibledata.co.uk/posts/2022-06-15_verbatim-sql-in-dbplyr-pipe-chains},
  langid = {en}
}
For attribution, please cite this work as:
Charlie Hadley. 2022. “Writing Verbatim SQL in Pipe Chains Using {Dbplyr}.” June 15, 2022. https://visibledata.co.uk/posts/2022-06-15_verbatim-sql-in-dbplyr-pipe-chains.