Skip to content

Latest commit

 

History

History
213 lines (173 loc) · 7.23 KB

README.md

File metadata and controls

213 lines (173 loc) · 7.23 KB

squr-hex

squr [/'skju:əɹ/]: Structured Query Utility for R

This is a substantially re-written fork of Stefan Milton Bache's squr package.

The squr (pronounced "skewer") package provides a set of tools for managing structured query language (SQL) files in R projects. It promotes the separation of logic and query scripts and eases the process of reading, parameterizing, and composing SQL queries.

The primary changes in this fork are: the removal of several features that are unnecessary for my own purposes (transactions, INSERT statements, ignore blocks, composing SQL snippets with +) although I may add them back at a later date. Additionally, I've made significant internal changes to handle the (specific to me) situation of needing to run queries with very long IN clauses with more than 1000 elements. This includes new parameter specification methods that use the helper functions IN() and LIKE(). Finally there is a minimal system for documenting the associated SQL files.

Who is squr for?

squr uses DBI::sqlInterpolate and DBI::dbQuoteString to parse values, but also allows for arbitrary string manipulation of SQL queries via sq_replace. Interpolating values is not as secure as using each database's parameterized query functionality.

Because of this, squr is aimed at users for whom SQL injection attacks are a fairly low concern. This may include situations like,

  • you interact with a local database on your machine with limited or no internet connectivity
  • SQL parameter input values are always generated by you, not unknown remote user input
  • you write code against a database for which you only have SELECT privileges
  • the code you write with squr is not being run by anyone other than you (e.g. not in web apps like Shiny)

The farther away from these sorts of situations you are, the less likely it is that squr is a secure choice. However, whenever the values originate from user input (e.g. in a Shiny/web application, or web services, etc), approprite precautions should still be taken (type checking, whitelisting, etc.)

OWASP has some good guidelines on how to prevent SQL injection in your applications.

Example

In the following example the sql/query.sql file is read, parameterized, and sent. There is no database connectivivty in squr, this is left fully flexible and out of scope.

# Simple example of a query function wrapper. This part varies depending
# on database, drivers, etc, but needs only setup once.
rodbc <- function(query){
  ch <- RODBC::odbcDriverConnect("<connectionstring>")
  on.exit(RODBC::odbcClose(ch))
  RODBC::sqlQuery(ch, query)
}

# Example of reading file, setting a parameter value, and sending the query,
# using the `sq_*` ("skew") functions.
result <- 
  sq_file("sql/query") %>% 
  sq_set(Param = value) %>%
  sq_send(.with = rodbc)

The corresponding query.sql file could look something like:

SELECT *
  FROM TheTable
 WHERE Param = @Param

Note that many arguments in squr are prefixed with a dot; this is to avoid name clash with values in ... arguments.

Arguments are not actually bound to parameters in sq_set, they are merely recorded in the object. Parameters are bound just before being sent to the database in sq_send.

Note:

  • the .sql extension can be omitted,
  • when called from within a package, the path is relative to the installation folder.

There is also sq_text, which is a way to use inline SQL. Both sq_file and sq_text produces S3 objects which are lists with an additional class sq to enable a few methods (e.g. print).

Parameterizing IN Clauses

Suppose we have the following SQL:

SELECT *
  FROM table
 WHERE column IN @column

and we'd like to be able to run this query with various collections of values inserted for @column. In squr we can do the following:

result <- sq_text(.query = "select * from table where column in @column") %>%
  sq_set(column = IN(1:10)) %>%
  sq_send(.with = rodbc)

Wrapping the vector of values to be bound with IN() ensures that they are properly formatted and wrapped in parens. This will work for only a single value, converting IN(1) into column IN (1). Also, if the vector of values passed is longer than 1000 squr will split it into chunks of size at most 1000 and then send multiple queries for each chunk, rbinding the results.

Splitting IN clauses like this only works for one IN clause per SQL query.

If our query looked like this:

SELECT *
  FROM table
 WHERE column LIKE @column

we can pass in values for @column like this:

result <- sq_text(.query = "select * from table where column like @column") %>%
  sq_set(column = LIKE("foo",side = "l",wildcard = "%")) %>%
  sq_send(.with = rodbc)

The resulting bound query would look like this:

SELECT *
  FROM table
 WHERE column LIKE '%foo'

Replacing SQL Text Inline

There is also a function sq_replace that is a wrapper for gsub that allows you to edit SQL text, but only prior to setting parameter values with sq_set.

Dynamic Table and Column Names

Since values are appropriately quoted when they are bound, the default replacements will not work for dynamically specifying column and table names. However, you can use sq_value explicitly (this is the function used internally to prepare a value for SQL):

-- The SQL file
SELECT [Date]
     , [CustomerId]
     , [CustomerName]
     , @Feature
  FROM Customers
 WHERE Date BETWEEN @DateFrom AND @DateTo
# R
result <- 
  sq_file("customers") %>% 
  sq_set(DateFrom = Sys.Date() - 10, 
         DateTo   = Sys.Date(), 
         Feature  = sq_value("Turnover", quote = "[")) %>% 
  sq_send(.with = rodbc)

Documenting SQL Queries

squr allows for limited documentation within your .sql files, using a very simple syntax along the lines of roxygen2. All commented out lines at the beginning of a .sql file are assumed to be documentation. An example illustrating the format is:

-- Test Query
--
-- Pulls `column1` and `column1` from `my_table` filtering on a particular value for
-- `condition`.
--
-- @param foo a value for condition, e.g. "bar"
-- @param bar a value for another condition, this time with a
-- longer description across multiple lines.
-- @functions my_fun, your_fun, lots_of_fun,
-- more_fun, final_fun
-- @scripts script1.R, script2.R
select
  column1,
  column2
from
  my_table
where
  condition = @foo
  another_condition = @bar

The first line, Test Query is the query title. The next section forms the description, with a blank commented line above and below.

Only three tags are recognized:

  • @param a single word denoting the name of the paramter, a space and then a description of the parameter
  • @functions a comma separated list of functions that the SQL query is used in, e.g. for when squr is used within another package
  • @scripts a comma separated list of .R scripts that the SQL query is used in, e.g. for when squr is used in an RStudio project.

Other tags are ignored.

See also

A similar (but different) project for Clojure (with ports for some other languages) by @krisajenkins is Yesql.