diff --git a/.Rbuildignore b/.Rbuildignore
index ac1b2b1..c26ed3b 100644
--- a/.Rbuildignore
+++ b/.Rbuildignore
@@ -3,4 +3,5 @@
^\.httr-oauth$
^\.travis\.yml$
^CONTRIBUTING\.md$
-^cran-comments\.md$
\ No newline at end of file
+^cran-comments\.md$
+^revdep
\ No newline at end of file
diff --git a/DESCRIPTION b/DESCRIPTION
index dbb278b..d2266da 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,14 +1,14 @@
Package: bigQueryR
Title: Interface with Google BigQuery with Shiny Compatibility
-Version: 0.1.0.9000
+Version: 0.2.0
Authors@R: c(person("Mark", "Edmondson",email = "r@sunholo.com",
role = c("aut", "cre")),
person("Hadley", "Wickham", , "hadley@rstudio.com", role = "ctb")
)
-Description: Interface with Google BigQuery,
+Description: Interface with 'Google BigQuery',
see
This is a package for interating with BigQuery from within R.
+You may want instead to use bigrquery which is more developed with integration with dplyr
etc. Some functions from bigrquery
are used in this package.
This package is here as it uses googleAuthR as backend, so has Shiny support, and compatibility with other googleAuthR dependent packages.
+It also has support for data extracts to Google Cloud Storage, meaning you can download data and make the download URL available to a user via their Google email. If you do a query normally with over 100000 results it hangs and errors.
+An example of a BigQuery Shiny app running OAuth2 is here, the BigQuery Visualiser
+Authentication is as used in other googleAuthR
libraries:
library(bigQueryR)
+
+## this will open your browser
+## Authenticate with an email that has access to the BigQuery project you need
+bqr_auth()
+
+## verify under a new user
+bqr_auth(new_user=TRUE)
If you are authenticating under several APIs via googleAuthR
then use gar_auth()
instead with the appropriate scopes set.
You can also use service-to-service JSON files and multi-user authentication under Shiny, see the googleAuthR
readme for details.
Various functions for listing what is in your BigQuery account.
+library(bigQueryR)
+
+## this will open your browser
+## Authenticate with an email that has access to the BigQuery project you need
+bqr_auth()
+
+## verify under a new user
+bqr_auth(new_user=TRUE)
+
+## get projects
+projects <- bqr_list_projects()
+
+my_project <- projects[1]
+
+## for first project, get datasets
+datasets <- bqr_list_datasets[my_project]
+
+my_dataset <- datasets[1]
+## list tables
+my_table <- bqr_list_tables(my_project, my_dataset)
+
+## get metadata for table
+meta_table <- bqr_table_meta(my_project, my_dataset, my_table)
You can pass in queries that have results under ~ 100000 rows using this command:
+bqr_query("big-query-r","samples",
+ "SELECT COUNT(repository.url) FROM [publicdata:samples.github_nested]")
More than that, and the API starts to hang and you are limited by your download bandwidth.
+For bigger queries, asynchronous queries save the results to another BigQuery table. You can check the progress of the job via bqr_get_job
library(bigQueryR)
+
+## Auth with a project that has at least BigQuery and Google Cloud Storage scope
+bqr_auth()
+
+## make a big query
+job <- bqr_query_asynch("your_project",
+ "your_dataset",
+ "SELECT * FROM blah LIMIT 9999999",
+ destinationTableId = "bigResultTable")
+
+## poll the job to check its status
+## its done when job$status$state == "DONE"
+bqr_get_job("your_project", job$jobReference$jobId)
+
+##once done, the query results are in "bigResultTable"
You may now want to download this data. For large datasets, this is best done via extracting the BigQuery result to Google Cloud Storage, then downloading the data from there.
+You can create a bucket at Google Cloud Storage see https://cloud.google.com/storage/docs/cloud-console, or you can use library(googleCloudStorageR)
+Once created, you can extract your data via the below:
+## Create the data extract from BigQuery to Cloud Storage
+job_extract <- bqr_extract_data("your_project",
+ "your_dataset",
+ "bigResultTable",
+ "your_cloud_storage_bucket_name")
+
+## poll the extract job to check its status
+## its done when job$status$state == "DONE"
+bqr_get_job("your_project", job_extract$jobReference$jobId)
+
+## to download via a URL and not logging in via Google Cloud Storage interface:
+## Use an email that is Google account enabled
+## Requires scopes:
+## https://www.googleapis.com/auth/devstorage.full_control
+## https://www.googleapis.com/auth/cloud-platform
+## set via options("bigQueryR.scopes") and reauthenticate if needed
+
+download_url <- bqr_grant_extract_access(job_extract, "your@email.com")
+
+## download_url may be multiple if the data is > 1GB
+> [1] "https://storage.cloud.google.com/big-query-r-extracts/extract-20160311112410-000000000000.csv"
+> [2] "https://storage.cloud.google.com/big-query-r-extracts/extract-20160311112410-000000000001.csv"
+> [3] "https://storage.cloud.google.com/big-query-r-extracts/extract-20160311112410-000000000002.csv"