This directory contains community contributed user-defined functions
to extend BigQuery for more specialized usage patterns. Each UDF within this
directory will be automatically synchronized to the bqutil
project within the
fn
dataset for reference in queries.
For example, if you'd like to reference the int
function within your query,
you can reference it like the following:
SELECT bqutil.fn.int(1.684)
- azimuth_to_geog_point
- bignumber_add
- bignumber_avg
- bignumber_div
- bignumber_mul
- bignumber_sub
- bignumber_sum
- chisquare_cdf
- corr_pvalue
- csv_to_struct
- cw_instr4
- cw_initcap
- cw_otranslate
- cw_stringify_interval
- cw_regex_mode
- cw_regexp_substr_4
- cw_regexp_substr_generic
- cw_regexp_substr_5
- cw_regexp_substr_6
- cw_map_create
- cw_map_get
- cw_regexp_instr_2
- cw_regexp_instr_3
- cw_regexp_instr_4
- cw_regexp_instr_generic
- cw_regexp_instr_6
- cw_regexp_replace_generic
- cw_regexp_replace_4
- cw_regexp_replace_5
- cw_regexp_replace_6
- cw_regexp_instr_5
- cw_array_min
- cw_array_median
- cw_array_max
- cw_array_distinct
- cw_next_day
- cw_td_nvp
- cw_convert_base
- cw_from_base
- cw_to_base
- cw_array_overlap
- cw_array_compact
- cw_runtime_parse_interval_seconds
- cw_url_encode
- cw_url_decode
- cw_url_extract_host
- cw_url_extract_protocol
- cw_url_extract_path
- cw_url_extract_port
- cw_url_extract_authority
- cw_url_extract_query
- cw_url_extract_file
- cw_url_extract_fragment
- cw_url_extract_parameter
- cw_regexp_extract
- cw_regexp_extract_n
- cw_regexp_extract_all
- cw_regexp_extract_all_n
- cw_json_array_contains_str
- cw_json_array_contains_num
- cw_json_array_contains_bool
- cw_json_array_get
- cw_json_array_length
- cw_substring_index
- cw_editdistance
- cw_round_half_even
- cw_round_half_even_bignumeric
- cw_getbit
- cw_setbit
- cw_lower_case_ascii_only
- cw_substrb
- cw_twograms
- cw_threegrams
- cw_nvp2json1
- cw_nvp2json3
- cw_nvp2json4
- cw_strtok
- cw_regexp_split
- cw_csvld
- cw_json_enumerate_array
- cw_ts_pattern_match
- cw_error_number
- cw_error_severity
- cw_error_state
- cw_find_in_list
- cw_map_parse
- cw_comparable_format_varchar_t
- cw_comparable_format_varchar
- cw_comparable_format_bigint_t
- cw_comparable_format_bigint
- cw_months_between
- cw_period_intersection
- cw_period_ldiff
- cw_period_rdiff
- cw_ts_overlap_buckets
- day_occurrence_of_month
- degrees
- find_in_set
- freq_table
- from_binary
- from_hex
- get_array_value
- getbit
- get_value
- int
- jaccard
- json_extract_keys
- json_extract_values
- json_typeof
- knots_to_mph
- kruskal_wallis
- last_day
- levenshtein
- linear_interpolate
- linear_regression
- mannwhitneyu
- median
- meter_to_miles
- miles_to_meters
- mph_to_knots
- nautical_miles_conversion
- nlp_compromise_number
- nlp_compromise_people
- normal_cdf
- percentage_change
- percentage_difference
- pi
- pvalue
- p_fisherexact
- radians
- random_int
- random_string
- random_value
- sure_cond
- sure_like
- sure_nonnull
- sure_range
- sure_values
- to_binary
- to_hex
- translate
- ts_gen_keyed_timestamps
- ts_linear_interpolate
- ts_session_group
- ts_slide
- ts_tumble
- t_test
- typeof
- url_decode
- url_encode
- url_keys
- url_param
- url_parse
- url_trim_query
- week_of_month
- y4md_to_date
- zeronorm
Takes an input latitude, longitude, azimuth, and distance (in miles) and returns the corresponding latitude and longitude as a BigQuery GEOGRAPHY point.
SELECT bqutil.fn.azimuth_to_geog_point(30.2672, 97.7431, 312.9, 1066.6);
POINT(81.4417483906444 39.9606210457152)
Safely allows mathematical addition on numbers of any magnitude. Returns the result as a string.
SELECT bqutil.fn.bignumber_add(
'99999999999999999999999999999999999999999999999999999999999999999999', '2348592348793428978934278932746531725371625376152367153761536715376')
"102348592348793428978934278932746531725371625376152367153761536715375"
Safely allows calculating the average of numbers of any magnitude. Returns the result as a string.
SELECT bqutil.fn.bignumber_avg(
'99999999999999999999999999999999999999999999999999999999999999999999', '33333333333333333333333333333333333333333333333333333333333333333333', '66666666666666666666666666666666666666666666666666666666666666666666')
"66666666666666666666666666666666666666666666666666666666666666666666"
Safely allows mathematical division on numbers of any magnitude. Returns the result as a string.
SELECT bqutil.fn.bignumber_div(
'99999999999999999999999999999999999999999999999999999999999999999999', '33333333333333333333333333333333333333333333333333333333333333333333')
"3"
Safely allows mathematical multiplication on numbers of any magnitude. Returns the result as a string.
SELECT bqutil.fn.bignumber_mul(
'99999999999999999999999999999999999999999999999999999999999999999999', '893427328732842662772591830391462182598436547786876876876')
"89342732873284266277259183039146218259843654778687687687599999999999106572671267157337227408169608537817401563452213123123124"
Safely allows mathematical subtraction on numbers of any magnitude. Returns the result as a string.
SELECT bqutil.fn.bignumber_sub(
'99999999999999999999999999999999999999999999999999999999999999999999', '893427328732842662772591830391462182598436547786876876876')
"99999999999106572671267157337227408169608537817401563452213123123123"
Safely allows calculating the total sum of numbers of any magnitude. Returns the result as a string.
SELECT bqutil.fn.bignumber_sum(
'99999999999999999999999999999999999999999999999999999999999999999999', '893427328732842662772591830391462182598436547786876876876', '123456789123456789123456789123456789123456789123456789123456789123456789')
"123556789123457682550785521966119561715287180585639387560004576000333664"
Take a list of comma separated key-value pairs and creates a struct. Input: strList: string that has map in the format a:b,c:d.... Output: struct for the above map.
WITH test_cases AS (
SELECT NULL as s
UNION ALL
SELECT '' as s
UNION ALL
SELECT ',' as s
UNION ALL
SELECT ':' as s
UNION ALL
SELECT 'a:b' as s
UNION ALL
SELECT 'a:b,c:d' as s
UNION ALL
SELECT 'a:b' as s
)
SELECT key, value from test_cases as t, UNNEST(bqutil.fn.csv_to_struct(t.s)) s;
results:
key | value |
---|---|
a | b |
a | b |
c | d |
a | b |
Takes an input source string, search string within source, position and number of occurrence. It returns index number of last occurrence staring position from position in source.
SELECT bqutil.fn.cw_instr4('TestStr123456Str', 'Str', 1, 2);
14
Takes an input string and returns input string with first letter capital.
SELECT bqutil.fn.cw_initcap('teststr');
SELECT bqutil.fn.cw_initcap('test str');
Teststr
Test Str
Takes input source string with key and value. It returns source string with replacement of key with value.
SELECT bqutil.fn.cw_otranslate('Thin and Thick', 'Thk', 'Sp');
Spin and Spic
Formats the interval as 'day hour:minute:second
SELECT bqutil.fn.cw_stringify_interval(86100);
+0000 23:55:00
Retrieve mode.
SELECT bqutil.fn.cw_regex_mode('i');
SELECT bqutil.fn.cw_regex_mode('m');
SELECT bqutil.fn.cw_regex_mode('n);
ig
mg
sg
Takes input haystack string, needle string, position and occurence. It returns needle from the starting position if present with number of occurence time in haystack.
SELECT bqutil.fn.cw_regexp_substr_4('TestStr123456', 'Test', 1, 1);
SELECT bqutil.fn.cw_regexp_substr_4('TestStr123456Test', 'Test', 1, 2);
SELECT bqutil.fn.cw_regexp_substr_4('TestStr123456Test', 'Test', 1, 3);
Test
Test
null
Generic regex based substring function.
SELECT bqutil.fn.cw_regexp_substr_generic('TestStr123456', 'Test', 1, 1, 'g', 0);
Test
Takes input haystack string, needle string, position, occurence and mode. It returns needle from the starting position if present with number of occurence time in haystack. Mode can be g for global search, i for case insensetive search and m for multiline search.
SELECT bqutil.fn.cw_regexp_substr_5('TestStr123456', 'Test', 1, 1, 'g');
SELECT bqutil.fn.cw_regexp_substr_5('TestStr123456Test', 'test', 1, 2, 'i');
SELECT bqutil.fn.cw_regexp_substr_5('TestStr123456\nTest', 'Test', 1, 2, 'm');
Test
Test
Test
Takes input haystack string, needle string, position, occurence and mode. It returns needle from the starting position if present with number of occurence time in haystack. Mode can be g for global search, i for case insensetive search and m for multiline search.
SELECT bqutil.fn.cw_regexp_substr_6('TestStr123456', 'Test', 1, 1, 'g', 0);
Test
Given an array of keys and values, creates an array of struct containing matched <key,value> from each array. Number of elements in each array should be equal otherwise remaining values will be ignored.
SELECT bqutil.fn.cw_map_create([1, 2, 3], ['A', 'B', 'C']);
results:
Row | f0_.key | f0_.value |
---|---|---|
1 | 1 | A |
2 | B | |
3 | C |
Given an array of struct and needle, searches an array to find struct whose key-field matches needle, then it returns the value-field in the given struct.
SELECT bqutil.fn.cw_map_get([STRUCT(1 as key, "ABC" as value)], 1);
ABC
Takes input haystack string with needle string. Returns starting index of needle.
SELECT bqutil.fn.cw_regexp_instr_2('TestStr123456', 'Str');
SELECT bqutil.fn.cw_regexp_instr_2('TestStr123456', '90');
5
0
Takes input haystack string, needle string and starting positin from where search will start. Returns starting index of needle.
SELECT bqutil.fn.cw_regexp_instr_3('TestStr123456', 'Str', 0);
SELECT bqutil.fn.cw_regexp_instr_3('TestStr123456', 'Str', 6);
5
0
Takes input haystack string, needle string, starting positin from where search will start and number of occurance. Returns starting index of last needle.
SELECT bqutil.fn.cw_regexp_instr_4('TestStr123456', 'Str', 1, 1);
SELECT bqutil.fn.cw_regexp_instr_4('TestStr123456Str', 'Str', 1, 2);
SELECT bqutil.fn.cw_regexp_instr_4('TestStr123456Str', 'Str', 1, 3);
5
14
0
cw_regexp_instr_generic(haystack STRING, regexp STRING, p INT64, o INT64, returnopt INT64, mode STRING)
Takes input haystack string, needle string, starting positin from where search will start, number of occurance, returnopt number and mode. Returns end index +1 of last needle. Mode can be g for global search, i for case insensetive search and m for multiline search.
SELECT bqutil.fn.cw_regexp_instr_generic('TestStr123456', 'Str', 1, 1, 1, 'g');
8
Takes input haystack string, needle string, starting positin from where search will start, number of occurance, returnopt number and mode. Returns end index +1 of last needle. Mode can be g for global search, i for case insensetive search and m for multiline search.
SELECT bqutil.fn.cw_regexp_instr_6('TestStr123456', 'Str', 1, 1, 1, 'g');
8
cw_regexp_replace_generic(haystack STRING, regexp STRING, replacement STRING, offset INT64, occurrence INT64, mode STRING)
Generic regexp_replace, which is the 6-args version with regexp_mode already decoded
SELECT bqutil.fn.cw_regexp_replace_generic('TestStr123456', 'Str', '$:#>', 1, 1, 'i');
Test$:#>123456
Takes input haystack string, regular expression, replacement string and starting offset. It returns new string with replacement string matches accordingly regular expression.
SELECT bqutil.fn.cw_regexp_replace_4('TestStr123456', 'Str', 'Cad$', 1);
TestCad$123456
cw_regexp_replace_5(haystack STRING, regexp STRING, replacement STRING, offset INT64, occurrence INT64)
Takes input haystack string, regular expression, replacement string, starting offset and number of occurence which we want to replace. It returns new string with replacement string matches accordingly regular expression.
SELECT bqutil.fn.cw_regexp_replace_5('TestStr123456', 'Str', 'Cad$', 1, 1);
SELECT bqutil.fn.cw_regexp_replace_5('TestStr123456Str', 'Str', 'Cad$', 1, 2);
SELECT bqutil.fn.cw_regexp_replace_5('TestStr123456Str', 'Str', 'Cad$', 1, 1);
TestCad$123456
TestStr123456Cad$
TestCad$123456Str
cw_regexp_replace_6(haystack STRING, regexp STRING, replacement STRING, p INT64, o INT64, mode STRING)
Takes input haystack string, regular expression, replacement string, starting offset, number of occurence which we want to replace and mode. It returns new string with replacement string matches accordingly regular expression. Mode can be g for global search, i for case insensetive search and m for multiline search.
SELECT bqutil.fn.cw_regexp_replace_6('TestStr123456', 'Str', '$:#>', 1, 1, 'i');
Test$:#>123456
Takes input haystack string, needle string, starting positin from where search will start, number of occurance and returnopt number. Returns end index +1 of last needle. Mode can be g for global search, i for case insensetive search and m for multiline search.
SELECT bqutil.fn.cw_regexp_instr_5('TestStr123456', '123', 1, 1, 1);
11
Returns minimum of array.
SELECT bqutil.fn.cw_array_min([1, 2, 3, 4, 5]);
1
Returns median of array.
SELECT bqutil.fn.cw_array_median([1, 2, 3, 4, 5, 6]);
3.5
Returns maximum of array.
SELECT bqutil.fn.cw_array_max([1, 2, 3, 4, 5, 6]);
6
Returns distinct array.
SELECT bqutil.fn.cw_array_distinct([1, 2, 3, 4, 4, 5, 5]);
[1, 2, 3, 4, 5]
Returns the date of the first weekday (second arugment) that is later than the date specified by the first argument.
SELECT bqutil.fn.cw_next_day('2022-09-21', 'we');
2022-09-28
Extract a value from a key-value separated string
SELECT bqutil.fn.cw_td_nvp('entree:orange chicken#entree2:honey salmon', 'entree', '#', ':', 1);
orange chicken
Convert string from given base to another base
SELECT bqutil.fn.cw_convert_base('001101011', 2, 10);
SELECT bqutil.fn.cw_convert_base('A', 16, 2);
107
1010
Convert string from given base to decimal
SELECT bqutil.fn.cw_from_base('001101011', 2);
SELECT bqutil.fn.cw_from_base('A', 16);
107
10
Convert string from decimal to given base
SELECT bqutil.fn.cw_to_base(5, 2);
SELECT bqutil.fn.cw_to_base(10, 16);
101
a
Returns true if arrays are overlapped otherwise false.
SELECT bqutil.fn.cw_array_overlap([1, 2, 3], [4, 5, 6]);
SELECT bqutil.fn.cw_array_overlap([1, 2, 3], [2, 3, 4]);
false
true
Returns a compacted array with null values removed
SELECT bqutil.fn.cw_array_compact([1, 2, 3, null, 5]);
[1, 2, 3, 5]
Kludge for interval translation - for now day->sec only
SELECT bqutil.fn.cw_runtime_parse_interval_seconds(1 DAY);
86400
URL encode a string
SELECT bqutil.fn.cw_url_encode("?");
SELECT bqutil.fn.cw_url_encode("/");
%3F
%2F
URL decode a string
SELECT bqutil.fn.cw_url_decode("%3F");
SELECT bqutil.fn.cw_url_decode("%2F");
?
/
Extract the host from a url, return "" (empty string) if no host is found.
SELECT bqutil.fn.cw_url_extract_host('https://google.com');
google.com
Extract the protocol from a url, return "" (empty string) if no protocol is found.
SELECT bqutil.fn.cw_url_extract_protocol('https://google.com/test?key=val');
https
Extract the path from a url, returns "" (empty string) if no path is found.
SELECT bqutil.fn.cw_url_extract_path('https://www.test.com/collections-in-java#collectionmethods');
/collections-in-java
Extract the port from a url, returns null if no port is found.
SELECT bqutil.fn.cw_url_extract_port('https://localhost:8080/test?key=val');
8080
Extract the authority from a url, returns "" (empty string) if no authority is found.
SELECT bqutil.fn.cw_url_extract_authority('https://localhost:8080/test?key=val');
localhost:8080
Extract the query from a url, returns "" (empty string) if no query is found.
SELECT bqutil.fn.cw_url_extract_query('https://localhost:8080/test?key=val');
key=val
Extract the file from a url, returns "" (empty string) string if no file is found.
SELECT bqutil.fn.cw_url_extract_file('https://www.test.com/collections-in-java#collectionmethods');
/collections-in-java
Extract the fragment from a url, returns "" (empty string) if no fragment is found.
SELECT bqutil.fn.cw_url_extract_fragment('https://www.test.com/collections-in-java#collectionmethods');
collectionmethods
Extract the value of a query param from a url, returns null if the parameter isn't found.
SELECT bqutil.fn.cw_url_extract_parameter('https://www.test.com/collections-in-java&key=val#collectionmethods', 'key');
val
Returns the first substring matched by the regular expression regexp in str.
SELECT bqutil.fn.cw_regexp_extract('TestStr123456#?%&', 'Str');
Str
Finds the first occurrence of the regular expression regexp in str and returns the capturing group number groupn.
SELECT bqutil.fn.cw_regexp_extract_n('TestStr123456', 'Str', 0);
Str
Returns the substring(s) matched by the regular expression regexp in str.
SELECT bqutil.fn.cw_regexp_extract_all('TestStr123456', 'Str.*');
[Str123456]
Finds all occurrences of the regular expression regexp in str and returns the capturing group number groupn.
SELECT bqutil.fn.cw_regexp_extract_all_n('TestStr123456Str789', 'Str.*', 0);
Str123456Str789
Determine if value exists in json (a string containing a JSON array).
SELECT bqutil.fn.cw_json_array_contains_str('["name", "test", "valid"]', 'test');
true
Same as cw_json_array_contains_str(STRING, STRING) UDF but with needle = number.
SELECT bqutil.fn.cw_json_array_contains_num('[1, 2, 3, "valid"]', 1.0);
SELECT bqutil.fn.cw_json_array_contains_num('[1, 2, 3, "valid"]', 5.0);
true
false
Same as cw_json_array_contains_str(STRING, STRING) UDF but with needle = boolean
SELECT bqutil.fn.cw_json_array_contains_bool('[1, 2, 3, "valid", true]', true);
SELECT bqutil.fn.cw_json_array_contains_bool('[1, 2, 3, "valid", true]', false);
true
false
Returns the element at the specified index into the json_array. The index is zero-based
SELECT bqutil.fn.cw_json_array_get('[{"name": "test"}, {"name": "test1"}]', 1.0);
test1
Returns the array length of json (a string containing a JSON array)
SELECT bqutil.fn.cw_json_array_length('[{"name": "test"}, {"name": "test1"}]');
2
Takes input string, seperater string and index number. It returns index element.
SELECT bqutil.fn.cw_substring_index('TestStr123456,Test123', ',', 1);
TestStr123456
Similar to teradata's editdistance without weightages
SELECT bqutil.fn.cw_editdistance('Jim D. Swain', 'Jim D. Swain');
SELECT bqutil.fn.cw_editdistance('Jim D. Swain', 'John Smith');
0
9
Round half even number
SELECT bqutil.fn.cw_round_half_even(10, 10);
10
Round half even bignumeric number
SELECT bqutil.fn.cw_round_half_even_bignumeric(10, 10);
10
Get bit on given inex.
SELECT bqutil.fn.cw_getbit(11, 100);
SELECT bqutil.fn.cw_getbit(11, 3);
0
1
Set bit and return new bits
SELECT bqutil.fn.cw_setbit(1001, 2);
1005
Lowercases only ASCII characters within a given string.
SELECT bqutil.fn.cw_lower_case_ascii_only('TestStr123456#');
teststr123456#
Treats the multibyte character string as a string of octets (bytes).
SELECT bqutil.fn.cw_substrb('TestStr123', 0, 3);
Te
Takes input string with space. Space delimiter words will repeat two times and generate array.
SELECT bqutil.fn.cw_twograms('Test Str 123456 789');
["Test Str", "Str 123456", "123456 789"]
Takes input string with space. Space delimiter words will repeat three times and generate array.
SELECT bqutil.fn.cw_threegrams('Test 1234 str abc');
["Test 1234 str", "1234 str abc"]
Convert an input string of name-value pairs to a JSON object.
SELECT bqutil.fn.cw_nvp2json1('name=google&occupation=engineer&hair=color');
{"name":"google","occupation":"engineer","hair":"color"}
Convert an input string of name-value pairs to a JSON object. name_delim is delimiter for keys. val_delim is delimiter for key-value.
SELECT bqutil.fn.cw_nvp2json3('name=google&occupation=engineer&hair=color', '&', '=');
{"name":"google","occupation":"engineer","hair":"color"}
Convert an input string of name-value pairs to a JSON object. name_delim is delimiter for keys. val_delim is delimiter for key-value. ignore_char is to ignore and removed from output json.
SELECT bqutil.fn.cw_nvp2json4('name=google#1&occupation=engineer#2&hair=color#3', '&', '=', '#');
{"name":"google1","occupation":"engineer2","hair":"color3"}
Takes input string and delimiter. It generates pair from string tokenizer.
SELECT bqutil.fn.cw_strtok('Test#1', '#');
([STRUCT(CAST(1 AS INT64) AS tokennumber, "Test" AS token),
STRUCT(CAST(2 AS INT64) AS tokennumber, "1" AS token)])
Takes input string, delimiter and flags. It generates pair from string tokenizer. Flags works like Regex mode of javascript.
SELECT bqutil.fn.cw_regexp_split('Test#1', '#', 'i');
([STRUCT(CAST(1 AS INT64) AS tokennumber, "Test" AS token),
STRUCT(CAST(2 AS INT64) AS tokennumber, "1" AS token)])
Generates CSV array.
SELECT bqutil.fn.cw_csvld('Test#123', '#', '"', 2);
["Test", "123"]
Takes input JSON array and flatten it.
SELECT bqutil.fn.cw_json_enumerate_array('[{"name":"Cameron"}, {"name":"John"}]');
results:
Row | f0_.ordinal | f0_.jsonvalue |
---|---|---|
1 | 1 | {"name":"Cameron"} |
2 | {"name":"John"} |
ts_pattern_match is function that returns range of matched pattern in given UID, SID (user session)
SELECT bqutil.fn.cw_ts_pattern_match(['abc', 'abc'], ['abc']);
results:
Row | f0_.pattern_id | f0_.start | f0_.stop |
---|---|---|---|
1 | 1 | 1 | 1 |
2 | 2 | 2 |
Convert BQ generated error string to a number appropriate for other DBs
SELECT bqutil.fn.cw_error_number('Error Message');
1
Convert BQ generated error string to a number appropriate for other DBs
SELECT bqutil.fn.cw_error_severity('Error Message');
1
Convert BQ generated error string to a number appropriate for other DBs
SELECT bqutil.fn.cw_error_state('Error Message');
1
Find index of element in set.
SELECT bqutil.fn.cw_find_in_list("1", "[Test,1,2]");
2
String to map convert.
SELECT bqutil.fn.cw_map_parse("a=1 b=42", " ", "=");
([STRUCT("a" AS key, "1" AS value),
STRUCT("b" AS key, "42" AS value)])
Use hex to work around the separator problem (e.g. if separator = '-' then ['-', ''] and ['', '-'] both produce '--')
SELECT bqutil.fn.cw_comparable_format_varchar_t("2");
32
Use hex to work around the separator problem (e.g. if separator = '-' then ['-', ''] and ['', '-'] both produce '--')
SELECT bqutil.fn.cw_comparable_format_varchar(["2", "8"]);
32 38
Lexicographically '+' comes before '-' so we replace p(lus) and m(inus) and subtract LONG_MIN on negative values
SELECT bqutil.fn.cw_comparable_format_bigint_t(2);
p 2
Lexicographically '+' comes before '-' so we replace p(lus) and m(inus) and subtract LONG_MIN on negative values
SELECT bqutil.fn.cw_comparable_format_bigint([2, 8]);
p 2 p 8
Merges two periods together if they overlap and returns unique id for each merged bucket. Coalesces meeting periods as well (not just overlapping periods) if includeMeets is true.
SELECT bqutil.fn.cw_ts_overlap_buckets(false, [STRUCT(TIMESTAMP("2008-12-25"), TIMESTAMP("2008-12-31")), STRUCT(TIMESTAMP("2008-12-26"), TIMESTAMP("2008-12-30"))]);
results:
Row | f0_.bucketNo | f0_.st | f0_.et |
---|---|---|---|
1 | 1 | 2008-12-25 00:00:00 UTC | 2008-12-31 00:00:00 UTC |
Similar to Teradata and Netezza's months_between function
SELECT bqutil.fn.months_between(DATETIME '2005-03-01 10:34:56', DATETIME '2005-02-28 11:22:33');
0.12795698924731182795698924731182795699
cw_period_intersection(p1 STRUCT<lower TIMESTAMP, upper TIMESTAMP>, p2 STRUCT<lower TIMESTAMP, upper TIMESTAMP>)
SELECT bqutil.fn.cw_period_intersection(
STRUCT(TIMESTAMP '2001-11-12 00:00:00' AS lower, TIMESTAMP '2001-11-14 00:00:00' AS upper),
STRUCT(TIMESTAMP '2001-11-13 00:00:00' AS lower, TIMESTAMP '2001-11-15 00:00:00' AS upper))
STRUCT(TIMESTAMP '2001-11-13 00:00:00' AS lower, TIMESTAMP '2001-11-14 00:00:00' AS upper)
cw_period_ldiff(p1 STRUCT<lower TIMESTAMP, upper TIMESTAMP>, p2 STRUCT<lower TIMESTAMP, upper TIMESTAMP>)
SELECT bqutil.fn.cw_period_ldiff(
STRUCT(TIMESTAMP '2001-11-12 00:00:00' AS lower, TIMESTAMP '2001-11-14 00:00:00' AS upper),
STRUCT(TIMESTAMP '2001-11-13 00:00:00' AS lower, TIMESTAMP '2001-11-15 00:00:00' AS upper))
STRUCT(TIMESTAMP '2001-11-12 00:00:00' AS lower, TIMESTAMP '2001-11-13 00:00:00' AS upper)
cw_period_rdiff(p1 STRUCT<lower TIMESTAMP, upper TIMESTAMP>, p2 STRUCT<lower TIMESTAMP, upper TIMESTAMP>)
SELECT bqutil.fn.cw_period_rdiff(
STRUCT(TIMESTAMP '2001-11-13 00:00:00' AS lower, TIMESTAMP '2001-11-15 00:00:00' AS upper),
STRUCT(TIMESTAMP '2001-11-12 00:00:00' AS lower, TIMESTAMP '2001-11-14 00:00:00' AS upper))
STRUCT(TIMESTAMP '2001-11-14 00:00:00' AS lower, TIMESTAMP '2001-11-15 00:00:00' AS upper)
Returns the nth occurrence of the weekday in the month for the specified date. The result is an INTEGER value between 1 and 5.
SELECT
bqutil.fn.day_occurrence_of_month(DATE '2020-07-01'),
bqutil.fn.day_occurrence_of_month(DATE '2020-07-08');
1 2
Convert radians values into degrees.
SELECT bqutil.fn.degrees(3.141592653589793) is_this_pi
180.0
Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas. For example, find_in_set('ab', 'abc,b,ab,c,def') returns 3. Input: str: string to search for. strList: string in which to search. Output: Position of str in strList
WITH test_cases AS (
SELECT 'ab' as str, 'abc,b,ab,c,def' as strList
UNION ALL
SELECT 'ab' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
UNION ALL
SELECT 'mobile' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
UNION ALL
SELECT 'mobile,' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
)
SELECT bqutil.fn.find_in_set(str, strList) from test_cases
results:
f0_ |
---|
3 |
NULL |
1 |
0 |
Construct a frequency table (histogram) of an array of elements. Frequency table is represented as an array of STRUCT(value, freq)
SELECT bqutil.fn.freq_table([1,2,1,3,1,5,1000,5]) ft
results:
Row | ft.value | ft.freq |
---|---|---|
1 | 1 | 3 |
2 | 1 | |
3 | 1 | |
5 | 2 | |
1000 | 1 |
Returns a number in decimal form from its binary representation.
SELECT
bqutil.fn.to_binary(x) AS binary,
bqutil.fn.from_binary(bqutil.fn.to_binary(x)) AS x
FROM
UNNEST([1, 123456, 9876543210, -1001]) AS x;
results:
binary | x |
---|---|
0000000000000000000000000000000000000000000000000000000000000001 | 1 |
0000000000000000000000000000000000000000000000011110001001000000 | 123456 |
0000000000000000000000000000001001001100101100000001011011101010 | 9876543210 |
1111111111111111111111111111111111111111111111111111110000010111 | -1001 |
Returns a number in decimal form from its hexadecimal representation.
SELECT
bqutil.fn.to_hex(x) AS hex,
bqutil.fn.from_hex(bqutil.fn.to_hex(x)) AS x
FROM
UNNEST([1, 123456, 9876543210, -1001]) AS x;
results:
hex | x |
---|---|
0000000000000001 | 1 |
000000000001e240 | 123456 |
000000024cb016ea | 9876543210 |
fffffffffffffc17 | -1001 |
Given a key and a map, returns the ARRAY type value. This is same as get_value except it returns an ARRAY type. This can be used when the map has multiple values for a given key.
WITH test AS (
SELECT ARRAY(
SELECT STRUCT('a' AS key, 'aaa' AS value) AS s
UNION ALL
SELECT STRUCT('b' AS key, 'bbb' AS value) AS s
UNION ALL
SELECT STRUCT('a' AS key, 'AAA' AS value) AS s
UNION ALL
SELECT STRUCT('c' AS key, 'ccc' AS value) AS s
) AS a
)
SELECT bqutil.fn.get_array_value('b', a), bqutil.fn.get_array_value('a', a), bqutil.fn.get_array_value('c', a) from test;
results:
f0_ | f1_ | f2_ |
---|---|---|
["bbb"] | ["aaa","AAA"] | ["ccc"] |
Given an INTEGER value, returns the value of a bit at a specified position. The position of the bit starts from 0.
SELECT bqutil.fn.getbit(23, 2), bqutil.fn.getbit(23, 3), bqutil.fn.getbit(null, 1)
1 0 NULL
Given a key and a list of key-value maps in the form [{'key': 'a', 'value': 'aaa'}], returns the SCALAR type value.
WITH test AS (
SELECT ARRAY(
SELECT STRUCT('a' AS key, 'aaa' AS value) AS s
UNION ALL
SELECT STRUCT('b' AS key, 'bbb' AS value) AS s
UNION ALL
SELECT STRUCT('c' AS key, 'ccc' AS value) AS s
) AS a
)
SELECT bqutil.fn.get_value('b', a), bqutil.fn.get_value('a', a), bqutil.fn.get_value('c', a) from test;
results:
f0_ | f1_ | f2_ |
---|---|---|
bbb | aaa | ccc |
Convience wrapper which can be used to convert values to integers in place of
the native CAST(x AS INT64)
.
SELECT bqutil.fn.int(1) int1
, bqutil.fn.int(2.5) int2
, bqutil.fn.int('7') int3
, bqutil.fn.int('7.8') int4
1, 2, 7, 7
Note that CAST(x AS INT64) rounds the number, while this function truncates it. In many cases, that's the behavior users expect.
Accepts two string and returns the distance using Jaccard algorithm.
SELECT
bqutil.fn.jaccard('thanks', 'thaanks'),
bqutil.fn.jaccard('thanks', 'thanxs'),
bqutil.fn.jaccard('bad demo', 'abd demo'),
bqutil.fn.jaccard('edge case', 'no match'),
bqutil.fn.jaccard('Special. Character?', 'special character'),
bqutil.fn.jaccard('', ''),
1, 0.71, 1.0, 0.25, 0.67, 0.0
Returns all keys in the input JSON as an array of string Returns NULL if invalid JSON string is passed,
SELECT bqutil.fn.json_extract_keys(
'{"foo" : "cat", "bar": "dog", "hat": "rat"}'
) AS keys_array
foo
bar
hat
Returns all values in the input JSON as an array of string Returns NULL if invalid JSON string is passed,
SELECT bqutil.fn.json_extract_values(
'{"foo" : "cat", "bar": "dog", "hat": "rat"}'
) AS keys_array
cat
dog
rat
Returns the type of JSON value. It emulates json_typeof
of PostgreSQL.
SELECT
bqutil.fn.json_typeof('{"foo": "bar"}'),
bqutil.fn.json_typeof(TO_JSON_STRING(("foo", "bar"))),
bqutil.fn.json_typeof(TO_JSON_STRING([1,2,3])),
bqutil.fn.json_typeof(TO_JSON_STRING("test")),
bqutil.fn.json_typeof(TO_JSON_STRING(123)),
bqutil.fn.json_typeof(TO_JSON_STRING(TRUE)),
bqutil.fn.json_typeof(TO_JSON_STRING(FALSE)),
bqutil.fn.json_typeof(TO_JSON_STRING(NULL)),
object, array, string, number, boolean, boolean, null
Converts knots to miles per hour
SELECT bqutil.fn.knots_to_mph(37.7);
43.384406
Returns an integer number indicating the degree of similarity between two strings (0=identical, 1=single character difference, etc.)
SELECT
source,
target,
bqutil.fn.levenshtein(source, target) distance,
FROM UNNEST([
STRUCT('analyze' AS source, 'analyse' AS target),
STRUCT('opossum', 'possum'),
STRUCT('potatoe', 'potatoe'),
STRUCT('while', 'whilst'),
STRUCT('aluminum', 'alumininium'),
STRUCT('Connecticut', 'CT')
]);
Row | source | target | distance |
---|---|---|---|
1 | analyze | analyse | 1 |
2 | opossum | possum | 1 |
3 | potatoe | potatoe | 0 |
4 | while | whilst | 2 |
5 | aluminum | alumininium | 3 |
6 | Connecticut | CT | 10 |
This function is based on the Levenshtein distance algorithm which determines the minimum number of single-character edits (insertions, deletions or substitutions) required to change one source string into another target one.
Interpolate the current positions value from the preceding and folllowing coordinates
SELECT
bqutil.fn.linear_interpolate(2, STRUCT(0 AS x, 0.0 AS y), STRUCT(10 AS x, 10.0 AS y)),
bqutil.fn.linear_interpolate(2, STRUCT(0 AS x, 0.0 AS y), STRUCT(20 AS x, 10.0 AS y))
results:
f0_ | f1_ |
---|---|
2.0 | 1.0 |
Get the median of an array of numbers.
SELECT bqutil.fn.median([1,1,1,2,3,4,5,100,1000]) median_1
, bqutil.fn.median([1,2,3]) median_2
, bqutil.fn.median([1,2,3,4]) median_3
3.0, 2.0, 2.5
Converts meters to miles
SELECT bqutil.fn.meters_to_miles(5000.0);
3.1068559611866697
Converts miles to meters
SELECT bqutil.fn.miles_to_meters(2.73);
4393.50912
Converts miles per hour to knots
SELECT bqutil.fn.mph_to_knots(75.5);
65.607674794487224
Converts nautical miles to miles
SELECT bqutil.fn.nautical_miles_conversion(1.12);
1.2888736
Parse numbers from text.
SELECT bqutil.fn.nlp_compromise_number('one hundred fifty seven')
, bqutil.fn.nlp_compromise_number('three point 5')
, bqutil.fn.nlp_compromise_number('2 hundred')
, bqutil.fn.nlp_compromise_number('minus 8')
, bqutil.fn.nlp_compromise_number('5 million 3 hundred 25 point zero 1')
157, 3.5, 200, -8, 5000325.01
Extract names out of text.
SELECT bqutil.fn.nlp_compromise_people(
"hello, I'm Felipe Hoffa and I work with Elliott Brossard - who thinks Jordan Tigani will like this post?"
) names
["felipe hoffa", "elliott brossard", "jordan tigani"]
Calculate the percentage change (increase/decrease) between two numbers.
SELECT bqutil.fn.percentage_change(0.2, 0.4)
, bqutil.fn.percentage_change(5, 15)
, bqutil.fn.percentage_change(100, 50)
, bqutil.fn.percentage_change(-20, -45)
results:
f0_ | f1_ | f2_ | f3_ |
---|---|---|---|
1.0 | 2.0 | -0.5 | -1.125 |
Calculate the percentage difference between two numbers.
SELECT bqutil.fn.percentage_difference(0.2, 0.8)
, bqutil.fn.percentage_difference(4.0, 12.0)
, bqutil.fn.percentage_difference(100, 200)
, bqutil.fn.percentage_difference(1.0, 1000000000)
results:
f0_ | f1_ | f2_ | f3_ |
---|---|---|---|
1.2 | 1.0 | 0.6667 | 2.0 |
Returns the value of pi.
SELECT bqutil.fn.pi() this_is_pi
3.141592653589793
Convert degree values into radian.
SELECT bqutil.fn.radians(180) is_this_pi
3.141592653589793
Generate random integers between the min and max values.
SELECT bqutil.fn.random_int(0,10) randint, COUNT(*) c
FROM UNNEST(GENERATE_ARRAY(1,1000))
GROUP BY 1
ORDER BY 1
Returns a random value from an array.
SELECT
bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe']),
bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe']),
bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe'])
'tino', 'julie', 'jordan'
If cond
is FALSE
the function cause error.
SELECT
`bqutil.fn.sure_cond`(x, x > 0)
FROM UNNEST([1, 2, 3, 4]) as x
If argument value
is matched by like_pattern
, the function returns value
as-is.
Otherwise it causes error.
SELECT
`bqutil.fn.sure_like`("[some_pattern]", "[%]") = "hoge";
If non-NULL argument is passed, the function returns input value
as-is.
However if NULL value is passed, it causes error.
SELECT
bqutil.fn.sure_nonnull(1),
bqutil.fn.sure_nonnull("string"),
bqutil.fn.sure_nonnull([1, 2, 3]),
Returns true if value is between lower_bound and upper_bound, inclusive.
SELECT
bqutil.fn.sure_range(1, 1, 10) == 1,
bqutil.fn.sure_range("b", "a", "b") == "b",
If argument value
is in acceptable_value_array
or NULL, the function returns input value
as-is.
Otherwise it causes error.
SELECT
`bqutil.fn.sure_values`("hoge", ["hoge", "fuga"]) = "hoge",
`bqutil.fn.sure_values`( NULL, ["hoge", "fuga"]) is NULL
Returns a binary representation of a number.
SELECT
x,
bqutil.fn.to_binary(x) AS binary
FROM
UNNEST([1, 123456, 9876543210, -1001]) AS x;
results:
x | binary |
---|---|
1 | 0000000000000000000000000000000000000000000000000000000000000001 |
123456 | 0000000000000000000000000000000000000000000000011110001001000000 |
9876543210 | 0000000000000000000000000000001001001100101100000001011011101010 |
-1001 | 1111111111111111111111111111111111111111111111111111110000010111 |
Returns a hexadecimal representation of a number.
SELECT
x,
bqutil.fn.to_hex(x) AS hex
FROM
UNNEST([1, 123456, 9876543210, -1001]) AS x;
results:
x | hex |
---|---|
1 | 0000000000000001 |
123456 | 000000000001e240 |
9876543210 | 000000024cb016ea |
-1001 | fffffffffffffc17 |
Returns a random string of specified length. Individual characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z.
SELECT
bqutil.fn.random_string(5),
bqutil.fn.random_string(7),
bqutil.fn.random_string(10)
'mb3AP' 'aQG5XYB' '0D5WFVQuq6'
For a given expression, replaces all occurrences of specified characters with specified substitutes. Existing characters are mapped to replacement characters by their positions in the characters_to_replace
and characters_to_substitute
arguments. If more characters are specified in the characters_to_replace
argument than in the characters_to_substitute
argument, the extra characters from the characters_to_replace
argument are omitted in the return value.
SELECT bqutil.fn.translate('mint tea', 'inea', 'osin')
most tin
Generate a timestamp array associated with each key
SELECT *
FROM
UNNEST(bqutil.fn.ts_gen_keyed_timestamps(['abc', 'def'], 60, TIMESTAMP '2020-01-01 00:30:00', TIMESTAMP '2020-01-01 00:31:00))
series_key | tumble_val |
---|---|
abc | 2020-01-01 00:30:00 UTC |
def | 2020-01-01 00:30:00 UTC |
abc | 2020-01-01 00:31:00 UTC |
def | 2020-01-01 00:31:00 UTC |
ts_linear_interpolate(pos TIMESTAMP, prev STRUCT<x TIMESTAMP, y FLOAT64>, next STRUCT<x TIMESTAMP, y FLOAT64>)
Interpolate the positions value using timestamp seconds as the x-axis
select bqutil.fn.ts_linear_interpolate(
TIMESTAMP '2020-01-01 00:30:00',
STRUCT(TIMESTAMP '2020-01-01 00:29:00' AS x, 1.0 AS y),
STRUCT(TIMESTAMP '2020-01-01 00:31:00' AS x, 3.0 AS y)
)
f0_ |
---|
2.0 |
Function to compare two timestamp as being within the same session window. A timestamp in the same session window as its previous timestamp will evaluate as NULL, otherwise the current row's timestamp is returned. The "LAST_VALUE(ts IGNORE NULLS)" window function can then be used to stamp all rows with the starting timestamp for the session window.
--5 minute (300 seconds) session window
WITH ticks AS (
SELECT 'abc' as key, 1.0 AS price, CAST('2020-01-01 01:04:59 UTC' AS TIMESTAMP) AS ts
UNION ALL
SELECT 'abc', 2.0, CAST('2020-01-01 01:05:00 UTC' AS TIMESTAMP)
UNION ALL
SELECT 'abc', 3.0, CAST('2020-01-01 01:05:01 UTC' AS TIMESTAMP)
UNION ALL
SELECT 'abc', 4.0, CAST('2020-01-01 01:09:01 UTC' AS TIMESTAMP)
UNION ALL
SELECT 'abc', 5.0, CAST('2020-01-01 01:24:01 UTC' AS TIMESTAMP)
)
SELECT
* EXCEPT(session_group),
LAST_VALUE(session_group IGNORE NULLS)
OVER (PARTITION BY key ORDER BY ts ASC) AS session_group
FROM (
SELECT
*,
bqutil.fn.ts_session_group(
ts,
LAG(ts) OVER (PARTITION BY key ORDER BY ts ASC),
300
) AS session_group
FROM ticks
)
key | price | ts | sesssion_group |
---|---|---|---|
abc | 1.0 | 2020-01-01 01:04:59 UTC | 2020-01-01 01:04:59 UTC |
abc | 2.0 | 2020-01-01 01:05:00 UTC | 2020-01-01 01:04:59 UTC |
abc | 3.0 | 2020-01-01 01:05:01 UTC | 2020-01-01 01:04:59 UTC |
abc | 4.0 | 2020-01-01 01:09:01 UTC | 2020-01-01 01:04:59 UTC |
abc | 5.0 | 2020-01-01 01:24:01 UTC | 2020-01-01 01:24:01 UTC |
Calculate the sliding windows the ts parameter belongs to.
-- show a 15 minute window every 5 minutes and a 15 minute window every 10 minutes
WITH ticks AS (
SELECT 1.0 AS price, CAST('2020-01-01 01:04:59 UTC' AS TIMESTAMP) AS ts
UNION ALL
SELECT 2.0, CAST('2020-01-01 01:05:00 UTC' AS TIMESTAMP)
UNION ALL
SELECT 3.0, CAST('2020-01-01 01:05:01 UTC' AS TIMESTAMP)
)
SELECT
price,
ts,
bqutil.fn.ts_slide(ts, 300, 900) as _5_15,
bqutil.fn.ts_slide(ts, 600, 900) as _10_15,
FROM ticks
price | ts | _5_15.window_start | _5_15.window_end | _5_15.window_start | _5_15.window_end |
---|---|---|---|---|---|
1.0 | 2020-01-01 01:04:59 UTC | 2020-01-01 00:50:00 UTC | 2020-01-01 01:05:00 UTC | 2020-01-01 00:50:00 UTC | 2020-01-01 01:05:00 UTC |
2020-01-01 00:55:00 UTC | 2020-01-01 01:10:00 UTC | 2020-01-01 01:00:00 UTC | 2020-01-01 01:15:00 UTC | ||
2020-01-01 01:00:00 UTC | 2020-01-01 01:15:00 UTC | ||||
2.0 | 2020-01-01 01:05:00 UTC | 2020-01-01 00:55:00 UTC | 2020-01-01 01:10:00 UTC | 2020-01-01 01:00:00 UTC | 2020-01-01 01:15:00 UTC |
2020-01-01 01:00:00 UTC | 2020-01-01 01:15:00 UTC | ||||
2020-01-01 01:05:00 UTC | 2020-01-01 01:20:00 UTC | ||||
3.0 | 2020-01-01 01:05:01 UTC | 2020-01-01 00:55:00 UTC | 2020-01-01 01:10:00 UTC | 2020-01-01 01:00:00 UTC | 2020-01-01 01:15:00 UTC |
2020-01-01 01:00:00 UTC | 2020-01-01 01:15:00 UTC | ||||
2020-01-01 01:05:00 UTC | 2020-01-01 01:20:00 UTC |
Calculate the tumbling window the input_ts belongs in
SELECT
fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 900) AS min_15,
fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 600) AS min_10,
fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 60) As min_1
min_15 | min_10 | |
---|---|---|
2020-01-01 00:15:00 UTC | 2020-01-01 00:10:00 UTC | 2020-01-01 00:17:00 UTC |
Return the type of input or 'UNKNOWN' if input is unknown typed value.
SELECT
bqutil.fn.typeof(""),
bqutil.fn.typeof(b""),
bqutil.fn.typeof(1.0),
bqutil.fn.typeof(STRUCT()),
STRING, BINARY, FLOAT64, STRUCT
Return decoded string of inputs "text" in "method" function.
SELECT NULL as method, bqutil.fn.url_decode("https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A", NULL) as value
UNION ALL SELECT "decodeURIComponent" as method, bqutil.fn.url_encode("https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A", "decodeURIComponent") as value
UNION ALL SELECT "decodeURI" as method, bqutil.fn.url_decode("https://example.com/?id=%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A", "decodeURI") as value
UNION ALL SELECT "unescape" as method, bqutil.fn.url_decode("https%3A//example.com/%3Fid%3D%u3042%u3044%u3046%u3048%u304A", "unescape") as value
method | value |
---|---|
NULL | https://example.com/?id=あいうえお |
decodeURIComponent | https://example.com/?id=あいうえお |
decodeURI | https://example.com/?id=あいうえお |
unescape | https://example.com/?id=あいうえお |
Return encoded string of inputs "text" in "method" function.
SELECT NULL as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", NULL) as value
UNION ALL SELECT "encodeURIComponent" as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", "encodeURIComponent") as value
UNION ALL SELECT "encodeURI" as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", "encodeURI") as value
UNION ALL SELECT "escape" as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", "escape") as value
method | value |
---|---|
NULL | https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A |
encodeURIComponent | https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A |
encodeURI | https://example.com/?id=%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A |
escape | https%3A//example.com/%3Fid%3D%u3042%u3044%u3046%u3048%u304A |
Get an array of url param keys.
SELECT bqutil.fn.url_keys(
'https://www.google.com/search?q=bigquery+udf&client=chrome')
["q", "client"]
Get the value of a url param key.
SELECT bqutil.fn.url_param(
'https://www.google.com/search?q=bigquery+udf&client=chrome', 'client')
"chrome"
Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL For example, url_parse('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'.
WITH urls AS (
SELECT 'http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1' as url
UNION ALL
SELECT 'rpc://facebook.com/' as url
)
SELECT bqutil.fn.url_parse(url, 'HOST'), bqutil.fn.url_parse(url, 'PATH'), bqutil.fn.url_parse(url, 'QUERY'), bqutil.fn.url_parse(url, 'REF'), bqutil.fn.url_parse(url, 'PROTOCOL') from urls
results:
f0_ | f1_ | f2_ | f3_ | f4_ |
---|---|---|---|---|
facebook.com | path1/p.php | k1=v1&k2=v2#Ref1 | Ref1 | http |
facebook.com | NULL | NULL | NULL | rpc |
Returns a URL with specified keys removed from the URL's query component. The keys to be removed are provided as an ARRAY input argument.
SELECT bqutil.fn.url_trim_query(
"https://www.example.com/index.html?goods_id=G1002&utm_id=ads&gclid=abc123",
["utm_id", "gclid"]
)
UNION ALL SELECT bqutil.fn.url_trim_query(
"https://www.example.com/index.html?goods_id=G1002&utm_id=ads&gclid=abc123",
["utm_id", "gclid", "goods_id"]
)
results:
f0_ |
---|
https://www.example.com/index.html?goods_id=G1002 |
https://www.example.com/index.html |
Returns the number of weeks from the beginning of the month to the specified date. The result is an INTEGER value between 1 and 5, representing the nth occurrence of the week in the month. The value 0 means the partial week.
SELECT
bqutil.fn.week_of_month(DATE '2020-07-01'),
bqutil.fn.week_of_month(DATE '2020-07-08');
0 1
Convert a STRING formatted as a YYYYMMDD to a DATE
SELECT bqutil.fn.y4md_to_date('20201220')
"2020-12-20"
Normalize a variable so that it has zero mean and unit variance.
with r AS (
SELECT 10 AS x
UNION ALL SELECT 20
UNION ALL SELECT 30
UNION ALL SELECT 40
UNION ALL SELECT 50
),
stats AS (
SELECT AVG(x) AS meanx, STDDEV(x) AS stddevx
FROM r
)
SELECT x, bqutil.fn.zeronorm(x, meanx, stddevx) AS zeronorm
FROM r, stats;
returns:
Row | x | zeronorm |
---|---|---|
1 | 10 | -12.649110640673518 |
2 | 20 | -6.324555320336759 |
3 | 30 | 0.0 |
4 | 40 | 6.324555320336759 |
5 | 50 | 12.649110640673518 |
This section details the subset of community contributed user-defined functions
that extend BigQuery and enable more specialized Statistical Analysis usage patterns.
Each UDF detailed below will be automatically synchronized to the fn
dataset
within the bqutil
project for reference in your queries.
For example, if you'd like to reference the int
function within your query,
you can reference it like the following:
SELECT bqutil.fn.int(1.684)
The returns the p value of the computed correlation coefficient based on the t-distribution. Input: r: correlation value. n: number of samples. Output: The p value of the correlation coefficient.
WITH test_cases AS (
SELECT 0.9 AS r, 25 n
UNION ALL
SELECT -0.5, 40
UNION ALL
SELECT 1.0, 50
UNION ALL
SELECT -1.0, 50
)
SELECT bqutil.fn.corr_pvalue(r,n) AS p
FROM test_cases
results:
p |
---|
1.443229117741041E-9 |
0.0010423414457657223 |
0.0 |
0.0 |
Takes an array of struct where each struct (point) represents a measurement, with a group label and a measurement value
The Kruskal–Wallis test by ranks, Kruskal–Wallis H test (named after William Kruskal and W. Allen Wallis), or one-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).
- Input: array: struct <factor STRING, val FLOAT64>
- Output: struct<H FLOAT64, p-value FLOAT64, DOF FLOAT64>
DECLARE data ARRAY<STRUCT<factor STRING, val FLOAT64>>;
set data = [
('a',1.0),
('b',2.0),
('c',2.3),
('a',1.4),
('b',2.2),
('c',5.5),
('a',1.0),
('b',2.3),
('c',2.3),
('a',1.1),
('b',7.2),
('c',2.8)
];
SELECT `bqutil.fn.kruskal_wallis`(data) AS results;
results:
results.H | results.p | results.DoF |
---|---|---|
3.4230769 | 0.1805877 | 2 |
Takes an array of STRUCT X, Y and returns a, b, r where Y = a*X + b, and r is the "goodness of fit measure.
The Linear Regression, is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).
- Input: array: struct <X FLOAT64, Y FLOAT64>
- Output: struct<a FLOAT64,b FLOAT64, r FLOAT64>
DECLARE data ARRAY<STRUCT<X STRING, Y FLOAT64>>;
set data = [ (5.1,2.5), (5.0,2.0), (5.7,2.6), (6.0,2.2), (5.8,2.6), (5.5,2.3), (6.1,2.8), (5.5,2.5), (6.4,3.2), (5.6,3.0)];
SELECT `bqutils.fn.linear_regression`(data) AS results;
results:
results.a | results.b | results.r |
---|---|---|
-0.4353361094588436 | 0.5300416418798544 | 0.632366563565354 |
Takes H and dof and returns p probability value.
The chisquare_cdf is NULL Hypothesis probability of the Kruskal-Wallis (KW) test. This is obtained to be the CDF of the chisquare with the H value and the Degrees of Freedom (dof) of the KW problem.
- Input: H FLOAT64, dof FLOAT64
- Output: p FLOAT64
SELECT `bqutils.fn.chisquare_cdf`(.3,2) AS results;
results:
results |
---|
0.8607079764250578 |
Computes the p value of the Fisher exact test (https://en.wikipedia.org/wiki/Fisher%27s_exact_test), implemented in JavaScript.
- Input: a,b,c,d : values of 2x2 contingency table ([ [ a, b ] ;[ c , d ] ] (type FLOAT64).
- Output: The p value of the test (type: FLOAT64)
Example
WITH mydata as (
SELECT
90.0 as a,
27.0 as b,
17.0 as c,
50.0 as d
)
SELECT
`bqutils.fn.p_fisherexact`(a,b,c,d) as pvalue
FROM
mydata
Output:
pvalue |
---|
8.046828829103659E-12 |
Computes the U statistics and the p value of the Mann–Whitney U test (https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test). This test is also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test
- Input: x,y :arrays of samples, both should be one-dimensional (type: ARRAY ), alt: defines the alternative hypothesis, the following options are available: 'two-sided', 'less', and 'greater'.
- Output: structure of the type struct<U FLOAT64, p FLOAT64> where U is the statistic and p is the p value of the test.
Example
WITH mydata AS (
SELECT
[2, 4, 6, 2, 3, 7, 5, 1.] AS x,
[8, 10, 11, 14, 20, 18, 19, 9. ] AS y
)
SELECT `bqutils.fn.mannwhitneyu`(y, x, 'two-sided') AS test
FROM mydata
Output:
test.U | test.p |
---|---|
0.0 | 9.391056991171487E-4 |
Runs the Student's T-test. Well known test to compare populations. Example taken from here: Sample
Sample Query:
DECLARE pop1 ARRAY<FLOAT64>;
DECLARE pop2 ARRAY<FLOAT64>;
SET pop1 = [13.3,6.0,20.0,8.0,14.0,19.0,18.0,25.0,16.0,24.0,15.0,1.0,15.0];
SET pop2 = [22.0,16.0,21.7,21.0,30.0,26.0,12.0,23.2,28.0,23.0] ;
SELECT `bqutils.fn.t_test`(pop1, pop2) AS actual_result_rows;
Results:
Row | actual_result_rows.t_value | actual_result_rows.dof |
---|---|---|
1 | 2.8957935572829476 | 21 |
Returns the value of x in the cdf of the Normal distribution with parameters mean and std (standard deviation).
Sample Query:
SELECT `bqutils.fn.normal_cdf`(1.1, 1.7, 2.0) as normal_cdf;
Results:
Row | normal_cdf |
---|---|
1 | 0.3820885778110474 |