Skip to content

Commit

Permalink
search_finna #2
Browse files Browse the repository at this point in the history
  • Loading branch information
ake123 committed Nov 15, 2024
1 parent 1db34cb commit 2c377e7
Show file tree
Hide file tree
Showing 6 changed files with 97 additions and 130 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Authors@R: c(
person("Leo", "Lahti", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0001-5537-637X"))
)
Description: R package to retrieve data from Finna API (one paragraph).
Description: R package to retrieve metadata from Finna API.
License: BSD_2_clause + file LICENSE
Encoding: UTF-8
Imports:
Expand Down
31 changes: 21 additions & 10 deletions R/search_finna.R
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,12 @@ search_finna <- function(query = NULL,#lookfor
lng = "fi",
prettyPrint = FALSE) {

# Handle empty search queries
# if (query == "" || is.null(query)) {
# warning("Error: Empty search query provided.")
# return(NULL)
# }
# Warn the user if the default limit is used
if (missing(limit) || limit == 100) {
warning("Default limit of 100 records is being used. Specify 'limit' argument for more records.")
}
# Start the timer
start_time <- Sys.time()

# Define the base URL for the search API
base_url <- "https://api.finna.fi/v1/search"
Expand Down Expand Up @@ -94,11 +95,13 @@ search_finna <- function(query = NULL,#lookfor
return(NULL)
}
)
#print(response)
#result <- content(response, as = "text", encoding = "UTF-8")
#print(result)
#json_result <- fromJSON(result)
#print(json_result)
if (!is.null(response) && httr::status_code(response) == 429) {
# Handle rate limit (429 Too Many Requests)
warning(sprintf("Rate limit hit (429). Retrying in %d seconds...", retry_delay))
Sys.sleep(retry_delay)
response <- NULL
attempt <- attempt + 1
}

# Process the response based on the status code
if (httr::status_code(response) == 200) {
Expand Down Expand Up @@ -191,6 +194,9 @@ search_finna <- function(query = NULL,#lookfor
return(NULL)
}
}
# End the timer
end_time <- Sys.time()
time_taken <- as.numeric(difftime(end_time, start_time, units = "secs"))

# Convert the list of extracted data into a tibble for easy analysis
tibble_results <- tibble::as_tibble(do.call(rbind, lapply(all_data, function(x) unlist(x, recursive = FALSE))))
Expand All @@ -200,6 +206,11 @@ search_finna <- function(query = NULL,#lookfor
#cat("Data retrieved from Finna API (https://www.finna.fi) - metadata licensed under CC0.\n")
#return(tibble_results)
attr(tibble_results, "result_count") <- result_count
attr(tibble_results, "time_taken_seconds") <- time_taken

message(sprintf("Total results found: %d", result_count))
message(sprintf("Data fetching completed in %.2f seconds.", time_taken))

return(tibble_results)

}
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
[![codecov](https://codecov.io/gh/rOpenGov/finna/branch/devel/graph/badge.svg)](https://app.codecov.io/gh/rOpenGov/finna)
[![codefactor](https://www.codefactor.io/repository/github/rOpenGov/finna/badge)](https://www.codefactor.io/repository/github/rOpenGov/finna)

The goal of finna is to retrieve data from Finna API
The goal of finna is to retrieve metadata from Finna API

## Installation instructions
The devel version of finna can be installed from GitHub as follows:
Expand All @@ -26,6 +26,8 @@ remotes::install_github("rOpenGov/finna")
## Example
The basic functionality of finna can be explored as follows:

**N.B** In the search_finna() default limit of 100 records is being used. Specify 'limit' argument for more records.

``` r
# Load the package
library(finna)
Expand Down Expand Up @@ -90,9 +92,9 @@ This package was developed using the following resources:

This package is in no way officially related to or endorsed by Finna.

When using data retrieved from Finna database in your work, please
indicate that the data source is Finna. If your re-use involves some
When using metadata retrieved from Finna database in your work, please
indicate that the metadata source is Finna. If your re-use involves some
kind of modification to data or text, please state this clearly to the
end user. See Finna policy on [copyright and free re-use of
data](https://www.finna.fi/Content/terms?lng=en-gb) for more
metadata](https://www.finna.fi/Content/terms?lng=en-gb) for more
detailed information and certain exceptions.
76 changes: 9 additions & 67 deletions vignettes/articles/Fennica.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ vignette: >

To search Fennica data in Finna

```{r}
**N.B** In the search_finna() default limit of 100 records is being used. Specify 'limit' argument for more records.

```{r message = FALSE, warning = FALSE}
library(finna)
fennica <- search_finna("*",filters=c('collection:"FEN"'))
print(fennica)
Expand All @@ -27,15 +29,15 @@ as `search_finna("*",filters=c('collection:"FEN"'), limit = Inf)`
search the whole data and it total search of counts in the the interval between
some years for example between the years 1809-1917 as follows:

```{r}
```{r message = FALSE, warning = FALSE}
library(finna)
fennica <- search_finna("*",filters = c('collection:"FEN"', 'search_daterange_mv:"[1809 TO 1918]"'))
print(fennica)
```

we can check the whole data count

```{r}
```{r message = FALSE, warning = FALSE}
library(finna)
fennica <- search_finna("*",filters = c('collection:"FEN"', 'search_daterange_mv:"[1809 TO 1918]"'))
result_count <- attr(fennica, "result_count")
Expand All @@ -45,14 +47,14 @@ print(result_count)
## Visualization for fennica

We can use any of the functions provided to visualize the data
```{r}
```{r message = FALSE, warning = FALSE}
library(finna)
fennica <- search_finna("*",filters = c('collection:"FEN"', 'search_daterange_mv:"[1809 TO 1918]"'))
refined_data <- refine_metadata(fennica)
visualize_author_distribution(refined_data)
```

```{r}
```{r message = FALSE, warning = FALSE}
fennica <- search_finna("*",filters = c('collection:"FEN"', 'search_daterange_mv:"[1809 TO 1918]"'))
refined_data <- refine_metadata(fennica)
visualize_word_cloud(refined_data, "Title")
Expand All @@ -67,76 +69,16 @@ Here's how you can modify the `search_finna` function to query these fields:

- You can use the `type = "Author"` option to specifically search for records by author.

```{r}
```{r message = FALSE, warning = FALSE}
library(finna)
record <-search_finna(query = "Jean Sibelius", type = "Author")
record
```

Alternatively, you can apply filters to search for authors using the `filters` parameter:

```{r}
```{r message = FALSE, warning = FALSE}
record <- search_finna(query = "Jean Sibelius", filters = c('author:"Jean Sibelius"'))
record
```

### 2. **Search for Publication Information:**

If you want to search for publication information such as the publication date or publisher, you can use `type = "Title"` or `type = "AllFields"` and then apply filters:

- For specific years, you can use the `search_daterange_mv` filter:

```r
search_finna(query = "Sibelius", filters = c('search_daterange_mv:"[2000 TO 2020]"'))
```

- To search by publisher, you can add a filter for the publisher name:

```r
search_finna(query = "Sibelius", filters = c('publisher:"Ondine"'))
```

### 3. **Search by Call Numbers:**

Call numbers are used to classify items in libraries. To search by call number, you can add a filter for `callnumber-search` or `callnumber`:

```r
search_finna(query = "Sibelius", filters = c('callnumber-search:"78.54"'))
```

This will return results where the call number is `78.54` (which is typically used for orchestral music).

### Example Using Multiple Filters:
You can combine these search types and filters to make more complex queries. For instance, to search for works by **Jean Sibelius** published between **2000 and 2020** with the call number **78.54**:

```r
search_finna(
query = "Sibelius",
filters = c('author:"Jean Sibelius"', 'search_daterange_mv:"[2000 TO 2020]"', 'callnumber-search:"78.54"')
)
```

### Code Overview:

```r
# Author search example
search_finna(query = "Jean Sibelius", type = "Author")

# Search for works by author with publication date range
search_finna(query = "Jean Sibelius", filters = c('search_daterange_mv:"[2000 TO 2020]"'))

# Search for works by call number
search_finna(query = "Sibelius", filters = c('callnumber-search:"78.54"'))

# Combine author, publication date, and call number filters
search_finna(
query = "Sibelius",
filters = c('author:"Jean Sibelius"', 'search_daterange_mv:"[2000 TO 2020]"', 'callnumber-search:"78.54"')
)
```

### Notes:
- **Filters**: The filters need to match the exact field names used in Finna's API. You can find these field names in the API documentation or by looking at the response from the API.
- **Call Number Search**: Ensure that the call numbers are correctly formatted according to the library's classification system (e.g., YKL in Finland).

This way, you can extract specific metadata like authors, publication years, and call numbers using the `search_finna` function.
28 changes: 15 additions & 13 deletions vignettes/articles/refinemetadata.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The purpose of the ` refine_metadata()` function is to:
This refinement process makes the metadata more consistent and user-friendly, reducing potential issues in subsequent analysis or reporting.


```{r}
```{r message = FALSE, warning = FALSE}
library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
Expand Down Expand Up @@ -51,7 +51,7 @@ print(integrated_data)

### **Analyze using ` analyze_metadata()` Function**

```{r}
```{r message = FALSE, warning = FALSE}
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
analysis_results <- analyze_metadata(refined_data)
Expand All @@ -60,15 +60,15 @@ print(analysis_results)

### 1. **Applying the `visualize_year_distribution()` Function**

```{r}
```{r message = FALSE, warning = FALSE}
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
analysis_results <- analyze_metadata(refined_data)
visualize_year_distribution(analysis_results$year_distribution)
```

### 1.1 Line plot of yearly distribution
```{r}
```{r message = FALSE, warning = FALSE}
library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
Expand All @@ -79,7 +79,7 @@ visualize_year_distribution_line(refined_data)

This function will visualize the top 20 titles from your dataset.

```{r}
```{r message = FALSE, warning = FALSE}
# Assuming you have a tibble with Finna metadata called `refined_data`
top_20_titles_plot <- visualize_top_20_titles(refined_data)
Expand All @@ -88,7 +88,7 @@ print(top_20_titles_plot)
```

### 2.1 Visualize Heatmap of Titles by Year
```{r warning=FALSE}
```{r message = FALSE, warning = FALSE}
library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
Expand All @@ -99,15 +99,17 @@ visualize_title_year_heatmap(refined_data)

This function visualizes the distribution of the records by format.

```{r}
```{r message = FALSE, warning = FALSE}
# Plot the format distribution
format_distribution_plot <- visualize_format_distribution(refined_data)
# To display the plot
print(format_distribution_plot)
```

### 3.1 Visualize Format Distribution as Pie Chart
```{r}

```{r message = FALSE, warning = FALSE}
library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
Expand All @@ -118,7 +120,7 @@ visualize_format_distribution_pie(refined_data)

This function shows the distribution of the records by library.

```{r}
```{r message = FALSE, warning = FALSE}
# Plot the library distribution
library_distribution_plot <- visualize_library_distribution(refined_data)
Expand All @@ -129,7 +131,7 @@ print(library_distribution_plot)

This function shows the distribution of the records by library.

```{r}
```{r message = FALSE, warning = FALSE}
library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
Expand All @@ -141,7 +143,7 @@ visualize_format_library_correlation(refined_data)

This function visualizes the distribution of the records by author.

```{r}
```{r message = FALSE, warning = FALSE}
# Plot the author distribution
author_distribution_plot <- visualize_author_distribution(refined_data)
Expand All @@ -153,7 +155,7 @@ print(author_distribution_plot)

This function visualizes the distribution of the records by subject.

```{r}
```{r message = FALSE, warning = FALSE}
# Plot the subject distribution
subject_distribution_plot <- visualize_subject_distribution(refined_data)
Expand All @@ -164,7 +166,7 @@ print(subject_distribution_plot)

This function visualizes the distribution of the records by subject.

```{r}
```{r message = FALSE, warning = FALSE}
music_data <- search_finna("music")
refined_data <- refine_metadata(music_data)
visualize_word_cloud(refined_data, "Title")
Expand Down
Loading

0 comments on commit 2c377e7

Please sign in to comment.