Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iteratively constructing a json data frame makes problems on 'later' mutation #48

Open
behrica opened this issue Oct 25, 2015 · 3 comments

Comments

@behrica
Copy link

behrica commented Oct 25, 2015

I construct a json data frame in a loop, like this:

papers = data.frame()

for (....) {
 current_papers <- content %>% as.tbl_json %>%
                enter_object("items") %>% gather_array() %>%
                spread_values(title=jstring("title"),
                              snippet= jstring("snippet"),
                              link= jstring("link"),
                              displayLink=jstring("displayLink")) %>%
                enter_object("pagemap","metatags") %>% gather_array() %>%
                spread_values(creationdate=jstring(creationdate_field),
                              moddate=jstring(moddate_field)) 

            papers <- rbind(papers,current_papers)

}

this seems to work fine, (the data frame looks good)
but using "mutate" from dplyr on it, like this

papers <-
    papers %>%
    mutate(clickLink=paste0('=HYPERLINK("',link,'","link")'))

gives a very strange error message

Error in `$<-.data.frame`(`*tmp*`, "..JSON", value = list(list(author = "Microsoft Office User",  : 
  replacement has 9 rows, data has 40

Converting it to a dataframe first, does work:

  papers %>% data.frame() %>%
    mutate(clickLink=paste0('=HYPERLINK("',link,'","link")'))

Is this a bug in tidyjson or do I do something wrong ?

@vats-div
Copy link
Contributor

I think tidyjson does not support rbind. What I mean by this is that if we bind two tbl_json objects, then we loose the structure of the json object. You can probably see this if you type attr(papers, 'JSON') and you'll only see the first JSON object. When you call data.frame on it, then it is doing mutate on a data.frame object and it does not need to do any JSON object manipulation.

@behrica
Copy link
Author

behrica commented Oct 27, 2015

I thought about this.
so probably in my loop I should convert to a normal data frame before doing rbind.

Is there a way tidyjson could fail on doing the rbind ?

It seems that the tidyjson object is a data.frame, (so should support all operations on it), while it is not.

Maybe the documentation could mention that it is a good idea, to convert to an data frame after having finished the json parsing.

Maybe even better:
Could you maybe add a specific methods:
"toDataFrame" or similar, which does the conversion and removes the specific index columns (which you never care abotu after having finished the json handling)

@colearendt
Copy link

Support for bind_rows has been added to the development version here. Use devtools::install_github('jeremystan/tidyjson') to explore - I find this version superior to the CRAN version.

Further, tbl_df can be used to discard the JSON components of the tbl_json object

parsed <- my_json %>% ... ## parse the JSON
more_munging <- parsed %>% tbl_df %>% bind_rows... ## Other manipulation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants