-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve DataFrame
Users Guide
#11324
Conversation
@@ -626,6 +626,12 @@ doc_comment::doctest!( | |||
user_guide_configs | |||
); | |||
|
|||
#[cfg(doctest)] | |||
doc_comment::doctest!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runs tests as part of cargo doc
|
||
## DataFrame Transformations | ||
|
||
These methods create a new DataFrame after applying a transformation to the logical plan that the DataFrame represents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tables are duplicates of what is in the API docs.
I think it is better to send people there (and invest in keeping it up to date / with examples).
The only thing that is lost is a summary table that breaks the functions down into Transformations
, Actions
, and other
.
If reviewers feel this content is valuable, I can move the tables to the API docs
Reviewing this sparks a lot of broader throughts for me. First of all, I'm not sure we need the distinction between "user guide" and "library user guide" when it comes to data frames. The only way you can use a data frame is if you are using it as library? I'm unsure why I should be reading one section or the other. Second, I think you lose a lot of context by removing the table. The But I think the important part is simply to note that there are transformations, methods that execute the frame and administrative methods. I might further break down the methods that execute the frame into those that return a new frame in some way and those that write to a data sink? That is, I'm not sure its necessary to list every method in each of these categories but it is helpful to identify the categories. That being said, I think a table, perhaps more granular, with links to the API documentation for each method and possibly even links to the SQL equivalent where appropriate would be a good long term goal. Is there some tooling / macros we could build to support this in a sustainable way? Also, is it the case that I can only create a data frame via SessionContext? The typically in the introduction suggests there are other ways of doing it. I wonder if it would be better to be more precise and just enumerate the different ways you can create a data frame. I think it's something like: read from a file, read from a table (which really covers a lot of possibilities), execute SQL statements. So - I suppose to make this executable within the context of this PR - perhaps reduce the tables to more of a summary? But also curious to hear from others. Finally, not for this PR, I wonder if SessionContext warrants its own section. As with DataFrame I think it would benefit from a discussion of the different categories of things it can be used for. Related, it's becoming clear to me from poking around the documentation and methods its becoming clear that there is a great deal of flexibility in mixing and matching SQL and data frames if you want to but I'm not sure that's coming across in the guides? When I have time I can try drafting something to see how it might fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @alamb
Co-authored-by: Oleks V <[email protected]>
Thanks @efredine and @comphead I have not forgotten about @efredine 's feedback in #11324 (comment). I filed #11388 to track |
* Improve `DataFrame` Users Guide * typo * Update docs/source/user-guide/dataframe.md Co-authored-by: Oleks V <[email protected]> --------- Co-authored-by: Oleks V <[email protected]>
* Improve `DataFrame` Users Guide * typo * Update docs/source/user-guide/dataframe.md Co-authored-by: Oleks V <[email protected]> --------- Co-authored-by: Oleks V <[email protected]>
Which issue does this PR close?
Part of #3058
Rationale for this change
While responding to comments from @efredine on #11290, I noticed some other ways the
DataFrame
docs could be improvedSpecifically this page: https://datafusion.apache.org/user-guide/dataframe.html
Among other things, the examples are incomplete (and they are not run in CO) and the documentation of methods is also incomplete
What changes are included in this PR?
Are these changes tested?
The examples are now tested as part of CI,
I also built the docs locally and I think they look better:
Are there any user-facing changes?