Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a parser for our Matlab functions' documentation #19

Open
gaspardcereza opened this issue Jun 25, 2020 · 14 comments · May be fixed by #20
Open

Implement a parser for our Matlab functions' documentation #19

gaspardcereza opened this issue Jun 25, 2020 · 14 comments · May be fixed by #20
Assignees
Labels
enhancement New feature or request

Comments

@gaspardcereza
Copy link
Member

Context

We are currently trying to define how to document our functions within the Matlab code. We also wan to gather all these descriptions on our website.

Problem

What looks good in the Matlab code isn't always visually satisfying once on the website (and vice-versa).

Solution

Parsing the Matlab code to gather all the useful information (description, syntax, inputs, outputs...) as structures or class object that will then be used for the website's generation (and displayed as we want).
Playing with the parser will also help us defining what could be the best convention for documenting our functions (related to this issue).

Feel free to comment if you have ideas, advice or anything...

@gaspardcereza gaspardcereza self-assigned this Jun 25, 2020
@jcohenadad
Copy link
Member

closing because duplicate of #17

@gaspardcereza
Copy link
Member Author

Originally posted by @jcohenadad in #17 (comment)

Now, regarding the "other" issue, i.e., parsing the header to change the output format for writing the .md file. Following on #17 (comment), i think it will take more than regexprep to address the issue. For example, one thing that needs to be done:
from there:

% INPUTS
%
%  bla (str)
%   blablabla
%
%  bla2 (float)
%    blablablablabla

to there:

| name | type | description
| bla | str | blablabla
…

it would be nice to implement that feature in HelpDocMd.

@gaspardcereza
Copy link
Member Author

I implemented a parser that reads a Matlab functions and returns all the information contained in the documentation section in a structure. I started from test.m as template for the documentation (though I think the final convention won't exactly look like this).

Here's what the Matlab documentation looks like:

%TEST Computes output1 and output 2 from arg1 and arg2.
%
% SYNTAX
%   output1 = test(arg1, arg2)
%   [output1, output2] = test(arg1, arg2)
%
% DESCRIPTION
%   Computes output1 as the sum of arg1 and arg2 and output2 as the
%   difference between arg1 and arg2.
%
% INPUTS
%   arg1
%     Scalar. This line is very long because i wanted to test if the input
%     description would be correctly parsed if it is longer than one line.
%
%   arg2
%     Scalar
%
% OUTPUTS
%   output1
%     Sum of arg1 and arg2  
%
%   output2
%     Difference between arg1 and arg2
%
% NOTES
% That function is destined to test the parsing of the function
% documentation (done by parse_doc.m).

And the returned structure contains the fields:

|.summary (string)
|
|.syntax (array of strings of size 1*nSyntaxes)
|
|.description (string)
|
|.inputs (struct)
|           \______ .names (array of strings of size 1*nInputs)
|            \______ .description (array of strings of size 1*nInputs)
|
|.outputs (struct)
|           \______ .names (array of strings of size 1*nInputs)
|            \______ .description (array of strings of size 1*nInputs)
|
|.notes (string)

Though I'm still not statisfied with the parser for 2 main reasons:

  • The way it it implemented right now requires a total respect of the template (e.g no "forgotten" spaces),
  • All the fields (SYNTAX, DESCRIPTION, etc...) must be provided.

I think there are many other defaults in the code but it might be a good start with some useful parsing Matlab features that I found.

The next step is to find a way to reorganize that structure as a .md file.

@rtopfer
Copy link
Contributor

rtopfer commented Jun 26, 2020

Looks promising!

Indeed, ensuring adherence to the template could be a bit tricky... Ideally Matlab would just do all of this for us 👎 :(

Alternatively, we could consider functionalizing the writing of the header itself: i.e. rather than manually copying/editing the template, the author actually declares the sections/content as string variables to be passed to a function—or class methods—which then checks everything required is indeed there, formats everything as desired, and returns the text and/or copies it to the clipboard (clipboard( 'copy', txt )), e.g.

docHeader = HeaderHelper() ; % a custom class
docHeader.add_section( 'syntax', ["this = my_fun( a, b )" ; "[this,that] =my_fun(a,b,c)"] );
docHeader.add_section( 'description', "Blah blah blah" );
docHeader.copy_txt() ; % copies to clipboard

Might be sufficiently weird that everyone would be put off from using it... but maybe not—manually formatting the text in the editor is actually kind of a hassle. (just an idea)

@jcohenadad
Copy link
Member

terrific work @gaspardcereza! i suggest

  • you create a proper unit test for HelpDocMd with your function (what you are doing with test.m is effectively a test that could go into CI)
  • you open a PR so we can centralize the feedback in the PR (more convenient, because the PR will be associated with your working branch)
  • adding space after each category, i.e.:
% INPUTS
%
%   arg1 (bla)
%    blablabla

@jcohenadad
Copy link
Member

Looks promising!

Indeed, ensuring adherence to the template could be a bit tricky... Ideally Matlab would just do all of this for us 👎 :(

Alternatively, we could consider functionalizing the writing of the header itself: i.e. rather than manually copying/editing the template, the author actually declares the sections/content as string variables to be passed to a function—or class methods—which then checks everything required is indeed there, formats everything as desired, and returns the text and/or copies it to the clipboard (clipboard( 'copy', txt )), e.g.

docHeader = HeaderHelper() ; % a custom class
docHeader.add_section( 'syntax', ["this = my_fun( a, b )" ; "[this,that] =my_fun(a,b,c)"] );
docHeader.add_section( 'description', "Blah blah blah" );
docHeader.copy_txt() ; % copies to clipboard

Might be sufficiently weird that everyone would be put off from using it... but maybe not—manually formatting the text in the editor is actually kind of a hassle. (just an idea)

i love the idea. My biggest concern would be that the 'help' inside the command window would be lost then, right?

@jcohenadad
Copy link
Member

jcohenadad commented Jun 26, 2020

Indeed, ensuring adherence to the template could be a bit tricky.

#17 would address (part of) it, right?

@rtopfer
Copy link
Contributor

rtopfer commented Jun 26, 2020

My biggest concern would be that the 'help' inside the command window would be lost then, right?

I was thinking this would just be a utility to help fill the template — i.e. the user still pastes the output into their file once they're done.

#17 would, indeed, be an alternative (at least for filling out the basic stuff, like argument names, types, defaults—a human still needs to fill in the description). But I see this as being a lot more work—a utility I wish Matlab just supplied, since for now pretty much all you can get in terms of info is nargin, nargout :/

@jcohenadad
Copy link
Member

My biggest concern would be that the 'help' inside the command window would be lost then, right?

I was thinking this would just be a utility to help fill the template — i.e. the user still pastes the output into their file once they're done.

ah! i misunderstood. I thought you suggested to replace the header by this code 😅. Hum, to be honest i think we can trust people to do a decent job of copy/pasting/replacing from the template. Also, the suggested utility would only be used when creating a function, whereas many times, the code is being updated (arguments added/removed/modified), so the utility function could not be used during those events.

#17 would, indeed, be an alternative (at least for filling out the basic stuff, like argument names, types, defaults—a human still needs to fill in the description). But I see this as being a lot more work—a utility I wish Matlab just supplied, since for now pretty much all you can get in terms of info is nargin, nargout :/

actually, i was thinking of using #17 not for the filling basic stuff but during CI. I.e.: the match between the arguments (code) and what's in the docstrings would be continuously verified by our CI. That will ensure a healthy codebase and no mismatch at all.

@rtopfer
Copy link
Contributor

rtopfer commented Jun 26, 2020

i figure if we could already validate the doc, we'd already be in a position to have matlab generate it (minus descriptions) 🙏
having to write the code and then re-write it in the header seems to me to defeat the purpose of having a computer 🙄

@gaspardcereza
Copy link
Member Author

  • you create a proper unit test for HelpDocMd with your function (what you are doing with test.m is effectively a test that could go into CI)

Is there a specific folder where we usually put the unit tests ?

@gaspardcereza gaspardcereza added the enhancement New feature or request label Jun 26, 2020
@jcohenadad
Copy link
Member

jcohenadad commented Jun 26, 2020

  • you create a proper unit test for HelpDocMd with your function (what you are doing with test.m is effectively a test that could go into CI)

Is there a specific folder where we usually put the unit tests ?

for inspiration: https://github.com/shimming-toolbox/shimming-toolbox/tree/master/tests

@jcohenadad
Copy link
Member

jcohenadad commented Jun 26, 2020

i figure if we could already validate the doc, we'd already be in a position to have matlab generate it (minus descriptions) 🙏
having to write the code and then re-write it in the header seems to me to defeat the purpose of having a computer 🙄

funny: i recall a similar conversation ~1 year ago, where positions were inverted and i was the one pushing for not manually filling info in the headers because people make mistakes and it should be done manually. One year later, after digging a bit into Mathworks capabilities, i turned (very) pessimistic about the whole thing. I would suggest to wait before doing this auto-fill because:

  • it will take a bit more coding to auto-fill than to do the validation;
  • my argument about updating docs still holds (i.e. once header is there and someone wants to add/modify a function, they need to do it manually).
  • taking the habit of doing it manually might actually reduce the number of mistakes for the previous point
  • introducing an autodoc tool comes with: documenting it, making the whole dev team aware of it (including each time there is a new addition to the team), answering questions about how to use it, etc. So i anticipate unnecessary time wasted there.

@rtopfer
Copy link
Contributor

rtopfer commented Jun 26, 2020

To be sure, as MATLAB isn't strongly-typed, allows variable inputs/outputs (e.g. varargin, varargout), default input assignments (values and types) could be contingent and occur anywhere in the code (or in separate files), etc., having a thorough auto-doc feature that works in all but the simplest cases might be impossible—or a deep learning project for an ambitious intern! 😋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants