Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a parser to transform the Matlab documentation into a structure #20

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
88 changes: 88 additions & 0 deletions parser/parse_doc.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
function docStruct = parse_doc(functionPath)
%PARSE_DOC Generates a structure from the input function's documentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's something that was agreed previously, but is it necessary to have a struct? Isn't it a better idea to have a dictionary (map) so you can input any information needed in the header like examples, etc. and make a test at the end of the parse to check that minimal sections are present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with a struct mainly because it was something I knew how to use. 😉 I'm not familiar with dictionaries but if you think that might be a better solution I'd be glad to discuss about it ! Is this what you are referring to ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what I'm thinking about. I think maps or set depending on what we want could be more versatile.

%
% SYNTAX
%
% docStruct = parse_doc(functionPath)
%
% DESCRIPTION
%
% Parses the function corresponding to the input path and fetches the
% information corresponding to the function's documentation before
% reorganising it as a structure
%
% INPUTS
%
% functionPath
% Character array corresponding to the path of the function.
%
% OUTPUTS
%
% docStruct
% Structure containing the different parts of the function's
% documentation as its fields (summary, description, inputs, outputs,
% and notes).
%
% NOTES
%
% It requires a total respect of the template (e.g no "forgotten" spaces).
% All the fields (SYNTAX, DESCRIPTION, etc...) must be provided in the
% parsed function.

%% Read the function and keep only the description section
functionTxt = fopen(functionPath); % Open the function file
gaspardcereza marked this conversation as resolved.
Show resolved Hide resolved

fgetl(functionTxt); % Skip the first line (function...)
% Initialize a cell that will receive the lines in the description
functionDoc = [];
gaspardcereza marked this conversation as resolved.
Show resolved Hide resolved

textLine = fgetl(functionTxt);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is duplicated with the one at line 55. could it be possible to fgetl at the begining of the while loop so you have it only once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition for my while loop is length(textLine) >= 1 so I need to call it before entering the loop...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I just really don't like duplicated code. You could also test in an if statement with a break command:
while 1
textLine = fgetl(functionTxt);
if length(textLine) >= 1
break;
end
...
end

nLine = 1;
headersPos = []; % Store the positions of the headers
while length(textLine) >= 1 % Parsing will stop at first empty line
if erase(textLine, '%') ~= 0 % Delete the empty commented lines

functionDoc = [functionDoc string(erase(textLine, '%'))];

if sum(isstrprop(strip(erase(textLine, '%')), 'upper')) ==...
length(strip(erase(textLine, '%'))) % Check uppercase

headersPos = [headersPos nLine]; % Store the header line

end
nLine = nLine+1;
end
textLine = fgetl(functionTxt); % Reads the next line
end

fclose(functionTxt); % Close the function file
headersPos = [headersPos length(functionDoc)+1]; % Position of the last line
%% Fetch the function's summary
docStruct.summary = functionDoc(1); % Store the summary of the function

%% Fetch the other sections

for nSection = 1:length(headersPos)-1 % Number of sections
header = strip(functionDoc(headersPos(nSection))); % Section name
sectionStart = headersPos(nSection)+1; % First line after header
sectionEnd = headersPos(nSection+1)-1; % Last line before next header

switch header
case 'SYNTAX'
docStruct.syntax = erase(functionDoc(sectionStart:sectionEnd),' ');
Copy link

@gab-berest gab-berest Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is really whitespace dependent and the problem is that there is a lot of characters that are refered as whitespaces (tab, space, null, ...). When someone will use a different number of spaces, everything will break. A good way of reducing this problem and making the documentation more flexible is adding a symbol in front of specific information ('->' in front of inputs, '<- in front of outputs', '@' in front of notes, '_' in front of types). This way you know that everything that is after a certain symbol is a certian type until you reach the next symbol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right but is it really easier to make sure everyone adds the right symbol in the right place or that they just respect the number of white spaces ? That's something I've been asking myself when I was coding but I don't have a clear preference for any of these options so I'm open to discussion.
Also, the way the code is implemented right now, there's no need for a symbol in description and notes because the code will just take everything that's in these sections and store it as a paragraph. I also don't see a genuine utility in having different symbols for the inputs and outputs as they are in different sections and thus can't be mixed up. But it might however be more visual to have these different arrows.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the whitespaces is actually the implementation (tabs vs spaces vs null characters, etc.). There's a lot of ways to add space in a text editor. However with a symbol it's easy to know which one.
The question of someone putting the right character is not relevant since it is the same for every way we find (number of spaces, etc.).
you are right for the description and notes, but I'm talking more about the input/outputs.

case 'DESCRIPTION'
docStruct.description = strjoin(strip(functionDoc(sectionStart:sectionEnd)),' ');
case 'INPUTS'
section = functionDoc(sectionStart:sectionEnd);
docStruct.inputs.names = strip(section(cellfun('isempty', strfind(section,' '))));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to be carreful with embeding functions in other functions. A lot of parenthesis can make the code less readable. sometimes, it is better to separate in multiple lines.

docStruct.inputs.description = split(strjoin(replace(strip(section(2:end)),docStruct.inputs.names(:),'|||')),'|||')';
case 'OUTPUTS'
section = functionDoc(sectionStart:sectionEnd);
gaspardcereza marked this conversation as resolved.
Show resolved Hide resolved
docStruct.outputs.names = strip(section(cellfun('isempty', strfind(section,' '))));
docStruct.outputs.description = split(strjoin(replace(strip(section(2:end)),docStruct.outputs.names(:),'|||')),'|||')';
case 'NOTES'
docStruct.notes = strjoin(strip(functionDoc(sectionStart:end)),' ');
otherwise
error('Unknown section name in the function documentation')
end
end
19 changes: 19 additions & 0 deletions tests/test_parse_doc.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
function test_parse_doc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be for me other tests. This test coverage is really minimal. When we merge, we should have the certitude that every way to use the function or not use the function is tested and working (specially for parsing functions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to draft something here to understand how unit testing works (never did it before). But I totally agree that a maximum of things have to be covered during the test. 👍

%TEST_PARSE_DOC Is the unit test corresponding to parse_doc.m
%
% DESCRIPTION
%
% That function makes sure that parse_doc is working properly and returns
% the expected structure for the documentation.
cd('../parser')
Copy link

@gab-berest gab-berest Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be an error if the function is run from another directory than the one tested (and in Matlab, it is possible to run a script or function from a different directory). Maybe you should check with a pwd function end extrapolate the correct way to get to the place you want. You could also check this link to know more about finding an item location in Matlab:
https://www.mathworks.com/help/matlab/ref/which.html
This way your code could be more robust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should consider removing this line entirely and make the code work whether this is called from anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's something I why trying to do but I could only find the path to the function using which if I was already in the right folder (not really useful in our case...). One solution would be to add helpDocMd/parser/ to the Matlab path but It would also require to know the full path toward it. @po09i do you know how this is usually done with unit testing ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The which command in Matlab does this. It returns the absolute path of the file you asked. See the link I added in my previous comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a startup.m script like the one in shimming-toolbox which adds all the source files to the Matlab path. This script would be launched when running the tests. So my point of view is you should assume all of helpDocMd is on the Matlab path. If it is not yet created in this repo, I think you should create it.

docStruct = parse_doc('parse_doc.m');

assert(isstruct(docStruct)) % Check if docStruct is indeed a structure

% Make sure every input/output name matches a description
if docStruct.inputs.names ~= ''
assert(length(docStruct.inputs.names) == length(docStruct.inputs.description));
gab-berest marked this conversation as resolved.
Show resolved Hide resolved
end
if docStruct.outputs.names ~= ''
assert(length(docStruct.outputs.names) == length(docStruct.outputs.description));
end