Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added replacement facility for page content. #104

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,69 @@ folderA/
```
</details>

## Replacements

By using either command line replacement pairs or specifying a file with json defintions of replacements an arbitrary regexp ("pattern") can be detected and replaced with another string, including expanding captured groups in the pattern.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably recommend doing this in YAML instead of JSON. Most other configuration files in the documentation-adjacent space tend to be YAML, and JSON is a valid subset of YAML anyway.


The replacement phase is taking place just before upsert, so all other textual manipulations are done by that time.

Replacements happen in a deterministic sequence. There are ample opportunities to get unexpected (but logically consistent) results by inadvertently result of a previous replacement.
<details>
<summary>Format of json file</summary>

```json
{
"environment": [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is not particularly intuitive. I would maybe use something like dynamic_replacement_modules

{
"import": "<module name>",
"path": "<source file>"
}
],
"replacements":[
{
"name": "<name - optional>",
"pattern": "<regexp>",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"pattern": "<regexp>",
"regex": "<regex>",

"new_value": "<string with optional group expansions>"
"evaluate": <true|false - optional>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not particularly intuitive. I would suggest using two different fields instead of making the behaviour dependent on this flag (e.g. replacement_text and replacement_function).

},
]
}
```
</details>

### Advanced replacements

The `environment` block is optional and used for very dynamic replacements. By specifying a python source file, it will be dynamically imported at run time. The `new_value` field can then specify a `<module>.<func>` that returns a string value. As an example, the following adds a replacement of "TODAY" to an iso-formatted datetime.

```json
{
"environment": [
{
"import": "funcs",
"path": "funcs.py"
}
],
"replacements":[
{
"name": "Todays date",
"pattern": "TODAY",
"new_value": "funcs.today"
"evaluate": true
},
]
}
```

Funcs.py
```python
import datetime

def today(term):
return datetime.datetime.now().isoformat()
```

The parameter `term` is a Match object as per using [re.subn](https://docs.python.org/3/library/re.html#re.subn).

## Terminal output format

By default, `md2cf` produces rich output with animated progress bars that are meant for human consumption. If the output is redirected to a file, the progress bars will not be displayed and only the final result will be written to the file. Error messages are always printed to standard error.
Expand Down
19 changes: 19 additions & 0 deletions md2cf/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
minimal_output_console,
)
from md2cf.document import Page
from md2cf.replacements import create_replacements
from md2cf.tui import Md2cfTUI
from md2cf.upsert import upsert_attachment, upsert_page

Expand Down Expand Up @@ -258,6 +259,20 @@ def get_parser():
help="number of retry attempts if any API call fails",
)

parser.add_argument(
"--replace",
nargs="+",
action="append",
dest="replacements",
help="Specify replacements on the form <from>=<to>. Can be repeated many times",
)

parser.add_argument(
"--replacements",
dest="replacementfile",
help="Filename with replacement definition in json format",
)
Comment on lines +270 to +274
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument(
"--replacements",
dest="replacementfile",
help="Filename with replacement definition in json format",
)
parser.add_argument(
"--replacement-file",
dest="replacement_file",
help="Filename with replacement definition in YAML format",
)


return parser


Expand Down Expand Up @@ -299,6 +314,8 @@ def main():
console.quiet = True
json_output_console.quiet = False

replacements = create_replacements(args.replacements, args.replacementfile)

confluence = api.MinimalConfluence(
host=args.host,
username=args.username,
Expand Down Expand Up @@ -398,6 +415,8 @@ def main():
for page in pages_to_upload:
pre_process_page(page, args, postface_markup, preface_markup, space_info)
tui.start_item_task(page.original_title)
for replacement in replacements:
page = replacement.replace(page)
upsert_page_result = None
try:
tui.set_item_progress_label(page.original_title, "Upserting")
Expand Down
70 changes: 70 additions & 0 deletions md2cf/replacements.py
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests at all for this :(

Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import importlib.util
import json
import re
from typing import List

from md2cf.console_output import console


class Replacement:
def __init__(
self, name: str, pattern: str, new_value: str, evaluate: bool = False
) -> None:
self.name = name
self.pattern = pattern
self.new_value = new_value
self.evaluate = evaluate

def replace(self, page):
console.print(f"Performing replacement '{self.name}'")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add console logging inside library classes/functions, please. This is better handled in __main__.py, but I'd drop it altogether since it breaks the TUI paradigm of the rest of the tool.

if self.evaluate:
new_value = eval(self.new_value)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is implemented with an eval, the field can actually specify any valid python code at all, which can be a security issue. Granted, once you're importing a custom module all security concerns go out the window anyway, but I'd still prefer something a bit smarter, like only allowing module.function here, parsing it, and using getattr to get it from the module.

else:
new_value = self.new_value
page.body, count = re.subn(f"({self.pattern})", new_value, page.body)
console.print(f">> {count} replacements made")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

return page

def __repr__(self) -> str:
return self.name


def create_replacements(replacements, replacementfile: str) -> List[Replacement]:
result = []
commandline_replacements = (
[item for sublist in replacements for item in sublist] if replacements else []
)

# Create Replacement objects for the commandline replacements
for i, r in enumerate(commandline_replacements):
result.append(Replacement(f"CLI replacement {i}", *r.split("=", 1)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go wrong pretty easily if somebody forgets the =: it will just result in a non-obvious exception about the number of parameters when calling the constructor. I'd love to have this function receive data that's already well-structured. For example, you could parse the command line arguments in __main.py__ (with error handling if the format is wrong) and give create_replacement a well-formed list of lists, e.g.

[["pattern", "replacement], ["another pattern", "another replacement"]]


# Opt out if no file specified
if not replacementfile:
return result

file_replacements = json.load(open(replacementfile))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_replacements = json.load(open(replacementfile))
with open(replacement_file) as fp:
file_replacements = json.load(fp)

# Do we need to load any modules?
for env in file_replacements.get("environment", []):
if env.get("import"):
spec = importlib.util.spec_from_file_location(
env["import"], env.get("path")
)
globals()[env["import"]] = importlib.util.module_from_spec(spec)
spec.loader.exec_module(globals()[env["import"]])

# Get the replacement definitions
for i, r in enumerate(file_replacements["replacements"]):
new_value = r["new_value"]
if isinstance(new_value, list):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour is not documented anywhere. Is it really necessary? What's a use case for this vs. a string with newlines?

new_value = "\n".join(new_value)
result.append(
Replacement(
r.get("name", f"File replacement {i}"),
r["pattern"],
new_value,
r.get("evaluate", False),
)
)

return result