check_parent performs a recursive search, but it shouldn't ? #2

roadelou · 2020-06-24T19:31:03Z

The tool moves and rearranges files from only the current directory, yet the check_parent function will look for the .videolog file in the current working directory and all all of its sub-directories recursively. This is what os.walk does in src/video_logging.py :

for root, dirs, files in os.walk("./"):
        if ".videolog" in files:
            raise SudoException()

I do not think that there is an equivalent to os.listdir but for files in python. One way to circumvent this is to take advantage of the fact that os.walk returns a generator object (and not a list). This means that instead of getting all the files recursively at once, os.walk (very probably) gets those in the current working directory, then those in the first sub-directory, etc...

Hence, one way to only get the files in the current working directory without going through every sub-directory would be something like :

for root, dirs, files in os.walk("./"):
        if root == '.':
            # Do your stuff
            break
        # else continue

Since the top directory is likely the first on to be walked through, this shouldn't be too inefficient.

That should also solve the issue mentioned in src/data.json in the sudo section :

[B]y default, the tool checks if the current directory contains the 'video-logging' folder in its arborescence by looking for the '.videolog' file. This process is usually quite fast but it may take a while when dealing with big directories.

On most modern file systems, I think that file look-ups in a given directory should be of almost constant time (correct me if I am wrong). Hence this issue is probably linked with the os.walk thing.

theodumont · 2020-07-17T12:15:20Z

Yes, I did want to check the current working directory and each of its subdirectories, and not only the cwd. Unless I didn't get your point? 🧐

Actually, as the folder command does not perform a recursive sort of the files and only moves the files that are in the current working directory, sorting a directory that contains .videolog could have 2 different consequences:

if cwd is video-logging directly, cli.py and src/ would be put in separated folders and thus one could no longer use the tool after the folder command;
if cwd contains the video-logging folder, then the whole folder will be moved and the tool will still be functional, but in another location on the computer.

So in a way, it may be useless to check all of the subdirectories, but I chose to do this so that the project folder doesn't move.
But I'm thinking about dropping the recursive check and only reading the content of the current directory; some people may want to sort their repos folder, which may contain this one.
I was thinking about using

os.path.isfile(".videolog")

for this.

Anyway, what do you think about this method, i.e. having an empty file that is only used for checking if we are in a specific directory? Another idea - that could be worse - would be to keep in memory the path in which the script was executed, and to compare it with the cwd each time.

roadelou · 2020-07-26T15:49:33Z

Yes, we are talking about the same thing 😄

I feel that my initial issue lacked some important information and might as such have been confusing 💫, allow me to correct that 😉

In the current implementation, you have chosen to trade usability (the tool takes time to perform the os.walk out of the box) for safety (the tool will not break itself out of the box) ⚖️

But, as you might have noticed, not many tools do that❗

Python lets you delete the python interpreter 🐍
Windows lets you delete registry keys 😉

In fact, almost no tool will check whether what you are doing makes sense, so long as it can perform what you requested:

You can open and edit a video with a text editor 📝 There is little point in doing so (from a human perspective) but for as far as the editor is concerned this is perfectly valid manipulation ✔️
Windows will let you move ➡️ its internal libraries (even if that will break the OS on reboot) because it can.

There is no deep philosophical reason for that, it is a limitation inherited from earlier days of computer science (see the C programming language) 😆 This is sometimes (rarely in fact) referred to as a best effort policy, i.e. the tool will try to perform what the human requested to the fullest of its abilities.

This is opposed to dynamically deterministic tools that twill always produce a reliable result, but might restrain themselves. A python interpreter 🐍 that would only execute a script if it terminates would be an example of this: it cannot execute all scripts, but if it does run one then we have a reliable guarantee that the execution will eventually end.

If you are interested in that sort of things I would advise you to investigate functional programming with languages such as Haskell or Rust.

Historically, the reason why most tools didn't do that sort of verification is because it would have been to complex (or impossible). But the whole programming environment evolved to account for this weakness 💡, and in particular the tools 🔧 (applications, utilities) are never stored along user data 📖 in the computer.

For instance on Windows, tools 🔧 can be found in C:\Program Files\ or C:\Program Files(x64)\ (for system-wide installs) or C:\Users\<username>\AppData\<some more things> (for user-specific tools). On the other hand, user data 📖 would be found in C:\Users\<username>\Documents for instance.

Thus, my issue with this trade off is that you are giving up on usability (slower on large directories out of the box) to account for an edge case that should never happen. Plus it is kind of surprising when you read it 😄

With that being said, there is at least one notable exception to this: when you are debugging the tool, you will almost always do this next to its source code. Thus the problem you are tackling is a real issue for development, and I guess you wrote it because you had some misfortune in the first place 😉

But as such I believe this verification could be opt-in, and not opt-out as it is now. So instead of asking the tool to remove this safeguard 👮 with "sudo", you could instead add a debug flag that would raise it. That is just an example of how to do that, probably you will find a more suitable way.

About how the verification should be performed, I think that all the ways you have mentioned and tried are valid, although I like the one where you dynamically find the path of the executable ❤️ more, because it should have no false-positive ✔️

theodumont added the discussion Discussion about a feature label Jul 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check_parent performs a recursive search, but it shouldn't ? #2

check_parent performs a recursive search, but it shouldn't ? #2

roadelou commented Jun 24, 2020

theodumont commented Jul 17, 2020

roadelou commented Jul 26, 2020

check_parent performs a recursive search, but it shouldn't ? #2

check_parent performs a recursive search, but it shouldn't ? #2

Comments

roadelou commented Jun 24, 2020

theodumont commented Jul 17, 2020

roadelou commented Jul 26, 2020