Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Designated backup directory revised #84

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

LiberalArtist
Copy link
Contributor

This reorganizes the commits from #83

Modifies path-utils:generate-autosave-name and path-utils:generate-backup-name to respond to preferences stored under 'path-utils:autosave-dir and 'path-utils:backup-dir, respectively, which can be used to designate a directory for saving the corresponding automatically-generated files, rather than saving them in the same directory as the originals.

This functionality is inspired by Emacs' ability to be configured to do likewise with backup and autosave files.

@rfindler
Copy link
Member

I think that you can just change the other PR instead of making a new one when you have new commits. Or did you want to start fresh?

@LiberalArtist
Copy link
Contributor Author

LiberalArtist commented Nov 14, 2017

I was trying to split the minimal changes into a different commit than the broader reformatting, which involved rewriting the history. (Maybe there is a more elegant way to do this with git/GitHub, but making a fresh branch was the simplest I could figure out.)

I am giving thought to the Windows MAX_PATH restrictions. (This is complicated by the fact that I don't have easy access right now to a Windows machine for testing.) I'm leaning towards hashing the path as you suggested, but the downside is that it makes it unclear to humans which backup/autosave file refers to which original, so, at a minimum, it seems like we should only fall back on hashing for excessively-long paths on Windows. It might also be an argument in favor of writing some kind of table showing that mapping, but doing that would seem to have more far-reaching implications for concurrency etc. than I was hoping for … From the article you linked to, it sounds like forming the Windows paths with \\?\ might be an alternative, but that would probably need some experimentation on a Windows machine to make sure it works as expected.

As far as handling the case that the user-specified directory no longer exists, it seems like the place to do that would be editor:backup-autosave-mixin. I guess the question is what the mixin should do in that case. The easy thing seems to be falling back to the default location, but I could also see an argument for displaying a warning to the user, or … Maybe the answer is to fall back for now, and think about eventually providing an overridable hook for clients of the framework who want different behavior.

@LiberalArtist
Copy link
Contributor Author

Actually, as I read editor:backup-autosave-mixin more closely, it looks like it always calls path-utils:generate-autosave-name or path-utils:generate-backup-name immediately before attempting to save the file, so I guess the thing to do is just have those functions double-check that a directory from the preference still exists. (That's how I originally envisioned this working.)

@LiberalArtist
Copy link
Contributor Author

Reading more from the Reference on Windows paths, it looks like Racket already handles avoiding MAX_PATH by switching to the \\?\ form:

Outside of Racket, except for \\?\ paths, pathnames are typically limited to 259 characters. Racket internally converts pathnames to \\?\ form as needed to avoid this limit. The operating system cannot access files through \\?\ paths that are longer than 32,000 characters or so.

I think that means the only concern we have to deal with is whether the result of encode-as-path-element on Windows exceeds the limit for individual elements of \\?\ paths: according to the source you provided, that is "the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation function (this value is commonly 255 characters)".

It does seem plausible that the result of encode-as-path-element could exceed 255 characters, and I think hashing is probably the easiest solution if it does. Is there a good way to check the lpMaximumComponentLength from Racket? For now I will assume 255 characters.

@LiberalArtist LiberalArtist force-pushed the designated-backup-directory-revised branch from fd22755 to 2c3f613 Compare November 15, 2017 01:03
@rfindler
Copy link
Member

re rewriting history: yes, I think that's better than creating a new pull request (you're effectively deleting the old commits but keeping the conversation when you do that in this context, whereas creating a new pull request deletes the old commits and the old conversation (or, at least, loses the connection between them)).

re windows paths under racket with \\?\: fantastic! I didn't realize that.

re finding the max path length programmatically: I don't know how to access that; I guess the FFI would be involved. I think that a note in the source with a url pointer is probably better than trying to figure out how to make the ffi work in this case.

re windows-specificity: I think that's unwise. We don't have enough of the core developers using windows so adding more conditionals like (if (equal? (system-type) 'windows) ...) is asking for trouble. Lets do the same thing on all platforms.

Instead of hashing, one thing that could work would be to just delete the characters in the middle. If that seems bad, then maybe we should add a bit more robustness to the encoding would already help. What about using the first byte to specify what the separator is. It can be ! if it isn't used in any of the components and then it can cycle through some nice set of bytes if it is used?

As for how to hash, an S-box hash seems like a good way to go; easy to implement and it can use up the full 255 bytes (or maybe safer to just stick with 200). Here's some code that probably works. Maybe.

#lang racket
(provide
 (contract-out
  [sbox-hash
   (-> bytes? exact-integer?)]))

(define maximum-hash-code (expt 256 200))

;; one particular (randomly-generated) permutation of the bytes
(define sbox
  #(141 154 101 111 48 245 199 117 223 97 135 124 209 157 180 28 166 241 6 90
        178 168 202 197 87 177 238 203 156 237 25 193 40 77 249 206 62 98 21
        198 2 169 226 39 80 14 134 155 56 91 112 110 54 182 212 145 59 26 37
        247 60 12 23 251 78 185 1 119 132 170 84 172 235 254 215 3 189 93 136
        224 233 118 150 234 68 217 236 86 126 186 83 75 50 148 61 95 24 138 192
        100 231 140 72 175 43 153 115 161 42 71 11 102 191 109 18 137 81 47 5
        165 219 195 106 10 125 146 174 99 74 194 227 210 69 36 19 46 103 66 44
        196 253 171 92 151 49 22 123 16 167 242 158 133 208 221 243 41 122 4 45
        183 52 213 220 246 248 143 114 82 20 244 228 105 163 53 15 33 13 179 51
        160 129 32 9 130 17 184 7 144 222 96 190 67 149 152 38 94 116 214 240
        232 27 85 104 229 35 128 250 127 164 252 64 121 239 211 88 139 142 70
        31 108 55 79 89 200 107 173 176 8 230 218 159 76 131 207 147 0 73 187
        30 162 65 63 216 120 204 57 113 201 188 205 181 34 58 225 29))

(define (sbox-hash bs)
  (modulo
   (for/fold ([hash 0])
             ([b (in-bytes bs)])
     (+ (* hash 3)
        (vector-ref sbox b)))
   maximum-hash-code))

As for the toc, I realized that there already is one: https://github.com/racket/gui/blob/master/gui-lib/framework/private/autosave.rkt#L35 . I see that it isn't thread safe, tho. Maybe that's better left to another day?

@LiberalArtist LiberalArtist force-pushed the designated-backup-directory-revised branch from 012eb39 to 2051d76 Compare November 15, 2017 02:48
@LiberalArtist
Copy link
Contributor Author

I like the idea of eliding some bytes from the middle: in most cases the resulting name would probably still be decipherable to the user (certainly more so than a hash). Since this solution doesn't have that major drawback, I agree that avoiding Windows-specific code seems to make sense.

I had noticed that toc as well, though it only covers autosave files, not backups. I agree it makes sense to address that as a separate issue, especially since these generated names should be more-or-less human-readable.

@rfindler
Copy link
Member

My experience with DrRacket is that the backup files aren't very useful. Maybe we should try to make them useful?

@LiberalArtist
Copy link
Contributor Author

I've added tests to confirm that path-utils:generate-autosave-name and path-utils:generate-backup-name fall back as expected when the directory from the relevant preference is deleted.

I was unsure whether I should also add tests for excessively-long path elements, since that would require propagating the magic number 255 into another file. A similar question is what, if anything, the docs should say about this behavior. My initial inclination would be to describe how path elements that are "potentially excessively long" are transformed while leaving unspecified how we determine what is "potentially excessively long".

(For the docs overall, I've adapted what the DrRacket docs say about backup and autosave files, though I need to make a few more updates to reflect subsequent changes.)

@LiberalArtist
Copy link
Contributor Author

Do you think we should check that the current user has write permission in the same places as directory-exists?? I can see an argument for not checking and letting the attempt to save error: I think I would personally prefer to see an error and be able to go fix the permissions than to have the autosave/backup behavior silently change.

@rfindler
Copy link
Member

Yes, it is a good idea to test long filenames (effective tests are, in general, tuned to the specific implementation; it is nice to have tests that aren't, but I don't think we can stop there as that would mean there are untested code paths).

I think that the docs should explain what the function does concretely. If you think that this number will change then the right approach is to use the phrase "excessively long" but then add a note that "currently excessively long currently is 255". I do not think that this is likely to change ever, so I think just saying 255 is fine (but also mentioning where this limit comes from in the docs).

I think checking that the directory exists and that there is write permission together is the right thing. (Filesystems are racy, tho, so checking it is only about handling common situations well; the operation might still fail right after the check succeeds in rare situations.)

@LiberalArtist
Copy link
Contributor Author

About checking write permission: this is my ignorance about the preference system, but, if the predicate function given to preferences:set-default reports that a value from the user's preferences file is not valid, can that cause the user-specified value to be overwritten? (I don't see anything in the docs that says that happens, but it seems imaginable that it could, and I don't see the docs explicitly saying that it won't happen.)

It is probably an edge case, but I can immagine using the same preference file while intentionally running programs as different users/with different permissions, so I think not currently having write permission for the directory specified by the preference should cause the generate- functions to fall back without overwriting the saved preference, since something else (or some future run) may have the necessary permission. That could be accomplished by only checking that there is write permission in make-getter/ensure-exists (in framework/private/path-utils.rkt).

Of course, if the predicate given to preferences:set-default never causes the value in the file to be overwritten, than this is just a lot of spilled electrons over a moot point.

@rfindler
Copy link
Member

The framework's preference library writes to the file only when preference:set is called. When the file is read and a value doesn't satisfy the preference, it is just ignored and the default is used.

But the preference file is intended to be used by only a single user. It seems like lots of things will go wrong if multiple users use the same preferences file.

@LiberalArtist LiberalArtist force-pushed the designated-backup-directory-revised branch from 6d080b0 to 360a680 Compare November 22, 2017 05:20
@LiberalArtist
Copy link
Contributor Author

Ok, I have dealt with permissions, added tests for both that and long file names, and updated the docs accordingly. I've also edited the history back down to two commits.

@LiberalArtist
Copy link
Contributor Author

LiberalArtist commented Nov 22, 2017

While running tests, I found two more todos:

  1. Currently, my tests appear to overwrite the user's values for these preferences. I had thought I had avoided that by using preferences:low-level-put-preferences and preferences:low-level-get-preference, but I seem to need to do something more.
  2. The contract for path-utils:generate-autosave-name specifies that it accepts path-for-some-system? values, but the new implementation doesn't work on values with the non-native convention type when there is a non-false preference value.
    It isn't clear to me that path-utils:generate-autosave-name should really accept path-for-some-system? values. The old implementation seems to handle them mostly accidentally, and the results aren't especially useful. For example, on Mac OS, the old implementation breaks when you try (path-utils:generate-autosave-name (bytes->path-element #"b/ad" 'windows)).
    I can add code to handle this if there's a good reason to accept path-for-some-system? values (which could potentially include that someone somewhere might have relied on the documentation and now be using the function for something unexpected), but it seemed worth contemplation.

@LiberalArtist
Copy link
Contributor Author

On the first item, I think what I was seeing was a byproduct of working at the REPL in the same file where I'd run the tests: I think preferences:get was consulting a cache that didn't react to the change in preferences:low-level-get-preference after leaving the parameterize, but I don't think the file was actually overwritten.

Running this program:

#lang racket

(require framework)

(define pref-key
  'a-fake-preference)

(preferences:set-default pref-key 'default (λ (v) #t))

(define pref
  (preferences:get/set pref-key))

(define (do-work)
  (let ([the-prefs-table (make-hash)])
    (parameterize ([preferences:low-level-put-preferences
                    (λ (syms vals)
                      (for ([sym (in-list syms)]
                            [val (in-list vals)])
                        (hash-set! the-prefs-table sym val)))]
                   [preferences:low-level-get-preference
                    (λ (sym [fail void])
                      (hash-ref the-prefs-table sym fail))])
      (preferences:set pref-key 'do-work)
      the-prefs-table)))

(pref)
(do-work)
(pref)

produces the same output every time:

'default
'#hash((plt:framework-pref:a-fake-preference . do-work))
'do-work

and, if set a custom value interactively (e.g. (pref 'custom-value)), that value is preserved and returned consistently by the first call to pref (the one before do-work) every time.

So I think this was a false alarm on my part.

@LiberalArtist LiberalArtist force-pushed the designated-backup-directory-revised branch from 360a680 to 561ccdc Compare November 23, 2017 19:31
@LiberalArtist
Copy link
Contributor Author

I've written code to make path-utils:generate-autosave-name always do what the current implementation does for foreign path-for-some-system? values (ignore everything but the final path element; convert that to a path?) and also handle cases that break the current implementation, like (path-utils:generate-autosave-name (bytes->path-element #"b/ad" 'windows)) on Mac OS or Unix. I'm just not sure if what the current implementation does is actually useful, or if anyone intentionally calls path-utils:generate-autosave-name with values that wouldn't satisfy (or/c #f path-string?).

On the other hand, it's easy enough to support and removes even the question of backwards-compatibility.

@LiberalArtist
Copy link
Contributor Author

I've added a commit to make path-utils:generate-autosave-name work with foreign path-for-some-system? values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants