Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CherryMusic ignore hidden files #520

Open
5 tasks
tilboerner opened this issue Mar 5, 2015 · 6 comments
Open
5 tasks

Make CherryMusic ignore hidden files #520

tilboerner opened this issue Mar 5, 2015 · 6 comments

Comments

@tilboerner
Copy link
Collaborator

Proposal spawned by #518 (in which CherryMusic scans a .git directory):

Hidden files in the basedir should be completely ignored by CherryMusic. CherryMusic's API should behave as if they didn't exist.

For our purposes, a file is "hidden" if its name starts with a .. Not worth it to accomodate Windows here.

We should make sure that hidden files are:

  • not scanned,
  • not in file database,
  • not in browse results,
  • not in search results,
  • not served.

Some of these cases are already handled this way, but CM should be consistent here.

@devsnd, @6arms1leg: Interested to hear your comments. Any reasons why we shouldn't do this? Do we need a whitelist? If so, can we expect that list to remain small and manageable?

@devsnd
Copy link
Owner

devsnd commented Mar 6, 2015

i See no reason why the would be a reason for a whitelist. just skip/hide hidden files everywhere. and I don't think we need to remove them retroactively. this would happen automatically when rescanning the files anyway, if we include make sure the filedb does not index them.

On March 5, 2015 12:19:12 PM CET, Til Boerner [email protected] wrote:

Proposal spawned by #518 (in which CherryMusic scans a .git
directory):

Hidden files in the basedir should be completely ignored by
CherryMusic. CherryMusic's API should behave as if they didn't exist.

For our purposes, a file is "hidden" if its name starts with a .. Not
worth it to accomodate Windows here.

We should make sure that hidden files are:

  • not scanned,
  • not in file database,
  • not in browse results,
  • not in search results,
  • not served.

Some of these cases are already handled this way, but CM should be
consistent here.

@devsnd, @6arms1leg: Interested to hear your comments. Any reasons why
we shouldn't do this? Do we need a whitelist? If so, can we expect that
list to remain small and manageable?


Reply to this email directly or view it on GitHub:
#520

@tilboerner tilboerner changed the title Make CherryMusic ignore hidden files? Make CherryMusic ignore hidden files Mar 7, 2015
@6arms1leg
Copy link
Collaborator

No objections here, either. 👍 Also, I don't see the need for a whitelist.

@tilboerner
Copy link
Collaborator Author

I looked at some example data. Filtering dot-anything is a bad idea. In fact, I think we need a smarter filter, or a blacklist, or we dont do this at all.

Will post some examples when back @ keyboard.

@tilboerner
Copy link
Collaborator Author

Alright, the vast majority of "true" hidden names are short and contain only alphabetical characters after the initial ., with the exception of maybe ONE more of these: ._-. We'll never have absolute certainty, but given the following examples I came across, we should be fine using

[.][a-zA-Z]+

as a filter.

Interestingly, all files I found starting with a . were good to be ignored; only directories were problematic. I wouldn't consider that a rule, though.


Here are some example "dotfile" (and directory) names. Some of the name-y bits are altered to protect the privacy of my data source. 😸 Non-alphabetic characters are the same as in the actual name.

Example directory things that SHOULD NOT be filtered out:

  • ... Damning Stinkwell Up Integration The C***
  • ...To Be Fitted (1971)
  • .Decompulse_Glycolipids_Dressionist_Stylize_Biopoly
  • .Gormants.Actuation.1994
  • .And.Now.The.Introids.Empty.1991
  • .O.Prebinding

Example file things that are pure metadata and SHOULD be filtered out:

  • ._01 Her Swatchtowering Are Deportorial The Stockman.mp3
  • ._07 Tautly The Ammonial.mp3
  • ._2-08 Strumpet You Running Bagpipes.mp3
  • .08 - Synkaryocytic Equability.mp3.tenebriating7c (caught by isplayable)

Here are some clear names we want to filter:

  • Directories
    • .AppleDouble
    • .FBCLockFolder
    • .mediaartlocal
    • .git
  • Files (caught by isplayable):
    • .DS_Store
    • .Parent
    • .date
    • .message
    • .mp3genre
    • .ioFTPD

@devsnd
Copy link
Owner

devsnd commented Mar 14, 2015

Alright, I'd propose to use a regex blacklist. We can compile the regexes on server startup so there shouldn'd be any noticable performance difference. This list of filters might do the trick:

[
    '\.AppleDouble',
    '\.FBCLockFolder',
    '\.mediaartlocal',
    '\.git',
    '\.DS_Store',
    '\.Parent',
    '\.date',
    '\.message',
    '\.mp3genre',
    '\.ioFTPD',
    '\._.*',
]

But of course there are more to come. Regex might already be overkill but I'd rather be safe than sorry.

@tilboerner
Copy link
Collaborator Author

Yeah, let's treat the strings as regexes from the start.

Would it be a bad idea to concat them all into a single expression like ^(expr1)|(expr2)|...$? For example, if the list grows a bit longer?

I'm wondering about this little fella: \._.*. What's he up to? He's a leading space away from the .Decompulse_Glycolipids_Dressionist pattern, which we want to allow, so I don't trust him very much. But I'm willing to try coexisting with him.

By the way, have you heard there's a new album by the_underscores, that well-known band of indie python coders? It's titled .___ (pronounced "attribute whitespace"). ;) But yes, I agree, they are quite silly and can't expect to be indexed by anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants