Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: ability to view/get/copy/move files on the filesystem itself when the store is filesystem #1006

Open
electrofloat opened this issue Feb 28, 2023 · 7 comments

Comments

@electrofloat
Copy link

As I understand now when the store is filesystem, it stores the files in chunks.

So when someone uploads a file the admin of sharry has to use the web interface to download/use that file.

Would be nice if we have an ability to just see/use uploaded files as is on the filesystem itself or at least have a built in way to access them this way.

@eikek
Copy link
Owner

eikek commented Feb 28, 2023

I think I'm not exactly sure what you mean. It is correct that sharry stores files in chunks when uploading via the webui. This makes it possible to upload large files more reliably and also allows pausing/continuing uploads. Files can be accessed using the api directly - no need to use the webui. It allows to query and download files. For example, using curl or similar some script could be written to download all files from a share (independent of the file backend). Another possibility is to concatenate the chunks on the filesystem directly and use the api to lookup their metadata (name, mimetype etc).

@electrofloat
Copy link
Author

So, what I'd like to have is that if someone uploads a file, I just access that file directly on the filesystem without using curl to "download" something that is already on the filesystem itself.
Concatenation is a possible solution but would need some helper tool to do it easily.

@eikek
Copy link
Owner

eikek commented Feb 28, 2023

Ah I see, I'm afraid this is not a good fit for sharry. You can upload a file without chunking, and sharry will store it as a single file (but not via the provided webui). The metadata will always be in the database, though.

One of the goals is to have uploads of large files working well. That was one of the biggest problems for me :). That's the reason for the chunking. So to implement this, it would mean to combine the chunks after uploading, which means using double the disk space and much longer processing. This is not really feasible for large files in my opinion. I don't see a lot of trouble in running a cat command for larger uploads, if needed. For many smaller files, going through the local tcp/ip stack is a negligible cost to me (agree it's not perfect for this specific case). With the latter approach, the script/tool could be used in many other ways (not only on the host).

@mpdcampbell
Copy link

So, what I'd like to have is that if someone uploads a file, I just access that file directly on the filesystem without using curl to "download" something that is already on the filesystem itself. Concatenation is a possible solution but would need some helper tool to do it easily.

If you want a helper tool, this script recombines the chunks with brute force concatenation and assigns file extensions. You do lose the original filenames though.

https://github.com/mpdcampbell/sharry-chunk-combiner

@eikek
Copy link
Owner

eikek commented Mar 1, 2023

https://github.com/mpdcampbell/sharry-chunk-combiner

Oh nice! I guess it would be indeed good to have some tooling to at least ease this kind of task - great one exists already!

@electrofloat
Copy link
Author

I think I'm not exactly sure what you mean. It is correct that sharry stores files in chunks when uploading via the webui. This makes it possible to upload large files more reliably and also allows pausing/continuing uploads. Files can be accessed using the api directly - no need to use the webui. It allows to query and download files. For example, using curl or similar some script could be written to download all files from a share (independent of the file backend). Another possibility is to concatenate the chunks on the filesystem directly and use the api to lookup their metadata (name, mimetype etc).

I looked at the api, but it seems that there is no way to get the directory where a specific file resides in, to use cat on that.
The only info I could find about a file is with this: curl -H'Sharry-Auth: 1577132299230-NkV4dUYyZVlwdmQtaEZQeUVrNGpBZHktYUdpeXl5RFJrM3EtbkhrQUU2a2pWaVUvZWlrZS9mYWxzZQ==-$2a$10$jFhOEGYktHb8yiLF5mhHjO-CvDL2MniUH+RQv8dTSWPwhSUeIw=' http://localhost:9090/api/v2/sec/share/8P7GzxmLGjF-F7K4kXAhe8j-6AhLrQuFJGb-i54TTRZ3xn8 | jq
which does not contain info on directories.

So while the sharre-chunk-combiner seems great to get all the files, there is no simple way to just get 1 or two files.

Also.. is there a real reason to store the files on the filesystem chunked? IMHO what happens on the transport protocol (chunking) should be completely unrelated to how the files are stored. The files themselves can be accessed at any position even if it is just one big file.
Sure.. there are reasons to store a file in more than one piece but that reason should only depend on - in this case - the filesystem itself (for eg. it cannot store files larger than 4GB).

I've looked at other solutions but it seems non of them supporting this great feature of Sharry giving out just a random url and people can upload files there without logging in. (at least not in a simple way of just running a binary and be done with it, a lot of them needs a whole bunch of php nonsense on the server itself to work)
It would be really nice to handle the files on the filesystem store as is (or maybe create a new store filesytem-nonchunked so it remains backward compatible).
(and I also wish there would be a feature where I just copy a file on the filesystem to an already shared folder in Sharry, and it would appear on the web and would be downloadable, but I guess it is way out of scope of this otherwise greate software)

@eikek
Copy link
Owner

eikek commented Mar 1, 2023

@electrofloat thanks for your kind words and you have valid points of course. Maybe I can provide some explanation for the current state: Sharry was created mainly out of my need to have reliable uploads of large files. This is now done by creating chunks of the file at the client and sending them to the server. The server receives these chunks in any order - therefore the chunks must be stored as is, because it could get the last chunk first, for example. It is not the http chunking protocol meant here (if interested tus is used).This then allows to easily implement pausing/continuing from a dropped connection etc. It is also quite simple to implement like this. (and less work/complexity is always good :)) And as you said, due to filesystem limitations, it could be required to split large files anyways.

When getting the file back, it doesn't really matter how sharry stores it. At least for me :-). It will simply stream all chunks back (or a desired range). Sharry itself doesn't know file systems, and this is what I want to preserve. It is not filesystem based. Thherfore the api can't return anything related to a filesystem, because sharry has no idea. The filesystem "backend" is just one way to store these chunks and is more an "uninteresting" detail to the app. It would be possible to create another "file backend" that could concatenate all chunks after all of them have been received, but at this point the only thing known is the id. All other data about the file is in the database. I don't think this adds a lot of value, you then have files like 6QcXQ9qeSQb-VYF2p9M23XT-bMPRFkmtsPK-ejjJUns3Ymk without extension etc - not really useful. But if someone wants to do it, I would be fine if it's a new impl not affecting existing stuff.

So I'm afraid your last wish is not very feasible for this app. It could be done perhaps via some directory watcher mechanics, but I'm not very keen of this. However, I think focusing so strongly on the filesystem is not needed. It is possible to create a CLI that one could use with the server. When using localhost the data will go through the local tcp stack, but doesn't leave the machine. It is more costly for sure, but does the job as well, I think. The upside is much more flexibility - but I know that you don't need this :/.

If you want to interpret the filesystem structure, I can tell you how it currently works: The response from the endpoint you found contains file ids. These Ids are directories, where the first two characters form a subdirectory, inside which the full id is used for another directory, and finally this contains the file. You could use the response to lookup the corresponding file, concatenate chunks, rename etc.

Hope that clarifies a little, sorry for not being good news

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants