Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request streaming with Zinc #1196

Merged
merged 42 commits into from
Dec 25, 2023
Merged

Conversation

theseion
Copy link
Member

@theseion theseion commented May 20, 2020

This PR adds request streaming with Zinc to Seaside.
The use case for this are primarily multipart/form-data file uploads. Usually, a request is read into memory completely before being processed, which means the contents of an entire file will end up in memory, regardless of the size of the file. This may kill the image in the worst case or lead to swapping / thrashing on the host machine. With request streaming it is possible to read the request in chunks and write these chunks to disk, so that no more than the chunk size has ever have to be allocated to process the entire file.

How it works

I've extended Zinc-Seaside with an option ZnServer>>streamUploads: to enable the feature. When request streaming is enabled, multipart/form-data requests will be read in chunks. If the request contains files they will be streamed to temporary files on disk.

There are two new important classes in Zinc-Seaside: ZnRingBuffer and ZnStreamingMultiPartFormDataEntity. These are the classes that implement the parsing logic.

The temporary files are represented by WAExternalFile which I moved to Seaside-Core. Since we already have WAFile I've created a hierarchy for file classes to make the different implementations polymorphic as far as possible.

I've also extended Grease with two methods for creating and writing temporary files (see SeasideSt/Grease#103).

Caveats

This should all be part of Zinc in my opinion but Zinc has a lot of changes that aren't yet part of the stable image. Those changes will make some of the things simpler but they also make some things harder, e.g. it will no longer be possible to use ZnRingBuffer because the underlying buffered streams will pass the buffer to primitives (can't work since ZnRingBuffer is a wrapper`).
I'm hoping that we can contribute this to Zinc in the future but for now I think it's best to have all the code in one place.

Additional stuff

This PR also includes overrides of Zinc methods that don't respect custom encoding (uses ZnCharacterEncoder utf8 instead of ZnCharacterEncoder default, which falls back to using ZnDefaultCharacterEncoder value, which in turn can be set with #defaultEncoder: on ZnServer`).

@theseion
Copy link
Member Author

theseion commented May 21, 2020

As we discussed in SeasideSt/Grease#103 I've moved #newTemporaryFileReference to the new Seaside-Zinc-Pharo package together with all of the stuff for request streaming.

I've also renamed Zinc-Seaside to Seaside-Zinc, but we can revert that if you don't like it.

Seaside-Seaside-Pharo-Tests contains ZnRingBufferTest which I had forgotten to move to Seaside before.

@theseion
Copy link
Member Author

I've reverted the renaming of Zinc-Seaside and opened a separated issue to do it for 3.6: #1197

@jbrichau
Copy link
Member

@theseion Some of the changes to the core of the Zinc-Seaside package are breaking things in GemStone. I can take a look if I can 'fix' them but it will probably be only by the end of the week.

Another option is create a specific subclass for the adapter and use that one in Pharo?

@theseion
Copy link
Member Author

Actually, nothing should break, that's why we put everything into a Pharo specific package. I'll look into it, maybe it's just a problem with the baseline.

@jbrichau
Copy link
Member

I did not investigate into detail, but there are still changes to the Zinc-Seaside package itself. My guess is that these break the Zinc adapter in GS. I did not try, so I cannot tell for sure.

@jbrichau
Copy link
Member

jbrichau commented Jul 3, 2020

hey @theseion
I suspect that if you merge in the changes from the master branch, some tests might clear up.

If not, I can take a look as it's Gemstone tests that are failing. But it would be better to have the last version first merged back into this branch as I have been making extensive changes for Gemstone tests on master.

@theseion
Copy link
Member Author

theseion commented Jun 4, 2022

@jbrichau I've picked this one up again (it's been a while...). It requires the changes from SeasideSt/Grease#138.

@theseion
Copy link
Member Author

theseion commented Jun 4, 2022

Set up for quick testing:

  1. Add ZnZincStreamingServerAdaptor as adaptor in the Seaside panel
  2. Create a new component with the following content:
    renderContentOn: html
        html form
            multipart;
            with: [
                html fileUpload callback: [ :file | self halt ].
                html submitButton ] 
  3. Open a browser pointed at the component and submit a file
  4. You'll now see an open debugger in the image. The file argument holds a WAExternalFile that references a file in the temp location, which holds the contents of the uploaded file. This file was created by reading the file in chunks instead of using the default method of reading the file into memory.

@theseion
Copy link
Member Author

theseion commented Jun 4, 2022

  • The failure in Pharo 10 is an unexpected pass. That should be fixed in another PR.
  • The failures in 6.1 seem strange. Not sure whether they are my fault.
  • Not sure about the GemStone failures. In GemStone I can see MNU's because of Pharo specific messages. Does Gemstone use Zinc too?
  • The failures in Squeak don't seem to be related, but I didn't check. Do you expect the Squeak tests to pass?

@jbrichau
Copy link
Member

jbrichau commented Jun 6, 2022

I removed the expected failure from the master branch.
I'll take a look at the Gemstone failures and see what I can do.

Squeak will be left unmaintained if nobody takes care of it. I don't have the time to dedicate to it.

@jbrichau
Copy link
Member

jbrichau commented Jun 6, 2022

Some things would need to be cleared up before we can merge this one into master:

  • There is now a conflict with the Seaside-ExternalFileUploads package, which also defines the class WAExternalFile.
  • Some functionality in Seaside-Core is Pharo specific (e.g. fileReference)
  • The Zinc-Pharo package includes the methods that are now in Grease (since merging File utilities for streaming uploads Grease#138)

I'm willing to help out getting things sorted. If you want, merge it into a branch on SeasideSt and we can work on it collaboratively? That would also help me getting the GemStone implementation merged in (if I can fix it)

@theseion
Copy link
Member Author

theseion commented Jun 6, 2022

You already have write access to this branch, I don't think I need to move it to SeasideSt. Or do you need it on SeasideSt for Gemstone?

I will rebase on top of master and start addressing the things you've listed. I'll start with WAExternalFile.

Thanks for the help!

@theseion
Copy link
Member Author

theseion commented Jun 6, 2022

Squeak will be left unmaintained if nobody takes care of it. I don't have the time to dedicate to it.

Fair. @tcj might be able to help with Squeak?

@jbrichau
Copy link
Member

jbrichau commented Jun 6, 2022

You already have write access to this branch
ah ok, I will commit there then!

@tcj
Copy link

tcj commented Jun 7, 2022

Fair. @tcj might be able to help with Squeak?

I will help as best I can. Does it help for me to run smalltalkCI locally? (Running it for the first time locally now.)

I understand this is about getting the tests checks to pass for Squeak. The tests checks must be failing for Squeak on every commit, and not just this one, correct? (Of course, the main functionality of this commit does not apply to Squeak since Squeak does not have Zinc.)

The failure behind WANoDuplicateUuidsTest looks like a dependency problem in the BaselineOfSeaside3... would you recommend I file a new issue regarding this and dig in there? I may have tried to dig into this issue before, actually: packages requiring Pharo somehow getting into the spec for Squeak, via the Tests groups... if I recall.

GRError: 'Seaside-Tests-Slime' depends on unknown package 'Seaside-Pharo-Slime'

@theseion
Copy link
Member Author

theseion commented Jun 8, 2022

Thanks for offering @tcj.

I will help as best I can. Does it help for me to run smalltalkCI locally? (Running it for the first time locally now.)

That certainly helps, yes.

I understand this is about getting the tests checks to pass for Squeak. The tests checks must be failing for Squeak on every commit, and not just this one, correct? (Of course, the main functionality of this commit does not apply to Squeak since Squeak does not have Zinc.)

Correct. There are a couple of tests that have been failing in Squeak for a while.

The failure behind WANoDuplicateUuidsTest looks like a dependency problem in the BaselineOfSeaside3... would you recommend I file a new issue regarding this and dig in there? I may have tried to dig into this issue before, actually: packages requiring Pharo somehow getting into the spec for Squeak, via the Tests groups... if I recall.

That would be a great start, yes!

@theseion
Copy link
Member Author

Thanks for working on this! I'll get back to this as soon as possible.

@jbrichau
Copy link
Member

jbrichau commented Jul 1, 2022

@theseion no problem. I just did a bit of integration work and the streaming seems to work fine. I still need to take a look at the gemstone issues that cause the builds to break.

@jbrichau
Copy link
Member

I think I'm almost there. The last commits show I need to catch some sleep before finishing this off.
There should only be one failing test in GemStone that I will need to investigate.

@jbrichau
Copy link
Member

@theseion Almost there! GemStone 3.7 works: https://github.com/SeasideSt/Seaside/actions/runs/7244577602
The use of FileReference should be changed to a Grease abstraction for the older GemStone versions, but that should not be a problem. I will also have to do some manual tests but it seems we will be able to merge this one after all this time.

@theseion
Copy link
Member Author

Wow! Thanks!

@jbrichau
Copy link
Member

@theseion All tests are green now. Looks good to me. What do you think?

jbrichau
jbrichau previously approved these changes Dec 23, 2023
@theseion
Copy link
Member Author

LGTM. I'm the author of the PR, so I can't review.

@theseion theseion merged commit bce2e80 into SeasideSt:master Dec 25, 2023
10 of 14 checks passed
@theseion theseion deleted the streaming-uploads branch December 25, 2023 11:08
@theseion
Copy link
Member Author

Yay! 🎉

jbrichau added a commit that referenced this pull request Jan 2, 2024
@jbrichau
Copy link
Member

jbrichau commented Jan 2, 2024

@theseion I noticed that the Zinc adaptor was doing streaming file uploads when the option was set to false. This was due to boolean mistake that was fixed in 6003bc7.

However, I then got the failing test testEncodingFunctionalTest where a field submitted in multipart form was always converted to a WAFile. I fixed it in 2734bfd basically by reverting that code to how it was before this PR. Giving you this heads up to check if this change was inadvertent or had a specific purpose.

Oh, and Happy New Year! ;-)

@theseion
Copy link
Member Author

theseion commented Jan 3, 2024

I don't recall the reason for doing it. As far as I recall, it is possible for a file input to be submitted with a file but with an empty file name. In that case, the part should be decoded as a file anyway, which is why I added the #or: check for binary encoding. My reasoning was that if the encoding is binary then it is reasonable to assume that we're dealing with a file.

Looking at the code again now I think we could just make the following change:

"..."
part fileName
    ifNil: [ self codec url decode: part fieldValueString ]
    ifNotNil: [ self convertMultipartFileField: part ] ].
"..."

There's another issue with this code though, as it's now possible to send an unlimited amount of data with multipart/form-data, which would consume all the memory. It would be make sense, IMO, to use streaming for all data above a certain length (e.g., content length > 8MB), and for all chunked transfers (since we can't know the length of those).

@jbrichau
Copy link
Member

Addressed the above in cb9f147

@astares
Copy link
Contributor

astares commented Mar 20, 2024

@theseion Reintegration into standard Zinc would be nice. Did you already speak to @svenvc?

@theseion
Copy link
Member Author

No, I haven't. I agree, this should probably be part of Zinc. Currently, the implementation relies on features of Seaside (like temporary files), but that shouldn't be a huge issue. I will not have the time to contribute it myself, but I'd be around to advise, of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants