Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLs not correctly parsed for ^1.0.0 with express #82

Closed
konnextv opened this issue Sep 13, 2020 · 12 comments
Closed

URLs not correctly parsed for ^1.0.0 with express #82

konnextv opened this issue Sep 13, 2020 · 12 comments

Comments

@konnextv
Copy link

Just a short heads-up: Using the current sapper setup URLs containing spaces (%20) stopped working in conjunction with express when I updated to 1.0.0. Sirv gives me a 404 if I try to access any such URL (I don't know if this extends to other special characters, too).

For repro, I am using [email protected].

I am sorry if this is a problem with express but I did some testing and I think chore(sirv): bump “@polka/url” for better decoding introduced the problem. (Might as well be an issue with @polka/url 🤔 )

@lukeed
Copy link
Owner

lukeed commented Sep 13, 2020

Hey, this is intentional – see #11, #20, #21

Your /foo%20bar.txt is looking for a /foo bar.txt file, which of course looks strange with spaces, but applies with any special characters too. For example, #11 presents the /fünke.txt scenario. When that request gets sent over the wire, it's received as /f%C3%BCnke.txt and so has to be decoded.

It actually gets trickier than that too, because the same character can be encoded in different ways depending on the HTTP client (see #11 (comment)), which means that decoding has to happen at the server in order to normalize the request path.

Hopefully that helps~! Closing, but please feel free to reply if I misread or got something wrong.
Thanks!

@lukeed lukeed closed this as completed Sep 13, 2020
@konnextv
Copy link
Author

Just to clarify:
You would expect Express to handle the URL parsing and sirv expects req.path to be /foo bar.txt instead of /foo%20bar.txt, correct?

@lukeed
Copy link
Owner

lukeed commented Sep 13, 2020

If coming from Express, yes, since req.path would be defined. Otherwise sirv takes care of itself via parser(req, true) – that true is a decode flag.

@konnextv
Copy link
Author

Understood, thank you.
I will open an issue over at sapper though since it took me a long time to figure this out because I didn't expect it to be an error with the current template used with express.

Cheers!

@konnextv
Copy link
Author

Or would it be possible to introduce a sirv flag that forces decoding even if req.path is defined already?

Because I just noticed that I cannot manipulate req.path myself (readonly) to circumvent this...

@lukeed
Copy link
Owner

lukeed commented Sep 13, 2020

There won't be a config, sorry. Customization is through the req.path value – you've given it what it's supposed to use, and decoding the same value twice can be problematic. (Decoding is also expensive perf-wise.)

This isn't a Sapper issue either.
Express purposefully doesn't decode incoming paths because it's expensive and can lead to mismatch. They'd rather say that '/fu%CC%88nke.txt' should 404 and '/fu%u0308nke.txt' should 200 (or vice versa) even though they both would match /fünke.txt file if decoded. They do, however, decode req.params values (only).

It's only because you've named your file(s) in a non-standard way that you've run into this problem.
Definitely recommend replacing spaces with hyphens or underscores in your file names. But if you don't/can't, here's how you can do it:

express()
  .use((req, res, next) => {
    req.path = decodeURIComponent(req.path);
    next();
  })
  // ...
  .use(
    compression({ threshold: 0 }),
    sirv('static', { dev }),
    sapper.middleware()
   )

@konnextv
Copy link
Author

konnextv commented Sep 13, 2020

Thanks for your advice, I understand.

But what you suggested does not work since req.path is readonly.

I got it to work by wrapping the sirv call in another middleware though:

express()
  .use(
    compression({ threshold: 0 }),
    (req, res, next) => {
      sirv("static", { dev }).call(
        null,
        { ...req, path: decodeURIComponent(req.path) },
        res,
        next
      );
    },
    sapper.middleware()
  )

So just one last question? Do you see any problem with this? To my knowledge the req object is preserved while still being able to change path only for further processing with sirv.

If not, I will suggest this approach for sapper users who want to use express.

@lukeed
Copy link
Owner

lukeed commented Sep 13, 2020

Yes, that's fine. It's not pretty, but IMO it's what you should be forced to do to accommodate your non-standard file names.

My above example can be changed to this (I forgot req.path is a getter)

express()
  .use((req, res, next) => {
    req.url = decodeURIComponent(req.url);
    next();
  })
  // ...

@konnextv
Copy link
Author

Info for others dealing with this: The above code does not work either, only my snippet helped in my case so far.

@lukeed
Copy link
Owner

lukeed commented Sep 13, 2020

Ran locally, this works with the static/foo bar.txt example presented in linked issue:

const express = require('express');
const sirv = require('sirv');

express()
	.use((req, res, next) => {
		req.url = decodeURIComponent(req.url);

		if (req._parsedUrl) {
			var nxt = decodeURIComponent(req._parsedUrl.pathname);
			req._parsedUrl.path = req._parsedUrl.pathname = nxt;
			req._parsedUrl._raw = req.url;
		}

		next();
	})
	.use(
		sirv('static', { dev: true })
	)
	.listen(3000)

Have to trick parseurl to not only work with the new req.url (which is fine, acceptable) but you have to modify its internal state/cache to send back the value you want.

FWIW, Express calls parseurl 6 times before the first bit of user code runs.

@benmccann
Copy link
Contributor

benmccann commented Aug 13, 2021

I do think this is a bug in sirv. The req.path should never be decoded. The fact that Polka is decoding it is a bug in Polka (lukeed/polka#142). sirv is matching Polka's incorrect behavior rather than Express's correct behavior. Soon sirv still stop working with Polka as well once lukeed/polka#172 is merged

@lukeed
Copy link
Owner

lukeed commented Aug 13, 2021

Solved by the Polka PR and a7bd672

The req._decoded check I just added should have always been in there, since it was sirv's way of preventing duplicate decodeURIComponent calls... but that was only true when it receives a request from a polka@next app, since Polka was previously writing the decoded value to req.path

Now that the latest polka@next (and Express) doesn't decode automatically anymore, req.path isn't trustworthy on its own. It needs req._decoded to be there too in order to trust it.

This combo-check is backwards compatible for polka@next users who don't upgrade and will unblock Express users for the first time, who have always had a "raw" req.path value set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants