-
Notifications
You must be signed in to change notification settings - Fork 87
HTTP URIs and Servlet API methods
The processing of HTTP URIs in a Servlet container is dependent on the order in which processing occurs. The following sections are presented in processing order.
From RFC 7230 we have:
http-URI = "http:" "//" authority path-abempty [ "?" query ] [ "#" fragment ]
https-URI = "https:" "//" authority path-abempty [ "?" query ] [ "#" fragment ]
with each of those elements defined by RFC 3986.
Per RFC 7230, section 5.1 any [ "#" fragment ]
is ignored as it is for client side processing only.
getQueryString()
returns null for no query string and the empty string for an empty query string.
The return value of getRequestURI()
is the URI at this point in the processing. i.e. any query string and fragment have been removed but the URI is otherwise unchanged.
Neither RFC 7230 nor RFC 2616 makes any mention of path parameters. There is a reference to them in RFC 3986, section 3.3 but no formal definition. RFC 2396 has a slightly more formal definition of
segment = *pchar *( ";" param )
It also states:
Each path segment may include a sequence of parameters, indicated by the semicolon ";" character. The parameters are not significant to the parsing of relative references.
Early versions of the Servlet spec referenced RFC 2396. Up to and including the current specification, URL rewriting using a path parameter is explicitly defined in the specification document as the lowest common denominator of session tracking.
There have been various security vulnerabilities reported related to path parameter handling, often path traversal attacks using some form of /..;/
where different Servlet containers and reverse proxy combinations handle this differently resulting in unexpected behaviour.
We need to explicitly define path parameter handling in the context of a Servlet container so that users of the API have a consistent experience and implementors of reverse proxies targeting Servlet containers are able to implement those reverse proxies with a clear understanding of how the container will behave.
Given the Servlet specifications original reliance on RFC 2396 and the text from that RFC regarding lack of significance with relative references, I would like to propose the following:
- parse the URI to extract any session ID passed as a path parameter
- ignore all other path parameters
- if there is demand, and I don't think there is, we could implement issue #67 but I am currently leaning towards WONTFIX for that issue.
This means that path parameters would appear in getRequestURI()
but not in getContextPath()
, getServletPath()
or getPathInfo()
.
An alternative would be:
- Parse the URI to extract (and remove from the URI) any session ID passed as a path parameter.
- Leave all other path parameters as is. Context paths and Servlet paths that included path parameters would not match (unless the context path or servlet mapping included the path parameter), resulting in 404s. Path parameters in the pathInfo would be included in the call to getPathInfo() and the app would need to parse them if required.
Option | Strip out path parameters | Retain path parameters |
---|---|---|
jsessionid | Only appears in getRequestURI()
|
Only appears in getRequestURI()
|
Application path parameters | Parse from getRequestURI()
|
Parse getRequestURI() or getContextPath() /getServletPath() /getPathInfo() as appropriate |
Security concerns | Potential problems with reverse proxies with segments like /..;/
|
No issues with reverse proxies as HTTP considers the segment (including path parameters) to be opaque |
RFCs | Not consistent with current RFCs for URI and HTTP | Consistent with current RFCs for URI and HTTP |
RFCs | Because 3986 says any reserved character can be used to delimit a path parameter, removing all parameters could be tricky | Not a concern as nothing needs to be removed |
Backwards compatibility | Would break any app that was parsing path parameters from anywhere other than getRequestURI()
|
Would break any app that used path parameters but expected them not to be present in getContextPath() , getServletPath() or getPathInfo()
|
%nn decoding | Simplifies as only %2f needs careful handling in path | %nn encoding of any reserved character needs careful handling |
URI | getRequestURI() |
getContextPath() |
getServletPath() |
getPathInfo() |
getQueryString() |
Notes |
---|---|---|---|---|---|---|
"/context/servlet/path?a=b" |
"/context/servlet/path" |
"/context" |
"/servlet" |
"/path" |
"a=b" |
A simple case |
"/context/servlet/path?a=b#fragment" |
"/context/servlet/path" |
"/context" |
"/servlet" |
"/path" |
"a=b" |
Fragments are ignored |
"/context/servlet/path" |
"/context/servlet/path" |
"/context" |
"/servlet" |
"/path" |
null |
No query string |
"/context/servlet/path?" |
"/context/servlet/path" |
"/context" |
"/servlet" |
"/path" |
"" |
Empty query string |
"/context;c=d/servlet/path?a=b" |
"/context;c=d/servlet/path" |
"/context" |
"/servlet" |
"/path" |
"a=b" |
Assumes option 1 for path parameters (removed once the return value for getRequestURI() has been determined) |