Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/rest: process HTML entities within XML #5848

Merged
merged 1 commit into from
Dec 1, 2021
Merged

Conversation

CodingKoopa
Copy link
Contributor

@CodingKoopa CodingKoopa commented Dec 1, 2021

What is the purpose of this change?

The MEGA SDK - and, by extension, MEGAcmd - escapes many HTML entities in the XML it serves as the WebDAV server. This is not conformant XML, and I have already submitted a PR to fix this. That having been said, I am submitting a fix for rclone too for a couple of reasons:

  • MEGAcmd development seems to be rather slow, and it could be some time until it's updated with the latest SDK - that is, if a fix is merged into the SDK. Moreover, the mega backend currently has an issue (Mega.nz Session Cache #4695) that makes it difficult to use, making this WebDAV functionality useful.
  • The MEGA SDK might not be the only WebDAV application exhibiting this behavior. In these cases, it should be preferable for rclone to process the entities rather than throwing an error.

For reference, this is what currently happens when using a problematic filename with MEGAcmd's WebDAV server, and rclone:

2020/11/16 12:50:00 ERROR : IO error: couldn't list files: XML syntax error on line 253: invalid character entity &add;

Was the change discussed in an issue or in the forum before?

Most directly, this issue is reported here and in #4885. The "rather strict" XML parser is also discussed in this comment.

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

Thanks!

MEGAcmd currently includes escaped HTML4 entites in its XML messages.
This behavior deviates from the XML standard, but currently it prevents
rclone from being able to use the remote.
@ncw
Copy link
Member

ncw commented Dec 1, 2021

What do you think the backwards compatibility problems with this patch might be given that the jottacloud and sugarsync backends use the XML parser in lib/rest as well as the webdav backend. Does this need to be an option?

@CodingKoopa
Copy link
Contributor Author

CodingKoopa commented Dec 1, 2021

Great question! I did notice that those backends use this function too, and indeed was concerned about breaking them. I didn't address it initially because I deemed the act of processing HTML entities to be a "non-destructive" action, particularly as the alternative is the parser throwing an error. If you think it would be wiser to restrict this to webdav, I could make it an option. After all, there's no reason why those two services should start acting out of specification, compared to the larger amount of WebDAV software.

@ncw
Copy link
Member

ncw commented Dec 1, 2021

Here are the docs: https://pkg.go.dev/encoding/xml#Decoder

	// Strict defaults to true, enforcing the requirements
	// of the XML specification.
	// If set to false, the parser allows input containing common
	// mistakes:
	//	* If an element is missing an end tag, the parser invents
	//	  end tags as necessary to keep the return values from Token
	//	  properly balanced.
	//	* In attribute values and character data, unknown or malformed
	//	  character entities (sequences beginning with &) are left alone.

The first fix seems OK and unlikely to cause a problem. The second fix - means that the parse would have thrown an errror rather than leave the undecodable entities according to the source code, so that is OK too.

The second part of your patch is decoder.Entity = xml.HTMLEntity and after a look at the source code of the XML library, I don't think this will cause a problem either since those entities would have thrown an error before.

So I think, on further reflection, this patch is safe and won't break anything.

We'll see soon enough in the integration tests and if it does cause a problem we'll need to make it conditional.

So I'll merge this now - thank you :-)

@ncw ncw merged commit 0681a5c into rclone:master Dec 1, 2021
@ncw
Copy link
Member

ncw commented Dec 1, 2021

PS since you are interested in using mega with webdav, would you like to write some docs in the webdav backend for it? Or maybe we could make a webdav provider for it? Possibly rclone could even start/stop the mega process.

@CodingKoopa
Copy link
Contributor Author

Sure, I can think about what I could do ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants