-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.11] Add XPathSelector::quote
#575
Conversation
method to quote strings in XPath, similar to PDO::quote() Quoting strings in XPath is surprisingly difficult to do, in the edge case where the string contains both double-quotes and single-quotes. Got the algorithm from https://stackoverflow.com/a/1352556/1067003
of my own design. test code: <?php class C { /** * Quotes a string for use in an XPath expression. * * Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]") * * @param string $string * * @return string */ public static function quote2(string $string): string { if (false === \strpos($string, '"')) { return '"' . $string . '"'; } if (false === \strpos($string, '\'')) { return '\'' . $string . '\''; } // if the string contains both single and double quotes, construct an // expression that concatenates all non-double-quote substrings with // the quotes, e.g.: // concat("'foo'", '"', "bar") $sb = []; while (\strlen($string) > 0) { $bytesUntilSingleQuote = \strcspn($string, '\''); $bytesUntilDoubleQuote = \strcspn($string, '"'); $quoteMethod = ($bytesUntilSingleQuote > $bytesUntilDoubleQuote) ? "'" : '"'; $bytesUntilQuote = max($bytesUntilSingleQuote, $bytesUntilDoubleQuote); $sb[] = $quoteMethod . \substr($string, 0, $bytesUntilQuote) . $quoteMethod; $string = \substr($string, $bytesUntilQuote); } $sb = \implode(',', $sb); return 'concat(' . $sb . ')'; } /** * Quotes a string for use in an XPath expression. * * Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]") * * @author Robert Rossney ( https://stackoverflow.com/users/19403/robert-rossney ) * * @param string $string * * @return string */ public static function quote(string $string): string { if (false === \strpos($string, '"')) { return '"' . $string . '"'; } if (false === \strpos($string, '\'')) { return '\'' . $string . '\''; } // if the string contains both single and double quotes, construct an // expression that concatenates all non-double-quote substrings with // the quotes, e.g.: // concat("'foo'", '"', "bar") $sb = 'concat('; $substrings = \explode('"', $string); for ($i = 0; $i < \count($substrings); ++$i) { $needComma = ($i > 0); if ('' !== $substrings[$i]) { if ($i > 0) { $sb .= ', '; } $sb .= '"' . $substrings[$i] . '"'; $needComma = true; } if ($i < (\count($substrings) - 1)) { if ($needComma) { $sb .= ', '; } $sb .= "'\"'"; } } $sb .= ')'; return $sb; } } $tests = array( "foo", // no quotes "\"foo", // double quotes only "'foo", // single quotes only "'foo\"bar", // both; double quotes in mid-string "'foo\"bar\"baz", // multiple double quotes in mid-string "'foo\"", // string ends with double quotes "'foo\"\"", // string ends with run of double quotes "\"'foo", // string begins with double quotes "\"\"'foo", // string begins with run of double quotes "'foo\"\"bar" // run of double quotes in mid-string ); foreach ($tests as $test) { $quoted = C::quote($test); $quoted2 = C::quote2($test); var_dump([ 'test' => $test, 'quoted' => $quoted, 'quoted2' => $quoted2, // 'eval' => eval("return $quoted;"), ]); } ?> => array(3) { ["test"]=> string(3) "foo" ["quoted"]=> string(5) ""foo"" ["quoted2"]=> string(5) ""foo"" } array(3) { ["test"]=> string(4) ""foo" ["quoted"]=> string(6) "'"foo'" ["quoted2"]=> string(6) "'"foo'" } array(3) { ["test"]=> string(4) "'foo" ["quoted"]=> string(6) ""'foo"" ["quoted2"]=> string(6) ""'foo"" } array(3) { ["test"]=> string(8) "'foo"bar" ["quoted"]=> string(26) "concat("'foo", '"', "bar")" ["quoted2"]=> string(21) "concat("'foo",'"bar')" } array(3) { ["test"]=> string(12) "'foo"bar"baz" ["quoted"]=> string(38) "concat("'foo", '"', "bar", '"', "baz")" ["quoted2"]=> string(25) "concat("'foo",'"bar"baz')" } array(3) { ["test"]=> string(5) "'foo"" ["quoted"]=> string(19) "concat("'foo", '"')" ["quoted2"]=> string(18) "concat("'foo",'"')" } array(3) { ["test"]=> string(6) "'foo""" ["quoted"]=> string(24) "concat("'foo", '"', '"')" ["quoted2"]=> string(19) "concat("'foo",'""')" } array(3) { ["test"]=> string(5) ""'foo" ["quoted"]=> string(19) "concat('"', "'foo")" ["quoted2"]=> string(18) "concat('"',"'foo")" } array(3) { ["test"]=> string(6) """'foo" ["quoted"]=> string(24) "concat('"', '"', "'foo")" ["quoted2"]=> string(19) "concat('""',"'foo")" } array(3) { ["test"]=> string(9) "'foo""bar" ["quoted"]=> string(31) "concat("'foo", '"', '"', "bar")" ["quoted2"]=> string(22) "concat("'foo",'""bar')" }
replaced the algorithm with one of my own design, creating smaller, more readable XPath expressions: for example, with the input here is my test code: <?php
class C
{
/**
* Quotes a string for use in an XPath expression.
*
* Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]")
*
* @param string $string
*
* @return string
*/
public static function quote2(string $string): string
{
if (false === \strpos($string, '"')) {
return '"' . $string . '"';
}
if (false === \strpos($string, '\'')) {
return '\'' . $string . '\'';
}
// if the string contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
// concat("'foo'", '"', "bar")
$sb = [];
while (\strlen($string) > 0) {
$bytesUntilSingleQuote = \strcspn($string, '\'');
$bytesUntilDoubleQuote = \strcspn($string, '"');
$quoteMethod = ($bytesUntilSingleQuote > $bytesUntilDoubleQuote) ? "'" : '"';
$bytesUntilQuote = max($bytesUntilSingleQuote, $bytesUntilDoubleQuote);
$sb[] = $quoteMethod . \substr($string, 0, $bytesUntilQuote) . $quoteMethod;
$string = \substr($string, $bytesUntilQuote);
}
$sb = \implode(',', $sb);
return 'concat(' . $sb . ')';
}
/**
* Quotes a string for use in an XPath expression.
*
* Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]")
*
* @author Robert Rossney ( https://stackoverflow.com/users/19403/robert-rossney )
*
* @param string $string
*
* @return string
*/
public static function quote(string $string): string
{
if (false === \strpos($string, '"')) {
return '"' . $string . '"';
}
if (false === \strpos($string, '\'')) {
return '\'' . $string . '\'';
}
// if the string contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
// concat("'foo'", '"', "bar")
$sb = 'concat(';
$substrings = \explode('"', $string);
for ($i = 0; $i < \count($substrings); ++$i) {
$needComma = ($i > 0);
if ('' !== $substrings[$i]) {
if ($i > 0) {
$sb .= ', ';
}
$sb .= '"' . $substrings[$i] . '"';
$needComma = true;
}
if ($i < (\count($substrings) - 1)) {
if ($needComma) {
$sb .= ', ';
}
$sb .= "'\"'";
}
}
$sb .= ')';
return $sb;
}
}
$tests = array(
"foo", // no quotes
"\"foo", // double quotes only
"'foo", // single quotes only
"'foo\"bar", // both; double quotes in mid-string
"'foo\"bar\"baz", // multiple double quotes in mid-string
"'foo\"", // string ends with double quotes
"'foo\"\"", // string ends with run of double quotes
"\"'foo", // string begins with double quotes
"\"\"'foo", // string begins with run of double quotes
"'foo\"\"bar" // run of double quotes in mid-string
);
foreach ($tests as $test) {
$quoted = C::quote($test);
$quoted2 = C::quote2($test);
var_dump([
'test' => $test,
'quoted' => $quoted,
'quoted2' => $quoted2,
// 'eval' => eval("return $quoted;"),
]);
} outputs $ php wut.php
array(3) {
["test"]=>
string(3) "foo"
["quoted"]=>
string(5) ""foo""
["quoted2"]=>
string(5) ""foo""
}
array(3) {
["test"]=>
string(4) ""foo"
["quoted"]=>
string(6) "'"foo'"
["quoted2"]=>
string(6) "'"foo'"
}
array(3) {
["test"]=>
string(4) "'foo"
["quoted"]=>
string(6) ""'foo""
["quoted2"]=>
string(6) ""'foo""
}
array(3) {
["test"]=>
string(8) "'foo"bar"
["quoted"]=>
string(26) "concat("'foo", '"', "bar")"
["quoted2"]=>
string(21) "concat("'foo",'"bar')"
}
array(3) {
["test"]=>
string(12) "'foo"bar"baz"
["quoted"]=>
string(38) "concat("'foo", '"', "bar", '"', "baz")"
["quoted2"]=>
string(25) "concat("'foo",'"bar"baz')"
}
array(3) {
["test"]=>
string(5) "'foo""
["quoted"]=>
string(19) "concat("'foo", '"')"
["quoted2"]=>
string(18) "concat("'foo",'"')"
}
array(3) {
["test"]=>
string(6) "'foo"""
["quoted"]=>
string(24) "concat("'foo", '"', '"')"
["quoted2"]=>
string(19) "concat("'foo",'""')"
}
array(3) {
["test"]=>
string(5) ""'foo"
["quoted"]=>
string(19) "concat('"', "'foo")"
["quoted2"]=>
string(18) "concat('"',"'foo")"
}
array(3) {
["test"]=>
string(6) """'foo"
["quoted"]=>
string(24) "concat('"', '"', "'foo")"
["quoted2"]=>
string(19) "concat('""',"'foo")"
}
array(3) {
["test"]=>
string(9) "'foo""bar"
["quoted"]=>
string(31) "concat("'foo", '"', '"', "bar")"
["quoted2"]=>
string(22) "concat("'foo",'""bar')"
} |
Can you add those tests in a unit test? |
XPathSelector::quote
should that be a dedicated XPathTests file or should it join the existing XPath tests in https://github.com/chrome-php/chrome/blob/1.11/tests/PageTest.php ? if a dedicated file, should it just be in the tests folder, or a subfolder? btw the main reason i re-wrote it was because the original algo was very difficult to read/follow, and I needed the testcases to really believe that the original algo worked - the new algo is much easier to read, to the point where I don't even need testcases to feel confident about it. (nevertheless, this method is affected by a encoding bug fixed in #576 ) |
I was going to suggest creating a new file in the The test in the
A test ensures that the method will behave the same way for those cases even if someone changes it in the future, intentionally or not. It's also way easier to develop a feature running tests than running a standalone php file and checking the output of a var_dump manually. I should have asked for better tests when this class was introduced in the first place. It may have avoided bugs like the one in #576 |
method to quote strings in XPath, similar to PDO::quote() / mysqli::real_escape_string sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]") the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003 (but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
method to quote strings in XPath, similar to PDO::quote() / mysqli::real_escape_string sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]") the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003 (but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
method to quote strings in XPath, similar to PDO::quote() / mysqli::real_escape_string sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]") the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003 (but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
method to quote strings in XPath, similar to PDO::quote() / mysqli::real_escape_string sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]") the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003 (but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
made a php builtin version for DOMXPath::quote, if php/php-src#13456 is accepted then there is no need for one in XPathSelector (or the implementation can just be a wrapper for the core one) |
Method to quote strings in XPath, similar to PDO::quote() / mysqli::real_escape_string. Sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]") The algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003 But using an improved implementation I wrote myself, originally for chrome-php/chrome#575
Native support has been accepted upstream and will be part of PHP8.4.0 release, so there should be no need for chrome-php to have it, I guess. Sample usage: new XPathSelector("//span[contains(text()," . DOMXPath::quote($string) . ")]") closing. |
method to quote strings in XPath, similar to PDO::quote() and mysqli::real_escape_string()
Quoting strings in XPath is surprisingly difficult to do in the edge case where the string contains both double-quotes and single-quotes. Got the algorithm from https://stackoverflow.com/a/1352556/1067003