Skip to content

Commit

Permalink
Merge pull request #65 from SL-Gundam/Correct_Ukrainian_Regex
Browse files Browse the repository at this point in the history
The russian quoteHeadersRegex should not check for "wrote:"
  • Loading branch information
willdurand authored Feb 1, 2018
2 parents 95d269a + 90a2725 commit 00adf16
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/EmailReplyParser/Parser/EmailParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class EmailParser
'/^\s*(Le(?:(?!^>*\s*Le\b|\bécrit:).){0,1000}écrit :)$/ms', // Le DATE, NAME <EMAIL> a écrit :
'/^\s*(El(?:(?!^>*\s*El\b|\bescribió:).){0,1000}escribió:)$/ms', // El DATE, NAME <EMAIL> escribió:
'/^\s*(Il(?:(?!^>*\s*Il\b|\bscritto:).){0,1000}scritto:)$/ms', // Il DATE, NAME <EMAIL> ha scritto:
'/^[\S\s]+ (написа(л|ла|в)+|wrote)+:/msu', // Everything before написал:
'/^[\S\s]+ (написа(л|ла|в)+)+:$/msu', // Everything before написал: not ending on wrote:
'/^\s*(Op\s.+?schreef.+:)$/ms', // Il DATE, schreef NAME <EMAIL>:
'/^\s*((W\sdniu|Dnia)\s.+?(pisze|napisał(\(a\))?):)$/msu', // W dniu DATE, NAME <EMAIL> pisze|napisał:
'/^\s*(Den\s.+\sskrev\s.+:)$/m', // Den DATE skrev NAME <EMAIL>:
Expand Down
15 changes: 15 additions & 0 deletions tests/EmailReplyParser/Tests/Parser/EmailParserTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,21 @@ public function testEmailUkrainian()
$email = $this->parser->parse($this->getFixtures('email_23.txt'));
$fragments = $email->getFragments();
$this->assertEquals(static::COMMON_FIRST_FRAGMENT, trim($fragments[0]));

// Find flaw in original Ukrainian regex "/^[\S\s]+ (написа(л|ла|в)+|wrote)+:/msu"
$email = $this->parser->parse($this->getFixtures('email_23_1.txt'));
$fragments = $email->getFragments();
$this->assertEquals(<<<EMAIL
Fusce bibendum, quam hendrerit sagittis tempor, dui turpis tempus erat, pharetra sodales ante sem sit amet metus.
Nulla malesuada, orci non vulputate lobortis, massa felis pharetra ex, convallis consectetur ex libero eget ante.
Nam vel turpis posuere, rhoncus ligula in, venenatis orci. Duis interdum venenatis ex a rutrum.
Something wrote:
Duis ut libero eu lectus consequat consequat ut vel lorem. Vestibulum convallis lectus urna,
et mollis ligula rutrum quis. Fusce sed odio id arcu varius aliquet nec nec nibh.
EMAIL
, (string) $fragments[0]);
}

public function testEmailSignatureWithEqual()
Expand Down
27 changes: 27 additions & 0 deletions tests/Fixtures/email_23_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Fusce bibendum, quam hendrerit sagittis tempor, dui turpis tempus erat, pharetra sodales ante sem sit amet metus.
Nulla malesuada, orci non vulputate lobortis, massa felis pharetra ex, convallis consectetur ex libero eget ante.
Nam vel turpis posuere, rhoncus ligula in, venenatis orci. Duis interdum venenatis ex a rutrum.

Something wrote:
Duis ut libero eu lectus consequat consequat ut vel lorem. Vestibulum convallis lectus urna,
et mollis ligula rutrum quis. Fusce sed odio id arcu varius aliquet nec nec nibh.

29 грудня 2017 р. о 19:09 <[email protected]> написав:

> Ticket answered by ahbladeroot
>
> Type:
> URL:
> ------------------------
> Він був добре освіченою людиною з безліччю літературних і художніх
> інтересів, і будинок в Кул-Парку містив велику бібліотеку та велику
> колекцію творів мистецтва. Мав також будинок у Лондоні, де подружжя
> проводило значну частину часу, даючи щотижневі [салони](
> https://uk.wikipedia.org/wiki/%D0%A1%D0%B0%D0%BB%D0%BE%D0%BD).
>
> --
> http://
> Reliable solutions for online projects - dedicated servers, CDN, domains
> and more...
>

0 comments on commit 00adf16

Please sign in to comment.