Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adrienbrault
Copy link

@adrienbrault adrienbrault commented Feb 3, 2023

You should also add the package to https://packagist.org

@adrienbrault adrienbrault marked this pull request as ready for review February 3, 2023 15:25
Copy link

@cosmastech cosmastech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thanks for doing it.

I have added some suggestions to make it a bit more extensible and add in some PHP features from 8.0+.

I'm of course not the package maintainer, just some things I saw that might be helpful.


public static function create(): self
{
$raw_chars = file_get_contents(dirname(__FILE__) . "/../data/characters.json");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the filepaths passed as parameters?

public static function create(): self
{
$raw_chars = file_get_contents(dirname(__FILE__) . "/../data/characters.json");
$byte_encoder = json_decode($raw_chars, true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be convenient to have JSON_THROW_ON_ERROR as the fourth parameter

"minimum-stability": "stable",
"require": {}
"require": {
"php": ">7.4"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to go with such an old PHP version? 7.4 is past its EOL. I would suggest just going with 8.0 here (security patches stopping in November of this year), if not 8.1 or 8.2

"minimum-stability": "stable",
"require": {}
"require": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should ext-mbstring also be added?

}
return -1;
}
use CodeRevolutionPlugins\GPT3Encoder\Encoder;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file would serve better as an example in a README


public function encode(string $text): array
{
$bpe_tokens = array();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$bpe_tokens = array();
$bpe_tokens = [];

preg_match_all("#'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+#u", $text, $matches);
if(!isset($matches[0]) || count($matches[0]) == 0)
{
error_log('Failed to match string: ' . $text);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to throw a custom RuntimeException here and let the caller determine how to handle it.

return $bpe_tokens;
}

$cache = array();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$cache = array();
$cache = [];

Comment on lines +88 to +89
$new_tokens = array();
$chars = array();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$new_tokens = array();
$chars = array();
$new_tokens = $chars = [];


private function split($str, $len = 1)
{
$arr = [];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have consistent formatting.

@adrienbrault
Copy link
Author

This is awesome! Thanks for doing it.

I have added some suggestions to make it a bit more extensible and add in some PHP features from 8.0+.

I'm of course not the package maintainer, just some things I saw that might be helpful.

I am not planning on working on this MR anymore.

I am using https://github.com/Gioni06/GPT3Tokenizer now

@cosmastech
Copy link

This is awesome! Thanks for doing it.
I have added some suggestions to make it a bit more extensible and add in some PHP features from 8.0+.
I'm of course not the package maintainer, just some things I saw that might be helpful.

I am not planning on working on this MR anymore.

I am using https://github.com/Gioni06/GPT3Tokenizer now

Thanks for the tip! 👍

DrDub added a commit to DrDub/openai-cookbook that referenced this pull request May 24, 2023
The PHP alternative is discontinued, its author is using the project to which this patch points to, which is also available in the PHP package manager.

See: CodeRevolutionPlugins/GPT-3-Encoder-PHP#1 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants