Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration with huggingface dataset API #118

Open
SebieF opened this issue Nov 4, 2024 · 0 comments
Open

Add integration with huggingface dataset API #118

SebieF opened this issue Nov 4, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@SebieF
Copy link
Collaborator

SebieF commented Nov 4, 2024

For a lot of users it might be handy to just use a link to a dataset on huggingface, instead of manually providing the sequence and label fasta files.

The configuration could look like this for example:

sequence_file: null
hf_dataset:
  path: proteinea/fluorescence
  name: null  # Optional name can specify a dataset configuration
  sequence_column: primary
  target_column: log_fluorescence
protocol: sequence_to_class
embeddings_file: per_protein_embeddings.h5
model_choice: FNN
optimizer_choice: adam
learning_rate: 0.001
dropout_rate: 0.25
loss_choice: cross_entropy_loss

Further resources:

@SebieF SebieF added enhancement New feature or request good first issue Good for newcomers labels Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant