Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for labels, keyboard shortcuts, deleting rows. #1030

Merged
merged 2 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/_static/dataset/dataset_add_label_all_button.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/dataset/dataset_add_label_all_short.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/dataset/dataset_add_label_row.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/dataset/dataset_add_label_row_tagged.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file modified docs/_static/dataset/dataset_add_label_tag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/dataset/dataset_view_trash.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions docs/datasets/dataset_configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,18 @@ items.
You can choose which embedding to use as the default for the current dataset across all users. This
embedding will be used to perform semantic and concept search.

### Tags

For projects with many datasets, tags can be used to organize datasets into groups that can be
organized on the left-hand-side.

### Keyboard shortcuts for fast labeling

The settings modal also lets you configure keyboard shortcuts to toggle labels on a row to enable
fast labeling.

<img src="../_static/dataset/dataset_settings_keyboard_shortcuts.png"></img>

## From Python

You can provide [`DatasetSettings`](#lilac.DatasetSettings) when you create a new dataset via
Expand Down
157 changes: 157 additions & 0 deletions docs/datasets/dataset_delete_rows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Deleting rows

## From the UI

Individual rows can be deleted from the UI, and deleted rows can be seen in the trash.

When rows are deleted, they will not be deleted from disk, but are not considered in the number of
rows, searches, filters, and signal computation.

### Single row

To delete a single row from the UI, click the trash icon next to the row, or use the "Delete" or
"Backspace" keys on your keyboard (only in single-item view).

<img src="../_static/dataset/dataset_delete_single_row.png"></img>

Once confirmed, the row will be removed from the total count.

### Multiple rows

You can also delete a set of rows in a given filter by using the bulk-delete icon.

In the example below, we've searched for "golden" and we will delete all 910 rows that contain this
keyword.

<img src="../_static/dataset/dataset_delete_multi_row.png"></img>

### Viewing trashed rows and restoring them

When rows are deleted in Lilac, they are not deleted from disk, so we can always view the trash and
restore them if we made a mistake.

First, let's open the schema to see how many rows we've deleted.

<img src="../_static/dataset/dataset_view_deleted_count.png"></img>

Clicking the eye-icon will let us view the 910 deleted rows.

<img src="../_static/dataset/dataset_view_trash.png"></img>

Individual rows can be restored by clicking the "back" arrow. The entire trash can be restored by
clicking the "back" arrow in the schema.

## From Python

First, get the IMDB dataset:

```python
import lilac as ll

ll.set_project_dir('~/my_project')

dataset = ll.get_dataset('local', 'imdb')
```

### Deleting individual rows

We can delete individual rows from Python by using the [](#Dataset.delete_rows) method.

First, let's select the first row and delete it. We need to explicitly ask for the `ROWID` which
uniquely identifies a row. We'll use this to delete the row.

```python
first_row = list(dataset.select_rows(['*', ll.ROWID], limit=1))

row_id = first_row[0][ll.ROWID]
print(row_id)
```

Output:

```bash
0003076800f1471f8f4c8a1b2deda742
```

Let's delete this row.

```python
dataset.delete_rows(row_ids=['0003076800f1471f8f4c8a1b2deda742'])
```

Note that when we delete rows, we actually are just adding a special `__deleted__` label to the row,
which will automatically get reflected in the counts upstream when we query select rows.

We can pass `include_deleted` to select_rows to select over deleted rows. Let's view the deleted
row.

```python
first_row = list(dataset.select_rows(['*', ll.ROWID], limit=1, include_deleted=True))
print(first_row[0])
```

Output:

```py
{
'__rowid__': '0003076800f1471f8f4c8a1b2deda742',
'text': 'If you want to truly experience the magic (?) of Don Dohler, then check out "Alien Factor" or maybe "Fiend"...',
'label': 'neg',
'__hfsplit__': 'test',
'__deleted__': {
'label': 'true',
'created': datetime.datetime(2023, 9, 20, 10, 16, 15, 545277)
}
}
```

### Deleting a selection of rows

Deleting individual rows can be cumbersome in Python, so Lilac prosvides a way to delete multiple
rows at the same time through a query selection.

There are two additional arguments to [](#Dataset.delete_rows) which mirror the two arguments in
[](#Dataset.select_rows). See [Querying a Dataset](./dataset_query.md) for more details.

- `searches`: A list of searches to apply to delete rows.
- `filters`: A list of filters to apply to delete rows.

We can use these to delete all results that match the searches and filters.

Let's delete all results that are less than 1,000 characters by using the enriched
`text_statistics.num_characters` field.

```
dataset.delete_rows(
filters=[
(('text', 'text_statistics', 'num_characters'), 'less', 1000)
]
)
```

### Restoring deleted rows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing period


Just like [](#Dataset.delete_rows), there is a parallel [](#Dataset.restore_rows) with the same
arguments.

We could restore all rows with:

```
dataset.restore_rows()
```

Or we could restore a single row:

```
dataset.restore_rows(row_ids=['0003076800f1471f8f4c8a1b2deda742])
```

Or we could restore rows in a filter.

```
dataset.restore_rows(
filters=[
(('text', 'text_statistics', 'num_characters'), 'less', 1000)
]
)
```
16 changes: 10 additions & 6 deletions docs/datasets/dataset_labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@ Once committing the tag, we can see that the row has a new label:

<img src="../_static/dataset/dataset_add_label_row_tagged.png"></img>

#### Keyboard shortcuts

In the [dataset settings](../datasets/dataset_configure.md) menu, you can also configure keyboard
shortcuts to toggle labels on individual rows.

This allows you use the arrow keys to arrow through rows, and quickly label with custom shortcuts.

### Label multiple rows

Labeling individual rows can be time-consuming, so Lilac provides a "Label all" feature, which
Expand All @@ -32,16 +39,13 @@ and click on the first histogram:

We can see that we're in the cut of the dataset with 50,335 rows, about half of our dataset.

<img src="../_static/dataset/dataset_add_label_label_all_button.png"></img>
<img src="../_static/dataset/dataset_add_label_all_button.png"></img>

Now, we can click "Label all", attach a label, and all 50,335 rows will be labeled.
Now, we can click the button with multiple tags, attach a label, and all 50,335 rows will be
labeled.

<img width=480 src="../_static/dataset/dataset_add_label_all_short.png"></img>

Once we click the label, the results in view will have that label:

<img src="../_static/dataset/dataset_add_label_short_labels.png"></img>

### Filtering rows with the label

To find all rows with a given label, we can use the search box to filter by the label.
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
datasets/dataset_load.md
datasets/dataset_explore.md
datasets/dataset_configure.md
datasets/dataset_delete_rows.md
datasets/dataset_edit.md
datasets/dataset_labels.md
datasets/dataset_concepts.md
Expand Down
Loading