Skip to content

Commit

Permalink
Fix last minute bugs (#595)
Browse files Browse the repository at this point in the history
- Compute concept scores for all datasets
- Show the max score in the concept preview
- pin "ML" and "OpenOrca" so it's the first dataset.

https://huggingface.co/spaces/lilacai/lilac

Meta tags fixed:
https://lilacml.com/
https://metatags.io/?url=https%3A%2F%2Flilacml.com%2F
  • Loading branch information
nsthorat authored Aug 24, 2023
1 parent 0c88092 commit 910e9e2
Show file tree
Hide file tree
Showing 21 changed files with 131 additions and 35 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Downloads](https://static.pepy.tech/badge/lilac/month)](https://pepy.tech/project/lilac)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Twitter](https://img.shields.io/twitter/follow/lilac_ai)](https://twitter.com/lilac_ai)
[![](https://dcbadge.vercel.app/api/server/YpGxQMyk?compact=true&style=flat)](https://discord.gg/YpGxQMyk)
[![](https://dcbadge.vercel.app/api/server/jNzw9mC8pp?compact=true&style=flat)](https://discord.gg/jNzw9mC8pp)

> **NEW: Try the [Lilac hosted demo with pre-loaded datasets](https://lilacai-lilac.hf.space/)**
Expand Down
Binary file added docs/_static/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/logo_wide.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/blog/introducing-lilac.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
```{tip}
Try the Lilac hosted **[demo on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)** or find
Try the Lilac hosted **[demo on HuggingFace](https://lilacai-lilac.hf.space/)** or find
us on GitHub: **[github.com/lilacai/lilac](https://github.com/lilacai/lilac)**
```

Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/concept_metrics.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Concept metrics

```{tip}
[Try Lilac concepts on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)
[Try Lilac concepts on HuggingFace](https://lilacai-lilac.hf.space/)
```

We can quantify the quality of a concept using an [F1 score](https://en.wikipedia.org/wiki/F-score)
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/concept_tuning.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Tuning a concept

```{tip}
[Try Lilac concepts on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)
[Try Lilac concepts on HuggingFace](https://lilacai-lilac.hf.space/)
```

Often times, after creating a concept or using an off-the-shelf-concept, the concept needs to be
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/concept_use.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ for details on understanding the quality of a concept with an embedding.
## From the UI

```{tip}
[Try Lilac concepts on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)
[Try Lilac concepts on HuggingFace](https://lilacai-lilac.hf.space/)
```

To use a concept from the UI, click the concept from the Navigation panel, which will open the
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/concepts.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Concepts

```{tip}
[Try Lilac concepts on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)
[Try Lilac concepts on HuggingFace](https://lilacai-lilac.hf.space/)
```

## What is a concept?
Expand Down
2 changes: 1 addition & 1 deletion docs/datasets/dataset_export.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Export data

```{tip}
[Download enrichments for popular datasets on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)
[Download enrichments for popular datasets on HuggingFace](https://lilacai-lilac.hf.space/)
```

Once we've computed signals and concepts over a dataset, it can be very useful to download the
Expand Down
4 changes: 2 additions & 2 deletions docs/huggingface/huggingface_spaces.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Duplicate the HuggingFace demo

Lilac hosts a [HuggingFace spaces demo](https://huggingface.co/spaces/lilacai/lilac) so you can try
Lilac before installing it.
Lilac hosts a [HuggingFace spaces demo](https://lilacai-lilac.hf.space/) so you can try Lilac before
installing it.

Thanks to HuggingFace, this space can be duplicated and customized with your own data. You can
decide to make your duplicated private for use with private or sensitive data.
Expand Down
7 changes: 7 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
🌸 Lilac
=================
.. meta::
:description: Analyze, structure and clean unstructured data with AI
:keywords: datasets, AI, machine learning, unstructured, lilac

.. raw:: html

<meta property="og:image" content="https://lilacml.com/_static/logo_wide.png" />

.. include:: welcome.md
:parser: myst_parser.sphinx_
Expand Down
2 changes: 1 addition & 1 deletion docs/signals/signals.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Signals

```{tip}
[Try Lilac signals on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)
[Try Lilac signals on HuggingFace](https://lilacai-lilac.hf.space/)
```

There are two types of [](#Signal) base classes based on the input:
Expand Down
9 changes: 5 additions & 4 deletions docs/welcome.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
```{tip}
Try the Lilac hosted **[demo on HuggingFace](https://huggingface.co/spaces/lilacai/lilac)** or find us on GitHub: **[github.com/lilacai/lilac](https://github.com/lilacai/lilac)**
Try the Lilac hosted **[demo on HuggingFace](https://lilacai-lilac.hf.space/)** or find us on GitHub: **[github.com/lilacai/lilac](https://github.com/lilacai/lilac)**
```

[![GitHub Repo stars](https://img.shields.io/github/stars/lilacai/lilac?logo=github&label=lilacai%2Flilac)](https://github.com/lilacai/lilac)
[![Downloads](https://static.pepy.tech/badge/lilac/month)](https://pepy.tech/project/lilac)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Twitter](https://img.shields.io/twitter/follow/lilac_ai)](https://twitter.com/lilac_ai)
[![](https://dcbadge.vercel.app/api/server/YpGxQMyk?compact=true&style=flat)](https://discord.gg/YpGxQMyk)
[![](https://dcbadge.vercel.app/api/server/jNzw9mC8pp?compact=true&style=flat)](https://discord.gg/jNzw9mC8pp)

## 👋 Welcome

[Lilac](http://lilacml.com) is an open-source product that helps you **analyze**, **structure**, and
**clean** unstructured data with AI.

See the [Installation](./getting_started/installation.md) and
[Quick Start](./getting_started/quickstart.md) guides to get started. Read the
[Announcement Blog](./blog/introducing-lilac.md) for more details.
[Quick Start](./getting_started/quickstart.md) guides to get started.

Read the [Announcement Blog](./blog/introducing-lilac.md) for more details.

<video loop muted autoplay controls src="https://github-production-user-asset-6210df.s3.amazonaws.com/2294279/260771834-cb1378f8-92c1-4f2a-9524-ce5ddd8e0c53.mp4"></video>

Expand Down
2 changes: 2 additions & 0 deletions lilac/signals/text_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

SPACY_LANG_MODEL = 'en_core_web_sm'
SPACY_BATCH_SIZE = 128
SPACY_MAX_LENGTH = 2_000_000

NUM_CHARS = 'num_characters'
READABILITY = 'readability'
Expand Down Expand Up @@ -55,6 +56,7 @@ def setup(self) -> None:
disable=[
'parser', 'tagger', 'ner', 'lemmatizer', 'textcat', 'custom', 'tok2vec', 'attribute_ruler'
])
self._lang.max_length = SPACY_MAX_LENGTH

@override
def compute(self, data: Iterable[RichData]) -> Iterable[Optional[Item]]:
Expand Down
32 changes: 32 additions & 0 deletions lilac_hf_space.yml
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,38 @@ signals:
- signal_name: text_statistics
- signal_name: near_dup
- signal_name: lang_detection
- signal_name: concept_score
namespace: lilac
concept_name: legal-termination
embedding: gte-small
- signal_name: concept_score
namespace: lilac
concept_name: negative-sentiment
embedding: gte-small
- signal_name: concept_score
namespace: lilac
concept_name: non-english
embedding: gte-small
- signal_name: concept_score
namespace: lilac
concept_name: positive-sentiment
embedding: gte-small
- signal_name: concept_score
namespace: lilac
concept_name: profanity
embedding: gte-small
- signal_name: concept_score
namespace: lilac
concept_name: question
embedding: gte-small
- signal_name: concept_score
namespace: lilac
concept_name: source-code
embedding: gte-small
- signal_name: concept_score
namespace: lilac
concept_name: toxicity
embedding: gte-small

concept_model_cache_embeddings:
- gte-small
Expand Down
26 changes: 17 additions & 9 deletions web/blueprint/src/lib/components/HuggingFaceSpaceWelcome.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,24 @@
<div class="mx-32 flex w-full flex-col items-center gap-y-6 px-8">
<div class="mt-8 w-full text-center">
<h2>Welcome to Lilac</h2>
<div class="mt-2 text-base text-gray-700">
analyze, structure and clean unstructured data with AI
</div>
<div class="mt-2 text-base text-gray-700">analyze, structure and clean data with AI</div>
<div class="mt-2 text-sm text-gray-700">
<a href="https://lilacml.com">visit our website</a>
<a target="_blank" href="https://lilacml.com">visit our website</a>
</div>
<div class="duplicate mt-6 flex flex-row items-center justify-center gap-x-4 text-gray-700">
<Button
href={`https://huggingface.co/spaces/${huggingFaceSpaceId}?duplicate=true`}
kind="tertiary">duplicate</Button
<div
use:hoverTooltip={{
text:
'Duplicate the HuggingFace space to manage your own instance. ' +
'You must be logged into HuggingFace for the duplicate modal to appear.'
}}
>
<Button
href={`https://huggingface.co/spaces/${huggingFaceSpaceId}?duplicate=true`}
target="_blank"
kind="tertiary">duplicate</Button
>
</div>
<a
class="-ml-2"
use:hoverTooltip={{
Expand All @@ -58,8 +65,9 @@
title={`Browse the ${tryDataset.name} dataset`}
>
<p class="text-sm">
Try the Lilac dataset viewer on the the pre-loaded <a href={tryDataset.originalLink}
>{tryDataset.displayName}</a
Try the Lilac dataset viewer on the the pre-loaded <a
target="_blank"
href={tryDataset.originalLink}>{tryDataset.displayName}</a
> dataset.
</p>
</WelcomeBanner>
Expand Down
2 changes: 1 addition & 1 deletion web/blueprint/src/lib/components/WelcomeBanner.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
export let backgroundColorClass: string;
</script>

<div class="welcome-item mt-8 w-full rounded shadow-md">
<div class="my-4 w-full rounded shadow-md">
<div
class={`flex cursor-pointer flex-row justify-between ${backgroundColorClass} rounded-t px-4 py-4 font-semibold`}
on:click={() => goto(link)}
Expand Down
16 changes: 16 additions & 0 deletions web/blueprint/src/lib/components/concepts/ConceptPreview.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import {Button, Select, SelectItem, SkeletonText, TextArea} from 'carbon-components-svelte';
import {onMount} from 'svelte';
import StringSpanHighlight from '../datasetView/StringSpanHighlight.svelte';
import {colorFromScore} from '../datasetView/colors';
import type {SpanValueInfo} from '../datasetView/spanHighlight';
export let concept: Concept;
Expand Down Expand Up @@ -82,6 +83,15 @@
valuePaths = spanValuePaths.valuePaths;
}
}
let maxConceptScore: number;
// Compute the max score and show it separately.
$: if ($conceptScore?.data != null) {
maxConceptScore = 0;
for (const row of $conceptScore.data[0]) {
const score = row['score'] as number;
maxConceptScore = Math.max(score, maxConceptScore);
}
}
</script>

<div class="flex flex-col gap-x-8">
Expand Down Expand Up @@ -111,6 +121,12 @@
{#if conceptScore && $conceptScore?.isFetching}
<SkeletonText />
{:else if previewResultItem != null && previewText != null}
<div class="my-2">
<span class="font-medium">Max score:</span>
<span class="rounded p-0.5" style:background-color={colorFromScore(maxConceptScore)}
>{maxConceptScore.toFixed(3)}</span
>
</div>
<StringSpanHighlight
text={previewText}
row={previewResultItem}
Expand Down
15 changes: 15 additions & 0 deletions web/blueprint/src/lib/components/schemaView/SchemaField.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,24 @@
checked={isVisible}
on:change={() => {
if (!isVisible) {
// For signals, when the root of a signal is checked, enable all the children.
if (field.signal != null) {
const children = childFields(field);
children.forEach(f => {
datasetViewStore.addSelectedColumn(f.path);
});
}
datasetViewStore.addSelectedColumn(path);
// Repeated fields are collapsed. When clicked, we need to also make them visible.
if (field.repeated_field != null) {
datasetViewStore.addSelectedColumn([...path, PATH_WILDCARD]);
}
} else {
datasetViewStore.removeSelectedColumn(path);
// Repeated fields are collapsed. When clicked, we need to also make them visible.
if (field.repeated_field != null) {
datasetViewStore.removeSelectedColumn([...path, PATH_WILDCARD]);
}
}
}}
/>
Expand Down
21 changes: 17 additions & 4 deletions web/blueprint/src/lib/view_utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -167,9 +167,10 @@ export function isPathVisible(
return true;
}

if (selectedColumns[path] != null)
if (selectedColumns[path] != null) {
// If a user has explicitly selected a column, return the value of the selection.
return selectedColumns[path];
}

const pathArray = deserializePath(path);

Expand Down Expand Up @@ -328,21 +329,32 @@ export function getTaggedDatasets(
tagDatasets[tag][dataset.namespace].push(dataset);
}
}
const tagSortPriorities = ['machine-learning'];
const sortedTags = Object.keys(tagDatasets).sort(
(a, b) => tagSortPriorities.indexOf(b) - tagSortPriorities.indexOf(a) || a.localeCompare(b)
);

const namespaceSortPriorities = ['lilac'];
// TODO(nsthorat): Don't hard-code this. Let's make this a config.
const pinnedDatasets = ['OpenOrca-100k'];

// Sort each tag by namespace and then dataset name.
const taggedDatasetGroups: NavigationTagGroup[] = [];
for (const tag of Object.keys(tagDatasets).sort()) {
for (const tag of sortedTags) {
const sortedNamespaceDatasets: NavigationGroupItem[] = Object.keys(tagDatasets[tag])
.sort(
(a, b) =>
namespaceSortPriorities.indexOf(a) - namespaceSortPriorities.indexOf(b) ||
namespaceSortPriorities.indexOf(b) - namespaceSortPriorities.indexOf(a) ||
a.localeCompare(b)
)
.map(namespace => ({
group: namespace,
items: tagDatasets[tag][namespace]
.sort((a, b) => a.dataset_name.localeCompare(b.dataset_name))
.sort(
(a, b) =>
pinnedDatasets.indexOf(b.dataset_name) - pinnedDatasets.indexOf(a.dataset_name) ||
a.dataset_name.localeCompare(b.dataset_name)
)
.map(d => ({
name: d.dataset_name,
link: datasetLink(d.namespace, d.dataset_name),
Expand Down Expand Up @@ -378,6 +390,7 @@ export function getTaggedConcepts(
tagConcepts[tag][concept.namespace].push(concept);
}
}

const namespaceSortPriorities = ['lilac'];

// Sort each tag by namespace and then dataset name.
Expand Down
16 changes: 9 additions & 7 deletions web/blueprint/src/routes/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@
</script>

<Page>
{#if $authInfo.isFetching}
<SkeletonText />
{:else if huggingFaceSpaceId != null && !canCreateDataset}
<HuggingFaceSpaceWelcome />
{:else}
<GettingStarted />
{/if}
<div class="flex h-full w-full gap-y-4 overflow-y-scroll p-4">
{#if $authInfo.isFetching}
<SkeletonText />
{:else if huggingFaceSpaceId != null && !canCreateDataset}
<HuggingFaceSpaceWelcome />
{:else}
<GettingStarted />
{/if}
</div>
</Page>

0 comments on commit 910e9e2

Please sign in to comment.