Skip to content

Commit

Permalink
chore: update version to 0.1.3 and enhance documentation in demo.py
Browse files Browse the repository at this point in the history
  • Loading branch information
titusz committed Aug 20, 2024
1 parent e189928 commit 741fee9
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 14 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Changelog

## [0.1.3] - Unrelease

## [0.1.2] - 2024-08-19
- Encode granular features with base64
- Refactor result format to generic ISCC data model
Expand Down
33 changes: 23 additions & 10 deletions iscc_sct/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,9 +443,8 @@ def reset_all():
)

with gr.Row(variant="panel"):
with gr.Column(variant="panel"):
gr.Markdown(
"""
gr.Markdown(
"""
## Understanding ISCC Semantic Text-Codes
### What is an ISCC Semantic Text-Code?
Expand Down Expand Up @@ -476,7 +475,11 @@ def reset_all():
The similarity shown is calculated by comparing the ISCC codes, not the original texts. This
allows for efficient and privacy-preserving comparisons, as only the codes need to be shared
or stored.
"""
)

gr.Markdown(
"""
### Why is this useful?
- **Content creators**: Find similar content across languages.
- **Researchers**: Quickly compare documents or find related texts in different languages.
Expand All @@ -490,20 +493,30 @@ def reset_all():
The "Explore Details & Advanced Options" section provides additional tools and information:
1. **ISCC Bit-Length**: Adjust the precision of the ISCC code. Higher values provide more detailed
comparisons but may be more sensitive to minor differences.
comparisons but may be more sensitive to minor differences.
2. **Max Tokens**: Set the maximum number of tokens per chunk. This affects how the text is split
for processing.
for processing.
3. **Chunked Text**: View how each input text is divided into chunks for processing. Each chunk is
color-coded and labeled with its size and simprint (a similarity preserving fingerprint).
color-coded and labeled with its size and simprint (a similarity preserving fingerprint).
4. **Granular Matches**: See a detailed comparison of individual chunks between Text A and Text B.
This table shows which specific parts of the texts are most similar, along with their approximate
cosine similarity (scaled -100% to +100%).
This table shows which specific parts of the texts are most similar (above 80%), along with their
approximate cosine similarity (scaled -100% to +100%).
For more information about the **ISCC** see:
- https://github.com/iscc
- https://iscc.codes
- https://iscc.io
- [ISO 24138:2024](https://www.iso.org/standard/77899.html)
"""
)

)
with gr.Row():
gr.Markdown(
f"iscc-sct v{sct.__version__} | Source Code: https://github.com/iscc/iscc-sct",
elem_classes="footer",
)

if __name__ == "__main__": # pragma: no cover
demo.launch()
6 changes: 3 additions & 3 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/test_iscc_sct.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@


def test_version():
assert sct.__version__ == "0.1.2"
assert sct.__version__ == "0.1.3"


def test_code_text_semantic_default():
Expand Down

0 comments on commit 741fee9

Please sign in to comment.