Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexisVLRT committed Dec 12, 2023
1 parent b870907 commit 10432a0
Show file tree
Hide file tree
Showing 22 changed files with 48 additions and 332 deletions.
43 changes: 0 additions & 43 deletions .pre-commit-config.yaml

This file was deleted.

34 changes: 0 additions & 34 deletions Makefile

This file was deleted.

18 changes: 0 additions & 18 deletions bin/install_with_conda.sh

This file was deleted.

20 changes: 0 additions & 20 deletions bin/install_with_venv.sh

This file was deleted.

4 changes: 0 additions & 4 deletions config/config.py

This file was deleted.

2 changes: 0 additions & 2 deletions config/config.toml

This file was deleted.

1 change: 0 additions & 1 deletion docs/code.md

This file was deleted.

3 changes: 0 additions & 3 deletions docs/index.md

This file was deleted.

42 changes: 0 additions & 42 deletions mkdocs.yaml

This file was deleted.

2 changes: 0 additions & 2 deletions notebooks/private/.gitignore

This file was deleted.

79 changes: 0 additions & 79 deletions notebooks/template.ipynb

This file was deleted.

65 changes: 0 additions & 65 deletions pyproject.toml

This file was deleted.

10 changes: 0 additions & 10 deletions requirements-developer.txt

This file was deleted.

11 changes: 8 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
-e .
pandas==1.5.3
numpy==1.24.2
gcsfs
s3fs
adlfs
universal_pathlib
chromadb
langchain
langchainhub
gpt4all
18 changes: 18 additions & 0 deletions sandbox_alexis/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from langchain.vectorstores.chroma import Chroma
from storage_backend import get_storage_root_path, StorageBackend
from langchain.embeddings import GPT4AllEmbeddings

from text_splitter import load_and_split_document

data = """One of the most important things I didn't understand about the world when I was a child is the degree to which the returns for performance are superlinear.
Teachers and coaches implicitly told us the returns were linear. "You get out," I heard a thousand times, "what you put in." They meant well, but this is rarely true. If your product is only half as good as your competitor's, you don't get half as many customers. You get no customers, and you go out of business.It's obviously true that the returns for performance are superlinear in business. Some think this is a flaw of capitalism, and that if we changed the rules it would stop being true. But superlinear returns for performance are a feature of the world, not an artifact of rules we've invented. We see the same pattern in fame, power, military victories, knowledge, and even benefit to humanity. In all of these, the rich get richer. [1]You can't understand the world without understanding the concept of superlinear returns. And if you're ambitious you definitely should, because this will be the wave you surf on.
It may seem as if there are a lot of different situations with superlinear returns, but as far as I can tell they reduce to two fundamental causes: exponential growth and thresholds.The most obvious case of superlinear returns is when you're working on something that grows exponentially. For example, growing bacterial cultures. When they grow at all, they grow exponentially. But they're tricky to grow. Which means the difference in outcome between someone who's adept at it and someone who's not is very great.Startups can also grow exponentially, and we see the same pattern there. Some manage to achieve high growth rates. Most don't. And as a result you get qualitatively different outcomes: the companies with high growth rates tend to become immensely valuable, while the ones with lower growth rates may not even survive.Y Combinator encourages founders to focus on growth rate rather than absolute numbers. It prevents them from being discouraged early on, when the absolute numbers are still low. It also helps them decide what to focus on: you can use growth rate as a compass to tell you how to evolve the company. But the main advantage is that by focusing on growth rate you tend to get something that grows exponentially.YC doesn't explicitly tell founders that with growth rate "you get out what you put in," but it's not far from the truth. And if growth rate were proportional to performance, then the reward for performance p over time t would be proportional to pt.
Even after decades of thinking about this, I find that sentence startling."""

split_documents = load_and_split_document(text=data)
root_path = get_storage_root_path("", StorageBackend.GCS)
vector_store = Chroma(persist_directory=str(root_path / "chromadb"), embedding_function=GPT4AllEmbeddings())
db = vector_store.add_documents(split_documents)
15 changes: 15 additions & 0 deletions sandbox_alexis/storage_backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from upath import UPath as Path
from enum import Enum


class StorageBackend(Enum):
LOCAL = "local"
MEMORY = "memory"
GCS = "gcs"
S3 = "s3"
AZURE = "az"


def get_storage_root_path(bucket_name, storage_backend: StorageBackend):
root_path = Path(f"{storage_backend.value}://{bucket_name}")
return root_path
7 changes: 7 additions & 0 deletions sandbox_alexis/vector_store.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import chromadb
from storage_backend import StorageBackend, get_storage_root_path


root_path = get_storage_root_path("sample_bucket", StorageBackend.GCS)
client = chromadb.PersistentClient(root_path / "chromadb")
collection = client.get_or_create_collection("embeddings")
Empty file removed secrets/.gitkeep
Empty file.
Empty file removed tests/data/.gitkeep
Empty file.
Empty file removed tests/integration_tests/.gitkeep
Empty file.
Empty file removed tests/unit_tests/.gitkeep
Empty file.
Loading

0 comments on commit 10432a0

Please sign in to comment.