Skip to content

Commit

Permalink
advanced usage + examples
Browse files Browse the repository at this point in the history
  • Loading branch information
StanChan03 committed Dec 13, 2024
1 parent c074c4b commit e260e45
Show file tree
Hide file tree
Showing 4 changed files with 153 additions and 11 deletions.
54 changes: 54 additions & 0 deletions docs/configurations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Setting Configurations
=======================

The Settings module is a central configuration system for managing application-wide settings.
It ensures consistent and thread-safe access to configurations, allowing settings to be dynamically
adjusted and temporarily overridden within specific contexts. In most examples seen, we have
used the settings to configured our LM.

Using the Settings module
--------------------------
.. code-block:: python
from lotus
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
Configurable Parameters
--------------------------
1. enable_cache:
* Description: Enables or Disables cahcing mechanisms
* Default: False
.. code-block:: python
settings.configure(enable_cache=True)
2. cascade_IS_weight:
* Description: Specifies the weight for importance Sampling in cascade Operators
* Default: 0.5
.. code-block:: python
settings.configure(cascade_IS_weight=0.8)
3. cascade_num_calibration_quantiles:
* Description: Number of quantiles used for calibrating sem_filter
* Defualt: 50
.. code-block:: python
settings.configure(cascade_num_calibration_quantiles=100)
4. min_join_cascade_size:
* Description: Minimum size of qa join cascade to trigger additional Processing
* Default: 100
.. code-block:: python
settings.configure(min_join_cascade_size=200)
5. cascade_IS_max_sample_range:
* DescriptionL maximum range for sampling during cascade IS Operations
* Default: 250
.. code-block:: python
settings,configure(cascade_IS_max_sample_range= 500)
6. cascade_IS_random_seed:
* Description: Seed value for randomization in casde IS. Use None for non-deterministic behavior
* Default: None
.. code-block:: python
settings.configure(cascade_IS_random_seed=42)
60 changes: 60 additions & 0 deletions docs/prompt_strategies.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
Prompt Strategies
===================

In addition to calling the semantic operators, advanced prompt stratigies can be used to potentially
get or improve the desired output. Two Prompt Strategies that can be used are Chain of Thought (CoT) and
Demonstrations.

Chain of Thought + Demonstrations:
----------------------------------
Chain of Thought reasoning refers to structing prompts in a way that guides the model through a step-by-step process
to arrive at a final answer. By breaking down complex tasks into intermediate steps, CoT ensures more accurate and
logical output

Here is a simple example of using chain of thought with the Semantic Filter operator
.. code-block:: python
import pandas as pd
import lotus
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
data = {
"Course Name": [
"Probability and Random Processes",
"Optimization Methods in Engineering",
"Digital Design and Integrated Circuits",
"Computer Security",
]
}
df = pd.DataFrame(data)
user_instruction = "{Course Name} requires a lot of math"
example_data = {
"Course Name": ["Machine Learning", "Reaction Mechanisms", "Nordic History"],
"Answer": [True, True, False],
"Reasoning": ["Machine Learning requires a solid understanding of linear alebra and calculus",
"Reaction Engineering requires Ordinary Differential Equations to solve reactor design problems",
"Nordic History has no math involved"]
}
examples = pd.DataFrame(example_data)
df = df.sem_filter(user_instruction, examples=examples, strategy="cot")
print(df)
When calling the Semantic Filter operator, we pass in an example DataFrame as well as the CoT strategy, which acts as a guide
for how the model should reason and respond to the given instructions. For instance, in the examples DataFrame
* "Machine Learning" has an answer of True, with reasoning that it requires a solid understanding of linear algebra and calculus.
* "Reaction Mechanisms" also has an answer of True, justified by its reliance on ordinary differential equations for solving reactor design problems.
* "Nordic History" has an answer of False, as it does not involve any mathematical concepts.

Using the CoT strategy will provide an output below:
+---+----------------------------------------+-------------------------------------------------------------------+
| | Course Name | explanation_filter |
+---+----------------------------------------+-------------------------------------------------------------------+
| 0 | Probability and Random Processes | Probability and Random Processes is heavily based on... |
| 1 | Optimization Methods in Engineering | Optimization Methods in Engineering typically involves... |
| 2 | Digital Design and Integrated Circuits | Digital Design and Integrated Circuits typically covers... |
+---+-------------------------------------+----------------------------------------------------------------------+
49 changes: 39 additions & 10 deletions docs/sem_agg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,53 @@ Examples
import pandas as pd
import lotus
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
data = {
"Course Name": [
"Probability and Random Processes",
"Optimization Methods in Engineering",
"Digital Design and Integrated Circuits",
"Computer Security",
"Cooking",
"Food Sciences",
"ArticleTitle": [
"Advancements in Quantum Computing",
"Climate Change and Renewable Energy",
"The Rise of Artificial Intelligence",
"A Journey into Deep Space Exploration"
],
"ArticleContent": [
"""Quantum computing harnesses the properties of quantum mechanics
to perform computations at speeds unimaginable with classical machines.
As research and development progress, emerging quantum algorithms show
great promise in solving previously intractable problems.""",
"""Global temperatures continue to rise, and societies worldwide
are turning to renewable resources like solar and wind power to mitigate
climate change. The shift to green technology is expected to reshape
economies and significantly reduce carbon footprints.""",
"""Artificial Intelligence (AI) has grown rapidly, integrating
into various industries. Machine learning models now enable systems to
learn from massive datasets, improving efficiency and uncovering hidden
patterns. However, ethical concerns about privacy and bias must be addressed.""",
"""Deep space exploration aims to understand the cosmos beyond
our solar system. Recent missions focus on distant exoplanets, black holes,
and interstellar objects. Advancements in propulsion and life support systems
may one day enable human travel to far-off celestial bodies."""
]
}
df = pd.DataFrame(data)
df = df.sem_agg("Summarize all {Course Name}")
print(df)
df = df.sem_agg("Provide a concise summary of all {ArticleContent} in a single paragraph, highlighting the key technological progress and its implications for the future.")
print(df._output[0])
Output
"Recent technological advancements are reshaping various fields and have significant implications for the future.
Quantum computing is emerging as a powerful tool capable of solving complex problems at unprecedented speeds, while the
global shift towards renewable energy sources like solar and wind power aims to combat climate change and transform economies.
In the realm of Artificial Intelligence, rapid growth and integration into industries are enhancing efficiency and revealing
hidden data patterns, though ethical concerns regarding privacy and bias persist. Additionally, deep space exploration is
advancing with missions targeting exoplanets and black holes, potentially paving the way for human travel beyond our solar
system through improved propulsion and life support technologies."

1 change: 0 additions & 1 deletion docs/sem_partition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,3 @@ Example
out = df.sem_agg("Summarize all {Course Name}")._output[0]
print(out)
Output

0 comments on commit e260e45

Please sign in to comment.