advanced usage + examples

lotus-data · Dec 13, 2024 · e260e45 · e260e45
1 parent c074c4b
commit e260e45
Show file tree

Hide file tree

Showing 4 changed files with 153 additions and 11 deletions.
diff --git a/docs/configurations.rst b/docs/configurations.rst
@@ -0,0 +1,54 @@
+Setting Configurations
+=======================
+
+The Settings module is a central configuration system for managing application-wide settings. 
+It ensures consistent and thread-safe access to configurations, allowing settings to be dynamically 
+adjusted and temporarily overridden within specific contexts. In most examples seen, we have 
+used the settings to configured our LM.
+
+Using the Settings module
+--------------------------
+.. code-block:: python
+    from lotus
+    from lotus.models import LM
+
+    lm = LM(model="gpt-4o-mini")
+    lotus.settings.configure(lm=lm)
+
+Configurable Parameters
+--------------------------
+1. enable_cache: 
+    * Description: Enables or Disables cahcing mechanisms
+    * Default: False
+.. code-block:: python
+    settings.configure(enable_cache=True)
+
+2. cascade_IS_weight: 
+    * Description: Specifies the weight for importance Sampling in cascade Operators
+    * Default: 0.5
+.. code-block:: python
+    settings.configure(cascade_IS_weight=0.8)
+
+3. cascade_num_calibration_quantiles:
+    * Description: Number of quantiles used for calibrating sem_filter
+    * Defualt: 50
+.. code-block:: python
+    settings.configure(cascade_num_calibration_quantiles=100)
+
+4. min_join_cascade_size:
+    * Description: Minimum size of qa join cascade to trigger additional Processing
+    * Default: 100
+.. code-block:: python 
+    settings.configure(min_join_cascade_size=200)
+
+5. cascade_IS_max_sample_range:
+    * DescriptionL maximum range for sampling during cascade IS Operations
+    * Default: 250
+.. code-block:: python
+    settings,configure(cascade_IS_max_sample_range= 500)
+
+6. cascade_IS_random_seed:
+    * Description: Seed value for randomization in casde IS. Use None for non-deterministic behavior
+    * Default: None
+.. code-block:: python
+    settings.configure(cascade_IS_random_seed=42)
diff --git a/docs/prompt_strategies.rst b/docs/prompt_strategies.rst
@@ -0,0 +1,60 @@
+Prompt Strategies
+===================
+
+In addition to calling the semantic operators, advanced prompt stratigies can be used to potentially
+get or improve the desired output. Two Prompt Strategies that can be used are Chain of Thought (CoT) and 
+Demonstrations.
+
+Chain of Thought + Demonstrations:
+----------------------------------
+Chain of Thought reasoning refers to structing prompts in a way that guides the model through a step-by-step process 
+to arrive at a final answer. By breaking down complex tasks into intermediate steps, CoT ensures more accurate and 
+logical output
+
+Here is a simple example of using chain of thought with the Semantic Filter operator
+.. code-block:: python
+    import pandas as pd
+
+    import lotus
+    from lotus.models import LM
+
+    lm = LM(model="gpt-4o-mini")
+
+    lotus.settings.configure(lm=lm)
+    data = {
+        "Course Name": [
+            "Probability and Random Processes",
+            "Optimization Methods in Engineering",
+            "Digital Design and Integrated Circuits",
+            "Computer Security",
+        ]
+    }
+    df = pd.DataFrame(data)
+    user_instruction = "{Course Name} requires a lot of math"
+
+    example_data = {
+        "Course Name": ["Machine Learning", "Reaction Mechanisms", "Nordic History"], 
+        "Answer": [True, True, False],
+        "Reasoning": ["Machine Learning requires a solid understanding of linear alebra and calculus",
+                      "Reaction Engineering requires Ordinary Differential Equations to solve reactor design problems",
+                      "Nordic History has no math involved"]
+    }
+    examples = pd.DataFrame(example_data)
+
+    df = df.sem_filter(user_instruction, examples=examples, strategy="cot")
+    print(df)
+
+When calling the Semantic Filter operator, we pass in an example DataFrame as well as the CoT strategy, which acts as a guide 
+for how the model should reason and respond to the given instructions. For instance, in the examples DataFrame 
+* "Machine Learning" has an answer of True, with reasoning that it requires a solid understanding of linear algebra and calculus.
+* "Reaction Mechanisms" also has an answer of True, justified by its reliance on ordinary differential equations for solving reactor design problems.
+* "Nordic History" has an answer of False, as it does not involve any mathematical concepts.
+
+Using the CoT strategy will provide an output below:
++---+----------------------------------------+-------------------------------------------------------------------+
+|   |           Course Name                  |                    explanation_filter                             |
++---+----------------------------------------+-------------------------------------------------------------------+
+| 0 | Probability and Random Processes       | Probability and Random Processes is heavily based on...           |
+| 1 | Optimization Methods in Engineering    | Optimization Methods in Engineering typically involves...         |
+| 2 | Digital Design and Integrated Circuits | Digital Design and Integrated Circuits typically covers...        |
++---+-------------------------------------+----------------------------------------------------------------------+
diff --git a/docs/sem_agg.rst b/docs/sem_agg.rst
@@ -12,24 +12,53 @@ Examples
     import pandas as pd
 
     import lotus
+
     from lotus.models import LM
 
     lm = LM(model="gpt-4o-mini")
-
     lotus.settings.configure(lm=lm)
+
     data = {
-        "Course Name": [
-            "Probability and Random Processes",
-            "Optimization Methods in Engineering",
-            "Digital Design and Integrated Circuits",
-            "Computer Security",
-            "Cooking",
-            "Food Sciences",
+        "ArticleTitle": [
+            "Advancements in Quantum Computing",
+            "Climate Change and Renewable Energy",
+            "The Rise of Artificial Intelligence",
+            "A Journey into Deep Space Exploration"
+        ],
+        "ArticleContent": [
+            """Quantum computing harnesses the properties of quantum mechanics 
+            to perform computations at speeds unimaginable with classical machines. 
+            As research and development progress, emerging quantum algorithms show 
+            great promise in solving previously intractable problems.""",
+            
+            """Global temperatures continue to rise, and societies worldwide 
+            are turning to renewable resources like solar and wind power to mitigate 
+            climate change. The shift to green technology is expected to reshape 
+            economies and significantly reduce carbon footprints.""",
+            
+            """Artificial Intelligence (AI) has grown rapidly, integrating 
+            into various industries. Machine learning models now enable systems to 
+            learn from massive datasets, improving efficiency and uncovering hidden 
+            patterns. However, ethical concerns about privacy and bias must be addressed.""",
+            
+            """Deep space exploration aims to understand the cosmos beyond 
+            our solar system. Recent missions focus on distant exoplanets, black holes, 
+            and interstellar objects. Advancements in propulsion and life support systems 
+            may one day enable human travel to far-off celestial bodies."""
         ]
     }
+
     df = pd.DataFrame(data)
-    df = df.sem_agg("Summarize all {Course Name}")
-    print(df)
+
+    df = df.sem_agg("Provide a concise summary of all {ArticleContent} in a single paragraph, highlighting the key technological progress and its implications for the future.")
+    print(df._output[0])
 
 Output
+"Recent technological advancements are reshaping various fields and have significant implications for the future. 
+Quantum computing is emerging as a powerful tool capable of solving complex problems at unprecedented speeds, while the 
+global shift towards renewable energy sources like solar and wind power aims to combat climate change and transform economies. 
+In the realm of Artificial Intelligence, rapid growth and integration into industries are enhancing efficiency and revealing 
+hidden data patterns, though ethical concerns regarding privacy and bias persist. Additionally, deep space exploration is 
+advancing with missions targeting exoplanets and black holes, potentially paving the way for human travel beyond our solar 
+system through improved propulsion and life support technologies."
 
diff --git a/docs/sem_partition.rst b/docs/sem_partition.rst
@@ -32,4 +32,3 @@ Example
     out = df.sem_agg("Summarize all {Course Name}")._output[0]
     print(out)
 
-Output
Original file line number	Diff line number	Diff line change
Expand Up		@@ -32,4 +32,3 @@ Example
		out = df.sem_agg("Summarize all {Course Name}")._output[0]
		print(out)

		Output