diff --git a/docs/configurations.rst b/docs/configurations.rst new file mode 100644 index 00000000..48b1b80b --- /dev/null +++ b/docs/configurations.rst @@ -0,0 +1,54 @@ +Setting Configurations +======================= + +The Settings module is a central configuration system for managing application-wide settings. +It ensures consistent and thread-safe access to configurations, allowing settings to be dynamically +adjusted and temporarily overridden within specific contexts. In most examples seen, we have +used the settings to configured our LM. + +Using the Settings module +-------------------------- +.. code-block:: python + from lotus + from lotus.models import LM + + lm = LM(model="gpt-4o-mini") + lotus.settings.configure(lm=lm) + +Configurable Parameters +-------------------------- +1. enable_cache: + * Description: Enables or Disables cahcing mechanisms + * Default: False +.. code-block:: python + settings.configure(enable_cache=True) + +2. cascade_IS_weight: + * Description: Specifies the weight for importance Sampling in cascade Operators + * Default: 0.5 +.. code-block:: python + settings.configure(cascade_IS_weight=0.8) + +3. cascade_num_calibration_quantiles: + * Description: Number of quantiles used for calibrating sem_filter + * Defualt: 50 +.. code-block:: python + settings.configure(cascade_num_calibration_quantiles=100) + +4. min_join_cascade_size: + * Description: Minimum size of qa join cascade to trigger additional Processing + * Default: 100 +.. code-block:: python + settings.configure(min_join_cascade_size=200) + +5. cascade_IS_max_sample_range: + * DescriptionL maximum range for sampling during cascade IS Operations + * Default: 250 +.. code-block:: python + settings,configure(cascade_IS_max_sample_range= 500) + +6. cascade_IS_random_seed: + * Description: Seed value for randomization in casde IS. Use None for non-deterministic behavior + * Default: None +.. code-block:: python + settings.configure(cascade_IS_random_seed=42) \ No newline at end of file diff --git a/docs/prompt_strategies.rst b/docs/prompt_strategies.rst new file mode 100644 index 00000000..619d20ce --- /dev/null +++ b/docs/prompt_strategies.rst @@ -0,0 +1,60 @@ +Prompt Strategies +=================== + +In addition to calling the semantic operators, advanced prompt stratigies can be used to potentially +get or improve the desired output. Two Prompt Strategies that can be used are Chain of Thought (CoT) and +Demonstrations. + +Chain of Thought + Demonstrations: +---------------------------------- +Chain of Thought reasoning refers to structing prompts in a way that guides the model through a step-by-step process +to arrive at a final answer. By breaking down complex tasks into intermediate steps, CoT ensures more accurate and +logical output + +Here is a simple example of using chain of thought with the Semantic Filter operator +.. code-block:: python + import pandas as pd + + import lotus + from lotus.models import LM + + lm = LM(model="gpt-4o-mini") + + lotus.settings.configure(lm=lm) + data = { + "Course Name": [ + "Probability and Random Processes", + "Optimization Methods in Engineering", + "Digital Design and Integrated Circuits", + "Computer Security", + ] + } + df = pd.DataFrame(data) + user_instruction = "{Course Name} requires a lot of math" + + example_data = { + "Course Name": ["Machine Learning", "Reaction Mechanisms", "Nordic History"], + "Answer": [True, True, False], + "Reasoning": ["Machine Learning requires a solid understanding of linear alebra and calculus", + "Reaction Engineering requires Ordinary Differential Equations to solve reactor design problems", + "Nordic History has no math involved"] + } + examples = pd.DataFrame(example_data) + + df = df.sem_filter(user_instruction, examples=examples, strategy="cot") + print(df) + +When calling the Semantic Filter operator, we pass in an example DataFrame as well as the CoT strategy, which acts as a guide +for how the model should reason and respond to the given instructions. For instance, in the examples DataFrame +* "Machine Learning" has an answer of True, with reasoning that it requires a solid understanding of linear algebra and calculus. +* "Reaction Mechanisms" also has an answer of True, justified by its reliance on ordinary differential equations for solving reactor design problems. +* "Nordic History" has an answer of False, as it does not involve any mathematical concepts. + +Using the CoT strategy will provide an output below: ++---+----------------------------------------+-------------------------------------------------------------------+ +| | Course Name | explanation_filter | ++---+----------------------------------------+-------------------------------------------------------------------+ +| 0 | Probability and Random Processes | Probability and Random Processes is heavily based on... | +| 1 | Optimization Methods in Engineering | Optimization Methods in Engineering typically involves... | +| 2 | Digital Design and Integrated Circuits | Digital Design and Integrated Circuits typically covers... | ++---+-------------------------------------+----------------------------------------------------------------------+ \ No newline at end of file diff --git a/docs/sem_agg.rst b/docs/sem_agg.rst index a8956c93..f4c8ac1d 100644 --- a/docs/sem_agg.rst +++ b/docs/sem_agg.rst @@ -12,24 +12,53 @@ Examples import pandas as pd import lotus + from lotus.models import LM lm = LM(model="gpt-4o-mini") - lotus.settings.configure(lm=lm) + data = { - "Course Name": [ - "Probability and Random Processes", - "Optimization Methods in Engineering", - "Digital Design and Integrated Circuits", - "Computer Security", - "Cooking", - "Food Sciences", + "ArticleTitle": [ + "Advancements in Quantum Computing", + "Climate Change and Renewable Energy", + "The Rise of Artificial Intelligence", + "A Journey into Deep Space Exploration" + ], + "ArticleContent": [ + """Quantum computing harnesses the properties of quantum mechanics + to perform computations at speeds unimaginable with classical machines. + As research and development progress, emerging quantum algorithms show + great promise in solving previously intractable problems.""", + + """Global temperatures continue to rise, and societies worldwide + are turning to renewable resources like solar and wind power to mitigate + climate change. The shift to green technology is expected to reshape + economies and significantly reduce carbon footprints.""", + + """Artificial Intelligence (AI) has grown rapidly, integrating + into various industries. Machine learning models now enable systems to + learn from massive datasets, improving efficiency and uncovering hidden + patterns. However, ethical concerns about privacy and bias must be addressed.""", + + """Deep space exploration aims to understand the cosmos beyond + our solar system. Recent missions focus on distant exoplanets, black holes, + and interstellar objects. Advancements in propulsion and life support systems + may one day enable human travel to far-off celestial bodies.""" ] } + df = pd.DataFrame(data) - df = df.sem_agg("Summarize all {Course Name}") - print(df) + + df = df.sem_agg("Provide a concise summary of all {ArticleContent} in a single paragraph, highlighting the key technological progress and its implications for the future.") + print(df._output[0]) Output +"Recent technological advancements are reshaping various fields and have significant implications for the future. +Quantum computing is emerging as a powerful tool capable of solving complex problems at unprecedented speeds, while the +global shift towards renewable energy sources like solar and wind power aims to combat climate change and transform economies. +In the realm of Artificial Intelligence, rapid growth and integration into industries are enhancing efficiency and revealing +hidden data patterns, though ethical concerns regarding privacy and bias persist. Additionally, deep space exploration is +advancing with missions targeting exoplanets and black holes, potentially paving the way for human travel beyond our solar +system through improved propulsion and life support technologies." diff --git a/docs/sem_partition.rst b/docs/sem_partition.rst index 8e496704..ec9faeb2 100644 --- a/docs/sem_partition.rst +++ b/docs/sem_partition.rst @@ -32,4 +32,3 @@ Example out = df.sem_agg("Summarize all {Course Name}")._output[0] print(out) -Output