Skip to content

Commit

Permalink
Docs: add learn section (#238)
Browse files Browse the repository at this point in the history
  • Loading branch information
lyie28 authored May 17, 2024
1 parent bc7e880 commit 465c8fc
Show file tree
Hide file tree
Showing 6 changed files with 129 additions and 5 deletions.
5 changes: 5 additions & 0 deletions .github/lavague-dic.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,9 @@ userdata
environ
getenv
debian
Yann
LeCun
LeCun's
webpage
WaveHunter
viewport
Binary file added docs/assets/research.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/docs/get-started/quick-tour-notebook/quick-tour.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@
"source": [
"# World model\n",
"\n",
"Next, we will initialize our `WorldModel`, providing it with a file containing examples of global objectives for actions to be taken on this website being broken down into a chain of thoughts, with the next instruction to be passed to the `ActionEngine`."
"Next, we will initialize our `WorldModel`, providing it with examples of global objectives for actions and the desired thought process and reasoning we wish it to replicate to generate the next instruction that needs to be passed to the `ActionEngine`."
]
},
{
Expand Down
63 changes: 60 additions & 3 deletions docs/docs/get-started/quick-tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pip install lavague

## Action Engine

**An agent is made up of two components: an `ActionEngine` and a `WorldModel`.**
**An agent is made up of two components: an `Action Engine` and a `World Model`.**

Let's start by initializing an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them.

Expand All @@ -39,15 +39,15 @@ action_engine = ActionEngine(selenium_driver)

## World Model

Next, we will initialize our `WorldModel`, providing it with examples of global objectives for actions on this website being broken down into a chain of thoughts and then the next instruction that needs to be passed to the `ActionEngine`.
Next, we will initialize our `World Model`, providing it with examples of global objectives for actions and the desired thought process and reasoning we wish it to replicate to generate the next instruction that needs to be passed to the `ActionEngine`.

```python
from lavague.core import WorldModel

world_model = WorldModel.from_hub("hf_example")
```

## WebAgent Demo
## Web Agent demo

We can now use these two elements to initialize a `WebAgent` and start playing with it!

Expand All @@ -63,3 +63,60 @@ agent.run("Go on the quicktour of PEFT")
```

![qt_output](../../assets/demo_agent_hf.gif)

## World Model examples file

When we initialized the World Model, we saw that we must provide a file containing examples. This file shows the World Model the desired thought process and reasoning we wish for it to replicate to generate the next instruction.

We initialized the World Model with an example file from our 'hub' - which is an open-source folder in our GitHub repo, which you can find (and contribute to) [here](https://github.com/lavague-ai/LaVague/tree/main/examples/knowledge)!

This was done by using the `WorldModel.from_hub()` method, passing it the name of the file we wanted to download, "hf_example" (without the `.txt` file extension ending).

!!! note "World Model initialization options"

Note, as well as pulling an example file from our GitHUb repo with our `from_hub()` method. You can:

- Specify a path to a local file containing examples by using the `WorldModel.from_local() method`
- Provide examples directly as a string with the `WorldMethod()` default constructor.

Let's take a look at one of the multiple examples including in that file:

```
Objective: Ask the AI model 'Command R plus' 'What is love'
Thought:
- I am on the Hugging Face website.
- Hugging Face is a company that hosts AI models, and allows users to interact with models on them through the chat.
- Therefore, to answer the objective of asking the AI model 'Command R Plus' 'What is love', we need first to find the model page.
- Given the current screenshot, the fastest way to find the model page seems to be to use the search bar.
Instruction: Type 'Command R plus' on the search bar with placeholder "Search ..." and click on the first result
```
These examples are inserted into our full World Model default prompt:

??? note "Default World Model prompt in full"

You are an AI system specialized in high level reasoning. Your goal is to generate instructions for other specialized AIs to perform web actions to reach objectives given by humans.
Your inputs are an objective in natural language, as well as a screenshot of the current page of the browser.
Your output are a list of thoughts in bullet points detailing your reasoning, followed by your conclusion on what the next step should be in the form of an instruction.
You can assume the instruction is used by another AI to generate the action code to select the element to be interacted with and perform the action demanded by the human.

The instruction should be detailed as possible and only contain the next step.
Do not make assumptions about elements you do not see.
If the objective is already achieved in the screenshot, provide the instruction 'STOP'.

Here are previous examples:
${examples}

Objective: ${objective}
Thought:

By providing our `World Model` with examples, we can help our `World Model` to learn to generate instructions by demonstrating the desired thought process and structure for completing tasks.

!!! tips "Contribute to our Knowledge Hub"

You can contribute example files for a website of your choice, by creating text files with examples of `objectives`, `thoughts` and `instructions` and submitting your file as a `PR` to our GitHub repo.

See the [contribution section of the docs](../contributing/contributing.md) for more information.

## Learn

To learn more about the LaVague architecture and workflow, head to the [learn section in our docs](../learn/architecture.md)!
59 changes: 59 additions & 0 deletions docs/docs/learn/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# LaVague Architecture & Workflow

## Definitions

Let's first define some of the key elements in our LaVague Agent architecture:

- `Objective`: The objective is the global goal that the user wants the Web Agent to achieve. For example: `"Log into my account and change my username to The WaveHunter."`
- `Instruction:` An instruction is the a smaller step needed to move towards achieving the user's objective. For example: `"Locate the username input field and enter the text 'user123'."`
- `World Model`: The World Model analyzes the user's objective and the current state of a webpage to generate the next instruction needed in order to eventually achieving the objective.
- `Action Engine`: The Action Engine receives this instruction and generates the automation code required to perform this action.
- `Driver`: The webdriver is both leveraged for the execution of the action code generated by the Action Engine and provides the World Model with **perception** through screenshots and HTML source code of current state of the webpage.

## Workflow

All the elements previously described interact in the following workflow:

![LaVague Workflow](../../assets/architecture.png)

1. The user's global objective is handled by the World Model. It considers this objective along with the state of the webpage through screenshots and HTML code, and generate the next step, aka. text instruction, needed to achieve this objective.

2. This instruction is sent to the ActionEngine, which then generates the automation code needed to perform this step and executes it.

3. The World Model then receives new text and image data, aka. a new screenshot and the updated source code, to reflect the updated state of the web page. With this information, it is able to generate the next instruction needed to achieve the objective.

4. This process repeats until the objective is achieved!


## Example workflow

To make this workflow clear, let's consider an example:

1. The World Model is given the following objective: `"Log into my account and change my username to The WaveHunter."`

The driver provides the World Model with the initial state of the webpage: the login page is loaded with empty username and password fields.

The World Model might then generate the following first instruction: `"Locate the username input field and enter the text 'user123'."`

2. The Action Engine receives the instruction, and generates the following automation code, which is then executed:

```python

from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
driver.find_element(By.ID, 'username').send_keys('user123')
```

3. The webpage state is updated (username is now entered in the field). A new screenshot and HTML source are captured.

The World Model receives the updated state and generates the next instruction:`"Locate the password input field and enter the text 'password456'."`

This process repeats until the final objective is achieved.

## Current research

Several elements of our approach and architecture are inspired by Yann LeCun's research paper: [A Path Towards Autonomous Machine Intelligence](https://openreview.net/pdf?id=BZ5a1r-kVsf).

The paper proposes an architecture for AI that consists of multiple modules, each responsible for different functions such as perception, world modeling, memory, and action generation. These modules work together to enable the AI to perceive its environment, react to it, reason about it, plan actions, and execute them, much like the human brain leverages and combines different cognitive processes.

<img src="https://raw.githubusercontent.com/lavague-ai/LaVague/under-the-hood/docs/assets/research.png" alt="Yann LeCun's proposed architecture for AI" width="80%">
5 changes: 4 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Website Info
site_name: LaVague
site_url: https://docs.lavague.ai/en/latest/
site_author: Mithril Security
site_author: LaVague

# Repository
repo_name: lavague-ai/LaVague
Expand Down Expand Up @@ -95,8 +95,11 @@ nav:
- ⚡ Getting Started:
- Installation: 'docs/get-started/installation.md'
- Quick tour: 'docs/get-started/quick-tour.md'
# - Under the hood: 'docs/get-started/under-the-hood.md'
# - Customization: 'docs/get-started/customization.md'

- 📚 Learn:
- Architecture: 'docs/learn/architecture.md'
# - 🚀 Action Engine:
# - 🤝 Integrations:
# - Overview: 'docs/integrations/home.md'
Expand Down

0 comments on commit 465c8fc

Please sign in to comment.