Implement DQN for Acrobot and MountainCar with enhanced exploration #64

leonvanbokhorst · 2024-11-26T08:17:01Z

Implement enhanced exploration strategy with momentum-based decisions
Add position and velocity-based reward structure
Increase success reward and add progressive rewards for near-goal states
Track success positions and epsilon history
Achieve 5/5 successful episodes in play mode with consistent performance

Summary by Sourcery

Implement Double DQN architectures for Acrobot-v1 and MountainCar-v0 environments, introducing enhanced exploration strategies and reward structures.

New Features:

Implement a Double DQN architecture for the Acrobot-v1 environment with experience replay and continuous state space handling.
Introduce a MountainCar-v0 DQN implementation featuring Double DQN architecture, prioritized experience replay, and dueling networks.

Enhancements:

Enhance exploration strategy in the MountainCar-v0 environment with momentum-based decisions and position-based strategies.
Add a position and velocity-based reward structure to the MountainCar-v0 environment, including increased success rewards and progressive rewards for near-goal states.

- Implement enhanced exploration strategy with momentum-based decisions - Add position and velocity-based reward structure - Increase success reward and add progressive rewards for near-goal states - Track success positions and epsilon history - Achieve 5/5 successful episodes in play mode with consistent performance

sourcery-ai · 2024-11-26T08:17:05Z

Reviewer's Guide by Sourcery

This PR implements two reinforcement learning agents for the Acrobot and MountainCar environments using Deep Q-Networks (DQN). The implementation includes several advanced features like Double DQN, Experience Replay, and enhanced exploration strategies. Both agents use curriculum learning and sophisticated reward shaping to improve training efficiency.

Sequence diagram for AcrobotAI training process

sequenceDiagram
    actor Trainer
    participant AcrobotAI
    participant Environment
    participant ReplayMemory
    participant DQN
    Trainer->>AcrobotAI: train(num_episodes)
    loop for each episode
        AcrobotAI->>Environment: reset()
        AcrobotAI->>DQN: select_action(state)
        DQN-->>AcrobotAI: action
        AcrobotAI->>Environment: step(action)
        Environment-->>AcrobotAI: next_state, reward, done
        AcrobotAI->>ReplayMemory: push(Experience)
        AcrobotAI->>DQN: optimize_model(experiences)
        DQN-->>AcrobotAI: update policy
    end
    AcrobotAI-->>Trainer: training complete

Sequence diagram for MountainCarAI training process

sequenceDiagram
    actor Trainer
    participant MountainCarAI
    participant Environment
    participant ReplayMemory
    participant DQN
    Trainer->>MountainCarAI: train(num_episodes)
    loop for each episode
        MountainCarAI->>Environment: reset()
        MountainCarAI->>DQN: select_action(state)
        DQN-->>MountainCarAI: action
        MountainCarAI->>Environment: step(action)
        Environment-->>MountainCarAI: next_state, reward, done
        MountainCarAI->>ReplayMemory: push(Experience)
        MountainCarAI->>DQN: optimize_model()
        DQN-->>MountainCarAI: update policy
    end
    MountainCarAI-->>Trainer: training complete

Class diagram for AcrobotAI and MountainCarAI

classDiagram
    class DQN {
        +Linear fc1
        +Linear fc2
        +Linear fc3
        +forward(Tensor x) Tensor
    }
    class ReplayMemory {
        +deque memory
        +push(Experience experience)
        +sample(int batch_size) List~Experience~
        +int __len__()
    }
    class AcrobotAI {
        +DQN policy_net
        +DQN target_net
        +ReplayMemory memory
        +select_action(Tensor state) Tensor
        +optimize_model(List~Experience~ experiences)
        +train(int num_episodes) bool
        +play(int episodes)
    }
    class MountainCarAI {
        +DQN policy_net
        +DQN target_net
        +ReplayMemory memory
        +select_action(Tensor state) int
        +optimize_model()
        +train(int num_episodes) bool
        +play(int episodes)
    }
    DQN <|-- AcrobotAI
    DQN <|-- MountainCarAI
    ReplayMemory <|-- AcrobotAI
    ReplayMemory <|-- MountainCarAI
    Experience <|-- ReplayMemory
    Experience : state
    Experience : action
    Experience : reward
    Experience : next_state
    Experience : done
    class Experience {
        +state
        +action
        +reward
        +next_state
        +done
    }

File-Level Changes

Change	Details	Files
Implemented Acrobot DQN agent with curriculum learning and momentum-based exploration	Created DQN architecture with 3-layer neural network Implemented experience replay buffer for training stability Added curriculum-based learning with progressive difficulty phases Designed momentum-based exploration strategy with smart random actions Implemented progressive reward structure based on height and stability	`gym/acrobot.py`
Implemented MountainCar DQN agent with prioritized experience and position-based rewards	Created simplified DQN architecture optimized for MountainCar Implemented prioritized experience replay memory Added enhanced exploration strategy based on velocity and position Designed position and velocity-based reward structure Added success position and epsilon history tracking	`gym/mountaincar.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @leonvanbokhorst - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-11-26T08:17:55Z

gym/acrobot.py

+        cos_theta2, sin_theta2 = state[2].item(), state[3].item()
+        tip_height = -cos_theta1 - cos_theta2
+
+        # More exploration when stuck


issue (bug_risk): stuck_counter is used before initialization

Initialize stuck_counter in init to avoid potential runtime errors.

sourcery-ai · 2024-11-26T08:17:55Z

gym/acrobot.py

+                done = terminated or truncated
+                steps += 1
+
+                # Calculate height


issue (complexity): Consider extracting reward calculation logic into a dedicated method to improve code organization.

The reward calculation logic can be simplified by extracting it into a dedicated method. This would improve readability while maintaining all functionality. Here's a suggested refactor:

def calculate_reward(self, tip_height: float, target_height: float, prev_max_height: float) -> tuple[float, bool]: reward = 0 episode_success = False # Base curriculum reward if tip_height > target_height: reward += 10.0 # Progressive rewards reward += (tip_height - target_height) * 5.0 # Previous max height bonus if tip_height > prev_max_height: reward += 5.0 self.stuck_counter = 0 else: self.stuck_counter += 1 # Success reward if tip_height > SUCCESS_HEIGHT: episode_success = True reward += 100.0 return reward - 0.1, episode_success # Include time penalty

Then simplify the training loop:

while not done: action = self.select_action(state) next_state, _, terminated, truncated, _ = self.env.step(action.item()) done = terminated or truncated steps += 1 tip_height = -(next_state[0] + next_state[2]) # Simplified height calc max_height = max(max_height, tip_height) reward, episode_success = self.calculate_reward( tip_height, target_height, prev_max_height ) self.memory.push(Experience(state, action, reward, next_state, done)) # ... rest of training loop

This separates reward calculation concerns from the main training flow while keeping all the existing logic intact.

sourcery-ai · 2024-11-26T08:17:55Z

gym/acrobot.py

+        torch.nn.utils.clip_grad_value_(self.policy_net.parameters(), 100)
+        self.optimizer.step()
+
+    def train(self, num_episodes: int):


issue (code-quality): We've found these issues:

Hoist statements out of for/while loops (hoist-statement-from-loop)

Low code quality found in AcrobotAI.train - 14% (low-code-quality)

Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.

Reduce nesting, perhaps by introducing guard clauses to return early.

Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.

sourcery-ai · 2024-11-26T08:17:55Z

gym/acrobot.py

+    ai = AcrobotAI()
+
+    print("🏃‍♂️ Training the agent...")
+    solved = ai.train(num_episodes=1000)


issue (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

sourcery-ai · 2024-11-26T08:17:55Z

gym/mountaincar.py

+
+        position, velocity = state
+
+        if random.random() > epsilon:


issue (code-quality): Merge else clause's nested if statement into elif (merge-else-if-into-elif)

sourcery-ai · 2024-11-26T08:17:55Z

gym/mountaincar.py

+        torch.nn.utils.clip_grad_value_(self.policy_net.parameters(), 100)
+        self.optimizer.step()
+
+    def train(self, num_episodes: int):


issue (code-quality): We've found these issues:

Replace if statement with if expression (assign-if-exp)

Low code quality found in MountainCarAI.train - 9% (low-code-quality)

Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.

Reduce nesting, perhaps by introducing guard clauses to return early.

Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.

sourcery-ai · 2024-11-26T08:17:55Z

gym/mountaincar.py

+    ai = MountainCarAI()
+
+    print("\n🏃‍♂️ Training the agent...")
+    solved = ai.train(num_episodes=1000)  # Increased max episodes


issue (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

sourcery-ai bot changed the title ~~@sourcery-ai~~ Implement DQN for Acrobot and MountainCar with enhanced exploration Nov 26, 2024

leonvanbokhorst self-assigned this Nov 26, 2024

leonvanbokhorst added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 26, 2024

leonvanbokhorst added this to the Phase 1 milestone Nov 26, 2024

leonvanbokhorst merged commit ca62d33 into main Nov 26, 2024
1 check failed

sourcery-ai bot approved these changes Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DQN for Acrobot and MountainCar with enhanced exploration #64

Implement DQN for Acrobot and MountainCar with enhanced exploration #64

leonvanbokhorst commented Nov 26, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 26, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Nov 26, 2024

sourcery-ai bot Nov 26, 2024

sourcery-ai bot Nov 26, 2024

sourcery-ai bot Nov 26, 2024

sourcery-ai bot Nov 26, 2024

sourcery-ai bot Nov 26, 2024

sourcery-ai bot Nov 26, 2024

Implement DQN for Acrobot and MountainCar with enhanced exploration #64

Implement DQN for Acrobot and MountainCar with enhanced exploration #64

Conversation

leonvanbokhorst commented Nov 26, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Nov 26, 2024 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for AcrobotAI training process

Sequence diagram for MountainCarAI training process

Class diagram for AcrobotAI and MountainCarAI

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Nov 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 26, 2024

Choose a reason for hiding this comment

leonvanbokhorst commented Nov 26, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 26, 2024 •

edited

Loading