Skip to content

Commit

Permalink
Merge pull request #1057 from ashis2004/77
Browse files Browse the repository at this point in the history
optimized the trading strategy based on sentiment analysis added
  • Loading branch information
invigorzz313 authored Aug 9, 2024
2 parents 54fad7c + 5aef055 commit 5f87b72
Show file tree
Hide file tree
Showing 3 changed files with 1,832 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"optimized the trading strategy based on sentiment analysis"
],
"metadata": {
"id": "S_I_QeRRg52m"
}
},
{
"cell_type": "markdown",
"source": [
"Loading the Dataset"
],
"metadata": {
"id": "x14oHjR0g9Dk"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"# Load data\n",
"data_sentiment = pd.read_csv('sentiment_trading_data.csv')\n",
"print(data_sentiment.head())\n",
"\n",
"# Convert to NumPy array (excluding the 'Date' column)\n",
"data_sentiment = data_sentiment.drop(columns=['Date']).to_numpy()\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vHerMIPte3d2",
"outputId": "6fd74df2-d792-4ef5-ac99-1e170e3c33dc"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" Date Stock_Price Sentiment\n",
"0 2020-01-01 87.454012 0.302466\n",
"1 2020-01-02 145.071431 -0.786814\n",
"2 2020-01-03 123.199394 0.315691\n",
"3 2020-01-04 109.865848 0.998827\n",
"4 2020-01-05 65.601864 -0.903576\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Intialize the fitness function"
],
"metadata": {
"id": "ObuEsypzhAGI"
}
},
{
"cell_type": "code",
"source": [
"def fitness_function(individual, data):\n",
" stock_prices = data[:, 0]\n",
" sentiments = data[:, 1]\n",
"\n",
" capital = 100000 # Starting capital\n",
" position = 0 # Initial position (0 means no stock held)\n",
"\n",
" for i in range(len(data)):\n",
" if sentiments[i] > individual[0]: # Buy signal based on sentiment threshold\n",
" position += capital // stock_prices[i] # Buy as many stocks as possible\n",
" capital -= position * stock_prices[i] # Deduct spent capital\n",
" elif sentiments[i] < individual[1]: # Sell signal based on sentiment threshold\n",
" capital += position * stock_prices[i] # Sell all stocks\n",
" position = 0 # Reset position\n",
"\n",
" return capital\n",
"\n"
],
"metadata": {
"id": "M5oMXowze6Od"
},
"execution_count": 4,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Intialize Population"
],
"metadata": {
"id": "PH2klZcvhEct"
}
},
{
"cell_type": "code",
"source": [
"def initialize_population(pop_size):\n",
" population = []\n",
" for _ in range(pop_size):\n",
" buy_threshold = np.random.uniform(-1, 1) # Random buy threshold\n",
" sell_threshold = np.random.uniform(-1, 1) # Random sell threshold\n",
" individual = [buy_threshold, sell_threshold]\n",
" population.append(individual)\n",
" return population\n"
],
"metadata": {
"id": "uYNfoXAxe7-m"
},
"execution_count": 5,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Perform Selection"
],
"metadata": {
"id": "M9Wr94fnhHaZ"
}
},
{
"cell_type": "code",
"source": [
"def selection(population, fitness_scores, num_parents):\n",
" parents = [population[idx] for idx in np.argsort(fitness_scores)[-num_parents:]]\n",
" return parents\n"
],
"metadata": {
"id": "BIzNDPbue9fc"
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Perform crossover"
],
"metadata": {
"id": "sTsXdkBRhJ4v"
}
},
{
"cell_type": "code",
"source": [
"def crossover(parents, offspring_size):\n",
" offspring = []\n",
" for _ in range(offspring_size):\n",
" parent1 = parents[np.random.randint(len(parents))]\n",
" parent2 = parents[np.random.randint(len(parents))]\n",
" crossover_point = np.random.randint(1, len(parent1))\n",
" child = parent1[:crossover_point] + parent2[crossover_point:]\n",
" offspring.append(child)\n",
" return offspring\n"
],
"metadata": {
"id": "xOLySvsHe-5E"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Perform Mutation"
],
"metadata": {
"id": "W_35vtRthMog"
}
},
{
"cell_type": "code",
"source": [
"def mutation(offspring, mutation_rate):\n",
" for individual in offspring:\n",
" if np.random.rand() < mutation_rate:\n",
" mutation_point = np.random.randint(len(individual))\n",
" individual[mutation_point] = np.random.uniform(-1, 1) # Mutate with new random threshold\n",
" return offspring\n"
],
"metadata": {
"id": "oQV3Xgt2fAyk"
},
"execution_count": 8,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Perform Genetic algorithm"
],
"metadata": {
"id": "ch0Uzd7OhOyc"
}
},
{
"cell_type": "code",
"source": [
"def genetic_algorithm(data, num_generations, pop_size, num_parents, mutation_rate):\n",
" population = initialize_population(pop_size)\n",
"\n",
" for generation in range(num_generations):\n",
" fitness_scores = [fitness_function(individual, data) for individual in population]\n",
" parents = selection(population, fitness_scores, num_parents)\n",
" offspring_size = pop_size - len(parents)\n",
" offspring = crossover(parents, offspring_size)\n",
" offspring = mutation(offspring, mutation_rate)\n",
" population = parents + offspring\n",
"\n",
" best_fitness = np.max(fitness_scores)\n",
" print(f\"Generation {generation}: Best Fitness = {best_fitness}\")\n",
"\n",
" best_individual = population[np.argmax(fitness_scores)]\n",
" return best_individual\n",
"\n",
"# Run the genetic algorithm\n",
"num_generations = 50\n",
"pop_size = 100\n",
"num_parents = 20\n",
"mutation_rate = 0.01\n",
"\n",
"best_params = genetic_algorithm(data_sentiment, num_generations, pop_size, num_parents, mutation_rate)\n",
"print(f\"Best Trading Strategy: Buy Threshold = {best_params[0]}, Sell Threshold = {best_params[1]}\")\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "hBM-j180fCPe",
"outputId": "fec15d5d-e03e-477d-eff7-4c87591f2edc"
},
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Generation 0: Best Fitness = 129785.02657707607\n",
"Generation 1: Best Fitness = 176212273.2138567\n",
"Generation 2: Best Fitness = 3.900617142553841e+32\n",
"Generation 3: Best Fitness = 3.900617142553841e+32\n",
"Generation 4: Best Fitness = 3.900617142553841e+32\n",
"Generation 5: Best Fitness = 3.900617142553841e+32\n",
"Generation 6: Best Fitness = 3.900617142553841e+32\n",
"Generation 7: Best Fitness = 3.900617142553841e+32\n",
"Generation 8: Best Fitness = 3.900617142553841e+32\n",
"Generation 9: Best Fitness = 3.900617142553841e+32\n",
"Generation 10: Best Fitness = 3.900617142553841e+32\n",
"Generation 11: Best Fitness = 3.900617142553841e+32\n",
"Generation 12: Best Fitness = 3.900617142553841e+32\n",
"Generation 13: Best Fitness = 3.900617142553841e+32\n",
"Generation 14: Best Fitness = 3.900617142553841e+32\n",
"Generation 15: Best Fitness = 3.900617142553841e+32\n",
"Generation 16: Best Fitness = 3.900617142553841e+32\n",
"Generation 17: Best Fitness = 3.900617142553841e+32\n",
"Generation 18: Best Fitness = 3.900617142553841e+32\n",
"Generation 19: Best Fitness = 3.900617142553841e+32\n",
"Generation 20: Best Fitness = 3.900617142553841e+32\n",
"Generation 21: Best Fitness = 3.900617142553841e+32\n",
"Generation 22: Best Fitness = 3.900617142553841e+32\n",
"Generation 23: Best Fitness = 3.900617142553841e+32\n",
"Generation 24: Best Fitness = 3.900617142553841e+32\n",
"Generation 25: Best Fitness = 3.900617142553841e+32\n",
"Generation 26: Best Fitness = 3.900617142553841e+32\n",
"Generation 27: Best Fitness = 3.900617142553841e+32\n",
"Generation 28: Best Fitness = 3.900617142553841e+32\n",
"Generation 29: Best Fitness = 3.900617142553841e+32\n",
"Generation 30: Best Fitness = 3.900617142553841e+32\n",
"Generation 31: Best Fitness = 3.900617142553841e+32\n",
"Generation 32: Best Fitness = 3.900617142553841e+32\n",
"Generation 33: Best Fitness = 3.900617142553841e+32\n",
"Generation 34: Best Fitness = 3.900617142553841e+32\n",
"Generation 35: Best Fitness = 3.900617142553841e+32\n",
"Generation 36: Best Fitness = 3.900617142553841e+32\n",
"Generation 37: Best Fitness = 3.900617142553841e+32\n",
"Generation 38: Best Fitness = 3.900617142553841e+32\n",
"Generation 39: Best Fitness = 3.900617142553841e+32\n",
"Generation 40: Best Fitness = 3.900617142553841e+32\n",
"Generation 41: Best Fitness = 3.900617142553841e+32\n",
"Generation 42: Best Fitness = 3.900617142553841e+32\n",
"Generation 43: Best Fitness = 3.900617142553841e+32\n",
"Generation 44: Best Fitness = 3.900617142553841e+32\n",
"Generation 45: Best Fitness = 3.900617142553841e+32\n",
"Generation 46: Best Fitness = 3.900617142553841e+32\n",
"Generation 47: Best Fitness = 3.900617142553841e+32\n",
"Generation 48: Best Fitness = 3.900617142553841e+32\n",
"Generation 49: Best Fitness = 3.900617142553841e+32\n",
"Best Trading Strategy: Buy Threshold = -0.7733929905450716, Sell Threshold = -0.9756864570452264\n"
]
}
]
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Sentiment Analysis Driven Trading Strategy Optimization

This project aims to optimize a trading strategy based on stock prices and sentiment data using a genetic algorithm. The goal is to find the best buy and sell thresholds that maximize the portfolio value over a given period.

## Dataset

The dataset used in this project is `sentiment_trading_data.csv`, which contains the following columns:
- `Date`: The date of the observation.
- `Stock_Price`: The stock price on the given date.
- `Sentiment`: The sentiment score for the given date, ranging from -1 to 1.

## Genetic Algorithm Implementation

### Step 1: Load the Data

First, load the `sentiment_trading_data.csv` dataset.

```python
import pandas as pd
import numpy as np

# Load data
data_sentiment = pd.read_csv('sentiment_trading_data.csv')
print(data_sentiment.head())

# Convert to NumPy array (excluding the 'Date' column)
data_sentiment = data_sentiment.drop(columns=['Date']).to_numpy()

# Step 2: Define the Fitness Function
The fitness function evaluates how well a given trading strategy (individual) performs based on stock prices and sentiment.

# step 3: Initialize the Population
Generate an initial population of random trading strategies.

# Step 4: Selection
Select the best-performing individuals to be parents for the next generation.

# Step 5: Crossover
Create offspring by combining parts of two parents.

# Step 6: Mutation
Randomly mutate some individuals to maintain genetic diversity.

#Step 7: Run the Genetic Algorithm
Execute the genetic algorithm with the defined functions.

#Results
The genetic algorithm was run for 50 generations with a population size of 100. The best trading strategy found is:

Buy Threshold: -0.7733929905450716
Sell Threshold: -0.9756864570452264

#Contributor
Ashish Kumar Patel
Loading

0 comments on commit 5f87b72

Please sign in to comment.