Unrealized PnL Trading Environment

Let,

l(t_i) be the amount of long currency,
s(t_i) be the amount of short currency and
p(t_i) be the price of the currency

at time instant t_i. Following assumptions are made,

agent starts with 0 initial amount
due to short duration of episodes (maximum time range allowed is 10 minutes)
- agent can borrow any amount of money at any timestep at 0% interest rate with a promise to settle at the end of the episode
- future rewards are not discounted

When trading at time instant t_i , the agent is reward for its portfolio status between t_i and t_{i + 1} , since it is kept same in this entire duration.

Reward Function

At any timestamp, the reward given to the agent is the actual value its portfolio. It is defined by, equation

non zero intermediate rewards allow the agent to converge to a trading strategy in lesser number of iterations than realized PnL reward function
however, frequent intermediate rewards are often noisy and tend to destabilize the learning process

Note

Given that future rewards are not discounted, it possesses the property that sum of all the intermediate rewards is same as the single realized PnL reward at the end of episode. This guarantees convergence to optimal policy.

Usage

import gym
import gym_cryptotrading
env = gym.make('UnRealizedPnLEnv-v0')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrealized PnL Trading Environment

Unrealized PnL Trading Environment

Reward Function

Note

Usage

Clone this wiki locally