Weighted Unrealized PnL Trading Environment

Let,

l(t_i) be the amount of long currency,
s(t_i) be the amount of short currency and
p(t_i) be the price of the currency

at time instant t_i. Following assumptions are made,

agent starts with 0 initial amount
due to short duration of episodes (maximum time range allowed is 10 minutes)
- agent can borrow any amount of money at any timestep at 0% interest rate with a promise to settle at the end of the episode
- future rewards are not discounted

When trading at time instant t_i , the agent is reward for its portfolio status between t_i and t_{i + 1} , since it is kept same in this entire duration.

Reward Function

equation where r_i is the unrealized PnL reward at t_i time instant, ω is suitable decay rate and k is the number of lag terms in the exponential weighted average

lag reward terms with negative indices are assumed to be 0
balances out the two extremes between realized PnL and unrealized PnL reward functions
serves dual objectives in guiding policy learning,
1. it provides the agent intermediate rewards to facilitate fast learning of the trading strategy
2. weighted average over past rewards tends to reduce the noise in the rather frequent rewards

Note

Given that future rewards rewards are not discounted, normalization of weights gives the same property (in the limites of large horizon) as the unrealized PnL of having the same sum as that of realized PnL reward at the end of the episode. This guarantees convergence to optimal policy.

Usage

This environment is characterized by two extra parameters,

ω, the decay rate of the exponentially weighted rewards
- defaults to 1e-2
k, the number of lag terms to be considered in the weighted average
- defaults to horizon

These can be set using env.env.set_params(history_length, horizon, unit, **kwargs) by passing the keyworded arguements decay_rate and lag.

import gym
import gym_cryptotrading
env = gym.make('WeightedPnLEnv-v0')
env.env.set_params(history_length, horizon, unit, decay_rate=, lag=)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted Unrealized PnL Trading Environment

Weighted Unrealized PnL Trading Environment

Reward Function

Note

Usage

Clone this wiki locally