using PARL reinforement learning framework with torch to implement SeqGAN(Chinese Poem generation)
original paper: SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
This is a project that using SeqGAN to generate Chinese Poem, where baidu's reinforcement learning framework PARL(with pytorch) are used.
github link of baidu's reinforcement learning framework PARL:
This is how PARL abstracts RL as model-algorithm-agent:
This is how I put SeqGAN into PARL framework:
generator is actor/agent, generator.step gives "actions"(how to choose word), generator.sample (MTCS search in SeqGAN) gives the "states"(whole sequence samples) each episode(here one episode ends means the whole sequence are generated)
discriminator and rollout are critic/environments, which obtain samples/embedding, output rewards
rewards(loss) are used to train critic/env(discriminator) and actor/agent(generator)
All PARL-related codes are used in train_generator_PG in main function
- train using Poems as corpus *-Done
- using PARL framework *-Done
- using build-in functions in PARL to substitude some function *-ing
- increasing training stability *-ing (gen loss in experiment-log are not used, ignore it)
most of code borrow from and, but merge them into PARL framework for better understanding of the RL process in SeqGAN.