RND (Random Network Distillation) with Proximal Policy Optimization (PPO) Tensorflow
This post documents my implementation of the Random Network Distillation (RND) with Proximal Policy Optimization (PPO) algorithm. (continuous version)
This site contains information/documentation for my codes in my Github
repositories & my notes from self-learning various topics.
Mainly contains stuff related to
machine learning,
market microstructure,
CI/CD web development
& stuff with not so trivial details that I will most probably forget
over time.
Checkout my Github.
This post documents my implementation of the Random Network Distillation (RND) with Proximal Policy Optimization (PPO) algorithm. (continuous version)
This post documents my implementation of the Distributed Proximal Policy Optimization (Distributed PPO or DPPO) algorithm. (Distributed continuous version)
This post documents my implementation of the A3C (Asynchronous Advantage Actor Critic) algorithm (Distributed discrete version).
This post documents my implementation of the A3C (Asynchronous Advantage Actor Critic) algorithm. (multi-threaded continuous version)
This post documents my implementation of the A3C (Asynchronous Advantage Actor Critic) algorithm (discrete). (multi-threaded discrete version)
This post demonstrates how to accumulate gradients with Tensorflow.
This post demonstrates a simple usage example of distributed Tensorflow with Python multiprocessing package.
This post documents my implementation of the N-step Q-values estimation algorithm.
This post demonstrates how to use the Python’s multiprocessing package to achieve parallel data generation.
This post provides a simple usage examples for common Numpy array manipulation.