Abstract
We propose an associative learning model using reward mod- ulated spike-time dependent plasticity in reinforcement learning paradigm. The task of learning is to associate a stimulus pair, known as the predictor− choice pair, to a target response. In our model, a generic architecture of neural network has been used, with minimal assumption about the network dynamics. We demonstrate that stimulus-stimulus-response as- sociation can be implemented in a stochastic way within a noisy setting. The network has rich dynamics resulting from its recurrent connectiv- ity and background activity. The algorithm can learn temporal sequence detection and solve temporal XOR problem.