The main problem is that, as in many other fields, DNN can be hard to train. Here, one problem is the correlation of input data: if you think about a video game (they actually use those to test their algorithms), you can imagine that screenshots taken one step after another are highly correlated: the game evolves "continuously". That, for NNs, can be a problem: doing many iterations of gradient descent on similar and correlated inputs may lead to overfit them and/or fall into a local minimum. This why they use experience replay: they store a series of "snapshots" of the game, then shuffle them, and pick them some steps later to do training. In this way, the data is not correlated anymore. Then, they notice how during the training the Q values (predicted by the NN) can change the on going policy, so making the agent prefer only a set of actions and causing it to store data that is correlated for the same reasons as before: this is why they delay training and update Q periodically, to ensure that the agent can explore the game, and train on shuffled and uncorrelated data.