1.Delayed, sparse reward(feedback), Long-term planning Hierarchical Deep Reinforcement Learning, Sub-goal, SAMDP, optoins, Thompson sampling, Boltzman exploration, Improving Exploration 2.Partial observability, Imperfect-Information Memory, Nash equi…