The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over tokens and act as diffusion operators to facilitate information propagation through the graph for question-answering that requires some reasoning ove…
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations. (More unsupervised learning related topics, you can refer to: Learnin…
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof. Gupta's novel work published on ICCV2015. It's really exciting to know how unsupervised learning method can contribute to learn visual representatio…