What I Read: Transformer Networks to Answer Questions About Images

https://medium.com/dataseries/microsoft-uses-transformer-networks-to-answer-questions-about-images-with-minimum-training-f978c018bb72

Microsoft Uses Transformer Networks to Answer Questions About Images With Minimum Training
Unified VLP can understand concepts about scenic images by using pretrained models.
Jesus Rodriguez
Jan 12

“Can we build deep learning models that unifies machine capabilities to perform well on both vision-language generation tasks and understanding tasks?”