Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow