[Articles] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Introducint BERT, bidirectional pre=training with Transformer

[Articles] Attention is All You Need

The foundational paper for modern NLP models, replacing RNNs with a self-attention mechanism.