Top AI Research Papers

Last updated: 21 Feb 2026

Foundational papers

Attention Is All You Need (2017)

Authors: Vaswani et al.

Takeaway: Introduces the transformer architecture, replacing recurrence and convolution with multi-head self-attention so models can capture long-range dependencies efficiently. Forms the basis of most modern large language models.

Further reading: Illustrated Transformer, Distill attention explainer
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)

Authors: Devlin et al.

Takeaway: Shows how masked language modeling and next-sentence prediction on large corpora create powerful bidirectional text representations that can be fine-tuned for many downstream NLP tasks.

Further reading: Google AI blog, Illustrated BERT
Language Models are Few-Shot Learners (2020)

Authors: Brown et al. (OpenAI)

Takeaway: Demonstrates that large autoregressive transformers (GPT-3) can perform many tasks from natural language prompts alone, highlighting scaling laws and emergent capabilities in language models.

Further reading: OpenAI overview, Visual guide to GPT-3
Deep Residual Learning for Image Recognition (2015)

Authors: He et al.

Takeaway: Introduces residual (skip) connections, enabling very deep networks to train effectively by learning residual functions. ResNets become the standard backbone for many computer vision models.

Further reading: ResNet walkthrough, Practical ResNet guide
Denoising Diffusion Probabilistic Models (2020)

Authors: Ho et al.

Takeaway: Recasts diffusion processes as a generative modeling technique that iteratively denoises data from pure noise, achieving high-fidelity image generation and inspiring modern diffusion-based generators.

Further reading: Diffusion models overview, Annotated diffusion implementation
Playing Atari with Deep Reinforcement Learning (2013)

Authors: Mnih et al. (DeepMind)

Takeaway: Combines convolutional networks with Q-learning to learn control policies directly from pixels, achieving human-level performance on many Atari games and kickstarting deep reinforcement learning.

Further reading: DeepMind blog, RL overview
Auto-Encoding Variational Bayes (2013)

Authors: Kingma & Welling

Takeaway: Introduces variational autoencoders (VAEs), combining neural networks with variational inference to learn latent variable generative models, including the reparameterization trick and ELBO objective.

Further reading: VAE follow-up tutorial, VAE explainer
Adam: A Method for Stochastic Optimization (2014)

Authors: Kingma & Ba

Takeaway: Proposes the Adam optimizer, which adapts per-parameter learning rates from first and second moment estimates of gradients, becoming a default choice for many deep learning applications.

Further reading: Optimizing gradient descent, Distill on optimization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)

Authors: Ioffe & Szegedy

Takeaway: Introduces batch normalization to stabilize and speed up training by normalizing intermediate activations, enabling higher learning rates and acting as a regularizer in deep networks.

Further reading: Batch norm tutorial, Intuition and practice
A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) (2020)

Authors: Chen et al.

Takeaway: Shows that strong data augmentation, large batch sizes, and a contrastive loss can yield self-supervised visual representations competitive with supervised pre-training.

Further reading: Contrastive learning overview, Illustrated SimCLR

Applied to networking and security

Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection (2018)

Authors: Mirsky et al.

Takeaway: Proposes a lightweight online anomaly-based IDS using an ensemble of autoencoders to model normal traffic and detect a wide range of attacks without labeled data.

Further reading: NDSS 2018 summary, Reference implementation
Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning (2017)

Authors: Lotfollahi et al.

Takeaway: Shows that deep networks can classify encrypted traffic directly from raw packet bytes, highlighting both the power and privacy implications of deep learning-based traffic classification.

Further reading: Article overview, Code and dataset
RouteNet: Leveraging Graph Neural Networks for Network Modeling and Optimization (2019)

Authors: Rusek et al.

Takeaway: Uses graph neural networks to learn performance models of communication networks, predicting per-path delay and loss for different routing schemes and enabling ML-driven traffic engineering.

Further reading: DeepMind blog, RouteNet repository
AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Autonomous Traffic Engineering (2018)

Authors: Mao et al.

Takeaway: Applies deep reinforcement learning to datacenter traffic engineering, showing RL policies can outperform hand-tuned heuristics in large-scale, production-like environments.

Further reading: Paper PDF, SIGCOMM commentary
Machine Learning for Networking: Workflow, Advances and Opportunities (2018)

Authors: Boutaba et al.

Takeaway: Surveys how ML is applied across network management tasks (traffic prediction, routing, anomaly detection) and lays out a practical workflow from data collection to deployment.

Further reading: ArXiv preprint, APNIC blog
Deep Learning for Cyber Security: A Survey (2019)

Authors: Yuan et al.

Takeaway: Comprehensive survey of deep learning for malware detection, intrusion detection, spam filtering, and more, including discussion of adversarial examples and data challenges.

Further reading: Cybersecurity & DL, Adversarial examples talk
A Survey of Network Traffic Classification Using Machine Learning (2013)

Authors: Zhang et al.

Takeaway: Reviews ML techniques for classifying network traffic by application and behavior, comparing flow-based features, algorithms, and evaluation challenges.

Further reading: Related ACM article, APNIC blog
Learning Intrusion Detection: A Data Mining Approach (1998)

Authors: Lee & Stolfo

Takeaway: Early work applying data mining and machine learning to intrusion detection, establishing many foundational ideas in feature-based and anomaly-based IDS.

Further reading: IDS survey, NIST intrusion detection guide
LSTM-based Intrusion Detection System for In-Vehicle CAN Bus Communications (2016)

Authors: Cho & Shin

Takeaway: Uses recurrent neural networks to model normal sequences of CAN bus messages and detect deviations as potential intrusions, showcasing ML for automotive/embedded security.

Further reading: Black Hat talk, Blog explanation
Hunting for Malicious TLS Flows: Machine Learning for Encrypted Malware Traffic (2016)

Authors: Anderson & McGrew (Cisco)

Takeaway: Shows that statistical features of TLS connections combined with ML can detect malware communications even when payloads are encrypted, influencing modern encrypted traffic analytics.

Further reading: Cisco blog, USENIX Security talk

Reading list

Deep Learning (2015)

Authors: LeCun, Bengio & Hinton

Takeaway: High-level overview of deep learning principles, architectures, and historical context across vision, speech, and language.

Further reading: Deep Learning book, Talk recording
Hidden Technical Debt in Machine Learning Systems (2015)

Authors: Sculley et al.

Takeaway: Argues that most complexity and risk in ML systems live in data dependencies and glue code rather than models, introducing a vocabulary for ML technical debt.

Further reading: Google research page, Follow-up discussion
Concrete Problems in AI Safety (2016)

Authors: Amodei et al.

Takeaway: Frames AI safety as a set of practical engineering problems (reward hacking, side effects, distributional shift) and proposes concrete benchmarks.

Further reading: OpenAI overview, Problem profile
Deep Neural Networks for YouTube Recommendations (2016)

Authors: Covington et al.

Takeaway: Describes YouTube's large-scale two-stage recommendation architecture (candidate generation + ranking) and how deep learning shapes industrial recommender systems.

Further reading: Google AI blog, Technical walkthrough
Wide & Deep Learning for Recommender Systems (2016)

Authors: Cheng et al. (Google)

Takeaway: Proposes the wide-and-deep architecture combining memorization (feature crosses) with generalization (deep nets), influential for tabular and recommendation models.

Further reading: Google AI blog, Implementation guide
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning (2017)

Authors: Rajpurkar et al.

Takeaway: Applies deep CNNs to chest X-rays and reaches radiologist-level performance on pneumonia detection, showing both promise and caveats of clinical AI.

Further reading: Stanford AI blog, Clinical follow-up
U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)

Authors: Ronneberger et al.

Takeaway: Introduces U-Net, an encoder–decoder CNN with skip connections tailored for medical image segmentation that becomes a de facto standard for segmentation tasks.

Further reading: Project page, Architecture overview
A Survey on Deep Learning in Medical Image Analysis (2017)

Authors: Litjens et al.

Takeaway: Survey of deep learning across radiology, pathology, and other imaging modalities, mapping tasks, architectures, and open challenges in medical imaging AI.

Further reading: Journal version, NVIDIA blog
End to End Learning for Self-Driving Cars (2016)

Authors: Bojarski et al. (NVIDIA)

Takeaway: Trains a CNN to map front-facing camera images directly to steering commands, illustrating the appeal and brittleness of end-to-end control for autonomous driving.

Further reading: NVIDIA dev blog, Demo video
End-to-End Training of Deep Visuomotor Policies (2016)

Authors: Levine et al.

Takeaway: Uses guided policy search to train deep networks that map images directly to robot motor torques, bridging perception and control for robotic manipulation.

Further reading: BAIR blog, Talk/demo
A Comprehensive Survey on Graph Neural Networks (2020)

Authors: Wu et al.

Takeaway: Surveys GNN architectures, training methods, and applications across recommendation, chemistry, and traffic forecasting, providing a starting point for graph-based deep learning.

Further reading: GNN introduction, PyG tutorials
Scaling Laws for Neural Language Models (2020)

Authors: Kaplan et al.

Takeaway: Empirically shows that loss scales as a power-law with model size, dataset size, and compute, giving a quantitative framework for planning LLM training budgets.

Further reading: OpenAI article, Blog explainer
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)

Authors: Wei et al.

Takeaway: Shows that prompting LLMs to generate intermediate reasoning steps improves performance on arithmetic, commonsense, and symbolic tasks, underscoring the power of prompt design.

Further reading: Google AI blog, Talk recording
Training Language Models to Follow Instructions with Human Feedback (2022)

Authors: Ouyang et al. (OpenAI)

Takeaway: Introduces instruction tuning with RLHF to align language models with user intent, showing aligned models can be more helpful and safer without increasing size.

Further reading: OpenAI article, Alignment discussion
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (2017)

Authors: Baylor et al.

Takeaway: Describes Google's end-to-end platform for deploying, monitoring, and maintaining ML pipelines at scale, with patterns for data validation, serving, and continual training.

Further reading: Google AI blog, TFX docs
The ML Test Score: A Rubric for ML Production Readiness (2017)

Authors: Breck et al.

Takeaway: Proposes a checklist-style rubric for evaluating ML production readiness across data, model, infrastructure, and monitoring, useful for MLOps reviews.

Further reading: Google research page, Rules of ML
Neural Machine Translation by Jointly Learning to Align and Translate (2014)

Authors: Bahdanau, Cho & Bengio

Takeaway: Introduces attention mechanisms in sequence-to-sequence models for machine translation, a key conceptual bridge from RNNs to transformers.

Further reading: Augmented RNNs, Visualizing seq2seq with attention
Listen, Attend and Spell (2015)

Authors: Chan et al.

Takeaway: Applies attention-based encoder–decoder models to end-to-end speech recognition, replacing traditional ASR pipelines with sequence-to-sequence models.

Further reading: DeepMind blog, CTC vs seq2seq
Deep Reinforcement Learning: An Overview (2017)

Authors: Li

Takeaway: Tutorial-style overview of deep reinforcement learning, covering value-based, policy-based, and actor–critic methods with clear conceptual framing.

Further reading: RL overview, Spinning Up in Deep RL
Deep Learning in Finance: Deep Portfolios (2017)

Authors: Heaton, Polson & Witte

Takeaway: Explores using deep learning to model asset returns and construct portfolios, framing portfolio selection as a supervised learning problem and discussing opportunities and pitfalls.

Further reading: Related work, QuantStart article

Top AI Research Papers

Foundational papers

Attention Is All You Need (2017)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)

Language Models are Few-Shot Learners (2020)

Deep Residual Learning for Image Recognition (2015)

Denoising Diffusion Probabilistic Models (2020)

Playing Atari with Deep Reinforcement Learning (2013)

Auto-Encoding Variational Bayes (2013)

Adam: A Method for Stochastic Optimization (2014)

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)

A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) (2020)

Applied to networking and security

Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection (2018)

Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning (2017)

RouteNet: Leveraging Graph Neural Networks for Network Modeling and Optimization (2019)

AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Autonomous Traffic Engineering (2018)

Machine Learning for Networking: Workflow, Advances and Opportunities (2018)

Deep Learning for Cyber Security: A Survey (2019)

A Survey of Network Traffic Classification Using Machine Learning (2013)

Learning Intrusion Detection: A Data Mining Approach (1998)

LSTM-based Intrusion Detection System for In-Vehicle CAN Bus Communications (2016)

Hunting for Malicious TLS Flows: Machine Learning for Encrypted Malware Traffic (2016)

Reading list

Deep Learning (2015)

Hidden Technical Debt in Machine Learning Systems (2015)

Concrete Problems in AI Safety (2016)

Deep Neural Networks for YouTube Recommendations (2016)

Wide & Deep Learning for Recommender Systems (2016)

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning (2017)

U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)

A Survey on Deep Learning in Medical Image Analysis (2017)

End to End Learning for Self-Driving Cars (2016)

End-to-End Training of Deep Visuomotor Policies (2016)

A Comprehensive Survey on Graph Neural Networks (2020)

Scaling Laws for Neural Language Models (2020)

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)

Training Language Models to Follow Instructions with Human Feedback (2022)

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (2017)

The ML Test Score: A Rubric for ML Production Readiness (2017)

Neural Machine Translation by Jointly Learning to Align and Translate (2014)

Listen, Attend and Spell (2015)

Deep Reinforcement Learning: An Overview (2017)

Deep Learning in Finance: Deep Portfolios (2017)