Top AI Research Papers
Last updated: 21 Feb 2026
Foundational papers
-
Attention Is All You Need (2017)
Authors: Vaswani et al.
Takeaway: Introduces the transformer architecture, replacing recurrence and convolution with multi-head self-attention so models can capture long-range dependencies efficiently. Forms the basis of most modern large language models.
Further reading: Illustrated Transformer, Distill attention explainer
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
Authors: Devlin et al.
Takeaway: Shows how masked language modeling and next-sentence prediction on large corpora create powerful bidirectional text representations that can be fine-tuned for many downstream NLP tasks.
Further reading: Google AI blog, Illustrated BERT
-
Language Models are Few-Shot Learners (2020)
Authors: Brown et al. (OpenAI)
Takeaway: Demonstrates that large autoregressive transformers (GPT-3) can perform many tasks from natural language prompts alone, highlighting scaling laws and emergent capabilities in language models.
Further reading: OpenAI overview, Visual guide to GPT-3
-
Deep Residual Learning for Image Recognition (2015)
Authors: He et al.
Takeaway: Introduces residual (skip) connections, enabling very deep networks to train effectively by learning residual functions. ResNets become the standard backbone for many computer vision models.
Further reading: ResNet walkthrough, Practical ResNet guide
-
Denoising Diffusion Probabilistic Models (2020)
Authors: Ho et al.
Takeaway: Recasts diffusion processes as a generative modeling technique that iteratively denoises data from pure noise, achieving high-fidelity image generation and inspiring modern diffusion-based generators.
Further reading: Diffusion models overview, Annotated diffusion implementation
-
Playing Atari with Deep Reinforcement Learning (2013)
Authors: Mnih et al. (DeepMind)
Takeaway: Combines convolutional networks with Q-learning to learn control policies directly from pixels, achieving human-level performance on many Atari games and kickstarting deep reinforcement learning.
Further reading: DeepMind blog, RL overview
-
Auto-Encoding Variational Bayes (2013)
Authors: Kingma & Welling
Takeaway: Introduces variational autoencoders (VAEs), combining neural networks with variational inference to learn latent variable generative models, including the reparameterization trick and ELBO objective.
Further reading: VAE follow-up tutorial, VAE explainer
-
Adam: A Method for Stochastic Optimization (2014)
Authors: Kingma & Ba
Takeaway: Proposes the Adam optimizer, which adapts per-parameter learning rates from first and second moment estimates of gradients, becoming a default choice for many deep learning applications.
Further reading: Optimizing gradient descent, Distill on optimization
-
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)
Authors: Ioffe & Szegedy
Takeaway: Introduces batch normalization to stabilize and speed up training by normalizing intermediate activations, enabling higher learning rates and acting as a regularizer in deep networks.
Further reading: Batch norm tutorial, Intuition and practice
-
A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) (2020)
Authors: Chen et al.
Takeaway: Shows that strong data augmentation, large batch sizes, and a contrastive loss can yield self-supervised visual representations competitive with supervised pre-training.
Further reading: Contrastive learning overview, Illustrated SimCLR
Applied to networking and security
-
Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection (2018)
Authors: Mirsky et al.
Takeaway: Proposes a lightweight online anomaly-based IDS using an ensemble of autoencoders to model normal traffic and detect a wide range of attacks without labeled data.
Further reading: NDSS 2018 summary, Reference implementation
-
Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning (2017)
Authors: Lotfollahi et al.
Takeaway: Shows that deep networks can classify encrypted traffic directly from raw packet bytes, highlighting both the power and privacy implications of deep learning-based traffic classification.
Further reading: Article overview, Code and dataset
-
RouteNet: Leveraging Graph Neural Networks for Network Modeling and Optimization (2019)
Authors: Rusek et al.
Takeaway: Uses graph neural networks to learn performance models of communication networks, predicting per-path delay and loss for different routing schemes and enabling ML-driven traffic engineering.
Further reading: DeepMind blog, RouteNet repository
-
AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Autonomous Traffic Engineering (2018)
Authors: Mao et al.
Takeaway: Applies deep reinforcement learning to datacenter traffic engineering, showing RL policies can outperform hand-tuned heuristics in large-scale, production-like environments.
Further reading: Paper PDF, SIGCOMM commentary
-
Machine Learning for Networking: Workflow, Advances and Opportunities (2018)
Authors: Boutaba et al.
Takeaway: Surveys how ML is applied across network management tasks (traffic prediction, routing, anomaly detection) and lays out a practical workflow from data collection to deployment.
Further reading: ArXiv preprint, APNIC blog
-
Deep Learning for Cyber Security: A Survey (2019)
Authors: Yuan et al.
Takeaway: Comprehensive survey of deep learning for malware detection, intrusion detection, spam filtering, and more, including discussion of adversarial examples and data challenges.
Further reading: Cybersecurity & DL, Adversarial examples talk
-
A Survey of Network Traffic Classification Using Machine Learning (2013)
Authors: Zhang et al.
Takeaway: Reviews ML techniques for classifying network traffic by application and behavior, comparing flow-based features, algorithms, and evaluation challenges.
Further reading: Related ACM article, APNIC blog
-
Learning Intrusion Detection: A Data Mining Approach (1998)
Authors: Lee & Stolfo
Takeaway: Early work applying data mining and machine learning to intrusion detection, establishing many foundational ideas in feature-based and anomaly-based IDS.
Further reading: IDS survey, NIST intrusion detection guide
-
LSTM-based Intrusion Detection System for In-Vehicle CAN Bus Communications (2016)
Authors: Cho & Shin
Takeaway: Uses recurrent neural networks to model normal sequences of CAN bus messages and detect deviations as potential intrusions, showcasing ML for automotive/embedded security.
Further reading: Black Hat talk, Blog explanation
-
Hunting for Malicious TLS Flows: Machine Learning for Encrypted Malware Traffic (2016)
Authors: Anderson & McGrew (Cisco)
Takeaway: Shows that statistical features of TLS connections combined with ML can detect malware communications even when payloads are encrypted, influencing modern encrypted traffic analytics.
Further reading: Cisco blog, USENIX Security talk
Reading list
-
Deep Learning (2015)
Authors: LeCun, Bengio & Hinton
Takeaway: High-level overview of deep learning principles, architectures, and historical context across vision, speech, and language.
Further reading: Deep Learning book, Talk recording
-
Hidden Technical Debt in Machine Learning Systems (2015)
Authors: Sculley et al.
Takeaway: Argues that most complexity and risk in ML systems live in data dependencies and glue code rather than models, introducing a vocabulary for ML technical debt.
Further reading: Google research page, Follow-up discussion
-
Concrete Problems in AI Safety (2016)
Authors: Amodei et al.
Takeaway: Frames AI safety as a set of practical engineering problems (reward hacking, side effects, distributional shift) and proposes concrete benchmarks.
Further reading: OpenAI overview, Problem profile
-
Deep Neural Networks for YouTube Recommendations (2016)
Authors: Covington et al.
Takeaway: Describes YouTube's large-scale two-stage recommendation architecture (candidate generation + ranking) and how deep learning shapes industrial recommender systems.
Further reading: Google AI blog, Technical walkthrough
-
Wide & Deep Learning for Recommender Systems (2016)
Authors: Cheng et al. (Google)
Takeaway: Proposes the wide-and-deep architecture combining memorization (feature crosses) with generalization (deep nets), influential for tabular and recommendation models.
Further reading: Google AI blog, Implementation guide
-
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning (2017)
Authors: Rajpurkar et al.
Takeaway: Applies deep CNNs to chest X-rays and reaches radiologist-level performance on pneumonia detection, showing both promise and caveats of clinical AI.
Further reading: Stanford AI blog, Clinical follow-up
-
U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)
Authors: Ronneberger et al.
Takeaway: Introduces U-Net, an encoder–decoder CNN with skip connections tailored for medical image segmentation that becomes a de facto standard for segmentation tasks.
Further reading: Project page, Architecture overview
-
A Survey on Deep Learning in Medical Image Analysis (2017)
Authors: Litjens et al.
Takeaway: Survey of deep learning across radiology, pathology, and other imaging modalities, mapping tasks, architectures, and open challenges in medical imaging AI.
Further reading: Journal version, NVIDIA blog
-
End to End Learning for Self-Driving Cars (2016)
Authors: Bojarski et al. (NVIDIA)
Takeaway: Trains a CNN to map front-facing camera images directly to steering commands, illustrating the appeal and brittleness of end-to-end control for autonomous driving.
Further reading: NVIDIA dev blog, Demo video
-
End-to-End Training of Deep Visuomotor Policies (2016)
Authors: Levine et al.
Takeaway: Uses guided policy search to train deep networks that map images directly to robot motor torques, bridging perception and control for robotic manipulation.
-
A Comprehensive Survey on Graph Neural Networks (2020)
Authors: Wu et al.
Takeaway: Surveys GNN architectures, training methods, and applications across recommendation, chemistry, and traffic forecasting, providing a starting point for graph-based deep learning.
Further reading: GNN introduction, PyG tutorials
-
Scaling Laws for Neural Language Models (2020)
Authors: Kaplan et al.
Takeaway: Empirically shows that loss scales as a power-law with model size, dataset size, and compute, giving a quantitative framework for planning LLM training budgets.
Further reading: OpenAI article, Blog explainer
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)
Authors: Wei et al.
Takeaway: Shows that prompting LLMs to generate intermediate reasoning steps improves performance on arithmetic, commonsense, and symbolic tasks, underscoring the power of prompt design.
Further reading: Google AI blog, Talk recording
-
Training Language Models to Follow Instructions with Human Feedback (2022)
Authors: Ouyang et al. (OpenAI)
Takeaway: Introduces instruction tuning with RLHF to align language models with user intent, showing aligned models can be more helpful and safer without increasing size.
Further reading: OpenAI article, Alignment discussion
-
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (2017)
Authors: Baylor et al.
Takeaway: Describes Google's end-to-end platform for deploying, monitoring, and maintaining ML pipelines at scale, with patterns for data validation, serving, and continual training.
Further reading: Google AI blog, TFX docs
-
The ML Test Score: A Rubric for ML Production Readiness (2017)
Authors: Breck et al.
Takeaway: Proposes a checklist-style rubric for evaluating ML production readiness across data, model, infrastructure, and monitoring, useful for MLOps reviews.
Further reading: Google research page, Rules of ML
-
Neural Machine Translation by Jointly Learning to Align and Translate (2014)
Authors: Bahdanau, Cho & Bengio
Takeaway: Introduces attention mechanisms in sequence-to-sequence models for machine translation, a key conceptual bridge from RNNs to transformers.
Further reading: Augmented RNNs, Visualizing seq2seq with attention
-
Listen, Attend and Spell (2015)
Authors: Chan et al.
Takeaway: Applies attention-based encoder–decoder models to end-to-end speech recognition, replacing traditional ASR pipelines with sequence-to-sequence models.
Further reading: DeepMind blog, CTC vs seq2seq
-
Deep Reinforcement Learning: An Overview (2017)
Authors: Li
Takeaway: Tutorial-style overview of deep reinforcement learning, covering value-based, policy-based, and actor–critic methods with clear conceptual framing.
Further reading: RL overview, Spinning Up in Deep RL
-
Deep Learning in Finance: Deep Portfolios (2017)
Authors: Heaton, Polson & Witte
Takeaway: Explores using deep learning to model asset returns and construct portfolios, framing portfolio selection as a supervised learning problem and discussing opportunities and pitfalls.
Further reading: Related work, QuantStart article