Top AI Research Papers

Last updated: 21 Feb 2026

Foundational papers

  1. Attention Is All You Need (2017)

    Authors: Vaswani et al.

    Takeaway: Introduces the transformer architecture, replacing recurrence and convolution with multi-head self-attention so models can capture long-range dependencies efficiently. Forms the basis of most modern large language models.

    Further reading: Illustrated Transformer, Distill attention explainer

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)

    Authors: Devlin et al.

    Takeaway: Shows how masked language modeling and next-sentence prediction on large corpora create powerful bidirectional text representations that can be fine-tuned for many downstream NLP tasks.

    Further reading: Google AI blog, Illustrated BERT

  3. Language Models are Few-Shot Learners (2020)

    Authors: Brown et al. (OpenAI)

    Takeaway: Demonstrates that large autoregressive transformers (GPT-3) can perform many tasks from natural language prompts alone, highlighting scaling laws and emergent capabilities in language models.

    Further reading: OpenAI overview, Visual guide to GPT-3

  4. Deep Residual Learning for Image Recognition (2015)

    Authors: He et al.

    Takeaway: Introduces residual (skip) connections, enabling very deep networks to train effectively by learning residual functions. ResNets become the standard backbone for many computer vision models.

    Further reading: ResNet walkthrough, Practical ResNet guide

  5. Denoising Diffusion Probabilistic Models (2020)

    Authors: Ho et al.

    Takeaway: Recasts diffusion processes as a generative modeling technique that iteratively denoises data from pure noise, achieving high-fidelity image generation and inspiring modern diffusion-based generators.

    Further reading: Diffusion models overview, Annotated diffusion implementation

  6. Playing Atari with Deep Reinforcement Learning (2013)

    Authors: Mnih et al. (DeepMind)

    Takeaway: Combines convolutional networks with Q-learning to learn control policies directly from pixels, achieving human-level performance on many Atari games and kickstarting deep reinforcement learning.

    Further reading: DeepMind blog, RL overview

  7. Auto-Encoding Variational Bayes (2013)

    Authors: Kingma & Welling

    Takeaway: Introduces variational autoencoders (VAEs), combining neural networks with variational inference to learn latent variable generative models, including the reparameterization trick and ELBO objective.

    Further reading: VAE follow-up tutorial, VAE explainer

  8. Adam: A Method for Stochastic Optimization (2014)

    Authors: Kingma & Ba

    Takeaway: Proposes the Adam optimizer, which adapts per-parameter learning rates from first and second moment estimates of gradients, becoming a default choice for many deep learning applications.

    Further reading: Optimizing gradient descent, Distill on optimization

  9. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)

    Authors: Ioffe & Szegedy

    Takeaway: Introduces batch normalization to stabilize and speed up training by normalizing intermediate activations, enabling higher learning rates and acting as a regularizer in deep networks.

    Further reading: Batch norm tutorial, Intuition and practice

  10. A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) (2020)

    Authors: Chen et al.

    Takeaway: Shows that strong data augmentation, large batch sizes, and a contrastive loss can yield self-supervised visual representations competitive with supervised pre-training.

    Further reading: Contrastive learning overview, Illustrated SimCLR

Applied to networking and security

  1. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection (2018)

    Authors: Mirsky et al.

    Takeaway: Proposes a lightweight online anomaly-based IDS using an ensemble of autoencoders to model normal traffic and detect a wide range of attacks without labeled data.

    Further reading: NDSS 2018 summary, Reference implementation

  2. Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning (2017)

    Authors: Lotfollahi et al.

    Takeaway: Shows that deep networks can classify encrypted traffic directly from raw packet bytes, highlighting both the power and privacy implications of deep learning-based traffic classification.

    Further reading: Article overview, Code and dataset

  3. RouteNet: Leveraging Graph Neural Networks for Network Modeling and Optimization (2019)

    Authors: Rusek et al.

    Takeaway: Uses graph neural networks to learn performance models of communication networks, predicting per-path delay and loss for different routing schemes and enabling ML-driven traffic engineering.

    Further reading: DeepMind blog, RouteNet repository

  4. AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Autonomous Traffic Engineering (2018)

    Authors: Mao et al.

    Takeaway: Applies deep reinforcement learning to datacenter traffic engineering, showing RL policies can outperform hand-tuned heuristics in large-scale, production-like environments.

    Further reading: Paper PDF, SIGCOMM commentary

  5. Machine Learning for Networking: Workflow, Advances and Opportunities (2018)

    Authors: Boutaba et al.

    Takeaway: Surveys how ML is applied across network management tasks (traffic prediction, routing, anomaly detection) and lays out a practical workflow from data collection to deployment.

    Further reading: ArXiv preprint, APNIC blog

  6. Deep Learning for Cyber Security: A Survey (2019)

    Authors: Yuan et al.

    Takeaway: Comprehensive survey of deep learning for malware detection, intrusion detection, spam filtering, and more, including discussion of adversarial examples and data challenges.

    Further reading: Cybersecurity & DL, Adversarial examples talk

  7. A Survey of Network Traffic Classification Using Machine Learning (2013)

    Authors: Zhang et al.

    Takeaway: Reviews ML techniques for classifying network traffic by application and behavior, comparing flow-based features, algorithms, and evaluation challenges.

    Further reading: Related ACM article, APNIC blog

  8. Learning Intrusion Detection: A Data Mining Approach (1998)

    Authors: Lee & Stolfo

    Takeaway: Early work applying data mining and machine learning to intrusion detection, establishing many foundational ideas in feature-based and anomaly-based IDS.

    Further reading: IDS survey, NIST intrusion detection guide

  9. LSTM-based Intrusion Detection System for In-Vehicle CAN Bus Communications (2016)

    Authors: Cho & Shin

    Takeaway: Uses recurrent neural networks to model normal sequences of CAN bus messages and detect deviations as potential intrusions, showcasing ML for automotive/embedded security.

    Further reading: Black Hat talk, Blog explanation

  10. Hunting for Malicious TLS Flows: Machine Learning for Encrypted Malware Traffic (2016)

    Authors: Anderson & McGrew (Cisco)

    Takeaway: Shows that statistical features of TLS connections combined with ML can detect malware communications even when payloads are encrypted, influencing modern encrypted traffic analytics.

    Further reading: Cisco blog, USENIX Security talk

Reading list

  1. Deep Learning (2015)

    Authors: LeCun, Bengio & Hinton

    Takeaway: High-level overview of deep learning principles, architectures, and historical context across vision, speech, and language.

    Further reading: Deep Learning book, Talk recording

  2. Hidden Technical Debt in Machine Learning Systems (2015)

    Authors: Sculley et al.

    Takeaway: Argues that most complexity and risk in ML systems live in data dependencies and glue code rather than models, introducing a vocabulary for ML technical debt.

    Further reading: Google research page, Follow-up discussion

  3. Concrete Problems in AI Safety (2016)

    Authors: Amodei et al.

    Takeaway: Frames AI safety as a set of practical engineering problems (reward hacking, side effects, distributional shift) and proposes concrete benchmarks.

    Further reading: OpenAI overview, Problem profile

  4. Deep Neural Networks for YouTube Recommendations (2016)

    Authors: Covington et al.

    Takeaway: Describes YouTube's large-scale two-stage recommendation architecture (candidate generation + ranking) and how deep learning shapes industrial recommender systems.

    Further reading: Google AI blog, Technical walkthrough

  5. Wide & Deep Learning for Recommender Systems (2016)

    Authors: Cheng et al. (Google)

    Takeaway: Proposes the wide-and-deep architecture combining memorization (feature crosses) with generalization (deep nets), influential for tabular and recommendation models.

    Further reading: Google AI blog, Implementation guide

  6. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning (2017)

    Authors: Rajpurkar et al.

    Takeaway: Applies deep CNNs to chest X-rays and reaches radiologist-level performance on pneumonia detection, showing both promise and caveats of clinical AI.

    Further reading: Stanford AI blog, Clinical follow-up

  7. U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)

    Authors: Ronneberger et al.

    Takeaway: Introduces U-Net, an encoder–decoder CNN with skip connections tailored for medical image segmentation that becomes a de facto standard for segmentation tasks.

    Further reading: Project page, Architecture overview

  8. A Survey on Deep Learning in Medical Image Analysis (2017)

    Authors: Litjens et al.

    Takeaway: Survey of deep learning across radiology, pathology, and other imaging modalities, mapping tasks, architectures, and open challenges in medical imaging AI.

    Further reading: Journal version, NVIDIA blog

  9. End to End Learning for Self-Driving Cars (2016)

    Authors: Bojarski et al. (NVIDIA)

    Takeaway: Trains a CNN to map front-facing camera images directly to steering commands, illustrating the appeal and brittleness of end-to-end control for autonomous driving.

    Further reading: NVIDIA dev blog, Demo video

  10. End-to-End Training of Deep Visuomotor Policies (2016)

    Authors: Levine et al.

    Takeaway: Uses guided policy search to train deep networks that map images directly to robot motor torques, bridging perception and control for robotic manipulation.

    Further reading: BAIR blog, Talk/demo

  11. A Comprehensive Survey on Graph Neural Networks (2020)

    Authors: Wu et al.

    Takeaway: Surveys GNN architectures, training methods, and applications across recommendation, chemistry, and traffic forecasting, providing a starting point for graph-based deep learning.

    Further reading: GNN introduction, PyG tutorials

  12. Scaling Laws for Neural Language Models (2020)

    Authors: Kaplan et al.

    Takeaway: Empirically shows that loss scales as a power-law with model size, dataset size, and compute, giving a quantitative framework for planning LLM training budgets.

    Further reading: OpenAI article, Blog explainer

  13. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)

    Authors: Wei et al.

    Takeaway: Shows that prompting LLMs to generate intermediate reasoning steps improves performance on arithmetic, commonsense, and symbolic tasks, underscoring the power of prompt design.

    Further reading: Google AI blog, Talk recording

  14. Training Language Models to Follow Instructions with Human Feedback (2022)

    Authors: Ouyang et al. (OpenAI)

    Takeaway: Introduces instruction tuning with RLHF to align language models with user intent, showing aligned models can be more helpful and safer without increasing size.

    Further reading: OpenAI article, Alignment discussion

  15. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (2017)

    Authors: Baylor et al.

    Takeaway: Describes Google's end-to-end platform for deploying, monitoring, and maintaining ML pipelines at scale, with patterns for data validation, serving, and continual training.

    Further reading: Google AI blog, TFX docs

  16. The ML Test Score: A Rubric for ML Production Readiness (2017)

    Authors: Breck et al.

    Takeaway: Proposes a checklist-style rubric for evaluating ML production readiness across data, model, infrastructure, and monitoring, useful for MLOps reviews.

    Further reading: Google research page, Rules of ML

  17. Neural Machine Translation by Jointly Learning to Align and Translate (2014)

    Authors: Bahdanau, Cho & Bengio

    Takeaway: Introduces attention mechanisms in sequence-to-sequence models for machine translation, a key conceptual bridge from RNNs to transformers.

    Further reading: Augmented RNNs, Visualizing seq2seq with attention

  18. Listen, Attend and Spell (2015)

    Authors: Chan et al.

    Takeaway: Applies attention-based encoder–decoder models to end-to-end speech recognition, replacing traditional ASR pipelines with sequence-to-sequence models.

    Further reading: DeepMind blog, CTC vs seq2seq

  19. Deep Reinforcement Learning: An Overview (2017)

    Authors: Li

    Takeaway: Tutorial-style overview of deep reinforcement learning, covering value-based, policy-based, and actor–critic methods with clear conceptual framing.

    Further reading: RL overview, Spinning Up in Deep RL

  20. Deep Learning in Finance: Deep Portfolios (2017)

    Authors: Heaton, Polson & Witte

    Takeaway: Explores using deep learning to model asset returns and construct portfolios, framing portfolio selection as a supervised learning problem and discussing opportunities and pitfalls.

    Further reading: Related work, QuantStart article