Skip to content

LLM Learning Journey

LLM Evaluation

Initializing search

llm-learning-journey

Home
Topics
Talks & Seminars
Online Courses
Tags
About

LLM Learning Journey

llm-learning-journey

Home
Topics
Topics
- AI Agents
  AI Agents
  - Overview & Frameworks
  - Paper Reviews
- RAG (Retrieval-Augmented Generation)
  RAG (Retrieval-Augmented Generation)
  - RAG Guide & Best Practices
  - Paper Reviews
- Reinforcement Learning & Fine-Tuning
  Reinforcement Learning & Fine-Tuning
  - RLHF Overview
  - Paper Reviews
- Prompt Engineering
  Prompt Engineering
  - Prompt Engineering Guide
- Evaluation
  Evaluation
  - Evaluation Methods & Metrics
- Model Compression
  Model Compression
  - Compression Techniques
- Recommendation Systems
  Recommendation Systems
  - RecSys Notes
- Others
  Others
  - Miscellaneous Notes
Talks & Seminars
Talks & Seminars
- NeurIPS 2025
  NeurIPS 2025
  - Invited Talks
    Invited Talks
    
    Cognitive Capability Evaluation
    
    The Art of Reasoning
    
    Problem Finding in AI
    
    Wrong Nightmares About AI
  - Tutorials
    Tutorials
    
    Explainable AI (xAI)
  - Panels & Workshops
    Panels & Workshops
    
    Responsible AI & Unlearning
    
    Agentic Development
    
    Deep Learning for Coding
  - Oral Presentations
    Oral Presentations
    
    Multimodal Oral Session
Online Courses
Online Courses
Tags
About

Home
Topics
Evaluation

LLM Evaluation¶

Comprehensive guide to evaluating Large Language Models using various metrics, benchmarks, and methodologies.

Contents¶

Evaluation Methods & Metrics - Metrics (BLEU, ROUGE, perplexity), benchmarks (GLUE, MMLU, BigBench), and evaluation strategies

Learn how to properly evaluate LLM performance across different tasks and capabilities.

2025-10-26 2025-10-26

Prompt Engineering Guide

Evaluation Methods & Metrics

Copyright © 2025 Jackie Yin | Shared for educational purposes