Benchmark Information
Tests a model's ability to identify the most relevant subject or topic category based on a set of keywords. This benchmark evaluates semantic understanding, domain knowledge, and the ability to recognize conceptual relationships between terms and their broader subject areas.
Category:
Classification
Visibility:
PUBLIC
Created:
Input Tokens:
2258
Est. Output Tokens:
920
System Prompt
You are an expert at identifying the subject or topic that best matches a given set of keywords. For each question, you will be presented with a list of keywords and multiple topic options. You must select the single most relevant topic by responding with ONLY the letter (A, B, C, or D) of the correct option. Do not include any explanation, just the letter.
Validation Rules
Correct Topic Letter
Exact Match Flex
JSON Path
$.answer
Benchmark Steps
# | User Prompt | Correct Topic Letter |
---|---|---|
1 | Keywords: photosynthesis, chlorophyll, stomata, carbon dioxide, sunlight Which topic is most relevant? A) Astronomy B) Plant Biology C) Chemistry D) Geography |
B |
2 | Keywords: blockchain, cryptocurrency, decentralized, mining, wallet, ledger Which topic is most relevant? A) Geology B) Banking C) Digital Currency Technology D) Data Storage |
C |
3 | Keywords: electoral college, swing states, primary, caucus, incumbent Which topic is most relevant? A) American Politics B) Education System C) Sports Competition D) Geography |
A |
4 | Keywords: aperture, ISO, shutter speed, bokeh, focal length, RAW Which topic is most relevant? A) Optics B) Photography C) Film Production D) Computer Graphics |
B |
5 | Keywords: supply chain, inventory, logistics, warehousing, distribution, procurement Which topic is most relevant? A) Transportation B) Real Estate C) Operations Management D) Retail Sales |
C |
6 | Keywords: genome, CRISPR, nucleotide, mutation, sequencing, allele Which topic is most relevant? A) Computer Programming B) Genetics C) Medicine D) Evolution |
B |
7 | Keywords: impressionism, palette, canvas, brushstroke, exhibition, curator Which topic is most relevant? A) Interior Design B) Art History C) Museum Management D) Fine Arts |
D |
8 | Keywords: algorithm, runtime complexity, recursion, data structure, optimization Which topic is most relevant? A) Mathematics B) Computer Science C) Engineering D) Statistics |
B |
9 | Keywords: inflation, monetary policy, interest rates, GDP, fiscal deficit Which topic is most relevant? A) Government Administration B) Banking C) Macroeconomics D) International Trade |
C |
10 | Keywords: fermentation, yeast, kneading, proofing, gluten, sourdough starter Which topic is most relevant? A) Chemistry B) Bread Making C) Brewing D) Food Science |
B |
No triggers defined for this benchmark. Create a trigger to automatically run this benchmark when a new model is added that matches your criteria.
No alerts defined for this benchmark. Create an alert to get notified when an execution result matching your criteria performs well on this benchmark.
Model | Reasoning Effort | Executions | Accuracy | Cost | Duration |
---|