Benchmark Details

Reasoning (Baseline)
Benchmark Information

Tests complex reasoning abilities across logic puzzles, pattern recognition, deductive inference, mathematical reasoning, and abstract problem-solving. Each problem requires multiple steps of analysis and synthesis to reach the correct solution.

Category: Reasoning
Visibility: PUBLIC
Max Completion Tokens: 15000
Max Retries: 1
Created: Loading...
Updated: Loading...
Input Tokens:

9626

Est. Output Tokens:

4617

System Prompt
You are a logical reasoning expert. For each problem, carefully analyze all given information, work through the logic step by step, and provide only the final answer in the exact format requested. Do not show your working or provide explanations unless specifically asked.
Validation Rules
Precise Logical Solution Semantic Equivalence (AI Check) JSON Path $.answer
Benchmark Steps
# User Prompt Precise Logical Solution
1
Five houses in a row are each painted different colors. The green house is immediately to the right of the ivory house. The red house is in the middle. The first house is yellow. What color is the fifth house? Answer with just the color name.
green
2
If all Bloops are Razzies and all Razzies are Lazzies, and some Lazzies are Dazzies, can we conclude that some Bloops are Dazzies? Answer only 'yes' or 'no'.
no
3
A sequence follows the pattern: 2, 6, 12, 20, 30, ?. What is the next number?
42
4
In a certain code, MOUSE is written as PRXVH. Using the same code, how would CHAIR be written?
FKDLU
5
Four friends — Anna, Bob, Carol, and David — are to sit in a single row of four seats numbered 1 (leftmost) to 4 (rightmost). Their seating must satisfy all of the following rules:

Bob sits immediately to the left of Carol.

Anna sits somewhere to the right of Carol (not necessarily next to her).

David is not seated at either end of the row.

In left-to-right order, who occupies the four seats?
Write the names separated by commas, e.g. “Name1, Name2, Name3, Name4”.
Bob, Carol, David, Anna
6
If it takes 5 machines 5 minutes to make 5 widgets, how many minutes would it take 100 machines to make 100 widgets?
5
7
Three boxes contain apples. Box A has twice as many as Box B. Box C has 5 more than Box A. If Box B has 10 apples, how many apples are there in total?
55
8
In a tournament, every player plays every other player exactly once. If there were 45 games total, how many players were there?
10
9
A clock shows 3:15. What is the angle between the hour and minute hands in degrees?
7.5
10
If SEND + MORE = MONEY in cryptarithmetic where each letter represents a unique digit, what digit does Y represent?
2
11
A frog is at the bottom of a 30-foot well. Each day it climbs 3 feet, but each night it slides back 2 feet. On which day will it escape the well?
28
12
Six people shake hands with each other exactly once. How many handshakes occur?
15
13
What is the next letter in the sequence: O, T, T, F, F, S, S, E, ?
N
14
A bat and ball cost $1.10 together. The bat costs $1 more than the ball. How much does the ball cost in cents?
5
15
If 2^x = 8^(x-2), what is the value of x?
3
16
A drawer contains 12 red socks and 8 blue socks. In complete darkness, what is the minimum number of socks you must take out to guarantee a matching pair?
3
17
If today is Wednesday, what day of the week will it be 100 days from now?
Friday
18
In a family, there are 2 parents and some children. The average age of the whole family is 20. The average age of the parents is 38. If there are 3 children, what is their average age?
8
19
A cube is painted red on all faces and then cut into 27 smaller cubes of equal size. How many of the smaller cubes have exactly two red faces?
12
20
If 6 cats can catch 6 mice in 6 minutes, how many cats are needed to catch 60 mice in 60 minutes?
6
21
What is the sum of all integers from 1 to 100?
5050
22
A train 100 meters long traveling at 60 km/hr takes 9 seconds to cross a bridge. What is the length of the bridge in meters?
50
23
If A is B's sister, and B is C's brother, and C is D's sister, what is the minimum number of people in this family? (A, B, C, and D may be the same person)
2
24
Two cyclists start from the same point and ride in opposite directions. One cycles at 15 mph, the other at 20 mph. How far apart are they after 2 hours?
70
25
If all integers from 1 to 1000 inclusive are written down, how many times does the digit 7 appear?
300
26
A rope ladder hangs over the side of a ship. The rungs are 1 foot apart, and the ladder is 12 feet long. The tide rises at 6 inches per hour. After 6 hours, how many rungs will be underwater?
0
27
In base 7, what is 25 + 36?
64
28
A rectangular garden has a perimeter of 100 feet. If the length is 10 feet more than the width, what is the area in square feet?
600
29
If you rearrange the letters 'CIFAIPC' you get the name of a what type of water mass? Answer with one word.
Ocean
30
Three light-switches outside a closed room control three bulbs inside.
You may flip the switches as you wish, then enter the room once.
Inside, you may look at and touch the bulbs but may not return to the switches.
How many bulbs can you definitively match to their switches?
3
31
A man tells you: “The product of my three daughters’ ages is 36. The sum of their ages equals my house number, which you can see is 13. Also, my oldest daughter has a cat.”
List their ages from youngest to oldest, separated by commas.
2, 2, 9
32
If 1/2 of 5 is 3, what is 1/3 of 10?
4
33
A lily pad doubles in size every day. If it takes 48 days to cover the entire pond, on which day did it cover half the pond?
47
34
What is the smallest positive integer that is divisible by all integers from 1 to 10?
2520
35
In a race, you overtake the person in second place. What position are you in now?
second
36
A book has 300 pages. Each page is numbered. How many times does the digit 2 appear in the page numbers?
160
37
If the day before yesterday was Thursday, what day will it be the day after tomorrow?
Monday
38
A car travels from A to B at 40 mph and returns at 60 mph. What is the average speed for the round trip in mph?
48
39
How many ways can you make change for a dollar using quarters, dimes, nickels, and pennies?
242
40
If you write down all numbers from 1 to 100, how many times do you write the digit 9?
20
41
A clock strikes once at 1 o'clock, twice at 2 o'clock, and so on. How many times will it strike in a 12-hour period?
78
42
Three people check into a hotel room that costs $30. They each pay $10. Later, the manager realizes the room should only cost $25, so he gives $5 to the bellboy to return. The bellboy keeps $2 and gives $1 back to each person. Now each person paid $9 (totaling $27) and the bellboy kept $2. That's $29. Where is the missing dollar? Answer: 'no missing dollar' or state where it is.
no missing dollar
43
What is the 100th digit after the decimal point in the decimal expansion of 1/7?
8
44
A farmer has chickens and cows. There are 30 heads and 74 legs in total. How many chickens are there?
23
45
If February has 29 days, and March 1st is a Tuesday, what day of the week was January 1st?
Friday
46
In how many ways can 8 identical balls be distributed among 3 distinct boxes if each box must contain at least one ball?
21
47
A 3x3 magic square has all rows, columns, and diagonals sum to 15. If the center cell is 5 and the bottom right cell is 8, what number must be in the top-left corner?
2
48
Two trains start 100 miles apart and travel toward each other at 25 mph and 15 mph respectively. A bird flies back and forth between them at 60 mph until they meet. How far does the bird fly?
150
49
If x² - 5x + 6 = 0 and y² - 5y + 6 = 0, where x ≠ y, what is the value of x + y?
5
50
Three boxes are labeled 'Apples', 'Oranges', and 'Mixed'. All three labels are incorrect. You can pick one fruit from one box to examine it. What is the minimum number of fruits you need to pick to correctly relabel all three boxes?
1
Model Reasoning Effort Executions Accuracy Cost Duration