models /

gpt-5.4-mini scaffold-assembly-v1

Provider: openai · Run at 2026-06-09 · Submission 4c42a6c5

Build Points 21 / 2,461 available
CaSS 10.7%
CBS 3.6%
Coverage 34%
Cost $10.47
Tokens (in/out) 1.81M / 71.7k
Diagnostic row, not ranked. legacy task-set provenance; 29/37 task results

Tier breakdown

The 3-dimension diagnostic matrix for this submission. A tier shows no CBS until the structural prover returns a definitive verdict there.

Tier BOM pass Scope pass Buildability CaSS CBS Coverage n
trivial 75.0% 0.0% 0.0% 0.0% 0.0% 75% 4
easy 100.0% 25.0% 25.0% 25.0% 25.0% 100% 4
medium 25.0% 25.0% 0.0% 25.0% 0.0% 25% 4
hard 50.0% 0.0% 0.0% 0.0% 0% 4
stretch 25.0% 0.0% 0.0% 0.0% 0% 4
expert 75.0% 25.0% 0.0% 25.0% 0.0% 25% 4
master 20.0% 0.0% 0.0% 0.0% 0.0% 20% 5

Per-task results

All 29 tasks attempted by this baseline. Click a task to see the same row from every model's perspective.

# Slug Tier CaSS BOM Scope Build Ref sim Iter Tokens Duration Status 3D
0001 flat-plate-5x7 trivial 0 0 0 incon. 0.000 3 5.2k 12.0s DNC
0002 flat-plate-5x5 trivial 0 1 0 0 0.000 7 15.4k 15.6s 3D
0003 staggered-column-2x2 trivial 0 1 0 0 0.000 4 14.1k 10.2s 3D
0004 hollow-frame-8x8 trivial 0 1 0 0 0.000 8 45.4k 23.9s 3D
0005 small-table easy 0 1 0 0 0.000 9 50.4k 17.2s 3D
0006 step-pyramid easy 1 1 1 1 0.400 9 27.6k 14.8s pass 3D
0007 simple-gate easy 0 1 0 0 0.000 9 58.0k 24.0s 3D
0008 picture-frame easy 0 1 0 0 0.000 12 151.5k 36.7s 3D
0009 minifig-chair medium 1 1 1 0 0.000 6 40.7k 18.3s pass 3D
0010 bookshelf-3-shelf medium 0 0 0 incon. 0.000 6 69.6k 21.1s DNC
0011 simple-bridge medium 0 0 0 incon. 0.000 7 36.6k 22.8s DNC
0012 square-lantern medium 0 0 0 incon. 0.000 10 110.7k 33.8s DNC
0013 farm-tractor hard 0 1 0 incon. 0.000 9 41.3k 43.6s 3D
0014 rolling-cart hard 0 1 0 incon. 0.000 7 50.1k 27.2s 3D
0015 drawbridge hard 0 0 0 incon. 0.000 8 74.8k 29.4s DNC
0016 slatted-bench hard 0 0 0 incon. 0.000 5 42.2k 34.0s DNC
0017 small-house stretch 0 0 0 incon. 0.000 11 122.8k 50.1s DNC
0018 small-lighthouse stretch 0 0 0 incon. 0.000 14 83.9k 54.3s DNC
0019 garden-shed stretch 0 1 0 incon. 0.000 9 236.5k 37.5s 3D
0020 fountain-with-pool stretch 0 0 0 incon. 0.000 10 190.3k 57.7s DNC
0021 grand-bridge expert 1 1 1 incon. 0.000 6 30.8k 35.4s pass 3D
0022 manor-house expert 0 0 0 incon. 0.000 5 45.6k 46.5s DNC
0023 clock-tower expert 0 1 0 0 0.017 12 184.1k 47.3s 3D
0024 grand-fountain expert 0 1 0 incon. 0.000 4 22.9k 13.7s 3D
0025 mosaic-smiley master 0 0 0 incon. 0.000 6 25.1k 38.4s DNC
0026 voxel-tree master 0 1 0 0 0.000 6 34.3k 14.8s 3D
0027 grand-mosaic-sunset master 0 0 0 incon. 0.000 5 29.0k 24.3s DNC
0028 mega-voxel-rocket master 0 0 0 incon. 0.000 2 3.8k 3.7s DNC
0029 colossal-mosaic-grid master 0 0 0 incon. 0.000 6 36.4k 18.6s DNC

Submitted BOMs

The final BOM the model emitted on each task it attempted.

#0001 flat-plate-5x7 DNC

adapter error: OpenAI 400: { "error": { "message": "Invalid prompt: your prompt was flagged as potentially violating our usage policy. Please try again with a different prompt: https://platform.openai.com/docs/guides/reasoning#advice-on-prompting", "type": "invalid_request_error", "param": null, "code": "invalid_prompt" } }

#0002 flat-plate-5x5 trivial CaSS 0

red 5x5 plate
part_num color_id qty
78329 19 5
total pieces 5

#0003 staggered-column-2x2 trivial CaSS 0

2x2x13 column
part_num color_id qty
3003 1 13
total pieces 13

#0004 hollow-frame-8x8 trivial CaSS 0

8x8 frame
part_num color_id qty
3034 1 2
3795 1 2
3020 1 2
3022 1 4
3023 1 4
total pieces 14

#0005 small-table easy CaSS 0

Small Table
part_num color_id qty
3001 72 4
3022 0 1
total pieces 5

#0006 step-pyramid easy CaSS 1

Step Pyramid
part_num color_id qty
3958 19 1
3031 19 1
3022 19 1
total pieces 3

#0007 simple-gate easy CaSS 0

Simple Gate
part_num color_id qty
3005 1 8
3004 1 2
3010 1 1
total pieces 11

#0008 picture-frame easy CaSS 0

standing picture frame
part_num color_id qty
3010 8 8
3009 8 2
3032 1 1
4276a 8 2
3020 1 1
total pieces 14

#0009 minifig-chair medium CaSS 1

Minifigure chair
part_num color_id qty
3022 86 4
3003 86 8
3005 86 4
total pieces 16

#0010 bookshelf-3-shelf DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0011 simple-bridge DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0012 square-lantern DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0013 farm-tractor hard CaSS 0

green farm tractor
part_num color_id qty
4488 2 4
69912 0 2
70695 0 2
4079 71 1
2741 0 1
3829c01 0 1
30640 2 1
total pieces 12

#0014 rolling-cart hard CaSS 0

open cart
part_num color_id qty
3020 71 2
3004 71 2
38339 0 4
24314c01 0 4
3661 0 1
61252 71 1
total pieces 14

#0015 drawbridge DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0016 slatted-bench DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0017 small-house DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0018 small-lighthouse DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0019 garden-shed stretch CaSS 0

garden shed
part_num color_id qty
41539 80 1
3034 80 12
821 5 1
3082 12 2
79736 25 1
41770 85 1
41769 85 1
99781 19 2
total pieces 21

#0020 fountain-with-pool DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0021 grand-bridge expert CaSS 1

Grand Bonded Bridge
part_num color_id qty
3030 0 2
3832 0 4
3062a 0 48
3633 0 4
total pieces 58

#0022 manor-house DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0023 clock-tower expert CaSS 0

Clock Tower
part_num color_id qty
41539 86 8
3031 86 1
92947 86 18
3005 47 4
3004 86 4
3942a 86 1
total pieces 36

#0024 grand-fountain expert CaSS 0

grand fountain
part_num color_id qty
3062a 71 6
92947 71 5
4032a 71 1
total pieces 12

#0025 mosaic-smiley DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0026 voxel-tree master CaSS 0

voxel_tree
part_num color_id qty
3005 28 3
3005 0 6
3024 28 1
3024 0 2
total pieces 12

#0027 grand-mosaic-sunset DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

#0028 mega-voxel-rocket DNC

no BOM after nudge

#0029 colossal-mosaic-grid DNC

model produced a non-conformant assembly (assembly instance counts must exactly match final_bom.parts quantities)

Run metadata

Submission ID4c42a6c5-d1b7-499c-bf50-0f595c9b297f
Protocolscaffold-assembly-v1
Protocol hashsha256:394f2d859fafd42d0ac144c5f251e68ea2e7f4c06da48bf733e1c2505055ee9c
Run at2026-06-09T17:12:45.812Z
Total duration13m47s
Adapter healthok