Class AgenticaCallBenchmark<Model>

LLM function calling selection benchmark.

AgenticaCallBenchmark is a class for the benchmark of the LLM (Large Model Language) function calling part. It utilizes both selector and caller agents and tests whether the expected IAgenticaOperation operations are properly selected and called from the given IAgenticaCallBenchmarkScenario scenarios.

Note that, this AgenticaCallBenchmark consumes a lot of time and LLM token costs because it needs the whole process of the Agentica class with a lot of repetitions. If you don't want such a heavy benchmark, consider to using AgenticaSelectBenchmark instead. In my experience, Agentica does not fail to function calling, so the function selection benchmark is much economical.

Author

Samchon

Type Parameters

Model extends ILlmSchema.Model

Index

Constructors

constructor

Methods

execute report

Constructors

constructor

new AgenticaCallBenchmark<Model extends Model>(
props: AgenticaCallBenchmark.IProps<Model>,
): AgenticaCallBenchmark<Model>
Initializer Constructor.
Type Parameters
- Model extends Model
Parameters
- props: AgenticaCallBenchmark.IProps<Model>
  Properties of the selection benchmark
Returns AgenticaCallBenchmark<Model>
- Defined in AgenticaCallBenchmark.ts:52

Methods

execute

execute(
listener?: (event: IAgenticaCallBenchmarkEvent<Model>) => void,
): Promise<IAgenticaCallBenchmarkResult<Model>>
Execute the benchmark.

Execute the benchmark of the LLM function calling, and returns the result of the benchmark.

If you wanna see progress of the benchmark, you can pass a callback function as the argument of the listener. The callback function would be called whenever a benchmark event is occurred.

Also, you can publish a markdown format report by calling the report function after the benchmark execution.
Parameters
- Optionallistener: (event: IAgenticaCallBenchmarkEvent<Model>) => void
  Callback function listening the benchmark events
Returns Promise<IAgenticaCallBenchmarkResult<Model>>
Results of the function calling benchmark
- Defined in AgenticaCallBenchmark.ts:79

report

report(): Record<string, string>
Report the benchmark result as markdown files.

Report the benchmark result executed by AgenticaCallBenchmark as markdown files, and returns a dictionary object of the markdown reporting files. The key of the dictionary would be file name, and the value would be the markdown content.

For reference, the markdown files are composed like below:
- ./README.md
- ./scenario-1/README.md
- ./scenario-1/1.success.md
- ./scenario-1/2.failure.md
- ./scenario-1/3.error.md
Returns Record<string, string>
Dictionary of markdown files.
- Defined in AgenticaCallBenchmark.ts:139

Class AgenticaCallBenchmark<Model>

Author

Type Parameters

Index

Constructors

Methods

Constructors

constructor

Type Parameters

Parameters

Returns AgenticaCallBenchmark<Model>

Methods

execute

Parameters

Returns Promise<IAgenticaCallBenchmarkResult<Model>>

report

Returns Record<string, string>

Settings

On This Page