Initializer Constructor.
Properties of the selection benchmark
Execute the benchmark.
Execute the benchmark of the LLM function calling, and returns the result of the benchmark.
If you wanna see progress of the benchmark, you can pass a callback
function as the argument of the listener
. The callback function
would be called whenever a benchmark event is occurred.
Also, you can publish a markdown format report by calling the report function after the benchmark execution.
Optional
listener: (event: IAgenticaCallBenchmarkEvent<Model>) => voidCallback function listening the benchmark events
Results of the function calling benchmark
Report the benchmark result as markdown files.
Report the benchmark result executed by
AgenticaCallBenchmark
as markdown files, and returns a dictionary
object of the markdown reporting files. The key of the dictionary
would be file name, and the value would be the markdown content.
For reference, the markdown files are composed like below:
./README.md
./scenario-1/README.md
./scenario-1/1.success.md
./scenario-1/2.failure.md
./scenario-1/3.error.md
Dictionary of markdown files.
LLM function calling selection benchmark.
AgenticaCallBenchmark
is a class for the benchmark of the LLM (Large Model Language) function calling part. It utilizes bothselector
andcaller
agents and tests whether the expected IAgenticaOperation operations are properly selected and called from the given IAgenticaCallBenchmarkScenario scenarios.Note that, this
AgenticaCallBenchmark
consumes a lot of time and LLM token costs because it needs the whole process of the Agentica class with a lot of repetitions. If you don't want such a heavy benchmark, consider to using AgenticaSelectBenchmark instead. In my experience, Agentica does not fail to function calling, so the function selection benchmark is much economical.Author
Samchon