Summary
When you create an AI agent in Astra, the system doesn’t just generate it and leave you to figure out the rest. It generates test cases automatically to help you evaluate and improve your agent from the start.
This article explains how automated testing, prompt optimization, and the evaluation dashboard work together to help you launch a reliable, high-performing agent with less manual effort.
Instructions
How automated agent evaluation works
As soon as your agent is created, Astra automatically generates test cases. This removes the need for manual test setup and reduces the time spent on trial and error.
When you open the Evaluation page, you’ll see a set of test cases automatically generated from your agent’s current instructions.
These test cases establish a performance baseline. They help you understand how your agent responds across different conversation types, edge cases, and risk scenarios. These scenarios range from standard queries to complex problems.
How to run the evaluation and view results
You can select specific test cases from the list to run the evaluation, or click Run all to evaluate all available test cases at once.
Once the evaluation is complete, the system will show you how the agent performs in real time. This helps you quickly identify unclear, incomplete, or conflicting instructions.
You will see the overall evaluation results, including:
Efficiency score – How well the AI agent handled the questions.
Accuracy – How correct the AI agent’s responses were.
Latency – How quickly the AI agent responded.
You can also view the following details in the evaluation summary:
Question – The test question used to evaluate the AI agent.
Category – The type of question, such as standard query, implied problem, or off-topic question.
Expected answer – The response that the AI agent is expected to provide.
AI response – The actual response generated by the AI agent.
Metrics – Displays the efficiency score for the response.
Status – Indicates whether the AI agent passed or failed the evaluation.
Notes – Shows additional information about the test case, such as whether the question was auto-generated, uploaded via CSV, or added manually.
You can click on any individual test case to view detailed results, including:
Evaluation Summary – A breakdown of how the agent responded and why it passed or failed.
Expected Behaviour – The correct or ideal response the agent should have provided.
Each result includes a detailed explanation of how the agent interpreted the user’s input and whether it responded correctly. This makes it easier to spot gaps and refine your instructions with precision.
What you need to do:
Review the results for each test case.
Compare the agent’s response with the evaluation summary.
Check whether the response matches the expected behaviour.
How to view the AI analysis and recommendations
Click Analyse results to review the evaluation outcome. The system may take a few moments to process the analysis.
After the analysis is complete, click View recommendation to see AI-powered optimization suggestions.
The system summarizes the main issues and provides practical recommendations to improve the agent’s accuracy and reliability.
Review the high-priority suggestions carefully. These may include adding clear rules for certain requests or defining step-by-step instructions for complex tasks such as order tracking.
How to update the agent instructions
After reviewing the suggestions, click Update instructions to automatically apply the improvements. This feature helps streamline the process of refining the agent using insights from real-world testing and analysis. It will take a few moments and display the results.
Review and apply changes
The Review updated instructions screen will display the proposed changes. Here, you can see how the new rules and guidelines are added to the agent’s existing instructions.
After optimization:
The portal highlights the proposed updates.
You can clearly see what has been modified and why.
Review the suggested improvements carefully.
Scroll through the updated instructions to ensure they meet your requirements.
Confirm that the AI suggestions align with your brand voice and business processes.
Click Accept to finalize the update.
Run the evaluation once again
After saving the changes, click Run all again to re-evaluate the agent using the same test cases. The system will run the evaluation with the updated instructions. When the test completes, you should see that all scenarios pass.
This process shows how the analysis and recommendation features help you iteratively improve and optimize your AI agent’s performance.










