Modelmetry Changelog
This page documents the changes to Modelmetry. We are constantly improving the platform, and we want to keep you updated on what's new.
December 14, 2024
- feature
Add RAG documents to User messages in the payload editor
You can now add RAG documents to User messages in the payload editor. This will help you to provide more context to some evaluators which need to evaluate the groundedness of an assistant response.
- feature
December 12, 2024
- improvement
Autocomplete Hints in Automation Rule Editor
Small quality of life improvement: we added autocomplete hints in the automation rule editor. This will help you to quickly build your automation rules and avoid typos.
- improvement
December 1, 2024
- feature
Dashboards with analytics
A huge amount of work went into creating analytics dashboards so you can view, at a glance, key metrics, the status of your guardrail checks, and the performance of your evaluators. This is the first iteration of our analytics and we will be adding more widgets and features in the future.
A dashboard allows you to add specific widgets which are data visualisation components. You can add widgets to your dashboard by clicking on the "Add widget" button.
- feature
November 18, 2024
- improvement
Clickable hints in grading editor
On top of autocomplete, you will now see pills of the available findings for you to use in the expression editor. You can simply click the pill to insert it in the editor. This will help you to quickly build your expression and avoid typos.
- improvement
November 11, 2024
- feature
New flexible grading system
Prior to this, evaluators were deciding whether a payload passed or failed the evaluation. This is no longer the case. Instead, an evaluator now returns one or many findings (metrics, booleans, labels) and you decide when these pass or fail using an assessment expression. This new system allows for more flexibility and control over the evaluation process. You can now define multiple findings and use them to create a more complex evaluation rules.
You can keep things very simple with, for example:
findings.score > 1
Or do complex rules such as:
(findings.score > 1 and "clean" in findings.tags) || (findings.score == 100)
Thanks to this, you can now have different messages based on the specific outcome or rule that passes or failed; which was a highly request feature!
- feature
November 1, 2024
- improvement
Evaluate a payload by config (JS SDK)
Until now, you could only perform evaluations via guardrail checks or traces with automations to trigger evaluations.
This is no longer the case. You can now run an evaluation by providing its details:
- The evaluator's id (e.g.,
modelmetry.boolean-llm-as-judge.v1
) - The evaluator's configuration
- The payload to evaluate
- Any additional data such as whether to persist the results (e.g., the metrics) and any secrets needed to run the evaluation (e.g., api keys for LLM providers).
This will allow you to evaluate payloads directly from your codebase, without having to create a guardrail check or a trace.
- The evaluator's id (e.g.,
- improvement
October 19, 2024
- feature
Copy and paste snippets to use guardrails in your code
You can now view Typescript and Python code snippets for each guardrail. This will help you to quickly integrate guardrails in your codebase. Simply click the
Code
button in your list of guardrails.
- feature
October 7, 2024
- improvement
Collapsible sidebar for more
The sidebar is now collapsible on desktop and other large screens to allow more screen real estate for the main content. You can collapse the sidebar by clicking on the "Sidebar" button in the breadcrumbs area. If you close your browser or navigate away, the sidebar will remain collapsed.
- improvement
September 18, 2024
- feature
Test evaluators in app
We know how difficult it can be to fine-tune an evaluator's configuration. Therefore, we have added a testing screen directly in the app. You can now test your evaluator by providing a payload and configuring the evaluation instance's parameters.
- feature
September 14, 2024
- improvement
Attach secrets to evaluators
Once you create a secret, you can attach it to an evaluator instance. This will allow you to use the secret when calling an external third-party API.
Go to an evaluator instance, and in the
Secrets
area, select the secret(s) you want to attach. Whilst we do allow you to attach multiple secrets for an instance, you can only attach one secret per provider (e.g., one Openai secret and one Google Cloud secret).
- improvement
September 11, 2024
- feature
Vault and secrets
The new Vault is where you can manage your secrets. You can create secrets, update them, and delete them.
You can use the Vault to store sensitive information, such as API keys, tokens, and Google CLoud service accounts, for defined providers. We cannot show you a secret's value, but we do store a small preview for you to understand what the secret is about.
Specific secrets can get attached to instances of evaluators so they can be used when calling external third-party APIs (e.g., Openai API, Google Cloud, Azure, Groq).
Your secrets are stored encrypted. All secrets are encrypted using AES-256 and they are only decrypted when used for a third-party API call.
- feature
September 1, 2024
- feature
View completion span payload
You can now view beautifully formatted payloads for completion spans.
For example, you can see the chat thread, system prompt, and the LLM output in a beautiful interface. We also added tabs to view completion options (e.g., temperature, model, tools, provider) as well as context and RAG.
- feature
August 25, 2024
- feature
Explore trace spans
When you are viewing a trace, you can now click on a span to view its details. Explore each span's payload, findings, metrics, logs, and events.
- feature
August 17, 2024
- feature
View details of a guardrail check
Click on a guardrail check in the table to view its details including the check's outcome, metrics, and a break down of the different evaluations ran for this check.
- feature
August 4, 2024
- improvement
View billing usage
You can now view your billing usage. This will help you to understand how much you have used our various features and how much you have remaining.
This is available to billing viewers in
Settings
, thenBilling
.
- improvement
July 25, 2024
- feature
Change project or company logo
Tenant administrators can now change the project or company logo. This will help you to customise the look and feel of your dashboard. Simply go to the
Settings
page, thenGeneral
and click on theChange your logo
button.
- feature
July 22, 2024
- feature
Users can update their display picture
All signed in users can now update their display picture. Simply go to the
Account
page and click on theChange your avatar
button.
- feature
July 21, 2024
- feature
Users can now (finally) update their password
All users can now update their password. Simply go to the
Account
page and click on theChange your password
button.If you have not signed in recently, you will be required to sign out and sign in again to be allowed a password change (for security reasons).
- feature
July 12, 2024
- feature
Custom Role-Based Access Control
Administrators can now create, update, and remove custom roles. Until today, you could only use one of the three standard roles: member, administrator, and owner.
With custom roles, you can hand-pick which actions you want to explicitly allow or deny, or leave unset (i.e., implicitly deny).
- feature
June 30, 2024
- improvement
View evaluator metrics
When you are looking to create a new evaluator instance, you can now view the evaluator's metrics. This will help you to understand the evaluator's capabilities and limitations.
- improvement
June 16, 2024
- improvement
Quick search for findings charts
You can now quickly search for metrics charts by their name. This will help you to quickly find the chart you are interested in.
- improvement
June 7, 2024
- improvement
Filter guardrail calls
You can now filter guardrail calls by their date, guardrail, outcome, and metrics (numeric findings). This will help you to quickly find the guardrail calls you are interested in.
- improvement
June 5, 2024
- improvement
Filter evaluators by category (+ text search)
When you are looking to create a new evaluator instance, you can now browse our list of evaluators using categories or search them by text.
The text search will look at the evaluator's name, description, and metrics, when checking whether it is relevant or not.
- improvement
June 3, 2024
- improvement
Filter API keys by status
In the Settings's API keys page, you can now quickly filter the table by api key status. You can view them all, or filter by enabled, disabled, and revoked statuses.
- improvement
June 1, 2024
- improvement
Update display name and email
You can now update your display name and email address. This will help you to keep your profile up-to-date. We will add further profile controls in due time.
- improvement