Modelmetry Changelog

This page documents the changes to Modelmetry. We are constantly improving the platform, and we want to keep you updated on what's new.

February 5, 2025
- improvement
  30% Faster guardrail checks
  We have made significant improvements to the performance of guardrail checks. You can now expect them to run 30% faster than before. Your guardrails generally contain multiple evaluations which are run in parallel. We have optimised the way we dispatch and await for these evaluations to complete to make the entire guardrail checking process a lot faster.
December 14, 2024
- feature
  Add RAG documents to User messages in the payload editor
  You can now add RAG documents to User messages in the payload editor. This will help you to provide more context to some evaluators which need to evaluate the groundedness of an assistant response.
December 12, 2024
- improvement
  Autocomplete Hints in Automation Rule Editor
  Small quality of life improvement: we added autocomplete hints in the automation rule editor. This will help you to quickly build your automation rules and avoid typos.
December 1, 2024
- feature
  Dashboards with analytics
  A huge amount of work went into creating analytics dashboards so you can view, at a glance, key metrics, the status of your guardrail checks, and the performance of your evaluators. This is the first iteration of our analytics and we will be adding more widgets and features in the future.
  A dashboard allows you to add specific widgets which are data visualisation components. You can add widgets to your dashboard by clicking on the "Add widget" button.
November 18, 2024
- improvement
  Clickable hints in grading editor
  On top of autocomplete, you will now see pills of the available findings for you to use in the expression editor. You can simply click the pill to insert it in the editor. This will help you to quickly build your expression and avoid typos.
November 11, 2024
- feature
  New flexible grading system
  Prior to this, evaluators were deciding whether a payload passed or failed the evaluation. This is no longer the case. Instead, an evaluator now returns one or many findings (metrics, booleans, labels) and you decide when these pass or fail using an assessment expression. This new system allows for more flexibility and control over the evaluation process. You can now define multiple findings and use them to create a more complex evaluation rules.
  You can keep things very simple with, for example:
```
findings.score > 1
```
  Or do complex rules such as:
```
(findings.score > 1 and "clean" in findings.tags) || (findings.score == 100)
```
  Thanks to this, you can now have different messages based on the specific outcome or rule that passes or failed; which was a highly request feature!
November 1, 2024
- improvement
  Evaluate a payload by config (JS SDK)
  Until now, you could only perform evaluations via guardrail checks or traces with automations to trigger evaluations.
  This is no longer the case. You can now run an evaluation by providing its details:
  - The evaluator's id (e.g., modelmetry.boolean-llm-as-judge.v1)
  - The evaluator's configuration
  - The payload to evaluate
  - Any additional data such as whether to persist the results (e.g., the metrics) and any secrets needed to run the evaluation (e.g., api keys for LLM providers).
  This will allow you to evaluate payloads directly from your codebase, without having to create a guardrail check or a trace.
October 19, 2024
- feature
  Copy and paste snippets to use guardrails in your code
  You can now view Typescript and Python code snippets for each guardrail. This will help you to quickly integrate guardrails in your codebase. Simply click the Code button in your list of guardrails.
October 7, 2024
- improvement
  Collapsible sidebar for more
  The sidebar is now collapsible on desktop and other large screens to allow more screen real estate for the main content. You can collapse the sidebar by clicking on the "Sidebar" button in the breadcrumbs area. If you close your browser or navigate away, the sidebar will remain collapsed.
September 18, 2024
- feature
  Test evaluators in app
  We know how difficult it can be to fine-tune an evaluator's configuration. Therefore, we have added a testing screen directly in the app. You can now test your evaluator by providing a payload and configuring the evaluation instance's parameters.
September 14, 2024
- improvement
  Attach secrets to evaluators
  Once you create a secret, you can attach it to an evaluator instance. This will allow you to use the secret when calling an external third-party API.
  Go to an evaluator instance, and in the Secrets area, select the secret(s) you want to attach. Whilst we do allow you to attach multiple secrets for an instance, you can only attach one secret per provider (e.g., one Openai secret and one Google Cloud secret).
September 11, 2024
- feature
  Vault and secrets
  The new Vault is where you can manage your secrets. You can create secrets, update them, and delete them.
  You can use the Vault to store sensitive information, such as API keys, tokens, and Google CLoud service accounts, for defined providers. We cannot show you a secret's value, but we do store a small preview for you to understand what the secret is about.
  Specific secrets can get attached to instances of evaluators so they can be used when calling external third-party APIs (e.g., Openai API, Google Cloud, Azure, Groq).
  Your secrets are stored encrypted. All secrets are encrypted using AES-256 and they are only decrypted when used for a third-party API call.
September 1, 2024
- feature
  View completion span payload
  You can now view beautifully formatted payloads for completion spans.
  For example, you can see the chat thread, system prompt, and the LLM output in a beautiful interface. We also added tabs to view completion options (e.g., temperature, model, tools, provider) as well as context and RAG.
August 25, 2024
- feature
  Explore trace spans
  When you are viewing a trace, you can now click on a span to view its details. Explore each span's payload, findings, metrics, logs, and events.
August 17, 2024
- feature
  View details of a guardrail check
  Click on a guardrail check in the table to view its details including the check's outcome, metrics, and a break down of the different evaluations ran for this check.
August 4, 2024
- improvement
  View billing usage
  You can now view your billing usage. This will help you to understand how much you have used our various features and how much you have remaining.
  This is available to billing viewers in Settings, then Billing.
July 25, 2024
- feature
  Change project or company logo
  Tenant administrators can now change the project or company logo. This will help you to customise the look and feel of your dashboard. Simply go to the Settings page, then General and click on the Change your logo button.
July 22, 2024
- feature
  Users can update their display picture
  All signed in users can now update their display picture. Simply go to the Account page and click on the Change your avatar button.
July 21, 2024
- feature
  Users can now (finally) update their password
  All users can now update their password. Simply go to the Account page and click on the Change your password button.
  If you have not signed in recently, you will be required to sign out and sign in again to be allowed a password change (for security reasons).
July 12, 2024
- feature
  Custom Role-Based Access Control
  Administrators can now create, update, and remove custom roles. Until today, you could only use one of the three standard roles: member, administrator, and owner.
  With custom roles, you can hand-pick which actions you want to explicitly allow or deny, or leave unset (i.e., implicitly deny).
June 30, 2024
- improvement
  View evaluator metrics
  When you are looking to create a new evaluator instance, you can now view the evaluator's metrics. This will help you to understand the evaluator's capabilities and limitations.
June 16, 2024
- improvement
  Quick search for findings charts
  You can now quickly search for metrics charts by their name. This will help you to quickly find the chart you are interested in.
June 7, 2024
- improvement
  Filter guardrail calls
  You can now filter guardrail calls by their date, guardrail, outcome, and metrics (numeric findings). This will help you to quickly find the guardrail calls you are interested in.
June 5, 2024
- improvement
  Filter evaluators by category (+ text search)
  When you are looking to create a new evaluator instance, you can now browse our list of evaluators using categories or search them by text.
  The text search will look at the evaluator's name, description, and metrics, when checking whether it is relevant or not.
June 3, 2024
- improvement
  Filter API keys by status
  In the Settings's API keys page, you can now quickly filter the table by api key status. You can view them all, or filter by enabled, disabled, and revoked statuses.
June 1, 2024
- improvement
  Update display name and email
  You can now update your display name and email address. This will help you to keep your profile up-to-date. We will add further profile controls in due time.

Modelmetry Changelog

February 5, 2025

30% Faster guardrail checks

December 14, 2024

Add RAG documents to User messages in the payload editor

December 12, 2024

Autocomplete Hints in Automation Rule Editor

December 1, 2024

Dashboards with analytics

November 18, 2024

Clickable hints in grading editor

November 11, 2024

New flexible grading system

November 1, 2024

Evaluate a payload by config (JS SDK)

October 19, 2024

Copy and paste snippets to use guardrails in your code

October 7, 2024

Collapsible sidebar for more

September 18, 2024

Test evaluators in app

September 14, 2024

Attach secrets to evaluators

September 11, 2024

Vault and secrets

September 1, 2024

View completion span payload

August 25, 2024

Explore trace spans

August 17, 2024

View details of a guardrail check

August 4, 2024

View billing usage

July 25, 2024

Change project or company logo

July 22, 2024

Users can update their display picture

July 21, 2024

Users can now (finally) update their password

July 12, 2024

Custom Role-Based Access Control

June 30, 2024

View evaluator metrics

June 16, 2024

Quick search for findings charts

June 7, 2024

Filter guardrail calls

June 5, 2024

Filter evaluators by category (+ text search)

June 3, 2024

Filter API keys by status

June 1, 2024

Update display name and email