Modelmetry Changelog

This page documents the changes to Modelmetry. We are constantly improving the platform, and we want to keep you updated on what's new.

  • December 14, 2024

    • feature

      Add RAG documents to User messages in the payload editor

      You can now add RAG documents to User messages in the payload editor. This will help you to provide more context to some evaluators which need to evaluate the groundedness of an assistant response.

  • December 12, 2024

    • improvement

      Autocomplete Hints in Automation Rule Editor

      Small quality of life improvement: we added autocomplete hints in the automation rule editor. This will help you to quickly build your automation rules and avoid typos.

  • December 1, 2024

    • feature

      Dashboards with analytics

      A huge amount of work went into creating analytics dashboards so you can view, at a glance, key metrics, the status of your guardrail checks, and the performance of your evaluators. This is the first iteration of our analytics and we will be adding more widgets and features in the future.

      A dashboard allows you to add specific widgets which are data visualisation components. You can add widgets to your dashboard by clicking on the "Add widget" button.

  • November 18, 2024

    • improvement

      Clickable hints in grading editor

      On top of autocomplete, you will now see pills of the available findings for you to use in the expression editor. You can simply click the pill to insert it in the editor. This will help you to quickly build your expression and avoid typos.

  • November 11, 2024

    • feature

      New flexible grading system

      Prior to this, evaluators were deciding whether a payload passed or failed the evaluation. This is no longer the case. Instead, an evaluator now returns one or many findings (metrics, booleans, labels) and you decide when these pass or fail using an assessment expression. This new system allows for more flexibility and control over the evaluation process. You can now define multiple findings and use them to create a more complex evaluation rules.

      You can keep things very simple with, for example:

      findings.score > 1

      Or do complex rules such as:

      (findings.score > 1 and "clean" in findings.tags) || (findings.score == 100)

      Thanks to this, you can now have different messages based on the specific outcome or rule that passes or failed; which was a highly request feature!

  • November 1, 2024

    • improvementHero

      Evaluate a payload by config (JS SDK)

      Until now, you could only perform evaluations via guardrail checks or traces with automations to trigger evaluations.

      This is no longer the case. You can now run an evaluation by providing its details:

      • The evaluator's id (e.g., modelmetry.boolean-llm-as-judge.v1)
      • The evaluator's configuration
      • The payload to evaluate
      • Any additional data such as whether to persist the results (e.g., the metrics) and any secrets needed to run the evaluation (e.g., api keys for LLM providers).

      This will allow you to evaluate payloads directly from your codebase, without having to create a guardrail check or a trace.

  • October 19, 2024

    • feature

      Copy and paste snippets to use guardrails in your code

      You can now view Typescript and Python code snippets for each guardrail. This will help you to quickly integrate guardrails in your codebase. Simply click the Code button in your list of guardrails.

  • October 7, 2024

    • improvement

      Collapsible sidebar for more

      The sidebar is now collapsible on desktop and other large screens to allow more screen real estate for the main content. You can collapse the sidebar by clicking on the "Sidebar" button in the breadcrumbs area. If you close your browser or navigate away, the sidebar will remain collapsed.

  • September 18, 2024

    • feature

      Test evaluators in app

      We know how difficult it can be to fine-tune an evaluator's configuration. Therefore, we have added a testing screen directly in the app. You can now test your evaluator by providing a payload and configuring the evaluation instance's parameters.

  • September 14, 2024

    • improvement

      Attach secrets to evaluators

      Once you create a secret, you can attach it to an evaluator instance. This will allow you to use the secret when calling an external third-party API.

      Go to an evaluator instance, and in the Secrets area, select the secret(s) you want to attach. Whilst we do allow you to attach multiple secrets for an instance, you can only attach one secret per provider (e.g., one Openai secret and one Google Cloud secret).

  • September 11, 2024

    • feature

      Vault and secrets

      The new Vault is where you can manage your secrets. You can create secrets, update them, and delete them.

      You can use the Vault to store sensitive information, such as API keys, tokens, and Google CLoud service accounts, for defined providers. We cannot show you a secret's value, but we do store a small preview for you to understand what the secret is about.

      Specific secrets can get attached to instances of evaluators so they can be used when calling external third-party APIs (e.g., Openai API, Google Cloud, Azure, Groq).

      Your secrets are stored encrypted. All secrets are encrypted using AES-256 and they are only decrypted when used for a third-party API call.

  • September 1, 2024

    • feature

      View completion span payload

      You can now view beautifully formatted payloads for completion spans.

      For example, you can see the chat thread, system prompt, and the LLM output in a beautiful interface. We also added tabs to view completion options (e.g., temperature, model, tools, provider) as well as context and RAG.

  • August 25, 2024

    • feature

      Explore trace spans

      When you are viewing a trace, you can now click on a span to view its details. Explore each span's payload, findings, metrics, logs, and events.

  • August 17, 2024

    • feature

      View details of a guardrail check

      Click on a guardrail check in the table to view its details including the check's outcome, metrics, and a break down of the different evaluations ran for this check.

  • August 4, 2024

    • improvement

      View billing usage

      You can now view your billing usage. This will help you to understand how much you have used our various features and how much you have remaining.

      This is available to billing viewers in Settings, then Billing.

  • July 25, 2024

    • feature

      Change project or company logo

      Tenant administrators can now change the project or company logo. This will help you to customise the look and feel of your dashboard. Simply go to the Settings page, then General and click on the Change your logo button.

  • July 22, 2024

    • feature

      Users can update their display picture

      All signed in users can now update their display picture. Simply go to the Account page and click on the Change your avatar button.

  • July 21, 2024

    • feature

      Users can now (finally) update their password

      All users can now update their password. Simply go to the Account page and click on the Change your password button.

      If you have not signed in recently, you will be required to sign out and sign in again to be allowed a password change (for security reasons).

  • July 12, 2024

    • feature

      Custom Role-Based Access Control

      Administrators can now create, update, and remove custom roles. Until today, you could only use one of the three standard roles: member, administrator, and owner.

      With custom roles, you can hand-pick which actions you want to explicitly allow or deny, or leave unset (i.e., implicitly deny).

  • June 30, 2024

    • improvement

      View evaluator metrics

      When you are looking to create a new evaluator instance, you can now view the evaluator's metrics. This will help you to understand the evaluator's capabilities and limitations.

  • June 16, 2024

    • improvement

      Quick search for findings charts

      You can now quickly search for metrics charts by their name. This will help you to quickly find the chart you are interested in.

  • June 7, 2024

    • improvement

      Filter guardrail calls

      You can now filter guardrail calls by their date, guardrail, outcome, and metrics (numeric findings). This will help you to quickly find the guardrail calls you are interested in.

  • June 5, 2024

    • improvement

      Filter evaluators by category (+ text search)

      When you are looking to create a new evaluator instance, you can now browse our list of evaluators using categories or search them by text.

      The text search will look at the evaluator's name, description, and metrics, when checking whether it is relevant or not.

  • June 3, 2024

    • improvement

      Filter API keys by status

      In the Settings's API keys page, you can now quickly filter the table by api key status. You can view them all, or filter by enabled, disabled, and revoked statuses.

  • June 1, 2024

    • improvement

      Update display name and email

      You can now update your display name and email address. This will help you to keep your profile up-to-date. We will add further profile controls in due time.