Models

Mastra provides a unified interface for working with LLMs across multiple providers.

Topics

Overview - Introduction to models
Embeddings - Text embedding models
Gateways - Model routing and load balancing
Providers - LLM provider configuration

Models Overview

Mastra provides a unified interface for working with LLMs across multiple providers.

Supported Providers

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude)
Google (Gemini)
Mistral
Azure OpenAI
And more...

Basic Configuration

import { Mastra } from '@mastra/core';

export const mastra = new Mastra({
  models: {
    gpt4: openai('gpt-4'),
    claude: anthropic('claude-3-opus'),
  },
});

Using Models

const agent = mastra.getAgent('myAgent', {
  model: mastra.models.gpt4,
});

Model Selection

Choose models based on:

Task complexity - Simple tasks can use smaller models
Latency requirements - Smaller models are faster
Cost - Balance performance and cost
Capabilities - Some models excel at specific tasks

Embeddings

Text embedding models for RAG and similarity search.

Supported Embedders

OpenAI (text-embedding-3-small, text-embedding-3-large)
Cohere
Hugging Face
Azure OpenAI

Configuration

import { embed } from '@mastra/models';

const embedder = embed.openai({
  model: 'text-embedding-3-small',
  dimensions: 1536,
});

Generating Embeddings

const embeddings = await embedder.embed({
  texts: ['Hello world', 'How are you?'],
});

console.log(embeddings[0]); // [0.123, -0.456, ...]
console.log(embeddings[1]); // [0.789, -0.012, ...]

Batch Processing

const texts = await loadDocuments();
const batches = chunkArray(texts, 100);

for (const batch of batches) {
  const embeddings = await embedder.embed({ texts: batch });
  await store.insert(embeddings);
}

Dimension Sizes

| Model | Dimensions | |-------|------------| | text-embedding-3-small | 1536 (or 512, 256) | | text-embedding-3-large | 3072 (or 1024, 256) | | text-embedding-ada-002 | 1536 |

Model Providers

Configure connections to LLM providers.

Overview

Mastra supports multiple LLM providers. Each provider has its own configuration requirements.

OpenAI Provider

Configure OpenAI models.

Installation

npm install @openai/openai

Configuration

import { openai } from '@mastra/models';

const gpt4 = openai('gpt-4', {
  apiKey: process.env.OPENAI_API_KEY,
});

Models

| Model | Description | Context | |-------|-------------|---------| | gpt-4o | Most capable, fast | 128k | | gpt-4-turbo | Fast, cost-effective | 128k | | gpt-4 | High intelligence | 8k | | gpt-3.5-turbo | Fast, affordable | 16k |

Usage

const response = await gpt4.generate({
  prompt: 'Hello!',
  maxTokens: 100,
});

Streaming

const stream = await gpt4.stream({
  prompt: 'Tell me a story',
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Anthropic Provider

Configure Anthropic Claude models.

Installation

npm install @anthropic-ai/sdk

Configuration

import { anthropic } from '@mastra/models';

const claude = anthropic('claude-3-opus', {
  apiKey: process.env.ANTHROPIC_API_KEY,
});

Models

| Model | Description | Context | |-------|-------------|---------| | claude-opus-4 | Highest intelligence | 200k | | claude-sonnet-4 | Balanced | 200k | | claude-3-opus | High capability | 200k | | claude-3-sonnet | Balanced | 200k | | claude-3-haiku | Fast | 200k |

Usage

const response = await claude.generate({
  prompt: 'Hello!',
  maxTokens: 100,
});

System Prompts

Anthropic excels at system prompts:

const response = await claude.generate({
  system: 'You are a helpful assistant.',
  prompt: 'Hello!',
  maxTokens: 100,
});

Google Provider

Configure Google Gemini models.

Installation

npm install @google/generative-ai

Configuration

import { google } from '@mastra/models';

const gemini = google('gemini-pro', {
  apiKey: process.env.GOOGLE_API_KEY,
});

Models

| Model | Description | Context | |-------|-------------|---------| | gemini-2.5-pro | Most capable | 1M | | gemini-2.0-flash | Fast | 1M | | gemini-1.5-pro | High capability | 1M | | gemini-1.5-flash | Fast | 1M |

Usage

const response = await gemini.generate({
  prompt: 'Hello!',
  maxTokens: 100,
});

Multimodal

const response = await gemini.generate({
  prompt: 'What is in this image?',
  images: [imageBuffer],
});

Mistral Provider

Configure Mistral AI models.

Installation

npm install @mistralai/mistralai

Configuration

import { mistral } from '@mastra/models';

const mistralModel = mistral('mistral-large', {
  apiKey: process.env.MISTRAL_API_KEY,
});

Models

| Model | Description | |-------|-------------| | mistral-large | Most capable | | mistral-medium | Balanced | | mistral-small | Fast |

Usage

const response = await mistralModel.generate({
  prompt: 'Hello!',
  maxTokens: 100,
});

Azure OpenAI Provider

Configure Azure OpenAI models.

Installation

npm install @azure/openai

Configuration

import { azure } from '@mastra/models';

const gpt4 = azure('gpt-4', {
  endpoint: process.env.AZURE_OPENAI_ENDPOINT,
  apiKey: process.env.AZURE_OPENAI_API_KEY,
  apiVersion: '2024-02-01',
});

Deployment

Azure OpenAI uses deployments:

const gpt4 = azure('gpt-4', {
  endpoint: process.env.AZURE_OPENAI_ENDPOINT,
  deploymentName: 'gpt-4',
  apiKey: process.env.AZURE_OPENAI_API_KEY,
});

Usage

const response = await gpt4.generate({
  prompt: 'Hello!',
  maxTokens: 100,
});

Streaming

const stream = await gpt4.stream({
  prompt: 'Tell me a story',
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Model Gateways

Model gateways provide routing, load balancing, and fallback capabilities.

Overview

Gateways let you use multiple models with automatic failover and load balancing.

OpenAI Gateway

Route requests to multiple OpenAI models with fallback.

Configuration

import { createGateway } from '@mastra/models';

const openaiGateway = createGateway({
  provider: 'openai',
  models: ['gpt-4', 'gpt-3.5-turbo'],
  strategy: 'fallback', // or 'load-balance'
});

Fallback Strategy

If one model fails, the next is tried:

const gateway = createGateway({
  provider: 'openai',
  models: ['gpt-4', 'gpt-3.5-turbo'],
  strategy: 'fallback',
});

Load Balancing

Distribute requests across models:

const gateway = createGateway({
  provider: 'openai',
  models: ['gpt-4', 'gpt-3.5-turbo'],
  strategy: 'load-balance',
  weights: [0.2, 0.8], // 20% to gpt-4, 80% to gpt-3.5
});

Usage

const response = await gateway.generate({
  prompt: 'Hello',
});

Anthropic Gateway

Route requests to multiple Anthropic models with fallback.

Configuration

import { createGateway } from '@mastra/models';

const anthropicGateway = createGateway({
  provider: 'anthropic',
  models: ['claude-3-opus', 'claude-3-sonnet'],
  strategy: 'fallback',
});

Fallback Strategy

const gateway = createGateway({
  provider: 'anthropic',
  models: ['claude-3-opus', 'claude-3-sonnet', 'claude-3-haiku'],
  strategy: 'fallback',
});

Usage

const response = await gateway.generate({
  prompt: 'Explain quantum computing',
  maxTokens: 1024,
});