DevOps & Infra
azure-ai-voicelive-ts - Claude MCP Skill
Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.
SEO Guide: Enhance your AI agent with the azure-ai-voicelive-ts tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to azure ai voice live sdk for javascript/typescript. build real-time voice ai applications with bidire... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# @azure/ai-voicelive (JavaScript/TypeScript)
Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.
## Installation
```bash
npm install @azure/ai-voicelive @azure/identity
# TypeScript users
npm install @types/node
```
**Current Version**: 1.0.0-beta.3
**Supported Environments**:
- Node.js LTS versions (20+)
- Modern browsers (Chrome, Firefox, Safari, Edge)
## Environment Variables
```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
# Optional: Logging
AZURE_LOG_LEVEL=info
```
## Authentication
### Microsoft Entra ID (Recommended)
```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";
const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
```
### API Key
```typescript
import { AzureKeyCredential } from "@azure/core-auth";
import { VoiceLiveClient } from "@azure/ai-voicelive";
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("your-api-key");
const client = new VoiceLiveClient(endpoint, credential);
```
## Client Hierarchy
```
VoiceLiveClient
āāā VoiceLiveSession (WebSocket connection)
āāā updateSession() ā Configure session options
āāā subscribe() ā Event handlers (Azure SDK pattern)
āāā sendAudio() ā Stream audio input
āāā addConversationItem() ā Add messages/function outputs
āāā sendEvent() ā Send raw protocol events
```
## Quick Start
```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";
const credential = new DefaultAzureCredential();
const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT!;
// Create client and start session
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");
// Configure session
await session.updateSession({
modalities: ["text", "audio"],
instructions: "You are a helpful AI assistant. Respond naturally.",
voice: {
type: "azure-standard",
name: "en-US-AvaNeural",
},
turnDetection: {
type: "server_vad",
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 500,
},
inputAudioFormat: "pcm16",
outputAudioFormat: "pcm16",
});
// Subscribe to events
const subscription = session.subscribe({
onResponseAudioDelta: async (event, context) => {
// Handle streaming audio output
const audioData = event.delta;
playAudioChunk(audioData);
},
onResponseTextDelta: async (event, context) => {
// Handle streaming text
process.stdout.write(event.delta);
},
onInputAudioTranscriptionCompleted: async (event, context) => {
console.log("User said:", event.transcript);
},
});
// Send audio from microphone
function sendAudioChunk(audioBuffer: ArrayBuffer) {
session.sendAudio(audioBuffer);
}
```
## Session Configuration
```typescript
await session.updateSession({
// Modalities
modalities: ["audio", "text"],
// System instructions
instructions: "You are a customer service representative.",
// Voice selection
voice: {
type: "azure-standard", // or "azure-custom", "openai"
name: "en-US-AvaNeural",
},
// Turn detection (VAD)
turnDetection: {
type: "server_vad", // or "azure_semantic_vad"
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 500,
},
// Audio formats
inputAudioFormat: "pcm16",
outputAudioFormat: "pcm16",
// Tools (function calling)
tools: [
{
type: "function",
name: "get_weather",
description: "Get current weather",
parameters: {
type: "object",
properties: {
location: { type: "string" }
},
required: ["location"]
}
}
],
toolChoice: "auto",
});
```
## Event Handling (Azure SDK Pattern)
The SDK uses a subscription-based event handling pattern:
```typescript
const subscription = session.subscribe({
// Connection lifecycle
onConnected: async (args, context) => {
console.log("Connected:", args.connectionId);
},
onDisconnected: async (args, context) => {
console.log("Disconnected:", args.code, args.reason);
},
onError: async (args, context) => {
console.error("Error:", args.error.message);
},
// Session events
onSessionCreated: async (event, context) => {
console.log("Session created:", context.sessionId);
},
onSessionUpdated: async (event, context) => {
console.log("Session updated");
},
// Audio input events (VAD)
onInputAudioBufferSpeechStarted: async (event, context) => {
console.log("Speech started at:", event.audioStartMs);
},
onInputAudioBufferSpeechStopped: async (event, context) => {
console.log("Speech stopped at:", event.audioEndMs);
},
// Transcription events
onConversationItemInputAudioTranscriptionCompleted: async (event, context) => {
console.log("User said:", event.transcript);
},
onConversationItemInputAudioTranscriptionDelta: async (event, context) => {
process.stdout.write(event.delta);
},
// Response events
onResponseCreated: async (event, context) => {
console.log("Response started");
},
onResponseDone: async (event, context) => {
console.log("Response complete");
},
// Streaming text
onResponseTextDelta: async (event, context) => {
process.stdout.write(event.delta);
},
onResponseTextDone: async (event, context) => {
console.log("\n--- Text complete ---");
},
// Streaming audio
onResponseAudioDelta: async (event, context) => {
const audioData = event.delta;
playAudioChunk(audioData);
},
onResponseAudioDone: async (event, context) => {
console.log("Audio complete");
},
// Audio transcript (what assistant said)
onResponseAudioTranscriptDelta: async (event, context) => {
process.stdout.write(event.delta);
},
// Function calling
onResponseFunctionCallArgumentsDone: async (event, context) => {
if (event.name === "get_weather") {
const args = JSON.parse(event.arguments);
const result = await getWeather(args.location);
await session.addConversationItem({
type: "function_call_output",
callId: event.callId,
output: JSON.stringify(result),
});
await session.sendEvent({ type: "response.create" });
}
},
// Catch-all for debugging
onServerEvent: async (event, context) => {
console.log("Event:", event.type);
},
});
// Clean up when done
await subscription.close();
```
## Function Calling
```typescript
// Define tools in session config
await session.updateSession({
modalities: ["audio", "text"],
instructions: "Help users with weather information.",
tools: [
{
type: "function",
name: "get_weather",
description: "Get current weather for a location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "City and state or country",
},
},
required: ["location"],
},
},
],
toolChoice: "auto",
});
// Handle function calls
const subscription = session.subscribe({
onResponseFunctionCallArgumentsDone: async (event, context) => {
if (event.name === "get_weather") {
const args = JSON.parse(event.arguments);
const weatherData = await fetchWeather(args.location);
// Send function result
await session.addConversationItem({
type: "function_call_output",
callId: event.callId,
output: JSON.stringify(weatherData),
});
// Trigger response generation
await session.sendEvent({ type: "response.create" });
}
},
});
```
## Voice Options
| Voice Type | Config | Example |
|------------|--------|---------|
| Azure Standard | `{ type: "azure-standard", name: "..." }` | `"en-US-AvaNeural"` |
| Azure Custom | `{ type: "azure-custom", name: "...", endpointId: "..." }` | Custom voice endpoint |
| Azure Personal | `{ type: "azure-personal", speakerProfileId: "..." }` | Personal voice clone |
| OpenAI | `{ type: "openai", name: "..." }` | `"alloy"`, `"echo"`, `"shimmer"` |
## Supported Models
| Model | Description | Use Case |
|-------|-------------|----------|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio | High-quality conversational AI |
| `gpt-4o-mini-realtime-preview` | Lightweight GPT-4o | Fast, efficient interactions |
| `phi4-mm-realtime` | Phi multimodal | Cost-effective applications |
## Turn Detection Options
```typescript
// Server VAD (default)
turnDetection: {
type: "server_vad",
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 500,
}
// Azure Semantic VAD (smarter detection)
turnDetection: {
type: "azure_semantic_vad",
}
// Azure Semantic VAD (English optimized)
turnDetection: {
type: "azure_semantic_vad_en",
}
// Azure Semantic VAD (Multilingual)
turnDetection: {
type: "azure_semantic_vad_multilingual",
}
```
## Audio Formats
| Format | Sample Rate | Use Case |
|--------|-------------|----------|
| `pcm16` | 24kHz | Default, high quality |
| `pcm16-8000hz` | 8kHz | Telephony |
| `pcm16-16000hz` | 16kHz | Voice assistants |
| `g711_ulaw` | 8kHz | Telephony (US) |
| `g711_alaw` | 8kHz | Telephony (EU) |
## Key Types Reference
| Type | Purpose |
|------|---------|
| `VoiceLiveClient` | Main client for creating sessions |
| `VoiceLiveSession` | Active WebSocket session |
| `VoiceLiveSessionHandlers` | Event handler interface |
| `VoiceLiveSubscription` | Active event subscription |
| `ConnectionContext` | Context for connection events |
| `SessionContext` | Context for session events |
| `ServerEventUnion` | Union of all server events |
## Error Handling
```typescript
import {
VoiceLiveError,
VoiceLiveConnectionError,
VoiceLiveAuthenticationError,
VoiceLiveProtocolError,
} from "@azure/ai-voicelive";
const subscription = session.subscribe({
onError: async (args, context) => {
const { error } = args;
if (error instanceof VoiceLiveConnectionError) {
console.error("Connection error:", error.message);
} else if (error instanceof VoiceLiveAuthenticationError) {
console.error("Auth error:", error.message);
} else if (error instanceof VoiceLiveProtocolError) {
console.error("Protocol error:", error.message);
}
},
onServerError: async (event, context) => {
console.error("Server error:", event.error?.message);
},
});
```
## Logging
```typescript
import { setLogLevel } from "@azure/logger";
// Enable verbose logging
setLogLevel("info");
// Or via environment variable
// AZURE_LOG_LEVEL=info
```
## Browser Usage
```typescript
// Browser requires bundler (Vite, webpack, etc.)
import { VoiceLiveClient } from "@azure/ai-voicelive";
import { InteractiveBrowserCredential } from "@azure/identity";
// Use browser-compatible credential
const credential = new InteractiveBrowserCredential({
clientId: "your-client-id",
tenantId: "your-tenant-id",
});
const client = new VoiceLiveClient(endpoint, credential);
// Request microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 24000 });
// Process audio and send to session
// ... (see samples for full implementation)
```
## Best Practices
1. **Always use `DefaultAzureCredential`** ā Never hardcode API keys
2. **Set both modalities** ā Include `["text", "audio"]` for voice assistants
3. **Use Azure Semantic VAD** ā Better turn detection than basic server VAD
4. **Handle all error types** ā Connection, auth, and protocol errors
5. **Clean up subscriptions** ā Call `subscription.close()` when done
6. **Use appropriate audio format** ā PCM16 at 24kHz for best quality
## Reference Links
| Resource | URL |
|----------|-----|
| npm Package | https://www.npmjs.com/package/@azure/ai-voicelive |
| GitHub Source | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive |
| Samples | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive/samples |
| API Reference | https://learn.microsoft.com/javascript/api/@azure/ai-voicelive |
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.Signals
Information
- Repository
- arlenagreer/claude_configuration_docs
- Author
- arlenagreer
- Last Sync
- 5/10/2026
- Repo Updated
- 5/7/2026
- Created
- 4/10/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
upgrade-webkit
Upgrade Bun's Webkit fork to the latest upstream version of Webkit.
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.