Video AI Indexing

Video AI indexing enables your AI agent to search and answer questions about video content. Videos are automatically analyzed, transcribed, and indexed for semantic search with timestamp context.

How It Works

Video Upload → Processing Pipeline → AI Knowledge Base
     │                │                     │
     │                ▼                     │
     │         ┌─────────────┐             │
     │         │ Segmentation│             │
     │         │ (60s chunks)│             │
     │         └─────────────┘             │
     │                │                     │
     │                ▼                     │
     │         ┌─────────────┐             │
     │         │ Transcription│            │
     │         │ (Audio→Text)│             │
     │         └─────────────┘             │
     │                │                     │
     │                ▼                     │
     │         ┌─────────────┐             │
     │         │ AI Analysis │             │
     │         │ (Content)   │             │
     │         └─────────────┘             │
     │                │                     │
     │                ▼                     │
     └────────► Embeddings ────────────────┘

Segmentation - Video split into time-based chunks (default: 60 seconds)
Transcription - Audio converted to text for each segment
AI Analysis - Visual and audio content analyzed
Embedding - Segments embedded for semantic search
Indexing - Content added to knowledge base with timestamps

Enabling Video AI

From Media Library

Go to CMS → Media tab
Upload a video or select an existing one
In the file details panel, find AI Indexing
Toggle Index for AI to ON

Configuration Options

| Option | Description | Default | |--------|-------------|---------| | Segment Duration | Length of each chunk in seconds | 60 | | Detect Techniques | Enable domain-specific recognition | On | | Extract Audio | Transcribe speech to text | On | | Digestion Instructions | Custom context for AI analysis | None |

Starting Processing

Configure your options
Click Start Processing
Monitor progress in the status indicator

Processing Status

Status Indicators

| Status | Icon | Description | |--------|------|-------------| | Not Indexed | ○ | Video not processed | | Queued | ◔ | Waiting to start | | Uploading | ◑ | Sent to processor | | Processing | ◕ | Analyzing content | | Completed | ● | Ready for AI search | | Failed | ⊗ | Error occurred |

Progress Tracking

While processing:

Progress bar - Overall completion (0-100%)
Stage - Current processing step
Estimated time - Approximate time remaining

Processing Stages

Queued - Job created, waiting in queue
Uploading - Video sent to processing service
Segmenting - Breaking video into chunks
Transcribing - Converting audio to text
Analyzing - AI analyzing each segment
Embedding - Creating vector embeddings
Complete - Indexed and searchable

Configuration Details

Segment Duration

Controls how the video is chunked:

| Duration | Best For | |----------|----------| | 30 seconds | Fast-paced content, frequent topic changes | | 60 seconds | General purpose (recommended) | | 120 seconds | Lectures, long-form discussions | | 300 seconds | Minimal segmentation |

Shorter segments = more precise timestamps, more processing time.

Detect Techniques

When enabled, AI identifies domain-specific elements:

Educational - Topics, concepts, examples
Technical - Code, diagrams, demonstrations
Product - Features, comparisons, use cases

Techniques appear in the processing results.

Extract Audio

When enabled:

Speech is transcribed to text
Transcription indexed for search
AI can quote or reference speech

When disabled:

Only visual content is analyzed
Faster processing
Use for music, silent content, etc.

Digestion Instructions

Provide context to improve AI analysis:

Example instructions:

"This is a cooking tutorial. Focus on ingredients,
techniques, and timing. Note any safety warnings."

"This is a software demo. Identify features shown,
keyboard shortcuts mentioned, and tips shared."

"This is an interview. Track who is speaking and
summarize key points made by each person."

Good instructions help AI:

Focus on relevant details
Use appropriate terminology
Extract the most useful information

Querying Video Content

Automatic Integration

Once indexed, video content is automatically searched when users ask questions:

User: "What did the tutorial cover about error handling?"

Agent: "In the tutorial, error handling is covered starting at
       timestamp 12:30. The key points were:
       1. Always wrap API calls in try-catch blocks
       2. Log errors with context for debugging
       3. Show user-friendly messages to customers"

Timestamp References

AI can reference specific moments:

"At 5:30 in the video..."
"The section starting at 15:00 discusses..."
"Between 10:00-12:00, you can see..."

Video Context in Chat

When users are watching a video, provide the video context in your chat request:

const response = await fetch('/api/sdk/agent/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-Fig1-API-Key': 'your-api-key'
  },
  body: JSON.stringify({
    message: "What technique is being shown right now?",
    context: {
      video: {
        videoId: 'video-id',           // ID from video processing
        currentTimestamp: 330,          // Playback position in seconds (5:30)
        title: 'React Hooks Tutorial',  // Optional: video title for context
        duration: 1800                  // Optional: total duration (30 min)
      }
    }
  })
});

The AI agent will automatically:

Prioritize video content near the current timestamp
Reference specific timestamps in responses
Include play_video actions to suggest jumping to relevant moments

Full Integration Example

Here's a complete React integration for a video training app:

import { useState, useRef } from 'react';

function VideoTrainingChat({ exercise }) {
  const videoRef = useRef<HTMLVideoElement>(null);
  const [sessionId, setSessionId] = useState<string | null>(null);

  const sendMessage = async (message: string) => {
    const video = videoRef.current;

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        message,
        sessionId,
        context: {
          // Link to the CMS content being viewed
          page: `/exercises/${exercise.slug}`,
          contentIds: [exercise._id],

          // Video playback context
          video: {
            videoId: exercise.videoId,
            currentTimestamp: video?.currentTime || 0,
            title: exercise.title,
            duration: video?.duration || 0
          },

          // Optional: user subscription for access control
          user: {
            isAuthenticated: true,
            subscriptions: ['premium']
          }
        }
      })
    });

    const data = await response.json();

    if (data.success) {
      setSessionId(data.data.sessionId);

      // Handle video navigation actions
      if (data.data.actions) {
        for (const action of data.data.actions) {
          if (action.type === 'play_video' && action.payload.timestamp) {
            video?.seekTo?.(action.payload.timestamp);
          }
        }
      }

      return data.data.message;
    }
  };

  return (
    <div>
      <video ref={videoRef} src={exercise.videoUrl} />
      {/* Chat UI component */}
    </div>
  );
}

Key Context Fields

| Field | Type | Description | |-------|------|-------------| | video.videoId | string | Required. Must match the ID from video processing | | video.currentTimestamp | number | Current playback position in seconds | | video.title | string | Video title for AI context | | video.duration | number | Total duration for progress context | | contentIds | string[] | CMS content IDs (exercise, lesson, etc.) | | page | string | Current page/route identifier |

Example Conversation

User watching at 0:40:

"What technique is being shown here?"

AI Response:

At 0:40, the instructor is demonstrating hand placement for the rear naked choke.
Key points:
- Wrap the choking arm around the neck
- Align bicep against one side of the throat
- Keep elbow directly under the chin

Would you like me to explain the finishing squeeze shown at 0:55?

With action:

{
  "actions": [{
    "type": "play_video",
    "payload": { "videoId": "vid_123", "timestamp": 55 },
    "label": "Jump to finishing technique"
  }]
}

Managing Indexed Videos

Viewing Results

After processing completes:

Select the video in Media Library
View the AI Index section
See:
- Total segments created
- Techniques detected
- Transcript length
- Processing date

Re-indexing

To re-process a video:

Select the video
Click Re-index
Optionally change settings
Start processing

Previous index is replaced when complete.

Removing from Index

To stop AI from using a video:

Select the video
Toggle Index for AI to OFF
Confirm removal

Video file is preserved; only the AI index is removed.

API Integration

Manage video processing via SDK:

// Start video processing
const job = await fig1.knowledge.processVideo({
  videoUrl: 'https://cdn.fig1.ai/videos/tutorial.mp4',
  fileName: 'tutorial.mp4',
  duration: 3600,
  analysisConfig: {
    segmentDuration: 60,
    detectTechniques: true,
    extractAudio: true,
    digestionInstructions: 'This is a coding tutorial...'
  }
});

console.log('Job started:', job.jobId);

// Check processing status
const status = await fig1.knowledge.getVideoJobStatus(job.jobId);
console.log(`Progress: ${status.progress}%`);

// List all video jobs
const jobs = await fig1.knowledge.listVideoJobs({
  status: 'processing'
});

// Wait for completion
const result = await fig1.knowledge.waitForVideoProcessing(job.jobId, {
  pollInterval: 5000,
  onProgress: (s) => console.log(`${s.progress}% - ${s.stage}`)
});

if (result.status === 'completed') {
  console.log(`Indexed ${result.result.totalSegments} segments`);
}

See the Knowledge API Reference for complete documentation.

Processing Limits

| Tier | Max Duration | Max Size | Concurrent Jobs | |------|--------------|----------|-----------------| | Starter | 60 minutes | 500 MB | 2 | | Pro | 180 minutes | 2 GB | 5 | | Enterprise | Unlimited | Custom | Custom |

Processing time varies by video length and complexity.

Supported Formats

| Format | Extension | Notes | |--------|-----------|-------| | MP4 | .mp4 | Recommended | | WebM | .webm | Full support | | MOV | .mov | QuickTime | | AVI | .avi | Basic support |

For best results, use MP4 with H.264 video and AAC audio.

Best Practices

Choose Videos Wisely

Index videos that users will actually ask questions about:

✅ Good candidates:

Training and tutorials
Product demonstrations
Educational lectures
How-to guides
Webinar recordings

❌ Poor candidates:

Background music
Ambiguous b-roll footage
Very short clips
Content already covered in text

Optimize Segment Duration

| Content Type | Recommended Duration | |--------------|---------------------| | Fast tutorials | 30-45 seconds | | General content | 60 seconds | | Lectures | 90-120 seconds | | Slow discussions | 120+ seconds |

Write Good Instructions

Good:
"Technical tutorial about React hooks. Focus on code
examples, common mistakes, and best practices mentioned."

Bad:
"This is a video"

Handle Long Videos

For very long videos (1+ hours):

Consider breaking into chapters
Use longer segment durations
Expect longer processing times
Index during off-peak hours

Troubleshooting

Processing Failed

Common causes:

Unsupported format - Convert to MP4
Corrupted file - Re-upload the video
Too long - Check tier limits
No audio track - Disable audio extraction

Poor Search Results

Improve results by:

Adding digestion instructions
Using shorter segments
Ensuring audio is clear
Checking transcript quality

Slow Processing

Processing time depends on:

Video length
Resolution
Audio complexity
Server load

Long videos may take 10-30 minutes.

Timestamps Inaccurate

Timestamp precision depends on segment duration:

60s segments = ±30s accuracy
30s segments = ±15s accuracy

Use shorter segments for time-sensitive content.

Cost Considerations

Video processing uses more resources than text:

Processing costs scale with video length
Storage required for embeddings
Query costs slightly higher

Consider indexing only high-value videos.