Video AI Indexing

Video AI indexing enables your AI agent to search and answer questions about video content. Videos are automatically analyzed, transcribed, and indexed for semantic search with timestamp context.

How It Works

Video Upload → Processing Pipeline → AI Knowledge Base
     │                │                     │
     │                ▼                     │
     │         ┌─────────────┐             │
     │         │ Segmentation│             │
     │         │ (60s chunks)│             │
     │         └─────────────┘             │
     │                │                     │
     │                ▼                     │
     │         ┌─────────────┐             │
     │         │ Transcription│            │
     │         │ (Audio→Text)│             │
     │         └─────────────┘             │
     │                │                     │
     │                ▼                     │
     │         ┌─────────────┐             │
     │         │ AI Analysis │             │
     │         │ (Content)   │             │
     │         └─────────────┘             │
     │                │                     │
     │                ▼                     │
     └────────► Embeddings ────────────────┘
  1. Segmentation - Video split into time-based chunks (default: 60 seconds)
  2. Transcription - Audio converted to text for each segment
  3. AI Analysis - Visual and audio content analyzed
  4. Embedding - Segments embedded for semantic search
  5. Indexing - Content added to knowledge base with timestamps

Enabling Video AI

From Media Library

  1. Go to CMSMedia tab
  2. Upload a video or select an existing one
  3. In the file details panel, find AI Indexing
  4. Toggle Index for AI to ON

Configuration Options

| Option | Description | Default | |--------|-------------|---------| | Segment Duration | Length of each chunk in seconds | 60 | | Detect Techniques | Enable domain-specific recognition | On | | Extract Audio | Transcribe speech to text | On | | Digestion Instructions | Custom context for AI analysis | None |

Starting Processing

  1. Configure your options
  2. Click Start Processing
  3. Monitor progress in the status indicator

Processing Status

Status Indicators

| Status | Icon | Description | |--------|------|-------------| | Not Indexed | ○ | Video not processed | | Queued | ◔ | Waiting to start | | Uploading | ◑ | Sent to processor | | Processing | ◕ | Analyzing content | | Completed | ● | Ready for AI search | | Failed | ⊗ | Error occurred |

Progress Tracking

While processing:

  • Progress bar - Overall completion (0-100%)
  • Stage - Current processing step
  • Estimated time - Approximate time remaining

Processing Stages

  1. Queued - Job created, waiting in queue
  2. Uploading - Video sent to processing service
  3. Segmenting - Breaking video into chunks
  4. Transcribing - Converting audio to text
  5. Analyzing - AI analyzing each segment
  6. Embedding - Creating vector embeddings
  7. Complete - Indexed and searchable

Configuration Details

Segment Duration

Controls how the video is chunked:

| Duration | Best For | |----------|----------| | 30 seconds | Fast-paced content, frequent topic changes | | 60 seconds | General purpose (recommended) | | 120 seconds | Lectures, long-form discussions | | 300 seconds | Minimal segmentation |

Shorter segments = more precise timestamps, more processing time.

Detect Techniques

When enabled, AI identifies domain-specific elements:

  • Educational - Topics, concepts, examples
  • Technical - Code, diagrams, demonstrations
  • Product - Features, comparisons, use cases

Techniques appear in the processing results.

Extract Audio

When enabled:

  • Speech is transcribed to text
  • Transcription indexed for search
  • AI can quote or reference speech

When disabled:

  • Only visual content is analyzed
  • Faster processing
  • Use for music, silent content, etc.

Digestion Instructions

Provide context to improve AI analysis:

Example instructions:

"This is a cooking tutorial. Focus on ingredients,
techniques, and timing. Note any safety warnings."

"This is a software demo. Identify features shown,
keyboard shortcuts mentioned, and tips shared."

"This is an interview. Track who is speaking and
summarize key points made by each person."

Good instructions help AI:

  • Focus on relevant details
  • Use appropriate terminology
  • Extract the most useful information

Querying Video Content

Automatic Integration

Once indexed, video content is automatically searched when users ask questions:

User: "What did the tutorial cover about error handling?"

Agent: "In the tutorial, error handling is covered starting at
       timestamp 12:30. The key points were:
       1. Always wrap API calls in try-catch blocks
       2. Log errors with context for debugging
       3. Show user-friendly messages to customers"

Timestamp References

AI can reference specific moments:

  • "At 5:30 in the video..."
  • "The section starting at 15:00 discusses..."
  • "Between 10:00-12:00, you can see..."

Video Context in Chat

When users are watching a video, provide the video context in your chat request:

const response = await fetch('/api/sdk/agent/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-Fig1-API-Key': 'your-api-key'
  },
  body: JSON.stringify({
    message: "What technique is being shown right now?",
    context: {
      video: {
        videoId: 'video-id',           // ID from video processing
        currentTimestamp: 330,          // Playback position in seconds (5:30)
        title: 'React Hooks Tutorial',  // Optional: video title for context
        duration: 1800                  // Optional: total duration (30 min)
      }
    }
  })
});

The AI agent will automatically:

  1. Prioritize video content near the current timestamp
  2. Reference specific timestamps in responses
  3. Include play_video actions to suggest jumping to relevant moments

Full Integration Example

Here's a complete React integration for a video training app:

import { useState, useRef } from 'react';

function VideoTrainingChat({ exercise }) {
  const videoRef = useRef<HTMLVideoElement>(null);
  const [sessionId, setSessionId] = useState<string | null>(null);

  const sendMessage = async (message: string) => {
    const video = videoRef.current;

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        message,
        sessionId,
        context: {
          // Link to the CMS content being viewed
          page: `/exercises/${exercise.slug}`,
          contentIds: [exercise._id],

          // Video playback context
          video: {
            videoId: exercise.videoId,
            currentTimestamp: video?.currentTime || 0,
            title: exercise.title,
            duration: video?.duration || 0
          },

          // Optional: user subscription for access control
          user: {
            isAuthenticated: true,
            subscriptions: ['premium']
          }
        }
      })
    });

    const data = await response.json();

    if (data.success) {
      setSessionId(data.data.sessionId);

      // Handle video navigation actions
      if (data.data.actions) {
        for (const action of data.data.actions) {
          if (action.type === 'play_video' && action.payload.timestamp) {
            video?.seekTo?.(action.payload.timestamp);
          }
        }
      }

      return data.data.message;
    }
  };

  return (
    <div>
      <video ref={videoRef} src={exercise.videoUrl} />
      {/* Chat UI component */}
    </div>
  );
}

Key Context Fields

| Field | Type | Description | |-------|------|-------------| | video.videoId | string | Required. Must match the ID from video processing | | video.currentTimestamp | number | Current playback position in seconds | | video.title | string | Video title for AI context | | video.duration | number | Total duration for progress context | | contentIds | string[] | CMS content IDs (exercise, lesson, etc.) | | page | string | Current page/route identifier |

Example Conversation

User watching at 0:40:

"What technique is being shown here?"

AI Response:

At 0:40, the instructor is demonstrating hand placement for the rear naked choke.
Key points:
- Wrap the choking arm around the neck
- Align bicep against one side of the throat
- Keep elbow directly under the chin

Would you like me to explain the finishing squeeze shown at 0:55?

With action:

{
  "actions": [{
    "type": "play_video",
    "payload": { "videoId": "vid_123", "timestamp": 55 },
    "label": "Jump to finishing technique"
  }]
}

Managing Indexed Videos

Viewing Results

After processing completes:

  1. Select the video in Media Library
  2. View the AI Index section
  3. See:
    • Total segments created
    • Techniques detected
    • Transcript length
    • Processing date

Re-indexing

To re-process a video:

  1. Select the video
  2. Click Re-index
  3. Optionally change settings
  4. Start processing

Previous index is replaced when complete.

Removing from Index

To stop AI from using a video:

  1. Select the video
  2. Toggle Index for AI to OFF
  3. Confirm removal

Video file is preserved; only the AI index is removed.

API Integration

Manage video processing via SDK:

// Start video processing
const job = await fig1.knowledge.processVideo({
  videoUrl: 'https://cdn.fig1.ai/videos/tutorial.mp4',
  fileName: 'tutorial.mp4',
  duration: 3600,
  analysisConfig: {
    segmentDuration: 60,
    detectTechniques: true,
    extractAudio: true,
    digestionInstructions: 'This is a coding tutorial...'
  }
});

console.log('Job started:', job.jobId);

// Check processing status
const status = await fig1.knowledge.getVideoJobStatus(job.jobId);
console.log(`Progress: ${status.progress}%`);

// List all video jobs
const jobs = await fig1.knowledge.listVideoJobs({
  status: 'processing'
});

// Wait for completion
const result = await fig1.knowledge.waitForVideoProcessing(job.jobId, {
  pollInterval: 5000,
  onProgress: (s) => console.log(`${s.progress}% - ${s.stage}`)
});

if (result.status === 'completed') {
  console.log(`Indexed ${result.result.totalSegments} segments`);
}

See the Knowledge API Reference for complete documentation.

Processing Limits

| Tier | Max Duration | Max Size | Concurrent Jobs | |------|--------------|----------|-----------------| | Starter | 60 minutes | 500 MB | 2 | | Pro | 180 minutes | 2 GB | 5 | | Enterprise | Unlimited | Custom | Custom |

Processing time varies by video length and complexity.

Supported Formats

| Format | Extension | Notes | |--------|-----------|-------| | MP4 | .mp4 | Recommended | | WebM | .webm | Full support | | MOV | .mov | QuickTime | | AVI | .avi | Basic support |

For best results, use MP4 with H.264 video and AAC audio.

Best Practices

Choose Videos Wisely

Index videos that users will actually ask questions about:

Good candidates:

  • Training and tutorials
  • Product demonstrations
  • Educational lectures
  • How-to guides
  • Webinar recordings

Poor candidates:

  • Background music
  • Ambiguous b-roll footage
  • Very short clips
  • Content already covered in text

Optimize Segment Duration

| Content Type | Recommended Duration | |--------------|---------------------| | Fast tutorials | 30-45 seconds | | General content | 60 seconds | | Lectures | 90-120 seconds | | Slow discussions | 120+ seconds |

Write Good Instructions

Good:
"Technical tutorial about React hooks. Focus on code
examples, common mistakes, and best practices mentioned."

Bad:
"This is a video"

Handle Long Videos

For very long videos (1+ hours):

  1. Consider breaking into chapters
  2. Use longer segment durations
  3. Expect longer processing times
  4. Index during off-peak hours

Troubleshooting

Processing Failed

Common causes:

  • Unsupported format - Convert to MP4
  • Corrupted file - Re-upload the video
  • Too long - Check tier limits
  • No audio track - Disable audio extraction

Poor Search Results

Improve results by:

  • Adding digestion instructions
  • Using shorter segments
  • Ensuring audio is clear
  • Checking transcript quality

Slow Processing

Processing time depends on:

  • Video length
  • Resolution
  • Audio complexity
  • Server load

Long videos may take 10-30 minutes.

Timestamps Inaccurate

Timestamp precision depends on segment duration:

  • 60s segments = ±30s accuracy
  • 30s segments = ±15s accuracy

Use shorter segments for time-sensitive content.

Cost Considerations

Video processing uses more resources than text:

  • Processing costs scale with video length
  • Storage required for embeddings
  • Query costs slightly higher

Consider indexing only high-value videos.

Next Steps