FrameX - Turn Any Video into a Searchable Knowledge Base

Features

Everything You Need

From raw video to instant answers in minutes.

Core Feature

Semantic Search

Find any moment in a video by describing what you're looking for. No need to remember exact keywords.

Search by meaning, not just keywords
Ranked results with similarity scores
Timestamps for instant video navigation
Works across any language

OpenAI text-embedding-3-small — 1536 dimensions

04:32 94.7%

The attention mechanism allows the model to focus on different parts of the input sequence when producing each element of the output...

12:18 89.2%

Self-attention computes three vectors for each token: query, key, and value. The dot product of query and key determines...

AI-Powered

Intelligent Q&A

Ask any question about the video and get accurate, context-aware answers with source citations.

Answers grounded in actual video content
Cites relevant transcript sections
Conversational follow-up questions
Powered by GPT-4o-mini for fast responses

RAG pipeline — top-3 context retrieval

What are the key differences between RNN and Transformer?

Based on the video, there are 3 key differences: 1) Transformers process all tokens in parallel while RNNs are sequential, 2) Transformers use self-attention instead of recurrence, 3) Transformers scale better to long sequences due to O(1) path length...

Why does parallel processing matter?

The video explains that parallel processing enables significantly faster training on GPUs. While RNNs must process tokens one-by-one, Transformers compute all positions simultaneously...

Automatic

Speech + Visual Text Extraction

High-quality transcription with Whisper plus OCR extraction of on-screen text from slides, code, and diagrams.

50+ languages auto-detected
Precise word-level timestamps
OCR for slides, code, diagrams on screen
Upload files or paste YouTube URLs

Whisper + Tesseract OCR — free & offline

00:00:12 Welcome to today's lecture on the Transformer architecture and its impact on modern AI.

00:00:28 The paper "Attention Is All You Need" was published in 2017 by Vaswani et al.

00:00:45 It proposed replacing recurrence entirely with a mechanism called self-attention.

00:01:03 This was revolutionary because it allowed for much greater parallelization during training.

How It Works

Three Simple Steps

From any video to knowledge base in minutes.

1 Step One

Add a Video

Paste a YouTube URL or drag & drop a video/audio file. FrameX handles downloading, audio extraction, and everything else automatically.

FrameX

Process

Valid YouTube URL detected

2 Step Two

AI Does the Heavy Lifting

FrameX downloads the video, transcribes audio with Whisper, extracts on-screen text via OCR, chunks the content, and generates semantic embeddings. Fully automatic.

Processing...

Downloading video

Extracting audio

Transcribing with Whisper

Extracting visual text (OCR)

Chunking content

Generating embeddings

3 Step Three

Search & Ask Anything

Semantically search any moment in the video or ask questions and get AI-powered answers with citations from the transcript.

FrameX — Results

Q&A

04:32 94.7%

The attention mechanism allows the model to focus on different parts of the input sequence when producing each output element...

12:18 89.2%

Self-attention computes query, key, and value vectors for each token. The dot product determines how much attention each token pays...

18:45 85.6%

Multi-head attention runs several attention functions in parallel, allowing the model to jointly attend to information from different subspaces...

Turn Any Video into aSearchable Knowledge Base