AI video comprehension for Claude. Paste a YouTube or Instagram URL, get full multimodal analysis — transcript, visual context, chart data, entity extraction.
Connect once. Then paste any supported URL into Claude and ask for transcription or analysis.
Supports YouTube, Instagram Reels, Vimeo, Twitter/X, TikTok, and direct MP4 URLs. Up to 2-hour runtime.
Whisper transcription in parallel with ffmpeg keyframe extraction, Tesseract OCR, and Claude Vision on each scene.
Timestamped transcript, visual assets with extracted chart data, entities mapped, contradictions flagged in batch mode.
Each tool is optimized for a different shape of video comprehension. Credits are refunded on failure, and cache hits are free.
TRANSCRIPTION ONLY (no visual analysis). Best for interviews, podcasts, and any content where text is enough. For vision, OCR, charts, or keyframe analysis, use deep_analyze instead.
PRIMARY tool for video understanding. Full multimodal pipeline: transcript + vision + OCR + entity extraction. Returns summary, key claims, visual assets, and charts as structured data.
Analyze a specific segment by timestamp range. Default is full multimodal (3 credits — transcript + vision + OCR on the range). Pass mode='quick' for transcript-only (1 credit). Pay the flat rate regardless of full video length.
Up to 10 videos at once with cross-video synthesis — common themes, entity overlap, contradicting claims. Default is mode='quick' (1 credit each) for bulk transcription; pass mode='deep' for multimodal across all (5 credits each). 10% off on 5+ videos.
Pay only for processing. Cache hits are free, failures are refunded automatically, unused credits roll over on paid plans.
Contendeo is a remote MCP connector. Add it once in Claude's connector settings and use it in any conversation.
https://contendeo.app/mcp/YouTube, Instagram Reels, Vimeo, Twitter/X, TikTok, and direct MP4 / webm URLs. yt-dlp covers 1,000+ sites in practice; the listed platforms are the ones we explicitly test and harden against bot detection.
Contendeo doesn't just transcribe — it extracts visual context from keyframes, runs OCR on charts and dashboards, identifies entities (tokens, protocols, people), and synthesizes everything into structured intelligence. A pure transcript misses anything said on-screen.
Videos are downloaded, processed in memory, and immediately deleted after processing. We never store video content. Only the analysis result is cached for 7 days, keyed by a hash of the URL, so repeat requests hit the cache instantly and cost zero credits.
Yes. Contendeo ships with specialized vision prompts for financial charts, order books, protocol diagrams, and token metrics. Set focus to "crypto" on deep_analyze for optimized extraction — exact prices, support/resistance levels, and TVL numbers are pulled verbatim from the frame.
Credits are automatically refunded on any failure — download error, API timeout, invalid URL, anything. Cache hits never charge credits in the first place. Every deduct and refund is logged with the full audit trail.