How browsers extract audio from video
The Web Audio API and the browser's media pipeline can demultiplex video containers — separating the audio stream from the video stream — and re-encode or pass through the audio data. For an MP4 file containing AAC audio and H.264 video, the browser reads the audio track, decodes it to raw PCM samples, then re-encodes to the target format (MP3, WAV, or M4A). This processing happens in your browser tab — no video data is uploaded.
For large files (1 GB+ videos), the browser reads the file in chunks rather than loading the entire file into memory. Processing time scales with video duration, not file size: a 2-hour video takes roughly 30–60 seconds to extract audio from, depending on the target format and encoding settings.
Output format comparison: MP3 vs. M4A vs. WAV
| Format | ~Size for 1 hr audio | Quality | Best for |
|---|---|---|---|
| MP3 (128 kbps) | ~56 MB | Good — audible artifacts on high-frequency content | Podcasts, speech, broad compatibility |
| MP3 (320 kbps) | ~140 MB | Excellent — near-transparent for most listeners | Music, archiving with compression |
| M4A / AAC (128 kbps) | ~56 MB | Better than MP3 at same bitrate — more efficient codec | Apple devices, streaming platforms |
| WAV (PCM) | ~600 MB | Lossless — exact copy of the decoded audio | Editing, archiving, professional use |
| OGG Vorbis (128 kbps) | ~56 MB | Comparable to AAC — open format | Web audio, open-source projects |
Note: if your source video already has lossy audio (AAC, MP3), re-encoding to another lossy format introduces generation loss — each encode slightly degrades quality. For archiving, use WAV once, then encode to your target format from the WAV master.
