You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yt-dlp/yt_dlp
pukkandan be6202f12b
Subtitle extraction from streaming media manifests #247
Authored by fstirlitz
Modified from: https://github.com/ytdl-org/youtube-dl/pull/6144

Closes: #73
Fixes:
https://github.com/ytdl-org/youtube-dl/issues/6106
https://github.com/ytdl-org/youtube-dl/issues/14977
https://github.com/ytdl-org/youtube-dl/issues/21438
https://github.com/ytdl-org/youtube-dl/issues/23609
https://github.com/ytdl-org/youtube-dl/issues/28132

Might also fix (untested):
https://github.com/ytdl-org/youtube-dl/issues/15424
https://github.com/ytdl-org/youtube-dl/issues/18267
https://github.com/ytdl-org/youtube-dl/issues/23899
https://github.com/ytdl-org/youtube-dl/issues/24375
https://github.com/ytdl-org/youtube-dl/issues/24595
https://github.com/ytdl-org/youtube-dl/issues/27899

Related:
https://github.com/ytdl-org/youtube-dl/issues/22379
https://github.com/ytdl-org/youtube-dl/pull/24517
https://github.com/ytdl-org/youtube-dl/pull/24886
https://github.com/ytdl-org/youtube-dl/pull/27215

Notes:
* The functions `extractor.common._extract_..._formats` are still kept for compatibility
* Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles`
* Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats
* AES support is untested
* The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players
    * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`.
        Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file
    * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit>
* Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools
    * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg`
    * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac>
    * But validity of the those extracted from ISM are untested
3 years ago
..
downloader [downloader/ism] Support muxing TTML subtitles 3 years ago
extractor Subtitle extraction from streaming media manifests #247 3 years ago
postprocessor [MetadataFromField] Improve regex and add tests 3 years ago
YoutubeDL.py Fix case sensitivity of format selector 3 years ago
__init__.py Add option `--skip-playlist-after-errors` 3 years ago
__main__.py Completely change project name to yt-dlp (#85) 3 years ago
aes.py Completely change project name to yt-dlp (#85) 3 years ago
cache.py Completely change project name to yt-dlp (#85) 3 years ago
compat.py [downloader/hls] Assemble single-file WebVTT subtitles from HLS segments 3 years ago
jsinterp.py Completely change project name to yt-dlp (#85) 3 years ago
options.py [documentation] Fix typos 3 years ago
socks.py Completely change project name to yt-dlp (#85) 3 years ago
swfinterp.py Completely change project name to yt-dlp (#85) 3 years ago
update.py [update] Fix updater removing the executable bit on some UNIX distros 3 years ago
utils.py [utils] Improve bug_report_message 3 years ago
version.py [version] update :ci skip all 3 years ago
webvtt.py [downloader/hls] Remove duplicate cues using a sliding window of candidates 3 years ago