Commit Graph

390 Commits (40eec6b15cd3135b24cb42fde5ccf62e9a1f0807)

Author SHA1 Message Date
Yen Chi Hsuan 0b26ba3fc8 [extractor/common] Allow passing more parameters to _search_json_ld 9 years ago
Sergey M․ 4ca2a3cf3c [extractor/common] Add initial support for JSON-LD metadata extraction into info_dict 9 years ago
Jakub Wilk dfb1b1468c Fix typos
Closes #8200.
9 years ago
Sergey M 3f3343cd3e Merge pull request #8061 from dstftw/introduce-chapter-and-series-fields
Introduce chapter and series fields
9 years ago
Sergey M․ 27bfd4e526 [extractor/common] Introduce number fields for chapters and series 9 years ago
Philipp Hagemeister 32f9036447 [ccc] Add language information to formats 9 years ago
Sergey M․ 7109903e61 [extractor/common] Document chapter and series fields 9 years ago
Sergey M․ 7e5edcfd33 Simplify formats accumulation for f4m/m3u8/smil formats
Now all _extract_*_formats routines return a list
9 years ago
remitamine 39d60b715a Merge pull request #7769 from remitamine/sort
[common] lower (m3u8,rtmp,rtsp) format preference only if required program is not available
9 years ago
remitamine d497a201ca [common] use specific variable for protocol preference in _sort_formats 9 years ago
remitamine 8d29e47f54 [common] simplify the use of _extract_m3u8_formats and _extract_f4m_formats 9 years ago
Sergey M․ 9b9c5355e4 Rename error_to_str to error_to_compat_str 9 years ago
Sergey M․ 7f8b271465 Properly convert errors to strings 9 years ago
Sergey M․ dd85e4d707 [extractor/common] Properly decode error string on python 2 (Closes #1354, closes #3957, closes #4037, closes #6449) 9 years ago
Sergey M․ 62d231c004 [extractor/common] Clarify duration can be float 9 years ago
Sergey M? 5c2266df4b Switch codebase to use sanitized_Request instead of
compat_urllib_request.Request

[downloader/dash] Use sanitized_Request

[downloader/http] Use sanitized_Request

[atresplayer] Use sanitized_Request

[bambuser] Use sanitized_Request

[bliptv] Use sanitized_Request

[brightcove] Use sanitized_Request

[cbs] Use sanitized_Request

[ceskatelevize] Use sanitized_Request

[collegerama] Use sanitized_Request

[extractor/common] Use sanitized_Request

[crunchyroll] Use sanitized_Request

[dailymotion] Use sanitized_Request

[dcn] Use sanitized_Request

[dramafever] Use sanitized_Request

[dumpert] Use sanitized_Request

[eitb] Use sanitized_Request

[escapist] Use sanitized_Request

[everyonesmixtape] Use sanitized_Request

[extremetube] Use sanitized_Request

[facebook] Use sanitized_Request

[fc2] Use sanitized_Request

[flickr] Use sanitized_Request

[4tube] Use sanitized_Request

[gdcvault] Use sanitized_Request

[extractor/generic] Use sanitized_Request

[hearthisat] Use sanitized_Request

[hotnewhiphop] Use sanitized_Request

[hypem] Use sanitized_Request

[iprima] Use sanitized_Request

[ivi] Use sanitized_Request

[keezmovies] Use sanitized_Request

[letv] Use sanitized_Request

[lynda] Use sanitized_Request

[metacafe] Use sanitized_Request

[minhateca] Use sanitized_Request

[miomio] Use sanitized_Request

[meovideo] Use sanitized_Request

[mofosex] Use sanitized_Request

[moniker] Use sanitized_Request

[mooshare] Use sanitized_Request

[movieclips] Use sanitized_Request

[mtv] Use sanitized_Request

[myvideo] Use sanitized_Request

[neteasemusic] Use sanitized_Request

[nfb] Use sanitized_Request

[niconico] Use sanitized_Request

[noco] Use sanitized_Request

[nosvideo] Use sanitized_Request

[novamov] Use sanitized_Request

[nowness] Use sanitized_Request

[nuvid] Use sanitized_Request

[played] Use sanitized_Request

[pluralsight] Use sanitized_Request

[pornhub] Use sanitized_Request

[pornotube] Use sanitized_Request

[primesharetv] Use sanitized_Request

[promptfile] Use sanitized_Request

[qqmusic] Use sanitized_Request

[rtve] Use sanitized_Request

[safari] Use sanitized_Request

[sandia] Use sanitized_Request

[shared] Use sanitized_Request

[sharesix] Use sanitized_Request

[sina] Use sanitized_Request

[smotri] Use sanitized_Request

[sohu] Use sanitized_Request

[spankwire] Use sanitized_Request

[sportdeutschland] Use sanitized_Request

[streamcloud] Use sanitized_Request

[streamcz] Use sanitized_Request

[tapely] Use sanitized_Request

[tube8] Use sanitized_Request

[tubitv] Use sanitized_Request

[twitch] Use sanitized_Request

[twitter] Use sanitized_Request

[udemy] Use sanitized_Request

[vbox7] Use sanitized_Request

[veoh] Use sanitized_Request

[vessel] Use sanitized_Request

[vevo] Use sanitized_Request

[viddler] Use sanitized_Request

[videomega] Use sanitized_Request

[viewvster] Use sanitized_Request

[viki] Use sanitized_Request

[vk] Use sanitized_Request

[vodlocker] Use sanitized_Request

[voicerepublic] Use sanitized_Request

[wistia] Use sanitized_Request

[xfileshare] Use sanitized_Request

[xtube] Use sanitized_Request

[xvideos] Use sanitized_Request

[yandexmusic] Use sanitized_Request

[youku] Use sanitized_Request

[youporn] Use sanitized_Request

[youtube] Use sanitized_Request

[patreon] Use sanitized_Request

[extractor/common] Remove unused import

[nfb] PEP 8
9 years ago
Sergey M․ 019839faaa [extractor/common] Use baseURL from f4m manifest for recursive manifest extraction 9 years ago
Sergey M 30eecc6a04 Merge pull request #7296 from jaimeMF/xml_attrib_unicode
Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (…
9 years ago
Sergey M․ dbd82a1d4f [extractor/common] Fix m3u8 extraction on failure 9 years ago
Sergey M․ dc519b5421 [extractor/common] Make ie_key and IE_NAME return unicode string 9 years ago
Jaime Marquínez Ferrándiz 36e6f62cd0 Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (#7178)
Attributes aren't unicode objects, so they couldn't be directly used in info_dict fields (for example '--write-description' doesn't work with bytes).
9 years ago
remitamine 3711304510 [extractor/common] get the redirected m3u8_url in _extract_m3u8_formats 9 years ago
Jaime Marquínez Ferrándiz 865d1fbafc [extractor/common] Remove unused import 9 years ago
Sergey M․ 943a1e24b8 [extractor/common] Use more generic URLError in _is_valid_url 9 years ago
Sergey M․ 02835c6bf4 [extractor/common] Document repost_count 9 years ago
Sergey M․ 448ef1f31c [extractor/common] Allow angle brackets in attributes in _og_regexes (#7215) 9 years ago
Sergey M․ 7a6d76a64d [extractor/common] Require closing quote in _og_regexes (Closes #7174)
E.g. do not match `property='og:video:type'` when `og:video` is requested.
9 years ago
Sergey M․ 4180a3d8b7 [extractor/common] Allow quoteless content attribute in og regexes (Closes #7115) 9 years ago
Yen Chi Hsuan 57935b2564 [extractor/common] Allow HTML5 unquoted attribute values
Fixes #7108

HTML5 allows unquoted attribute values. See the "Unquoted attribute value
syntax" section [1] for more information

[1] http://www.w3.org/TR/html5/syntax.html
9 years ago
Sergey M․ 4bba371644 [YoutubeDL] Autocalculate ext for subtitles when missing 9 years ago
Sergey M․ e5851b963a [extractor/common] Make f4m extraction for SMIL non fatal 9 years ago
Sergey M․ 4de6131090 [extractor/common] Add fatal to _extract_f4m_formats 9 years ago
Sergey M․ 3a1341a7bc [extractor/common] Make m3u8 extraction for SMIL non fatal 9 years ago
Sergey M․ c78e48177c [extractor/common] Check validity of direct URLs 9 years ago
Sergey M․ 647eab4541 [extractor/common] Extract upload date from SMIL 9 years ago
Sergey M․ 1e5bcdec02 [extractor/common] Extract images from SMIL 9 years ago
Sergey M․ e7d8e98a9f [extractor/common] Allow float bitrates 9 years ago
Sergey M․ 8aab976bbd [extractor/common] Document release_date field 9 years ago
Sergey M․ c430802e32 [extractor/common] Add raise_geo_restricted 9 years ago
Sergey M․ 586f1cc532 [extractor/common] Skip html comment tags (Closes #6822) 9 years ago
Sergey M․ 73eb13dfc7 [extractor/common] Case insensitive inputs extraction 9 years ago
Sergey M․ be0e5dbd83 [extractor/common] Extract submit inputs 9 years ago
Sergey M․ 43e7d3c945 [extractor/common] Add raise_login_required 9 years ago
Jaime Marquínez Ferrándiz 8c97f81943 [common] Follow convention of using 'cls' in classmethods 9 years ago
Yen Chi Hsuan f738dd7b7c [common] Remove debugging codes 9 years ago
Yen Chi Hsuan 912e0b7e46 [common] Add _merge_subtitles() 9 years ago
Yen Chi Hsuan 03bc7237ad [common] _parse_smil_subtitles: accept `lang` as the subtitle language 9 years ago
Sergey M․ 5cdefc4625 [extractor/common] Add more subtitle mime types for guess when ext is missing 9 years ago
Sergey M․ ce00af8767 [extractor/common] Add default subtitles lang 9 years ago
Yen Chi Hsuan f877c6ae5a [theplatform] Use InfoExtractor._parse_smil_formats() 9 years ago
Sergey M․ e64b756943 [extractor/common] Interactive TFA code input 9 years ago
Sergey M․ 201ea3ee8e [extractor/common] Improve _hidden_inputs 9 years ago
Sergey M․ 8b9848ac56 [extractor/common] Expand meta regex 9 years ago
Sergey M․ 942acef594 [extractor/common] Extract _parse_xspf 9 years ago
Sergey M․ 98044462b1 [extractor/common] Use playlist id as default title 9 years ago
Sergey M․ e0b9d78fab [extractor/common] Clarify playlists can have description field 9 years ago
Sergey M․ 8d6765cf48 [extractor/generic] Add generic support for xspf playist extraction 9 years ago
Sergey M. d5d7bdaeb5 Merge pull request #6428 from dstftw/improve-generic-smil-support
Improve generic SMIL support
9 years ago
Sergey M․ 5b0c40da24 [extractor/common] Expand meta regex 9 years ago
Sergey M․ 17712eeb19 [extractor/common] Extract namespace parse routine 9 years ago
Sergey M․ 41c3a5a7be [extractor/common] Fix python 3 9 years ago
Sergey M․ a107193e4b [extractor/common] Extract f4m and m3u8 formats, subtitles and info 9 years ago
remitamine 799207e838 [viewster] extract the api auth token
Closes #6406.
9 years ago
Sergey M․ 864f24bd2c [extractor/common] Add _meta_regex and clarify tags field 9 years ago
Purdea Andrei 5316bf7487 Documented tags as a possible dict key 9 years ago
Sergey M․ 10952eb2cf [extractor/common] Consistent URL spelling 9 years ago
Jaime Marquínez Ferrándiz 297a564bee [youtube] Extract end_time 9 years ago
Jaime Marquínez Ferrándiz 7c80519cbf [youtube] Extract start_time
From the 't=*' in the url.
Currently youtube-dl doesn't use the value, but it was requested for the mpv plugin.
9 years ago
Sergey M․ 74fe23ec35 [extractor/common] Style 9 years ago
Yen Chi Hsuan a38436e889 [extractor/common] Add 'transform_source' parameter to _extract_f4m_formats() 9 years ago
Sergey M․ 31c746e5dc [extractor/common] Keep going in some media_url is missing 9 years ago
Sergey M․ 70f0f5a8ca [extractor/common] Recursively extract child f4m manifests 9 years ago
Sergey M․ cc357c4db8 [extractor/common] Properly handle full URLs 9 years ago
Sergey M․ 97f4aecfc1 [extractor/common] Handle malformed f4m manifests 9 years ago
Sergey M․ cf61d96df0 [extractor/common] Add _form_hidden_inputs 9 years ago
Sergey M․ f8da79f828 [extractor/common] Improve _form_hidden_inputs and rename to _hidden_inputs 9 years ago
Sergey M․ 27713812a0 [extractor/common] Add method for extracting form hidden input fields as dict 9 years ago
Yen Chi Hsuan 13af92fdc4 [common] Add 'fatal' to _extract_m3u8_formats 9 years ago
Sergey M․ 5414623791 [extractor/common] Remove superfluous line 9 years ago
Sergey M․ c342041fba [extractor/common] Use NO_DEFAULT from utils 9 years ago
Yen Chi Hsuan 621ed9f5f4 [common] Add note and errnote field for _extract_m3u8_formats 10 years ago
Sergey M․ baa43cbaf0 [extractor/common] Relax valid url check verbosity 10 years ago
Yen Chi Hsuan c1c924abfe [utils,common] Merge format_srt_time and _subtitles_timecode
format_srt_time uses a comma as the delimiter between seconds and
milliseconds while _subtitles_timecode uses a dot. All .srt examples I
found on the Internet uses a comma, so I use a comma in the merged
version. See http://matroska.org/technical/specs/subtitles/srt.html and
http://devel.aegisub.org/wiki/SubtitleFormats/SRT
10 years ago
Yen Chi Hsuan 05d5392cda [common] Ignore subtitles in m3u8 10 years ago
Sergey M․ 74f728249f [extractor/common] Fallback to empty string for (yet) missing `format_id` in `_sort_formats` (Closes #5624) 10 years ago
Jaime Marquínez Ferrándiz 2ddcd88129 Remove code that was only used by the Grooveshark extractor 10 years ago
zouhair cf0649f8b7 Typo: twice "the the" to "the" 10 years ago
Sergey M․ 3ded7bac16 [extractor/common] Add ability to specify custom field preference for `_sort_formats` 10 years ago
Jaime Marquínez Ferrándiz 08f2a92c9c InfoExtractor._search_regex: Suggest updating when the regex is not found (suggested in #5442)
Reuse the same message from ExtractorError
10 years ago
Yen Chi Hsuan c9a779695d [extractor/common] Add the encoding parameter
The QQMusic info extractor need forced encoding for correct working.
10 years ago
Sergey M․ 830d53bfae [utils] Add `video_title` for `url_result` 10 years ago
Sergey M․ e21a55abcc [extractor/common] Remove f4m section
It's now provided by `f4m_id`
10 years ago
Sergey M․ 4a34f69ea6 [extractor/common] Add subtitles timecode formatter 10 years ago
Sergey M․ f207019ce5 [extractor/common] Remove 'm3u8' from quality selection URL 10 years ago
Sergey M․ 8dc9d361c2 [extractor/common] Fix format_id when `last_media` is None and always include `m3u8_id` if present
The rationale behind `m3u8_id` was to resolve duplicates when processing several m3u8 playlists within the same media that give equal resulting `format_id`'s,
e.g. `youtube-dl http://www.rts.ch/play/tv/passe-moi-les-jumelles/video/la-fee-des-bois-mustang-les-chemins-du-vent?id=3854925 -F`
10 years ago
Philipp Hagemeister a0bb7c5593 [extractor/common] Improve m3u format IDs (#5143) 10 years ago
Sergey M․ 2f0f6578c3 [extractor/common] Assume non HTTP(S) URLs valid 10 years ago
Philipp Hagemeister 72a406e7aa [extractor/common] Pass in video_id (#5057) 10 years ago
Antti Ajanki 6f4ba54079 [extractor/common] Extract HTTP (possibly f4m) URLs from a .smil file 10 years ago
Antti Ajanki 637570326b [extractor/common] Extract the first of a seq of videos in a .smil file 10 years ago
Jaime Marquínez Ferrándiz bfc993cc91 Merge branch 'subtitles-rework'
(Closes PR #4964)
10 years ago
Sergey M․ 9fe6ef7ab2 [extractor/common] Fix preference for m3u8 quality selection URL 10 years ago
Philipp Hagemeister 8fb3ac3649 PEP8: W503 10 years ago
Philipp Hagemeister 77b2986b5b [extractor/common] Recognize Indian censorship (#5021) 10 years ago
Jaime Marquínez Ferrándiz 9868ea4936 [extractor/common] Simplify subtitles handling methods
Initially I was going to use a single method for handling both subtitles and automatic captions, that's why I used the 'list_subtitles' and the 'subtitles' variables.
10 years ago
Philipp Hagemeister fa15607773 PEP8 fixes 10 years ago
Jaime Marquínez Ferrándiz 4cd95bcbc3 [twitch:stream] Prefer the 'source' format (fixes #4972) 10 years ago
Sergey M? 4069766c52 [extractor/common] Test URLs with GET 10 years ago
Jaime Marquínez Ferrándiz 360e1ca5cc [youtube] Convert to new subtitles system
The automatic captions are stored in the 'automactic_captions' field, which is used if no normal subtitles are found for an specific language.
10 years ago
Jaime Marquínez Ferrándiz c84dd8a90d [YoutubeDL] store the subtitles to download in the 'requested_subtitles' field
We need to keep the orginal subtitles information, so that the '--load-info' option can be used to list or select the subtitles again.
We'll also be able to have a separate field for storing the automatic captions info.
10 years ago
Jaime Marquínez Ferrándiz a504ced097 Improve subtitles support
For each language the extractor builds a list with the available formats sorted (like for video formats), then YoutubeDL selects one of them using the '--sub-format' option which now allows giving the format preferences (for example 'ass/srt/best').
For each format the 'url' field can be set so that we only download the contents if needed, or if the contents needs to be processed (like in crunchyroll) the 'data' field can be used.

The reasons for this change are:
* We weren't checking that the format given with '--sub-format' was available, checking it in each extractor would be repetitive.
* It allows to easily support giving a format preference.
* The subtitles were automatically downloaded in the extractor, but I think that if you use for example the '--dump-json' option you want to finish as fast as possible.

Currently only the ted extractor has been updated, but the old system still works.
10 years ago
Philipp Hagemeister 03cd72b007 [extractor/common] Move up filesize
filesize and tbr should correlate, so it doesn't make sense to treat them differently.
10 years ago
Jaime Marquínez Ferrándiz 6ca7732d5e [extractor/common] Fix link to external documentation 10 years ago
Jaime Marquínez Ferrándiz 2d30521ab9 [youtube] Extract average rating (closes #2362) 10 years ago
Philipp Hagemeister 9650885be9 [escapist] Filter video differently (Fixes #4919) 10 years ago
Philipp Hagemeister 7e5db8c930 [options] Add --no-color 10 years ago
Philipp Hagemeister 3a5bcd0326 [extractor/common] Wrap extractor errors (Fixes #1194)
For now, we just wrap some common errors. More may follow. We do not want to catch actual programming errors in the extractors, such as 1 // 0.
10 years ago
Naglis Jonaitis 69319969de [extractor/common] Add new helper method _family_friendly_search 10 years ago
Philipp Hagemeister 1e1896f2de [extractor/common] Correct sort order.
We should look at height and width before ext_preference.
10 years ago
Sergey M․ 3900eec27c [extractor/common] Fix 2.0 manifest extraction (Closes #4830) 10 years ago
Sergey M․ 60ca389c64 [extractor/common] Prefix f4m/m3u8 entries with identifier 10 years ago
Philipp Hagemeister 9bb8e0a3f9 [wsj] Add new extractor (Fixes #4854) 10 years ago
Philipp Hagemeister 1a6373ef39 [sort_formats] Prefer bitrate over video size
720p @ 1000KB/s looks way better than 1080p @ 500KB/s
10 years ago
Philipp Hagemeister 995029a142 [nerdist] Add new extractor (Fixes #4851) 10 years ago
Philipp Hagemeister b04b885271 [extractor/common] Document all protocol values 10 years ago
Sergey M․ 96a53167fa [common] Generalize URLs' HTTP errors pre-testing 10 years ago
Philipp Hagemeister 3dee7826e7 [rtl2] PEP8, simplify, make rtmp tests run (#470) 10 years ago
Philipp Hagemeister cfb56d1af3 Add --list-thumbnails 10 years ago
Jaime Marquínez Ferrándiz e1554a407d [extractors] Use http_headers for setting the User-Agent and the Referer 10 years ago
Philipp Hagemeister 121c09c7be Merge remote-tracking branch 'Dineshs91/f4m-2.0' 10 years ago
Philipp Hagemeister 6271f1cad9 [youtube|ffmpeg] Automatically correct video with non-square pixels (Fixes #4674) 10 years ago
Philipp Hagemeister ff21a8e0ee Merge remote-tracking branch 'Tithen-Firion/master' 10 years ago
Philipp Hagemeister dd622d7c4e [netzkino] Add new extractor (Fixes #4669) 10 years ago
Philipp Hagemeister bec2248141 [InfoExtractor/common] Correct and test meta tag matching 10 years ago
Philipp Hagemeister 0590062925 Respect age_limit when listing extractors (Fixes #4653) 10 years ago
Philipp Hagemeister e65566a9cc [youtube] Correct handling when DASH manifest is not necessary to find all formats 10 years ago
Sergey M․ 6c6f1408f2 [extractor/common] Allow multiline content tags 10 years ago
Jaime Marquínez Ferrándiz 5d3808524d [extractor/common] Update docstring: replace FileDownloader with YoutubeDL 10 years ago
Philipp Hagemeister bf94e38d3d Merge remote-tracking branch 'Tithen-Firion/hsw-update' 10 years ago
Philipp Hagemeister f5e43bc695 [vine] Provide alt_title (Fixes #4448) 10 years ago
Sergey M․ e89a2aabed [extractor/common] Add generic SMIL formats extraction routine 10 years ago
Philipp Hagemeister f58766ce5c [extractor/common] Document ie_key in url results 10 years ago
Sergey M․ acf5cbfe93 [extractor/common] Add description to playlist_result 10 years ago
Philipp Hagemeister b82f815f37 Allow iterators for playlist result entries 10 years ago
Tithen-Firion ebb6419960 [common] Split _download_json
Add ability for extractor to use _parse_json
10 years ago
Tithen-Firion 995ad69c54 [common] Add new parameters for _download_webpage 10 years ago
Philipp Hagemeister 810fb84d5e pep8 and minor beautification all around 10 years ago
Jaime Marquínez Ferrándiz 42939b6129 [youtube] Use a cookie for seeting the language
This way, we don't have to do an aditional request
10 years ago
Philipp Hagemeister 4e262a8838 [generic] Detect direct video links (Fixes #4149, #4313) 10 years ago
Jouke Waleson 9e1a5b8455 PEP8: applied even more rules 10 years ago