Commit Graph

611 Commits (3089bc748c0fe72a0361bce3f5e2fbab25175236)

Author SHA1 Message Date
renalid a942d6cb48
[utils,franceinter] Add french months' names and fix extraction
Update of the "FranceInter" radio extractor : webpages HTML structure
had changed, the extractor didn't work. So I updated this extractor to
get the mp3 URL and all details.
8 years ago
Sergey M․ c2b2c7e138
[utils] Add quicktime to mimetype2ext 8 years ago
Sergey M․ 6562d34a8c
[utils] Improve mimetype2ext 8 years ago
Remita Amine 073ac1225f [utils] add ac-3 to the list of audio codecs in parse_codecs 8 years ago
Yen Chi Hsuan 70852b47ca
[utils] Recognize units with full names in parse_filename
Reference: https://en.wikipedia.org/wiki/Template:Quantities_of_bytes
8 years ago
Yen Chi Hsuan e4659b4547
[utils] Correct octal/hexadecimal number detection in js_to_json 8 years ago
Sergey M․ 13585d7682
[utils] Recognize lowercase units in parse_filesize 8 years ago
Remita Amine 98e698f1ff [external/curl] respect more downloader options and display progress 8 years ago
Yen Chi Hsuan 81c13222c6
[utils] Recognize more formats in unified_timestamp
Used in CtsNews
8 years ago
Sergey M․ a8795327ca
[utils] Add support TV Parental Guidelines ratings in parse_age_limit 8 years ago
Yen Chi Hsuan d3f8e038fe
[utils] Add decode_png for openload (#9706) 8 years ago
Yen Chi Hsuan 7dc2a74e0a
[utils] Fix unified_timestamp for formats parsed by parsedate_tz() 8 years ago
Sergey M․ f164b97123
[utils] Add another f4m mimetype to mimetype2ext 8 years ago
Remita Amine e910fe2fe4 [brightcove] skip ism manifests 8 years ago
Yen Chi Hsuan 0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
8 years ago
Yen Chi Hsuan 84c237fb8a
[utils] Add get_element_by_class
For #9950
8 years ago
Remita Amine b4173f1551 [utils] add mimetypes to determine manifest ext(m3u8, f4m, mpd) 8 years ago
Remita Amine 81953d1ae5 [kaltura] add support videos stored on custom kaltura servers(closes #5557) 8 years ago
Sergey M․ 95cf60e826
[utils] Add PUTRequest 8 years ago
Aleksandar Topuzovic 6b03e1e25d
[HRTi] Implement extractor for Croatian Radiotelevision 8 years ago
remitamine 4f3c5e0627 [utils] add helper function for parsing codecs 8 years ago
Yen Chi Hsuan 1143535d76
[utils] Add urshift()
Used in IqiyiIE and LeIE
8 years ago
Sergey M․ b72b44318c
[utils] Add strip_or_none 8 years ago
Sergey M․ 46f59e89ea
[utils] Add unified_timestamp 8 years ago
remitamine e154c65128 [downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader 8 years ago
Yen Chi Hsuan 47212f7bcb
[utils] Don't transform numbers not starting with a zero
Fix test_Viidea and maybe others
8 years ago
Sergey M․ 329ca3bef6
[utils] Add try_get
To reduce boilerplate when accessing JSON
8 years ago
Paul Henning 15d106787e [utils] Change Firefox 44 to 47
See commit title.
8 years ago
Yen Chi Hsuan 55b2f099c0
[utils] Decode HTML5 entities
Used in test_Vporn_1. Also related to #9270
8 years ago
Yen Chi Hsuan 6c33d24b46
[utils] Add audio/mpeg to mimetype2ext()
Used in WDR live radios (#6147)
8 years ago
bzc6p c88270271e Added sanitization support for Hungarian letters Ő and Ű 8 years ago
Yen Chi Hsuan 9a4aec8b7e [utils] Use bytes-like objects as header values on Python 2 8 years ago
Yen Chi Hsuan 0ea590076f [utils] Always decode Location header
escape_url is broken for bytes-like objects
8 years ago
Yen Chi Hsuan 293c255688
[utils] Remove debugging codes 8 years ago
Yen Chi Hsuan 5950cb1d6d
[utils] Support a new form of date
Found in dw.com (#9475)
8 years ago
Sergey M․ c6b9cf05e1
[utils] Do not fail on unknown date formats in unified_strdate 8 years ago
Sergey M․ 46bc9b7d7c
[utils] Allow None in remove_{start,end} 8 years ago
Yen Chi Hsuan cdd94c2eae
[utils] Check for None values in SOCKS proxy
Originally reported at
https://github.com/rg3/youtube-dl/pull/9287#issuecomment-219617864
8 years ago
Yen Chi Hsuan 79298173c5
[utils] Fix getheader in urlhandle_detect_ext
Fixes #7049, related to #9440
8 years ago
Sergey M․ cda6d47aad
[utils] Simplify integer conversion in js_to_json 8 years ago
Sergey M․ 89ac4a19e6
[utils] Process non-base 10 integers in js_to_json 8 years ago
felix bd1e484448
[utils] js_to_json: various improvements
now JS object literals like { /* " */ 0: ",]\xaa<\/p>", } will be correctly converted to JSON.
8 years ago
Yen Chi Hsuan 7581bfc958
[utils] Unquote crendentials passed to SOCKS proxies
Fixes #9450
8 years ago
Yen Chi Hsuan 778a1ccca7
[utils] Add Œ and œ found in French to ACCENT_CHARS
Fixes #9463
8 years ago
Yen Chi Hsuan 702ccf2dc0
[compat] Rename shlex_quote and remove unused subprocess_check_output 8 years ago
Yen Chi Hsuan edaa23f822
[compat] Rename struct_(un)pack to compat_struct_(un)pack 8 years ago
Yen Chi Hsuan d5ae6bb501
[utils] Add rationale for register_socks_protocols 8 years ago
Yen Chi Hsuan 51fb4995a5
[utils] Register SOCKS protocols in urllib and support SOCKS4A 8 years ago
Yen Chi Hsuan 71aff18809
[socks] Support SOCKS proxies 8 years ago
Yen Chi Hsuan dab0daeeb0
[utils,compat] Move struct_pack and struct_unpack to compat.py 8 years ago
Sergey M․ abc97b5eda
[utils] Allow empty attribute values in get_element_by_attribute (Closes #9415) 8 years ago
Adam Thalhammer c587cbb793 improved performance by extracting accented chars to top level 8 years ago
Adam Thalhammer 79a2e94e79 Instead of replacing accented characters with an underscore when sanitizing file names in restricted mode, replace them with their non-accented equivalents fixes #9347 8 years ago
Sergey M․ eb9ee19422
[utils] Allow None mimetypes in mimetype2ext 8 years ago
Sergey M b6c0d4f431 Merge pull request #9110 from remitamine/parse_duration
[utils] imporove parse_duration to handle more formats
8 years ago
remitamine acaff49575 [utils] imporove parse_duration to handle more formats 8 years ago
Yen Chi Hsuan cacd996662 [utils] Don't touch URLs if not necessary
Fix test_Generic_15 (Google redirect)
8 years ago
Jaime Marquínez Ferrándiz 5bf28d7864 [utils] dfxp2srt: add additional namespace
Used by the ZDF subtitles (#9081).
8 years ago
Sergey M․ 15d260ebaa [utils] Use update_Request in http_request 8 years ago
Sergey M․ ed0291d153 [utils] Add update_Request 8 years ago
Sergey M․ 17bcc626bf [utils] Extract sanitize_url routine 8 years ago
Sergey M․ 15707c7e02 [compat] Add compat_urllib_parse_urlencode and eliminate encode_dict
encode_dict functionality has been improved and moved directly into compat_urllib_parse_urlencode
All occurrences of compat_urllib_parse.urlencode throughout the codebase have been replaced by compat_urllib_parse_urlencode

Closes #8974
8 years ago
Yen Chi Hsuan 622d19160b [utils] Clarify Python versions affected by buggy struct module 8 years ago
Yen Chi Hsuan efbed08dc2 [utils] Encode hostnames before passing to urllib
With IDN (Internationalized Domain Name) and a proxy, non-ascii URLs
are passed down to urllib/urllib2, causing UnicodeEncodeError

Fixes #8890
8 years ago
Jaime Marquínez Ferrándiz 782b1b5bd1 [utils] lookup_unit_table: Match word boundary instead of end of string 8 years ago
Jaime Marquínez Ferrándiz 09fc33198a utils: lookup_unit_table: Use a stricter regex
In parse_count multiple units start with the same letter, so it would match different units depending on the order they were sorted when iterating over them.
8 years ago
Sergey M․ 810c10baa1 [utils] Use compat_xpath 8 years ago
Sergey M․ c5229f3926 [utils] PEP 8 8 years ago
remitamine 83548824c2 Merge pull request #8092 from bpfoley/twitter-thumbnail
[utils] Add extract_attributes for extracting html tag attributes
8 years ago
Sergey M․ 2f7ae819ac [utils] PEP 8 8 years ago
Sergey M․ fb47597b09 [bbc] Generalize unit table lookup and add parse_count 8 years ago
Yen Chi Hsuan 25cb05bda9 [utils] Remove codec2ext
This function is orignally used for determining file extensions of DASH
formats. Now in DASH, ext is determined by mime_type. See #8766 for more
information.
8 years ago
Yen Chi Hsuan 6d210f2090 [utils] Add more codecs to codec2ext
BBC uses avc3. Here's an example (thanks to @remitamine for this example)

http://rdmedia.bbc.co.uk/dash/ondemand/bbb/2/client_manifest-common_init.mpd

See also https://trac.ffmpeg.org/ticket/5217
8 years ago
Yen Chi Hsuan 19a17d4623 [utils] Add codec2ext 8 years ago
Jaime Marquínez Ferrándiz 3233a68fbb [utils] update_url_query: Encode the strings in the query dict
The test case with {'test': '第二行тест'} was failing on python 2 (the non-ascii characters were replaced with '?').
8 years ago
remitamine 1255733945 Merge pull request #8739 from remitamine/update_url_params
[utils] add update_url_query function to create or update query string params
8 years ago
remitamine 38f9ef31dc [utils] add update_url_query function 8 years ago
Yen Chi Hsuan 0cae023b24 Merge branch 'jython-support'
Closes #8302
8 years ago
Yen Chi Hsuan 8ee239e921 [utils] Jython support - handle filenames correctly
Now test:youtube downloads
8 years ago
Brian Foley 8bb56eeeea [utils] Add extract_attributes for extracting html tag attributes
This is much more robust than just using regexps, and handles all
the common scenarios, such as empty/no values, repeated attributes,
entity decoding, mixed case names, and the different possible value
quoting schemes.
8 years ago
remitamine e07237f640 [utils] remove check for val from find_xpath_attr 8 years ago
Yen Chi Hsuan 5eb6bdced4 [utils] Multiple changes to base_n()
1. Renamed to encode_base_n()
2. Allow tables longer than 62 characters
3. Raise ValueError instead of AssertionError for invalid input data
4. Return the first character in the table instead of '0' for number 0
5. Add tests
8 years ago
Yen Chi Hsuan 680079be39 [utils] Relaxing regex in decode_packed_codes for vidzi 8 years ago
Yen Chi Hsuan f52354a889 [utils] Move codes for handling eval() from iqiyi.py 8 years ago
Yen Chi Hsuan 59f898b7a7 [utils] Merge base_n functions 8 years ago
Yen Chi Hsuan 481888294d [utils] Add base36 for use in Vidzi 8 years ago
Yen Chi Hsuan 81bdc8fdf6 [utils] Move base62 to utils 8 years ago
Sergey M․ f160785c5c [utils] Remove AM/PM from unified_strdate patterns 8 years ago
Yen Chi Hsuan b95dc034ca [utils] Implement cache for OnDemandPagedList 8 years ago
remitamine cafcf657a4 add more subtitles mime types to mimetype2ext and fix the platform subtitle extraction 8 years ago
Yen Chi Hsuan c1c05c67ea [utils] Jython support - disable setproctitle() until ctypes is complete 8 years ago
Yen Chi Hsuan 399a76e67b [utils] Jython support: tolerate missing fcntl module 8 years ago
Jaime Marquínez Ferrándiz 765ac263db [utils] mimetype2ext: return 'm4a' for 'audio/mp4' (fixes #8620)
The youtube extractor was using 'mp4' for them, therefore filters like 'bestaudio[ext=m4a]' stopped working (94278f7202 broke it).
8 years ago
Yen Chi Hsuan 5bc880b988 [utils] Add OHDave's RSA encryption function 8 years ago
Sergey M․ 611c1dd96e [refactor] Single quotes consistency 8 years ago
Sergey M․ d800609c62 [refactor] Do not specify redundant None as second argument in dict.get() 8 years ago
Sergey M․ 9c7b38981c [utils] Bump Firefox version in User-Agent
Old version number causes Youtube not to serve some formats in ytplayer.config
8 years ago
Sergey M․ 8411229bd5 [utils] Allow dot in strip_jsonp 8 years ago
Sergey M․ 86296ad2cd [utils] Add ability to control skipping false values in dict_get 8 years ago
Sergey M․ cbecc9b903 [utils] Add dict_get convenience method 8 years ago
Jaime Marquínez Ferrándiz 87de7069b9 [utils] dfxp2srt: make TTMLPElementParser inherit from object
For consistency between python 2 and 3.
8 years ago
remitamine 2b14cb566f [utils] fix dfxp2srt text extraction(fixes #8055) 8 years ago
Yen Chi Hsuan a0d8d704df [utils] Reorder items in mimetype2ext alphabetically 8 years ago
Yen Chi Hsuan f6861ec96f [utils] Add more items to mimetype2ext (#8293)
These are used in Youtube formats
8 years ago
remitamine 6ec6cb4e95 Revert "fix typos"
This reverts commit 36a0e46c39.
8 years ago
remitamine 36a0e46c39 fix typos 8 years ago
Jakub Wilk dfb1b1468c Fix typos
Closes #8200.
8 years ago
Sergey M․ a7aaa39863 [utils] Extract known extensions for reuse 8 years ago
Yen Chi Hsuan c047270c02 [utils] Remove Content-encoding from headers after decompression
With cn_verification_proxy, our http_response() is called twice, one from
PerRequestProxyHandler.proxy_open() and another from normal
YoutubeDL.urlopen(). As a result, for proxies honoring Accept-Encoding, the
following bug occurs:

$ youtube-dl -vs --cn-verification-proxy https://secure.uku.im:993 "test:letv"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-vs', '--cn-verification-proxy', 'https://secure.uku.im:993', 'test:letv']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.12.23
[debug] Git HEAD: 97f18fa
[debug] Python version 3.5.1 - Linux-4.3.3-1-ARCH-x86_64-with-arch-Arch-Linux
[debug] exe versions: ffmpeg 2.8.4, ffprobe 2.8.4, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.letv.com/ptv/vplay/22005890.html
[Letv] 22005890: Downloading webpage
[Letv] 22005890: Downloading playJson data
ERROR: Unable to download JSON metadata: Not a gzipped file (b'{"') (caused by OSError('Not a gzipped file (b\'{"\')',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/common.py", line 330, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/YoutubeDL.py", line 1886, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 471, in open
    response = meth(req, response)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/utils.py", line 773, in http_response
    raise original_ioerror
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/utils.py", line 761, in http_response
    uncompressed = io.BytesIO(gz.read())
  File "/usr/lib/python3.5/gzip.py", line 274, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.5/gzip.py", line 461, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.5/gzip.py", line 409, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
9 years ago
Sergey M․ 9b9c5355e4 Rename error_to_str to error_to_compat_str 9 years ago
Sergey M․ 8e60dc7526 [utils] Add encode_compat_str 9 years ago
Sergey M․ fdae235858 [utils] Add error_to_str 9 years ago
Yen Chi Hsuan db2fe38b55 [utils] Support alternative timestamp format in TTML
Fixes #7608
9 years ago
Yen Chi Hsuan d631d5f9f2 [utils] Fix TTML conversion
Tolerate invalid timestamps (closes #7909)
9 years ago
Sergey M․ 31b2051e21 [utils] Add remove_quotes 9 years ago
Yen Chi Hsuan 992fc9d6e1 [utils] Refactor handle_youtubedl_headers for future extension 9 years ago
Yen Chi Hsuan 0424ec307b [utils] Correct docstring of YoutubeDLHandler 9 years ago
Yen Chi Hsuan 87f0e62d94 [utils] Separate codes for handling Youtubedl-* headers 9 years ago
Sergey M․ 67dda51722 Rename compat_urllib_request_Request to sanitized_Request and move to utils 9 years ago
Sergey M․ 9cb9a5df77 [utils] Check ext with trailing slash against the list of known extensions 9 years ago
Sergey M․ 3e12bc583a [utils] Improve determine_ext (Closes #7593) 9 years ago
Sergey M․ 7e1f5447e7 [utils] Improve encode_dict 9 years ago
Sergey M․ 7a3f0c00ad [utils] Style 9 years ago
Sergey M․ 7aefc49c40 [utils] Skip invalid/non HTML entities (Closes #7518) 9 years ago
Jaime Marquínez Ferrándiz 6a75040278 [utils] unified_strdate: Return None if the date format can't be recognized (fixes #7340)
This issue was introduced with ae12bc3ebb, it returned 'None'.
9 years ago
Sergey M․ c90d16cf36 [utils:sanitize_path] Disallow trailing whitespace in path segment (Closes #7332) 9 years ago
Sergey M 30eecc6a04 Merge pull request #7296 from jaimeMF/xml_attrib_unicode
Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (…
9 years ago
Sergey M․ ae12bc3ebb [utils] Make unified_strdate always return unicode string 9 years ago
Sergey M․ 578c074575 [utils] Support list of xpath in xpath_element 9 years ago
Sergey M․ 52c3a6e49d [utils] Improve parse_iso8601 9 years ago
Jaime Marquínez Ferrándiz f78546272c [compat] compat_etree_fromstring: also decode the text attribute
Deletes parse_xml from utils, because it also does it.
9 years ago
Jaime Marquínez Ferrándiz 36e6f62cd0 Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (#7178)
Attributes aren't unicode objects, so they couldn't be directly used in info_dict fields (for example '--write-description' doesn't work with bytes).
9 years ago
Sergey M․ d01949dc89 [utils:js_to_json] Fix bad escape in double quoted strings 9 years ago
Yen Chi Hsuan 1e399778ee [letv] Fix extraction
Using data URIs for passing the decrypted M3U8 manifest, which is
supported by ffmpeg only.
9 years ago
Sergey M․ af98f8ff37 [utils] Return default on fail in int_or_none 9 years ago
Sergey M․ caf80631f0 [utils] Do not fail in float_or_none on non-numeric data 9 years ago
Sergey M․ 1812afb7b3 [utils] Do not fail in int_or_none on non-numeric data (Closes #7175) 9 years ago
Sergey M․ 5a1a2e9454 [utils] Fix kwargs on old python 2 (Closes #6905) 9 years ago
Sergey M․ e28034c5ac [utils] Comment cookie processing until result from travis and some more testing 9 years ago
Sergey M․ 266e466ee4 [utils] Simplify cookie processor 9 years ago
Sergey M․ 1639282434 [utils] Add encode_dict 9 years ago
Sergey M․ ad72917274 [utils] Add issue URL in comment for #6457 9 years ago
Sergey M․ a6420bf50c [utils] Add cookie processor for cookie correction (Closes #6769) 9 years ago
Sergey M․ 66e289bab4 [utils] Generalize cli option converters 9 years ago
Sergey M․ 8e636da499 [utils] Improve xpath_text 9 years ago
Sergey M․ 5d2354f177 [utils] Relax attribute key assert 9 years ago
Sergey M․ a41fb80ce1 [utils] Add xpath_element and xpath_attr 9 years ago
Sergey M․ e5e78797e6 [utils] Strict HTTP responses (Closes #6727) 9 years ago
Sergey M․ 5a4d9ddb21 [utils] Percent-encode redirect URL of Location header (Closes #6457) 9 years ago
Sergey M․ 51f267d9d4 [YoutubeDL:utils] Move percent encode non-ASCII URLs workaround to http_request and simplify (Closes #6457) 9 years ago