Commit Graph

317 Commits

Author SHA1 Message Date
Sergey M․ 53cd37bac5
[utils] Improve strip_or_none 2019-05-24 00:03:01 +07:00
Jakub Wilk fd35d8cdfd [utils] Transliterate "þ" as "th" (#20897)
Despite visual similarity "þ" is unrelated to "p".
It is normally transliterated as "th":

    $ echo þ-Þ | iconv -t ASCII//TRANSLIT
    th-TH
2019-05-11 01:42:31 +07:00
Sergey M․ 5e1271c56d
[utils] Improve int_or_none and float_or_none (#20403) 2019-03-23 01:08:54 +07:00
Sergey M․ 0dc41787af
[utils] Introduce parse_bitrate 2019-03-17 09:07:47 +07:00
Sergey M․ fad4ceb534
[utils] Fix urljoin for paths with non-http(s) schemes 2019-01-20 20:22:19 +07:00
Sergey M․ 25d110be30
[utils] Properly recognize AV1 codec (closes #17506) 2018-09-10 02:37:22 +07:00
Sergey M․ af03000ad5
[utils] Introduce url_or_none 2018-07-21 18:03:58 +07:00
Sergey M․ e9c671d5e8
[utils] Allow JSONP with empty func name (closes #17028) 2018-07-21 12:30:18 +07:00
Enes 85750f8972 [openload] Improve ext extraction 2018-06-02 00:16:22 +07:00
Remita Amine 3bb3ff38a1 [test_utils] add tests for b836118724 2018-05-23 12:20:05 +01:00
Sergey M․ 6cc622327f
[utils] Introduce merge_dicts 2018-04-28 02:47:17 +07:00
Sergey M․ 1cc47c6674
[utils] Fix match_str for boolean meta fields 2018-04-24 23:54:49 +07:00
Philipp Hagemeister f226880c6d [tennistv] Add support for tennistv.com 2018-03-14 09:55:21 +01:00
Sergey M․ b871d7e954
[utils] Add parse_resolution 2018-03-02 23:39:04 +07:00
Sergey M․ befa4708fd
[utils] Fixup some common URL's typos in sanitize_url (closes #15649) 2018-02-19 22:50:23 +07:00
Sergey M․ c707b1d828
[test_utils] Add tests for malformed JSON handling in js_to_json 2018-01-20 23:00:09 +07:00
Mike Fährmann c384d537f8 [util] Improve scientific notation handling in js_to_json (closes #14789) 2018-01-20 22:54:21 +07:00
Sergey M․ b555ae9bf1
[utils] Add another date format pattern (#14999) 2017-12-16 21:56:16 +07:00
Sergey M․ 056653bbb1
[utils] Add support for zero years and months in parse_duration 2017-10-29 07:04:48 +07:00
Yen Chi Hsuan 3869028ffb [utils] Use bytes-like objects in dfxp2srt
This fixes handling of non-UTF8 TTML subtitles

Closes #14191
2017-09-16 12:18:38 +08:00
Yen Chi Hsuan 95f3f7c20a
[utils] Fix unescapeHTML for misformed string like "&a"" (#13935) 2017-08-19 21:40:53 +08:00
Sergey M․ 5b232f46dc
[utils] Skip missing params in cli_bool_option (closes #13865) 2017-08-09 22:28:19 +07:00
Sergey M․ dee2ff1d81
[test_utils] Fix tests under Windows 2017-07-06 00:25:37 +07:00
Yen Chi Hsuan 609ff8ca19 [utils] Support attributes with no values in get_elements_by_attribute() 2017-07-05 23:27:12 +08:00
Sergey M․ b4a3d461e4
[utils] Handle HTMLParseError in extract_attributes (closes #13349) 2017-06-12 01:52:24 +07:00
Sergey M․ 2ae2ffda5e
[utils] Improve unified_timestamp 2017-06-11 21:27:22 +07:00
Yen Chi Hsuan 5552c9eb0f
[utils] Recognize more patterns in strip_jsonp()
Used in Youku Show pages
2017-05-26 21:58:18 +08:00
Yen Chi Hsuan 0c26548601
[cda] Implement birthday verification (closes #12789) 2017-05-04 16:26:17 +08:00
Sergey M․ deef31955b
[utils] Improve unified_timestamp
Seen at http://zaq1.pl/video/xev0e
2017-04-30 21:45:53 +07:00
Tithen-Firion 9222d94510 [test_utils] Add one more clean_html test 2017-04-28 18:05:14 +02:00
Remita Amine 5b995f713b [utils] add support for ttml styles 2017-04-19 14:38:40 +01:00
Sergey M․ a426ef6d78
[test_utils] Do not use dash in env variables' names 2017-03-26 03:22:48 +07:00
Sergey M․ 41c5e60dd5
[test_utils] Fix expand_path tests 2017-03-26 03:07:56 +07:00
Sergey M․ 51098426b8
[utils] Introduce expand_path 2017-03-26 02:30:10 +07:00
Sergey M․ 4b5de77bdb
[utils] Process bytestrings in urljoin (closes #12369) 2017-03-06 03:57:46 +07:00
Yen Chi Hsuan f48409c7ac [utils] Add pkcs1pad
Used in daisuki.net (#4738)
2017-02-28 22:10:31 +08:00
Thomas Christlieb 2af12ad9d2 Introduce get_elements_by_class and get_elements_by_attribute utility functions 2017-02-11 17:16:54 +08:00
Sergey M․ 4195096ea8
[utils] Improve comments processing in js_to_json (closes #11947) 2017-02-03 03:04:33 +07:00
Michal Čihař b3ee552e4b
[utils] Handle single-line comments in js_to_json 2017-02-03 03:04:33 +07:00
Sergey M․ 15846398ca
[utils] Improve parse_duration 2017-01-26 23:23:08 +07:00
Sergey M․ cb655f34fb
[utils] Add more date formats 2017-01-12 22:39:45 +07:00
Remita Amine 7fe1592073 [common] fix dash codec information for mixed videos and fragment url construction(#11490) 2016-12-20 12:35:03 +01:00
Sergey M․ b0c65c677f
[utils] Improve urljoin 2016-12-17 18:49:55 +07:00
Sergey M․ e34c33614d
[utils] Add convenience urljoin 2016-12-13 02:23:49 +07:00
Yen Chi Hsuan 582be35847
Update coding style after pycodestyle 2.1.0
In pycodestyle 2.1.0, E305 was introduced, which requires two blank
lines after top level declarations, too.

See https://github.com/PyCQA/pycodestyle/issues/400

See also #10689; thanks @stepshal for first mentioning this issue and
initial patches
2016-11-17 19:45:42 +08:00
Sergey M․ 02dc0a36b7
[utils] Introduce base_url 2016-11-02 02:30:18 +07:00
Sergey M․ c6eed6b8c0
[utils] Lower priority for rare date formats and add tests 2016-09-29 23:52:29 +07:00
Sergey M․ 3e4185c396
[utils] Use native french month names 2016-09-14 23:59:38 +07:00
Sergey M․ f6717dec8a
[utils] Improve month_by_name and add tests 2016-09-14 23:59:38 +07:00
Sergey M․ 6562d34a8c
[utils] Improve mimetype2ext 2016-09-02 22:57:48 +07:00
Yen Chi Hsuan 70852b47ca
[utils] Recognize units with full names in parse_filename
Reference: https://en.wikipedia.org/wiki/Template:Quantities_of_bytes
2016-08-20 00:17:26 +08:00
Yen Chi Hsuan e4659b4547
[utils] Correct octal/hexadecimal number detection in js_to_json 2016-08-19 20:37:17 +08:00
Sergey M․ 13585d7682
[utils] Recognize lowercase units in parse_filesize 2016-08-18 23:32:00 +07:00
Remita Amine 5f2c2b7936 [test_utils] add test for option with not str value 2016-08-13 09:54:12 +01:00
Sergey M․ a8795327ca
[utils] Add support TV Parental Guidelines ratings in parse_age_limit 2016-08-07 20:45:18 +07:00
Yen Chi Hsuan 7dc2a74e0a
[utils] Fix unified_timestamp for formats parsed by parsedate_tz() 2016-08-05 11:41:55 +08:00
Yen Chi Hsuan 0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
2016-07-10 23:40:45 +08:00
Yen Chi Hsuan 84c237fb8a
[utils] Add get_element_by_class
For #9950
2016-07-06 20:02:52 +08:00
Remita Amine dfaa86b75e [test_utils] add test for smuggling a smuggled url 2016-07-04 21:36:32 +01:00
remitamine 4f3c5e0627 [utils] add helper function for parsing codecs 2016-06-26 14:03:58 +01:00
Yen Chi Hsuan 1143535d76
[utils] Add urshift()
Used in IqiyiIE and LeIE
2016-06-26 15:16:49 +08:00
Sergey M․ 46f59e89ea
[utils] Add unified_timestamp 2016-06-25 23:19:18 +07:00
Yen Chi Hsuan 47212f7bcb
[utils] Don't transform numbers not starting with a zero
Fix test_Viidea and maybe others
2016-06-16 11:00:54 +08:00
Yen Chi Hsuan 55b2f099c0
[utils] Decode HTML5 entities
Used in test_Vporn_1. Also related to #9270
2016-06-10 15:11:55 +08:00
bzc6p b96f007eeb Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:39:32 +02:00
Sergey M․ 46bc9b7d7c
[utils] Allow None in remove_{start,end} 2016-05-19 04:31:30 +06:00
Sergey M․ 364cf465dd
[test_utils] PEP 8 2016-05-14 20:46:33 +06:00
Sergey M․ 89ac4a19e6
[utils] Process non-base 10 integers in js_to_json 2016-05-14 20:39:58 +06:00
felix bd1e484448
[utils] js_to_json: various improvements
now JS object literals like { /* " */ 0: ",]\xaa<\/p>", } will be correctly converted to JSON.
2016-05-14 20:12:39 +06:00
Yen Chi Hsuan 778a1ccca7
[utils] Add Œ and œ found in French to ACCENT_CHARS
Fixes #9463
2016-05-12 19:48:48 +08:00
Yen Chi Hsuan dab0daeeb0
[utils,compat] Move struct_pack and struct_unpack to compat.py 2016-05-10 14:51:38 +08:00
Adam Thalhammer 31c4448f6e Instead of replacing accented characters with an underscore when sanitizing file names in restricted mode, replace them with their non-accented equivalents fixes #9347 2016-05-02 13:25:12 +10:00
Adam Thalhammer 79a2e94e79 Instead of replacing accented characters with an underscore when sanitizing file names in restricted mode, replace them with their non-accented equivalents fixes #9347 2016-05-02 13:21:39 +10:00
Sergey M b6c0d4f431 Merge pull request #9110 from remitamine/parse_duration
[utils] imporove parse_duration to handle more formats
2016-04-21 22:53:16 +07:00
remitamine acaff49575 [utils] imporove parse_duration to handle more formats 2016-04-21 16:34:54 +01:00
Jaime Marquínez Ferrándiz eb9c3edd5e [test/utils] Add test for date_from_str 2016-04-09 22:40:05 +02:00
Yen Chi Hsuan 81f36eba88 [test/test_utils] Update for escape_url change (again) 2016-03-23 23:23:26 +08:00
Yen Chi Hsuan 2d60465e44 [test/test_utils] Update for escape_url change 2016-03-23 23:20:28 +08:00
Jaime Marquínez Ferrándiz 782b1b5bd1 [utils] lookup_unit_table: Match word boundary instead of end of string 2016-03-19 11:44:49 +01:00
Sergey M․ c5229f3926 [utils] PEP 8 2016-03-16 21:50:04 +06:00
remitamine 83548824c2 Merge pull request #8092 from bpfoley/twitter-thumbnail
[utils] Add extract_attributes for extracting html tag attributes
2016-03-16 13:16:27 +01:00
Sergey M․ fb47597b09 [bbc] Generalize unit table lookup and add parse_count 2016-03-13 16:27:20 +06:00
remitamine 3201a67f61 [test/test_utils] add more tests for update_url_query 2016-03-03 19:18:57 +01:00
remitamine fb640d0a3d [test/test_utils] add tests for update_url_query 2016-03-03 18:40:05 +01:00
Brian Foley 8bb56eeeea [utils] Add extract_attributes for extracting html tag attributes
This is much more robust than just using regexps, and handles all
the common scenarios, such as empty/no values, repeated attributes,
entity decoding, mixed case names, and the different possible value
quoting schemes.
2016-03-03 10:11:37 +00:00
Yen Chi Hsuan 5eb6bdced4 [utils] Multiple changes to base_n()
1. Renamed to encode_base_n()
2. Allow tables longer than 62 characters
3. Raise ValueError instead of AssertionError for invalid input data
4. Return the first character in the table instead of '0' for number 0
5. Add tests
2016-02-27 03:22:52 +08:00
Sergey M․ f160785c5c [utils] Remove AM/PM from unified_strdate patterns 2016-02-25 00:52:49 +06:00
Yen Chi Hsuan 5bc880b988 [utils] Add OHDave's RSA encryption function 2016-02-20 19:54:58 +08:00
Sergey M․ 8411229bd5 [utils] Allow dot in strip_jsonp 2016-02-07 19:47:09 +06:00
Sergey M․ 86296ad2cd [utils] Add ability to control skipping false values in dict_get 2016-02-07 08:13:04 +06:00
Sergey M․ cbecc9b903 [utils] Add dict_get convenience method 2016-02-07 06:12:53 +06:00
Sergey M․ 6b77d52b1f [test_utils] Add tests for encode_compat_str 2015-12-20 07:07:14 +06:00
Yen Chi Hsuan db2fe38b55 [utils] Support alternative timestamp format in TTML
Fixes #7608
2015-12-19 19:29:51 +08:00
Yen Chi Hsuan d631d5f9f2 [utils] Fix TTML conversion
Tolerate invalid timestamps (closes #7909)
2015-12-19 18:21:42 +08:00
Sergey M․ 31b2051e21 [utils] Add remove_quotes 2015-12-14 21:30:58 +06:00
Sergey M․ 9cb9a5df77 [utils] Check ext with trailing slash against the list of known extensions 2015-11-22 17:27:13 +06:00
Sergey M․ 5035536e3f [test_utils] Add tests for determine_ext 2015-11-22 06:33:52 +06:00
Sergey M․ 7aefc49c40 [utils] Skip invalid/non HTML entities (Closes #7518) 2015-11-16 20:20:16 +06:00
Jaime Marquínez Ferrándiz 6a75040278 [utils] unified_strdate: Return None if the date format can't be recognized (fixes #7340)
This issue was introduced with ae12bc3ebb, it returned 'None'.
2015-11-02 14:08:38 +01:00
Sergey M 30eecc6a04 Merge pull request #7296 from jaimeMF/xml_attrib_unicode
Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (…
2015-10-31 18:15:21 +00:00
Sergey M․ 578c074575 [utils] Support list of xpath in xpath_element 2015-10-31 22:39:44 +06:00
Sergey M․ 52c3a6e49d [utils] Improve parse_iso8601 2015-10-28 21:40:22 +06:00
Jaime Marquínez Ferrándiz 36e6f62cd0 Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (#7178)
Attributes aren't unicode objects, so they couldn't be directly used in info_dict fields (for example '--write-description' doesn't work with bytes).
2015-10-25 20:13:16 +01:00
Sergey M․ d01949dc89 [utils:js_to_json] Fix bad escape in double quoted strings 2015-10-20 23:09:51 +06:00
Sergey M․ f71264490c [test_utils] Add tests for cli option converters 2015-09-05 03:07:19 +06:00
Sergey M․ 87f70ab39d [test_utils] Add more tests for xpath 2015-09-05 00:36:16 +06:00
Sergey M․ ee114368ad [utils] Make value optional for find_xpath_attr
This allows selecting particular attributes by name but without specifying the value and similar to xpath syntax `[@attrib]`
2015-08-01 20:22:13 +06:00
Yen Chi Hsuan 9c29bc69f7 [utils] Improve parse_duration
Now dots are parsed. For example '87 Min.'
2015-07-22 23:15:22 +08:00
Yen Chi Hsuan 1b0427e6c4 [utils] Support TTML without default namespace
In a strict sense such TTML is invalid, but Yahoo uses it.
2015-05-19 00:45:01 +08:00
Yen Chi Hsuan 7dff03636a [utils] Support 'dur' field in TTML 2015-05-12 12:47:37 +08:00
Yen Chi Hsuan d39e0f05db [utils] Remove sanitize_url_path_consecutive_slashes()
This function is used only in SohuIE, which is updated to use a new
extraction logic.
2015-05-09 17:37:39 +08:00
Yen Chi Hsuan 0fe2ff78e6 [NBC] Enhance embedURL extraction (closes #2549) 2015-05-04 21:55:04 +08:00
Sergey M․ b3ed15b760 [utils] Add replace_extension 2015-05-02 23:23:06 +06:00
Sergey M․ a4bcaad773 [test_utils] Add tests for prepend_extension 2015-05-02 23:10:48 +06:00
Yen Chi Hsuan bf6427d2fb [ffmpeg] Add dfxp (TTML) subtitles support (#3432, #5146) 2015-04-25 23:18:27 +08:00
Yen Chi Hsuan 0a1603634b [utils] Remove url_infer_protocol 2015-04-08 21:39:34 +08:00
Yen Chi Hsuan 418c5cc3fc [udn] Add new extractor 2015-04-08 17:26:51 +08:00
Sergey M․ 8cf70de428 [test_utils] Add test for unified_strdate 2015-04-04 19:11:01 +06:00
Sergey M․ ba9e68f402 [utils] Drop trailing comma before closing brace 2015-04-04 17:48:55 +06:00
Naglis Jonaitis 91757b0f37 [utils] Escape all HTML entities written in hexadecimal form 2015-03-26 17:15:27 +02:00
Jaime Marquínez Ferrándiz 5379a2d40d [test/utils] Test xpath_text 2015-03-21 14:12:43 +01:00
Sergey M․ 92a4793b3c [utils] Place sanitize url function near other sanitizing functions 2015-03-17 21:34:22 +06:00
Sergey M․ dc03a42537 Merge branch 'sohu_fix' of https://github.com/yan12125/youtube-dl into yan12125-sohu_fix 2015-03-17 21:18:36 +06:00
Sergey M․ 2ebfeacabc [utils] Keep dot and dotdot unmodified (Closes #5171) 2015-03-10 00:50:11 +06:00
Sergey M․ f18ef2d144 [utils] Disallow trailing dot in sanitize_path for a path part 2015-03-08 22:08:48 +06:00
Sergey M․ a2aaf4dbc6 [utils] Add sanitize_path 2015-03-08 20:55:22 +06:00
Yen Chi Hsuan 55969016e9 [utils] Add a function to sanitize consecutive slashes in URLs 2015-03-06 12:43:49 +08:00
Philipp Hagemeister a7440261c5 [utils] Streap leading dots
Fixes #2865, closes #5087
2015-03-02 19:07:19 +01:00
Philipp Hagemeister 3e675fabe0 [airmozilla] Be more tolerant when nonessential items are missing (#5030) 2015-02-26 01:25:00 +01:00
Philipp Hagemeister 5a42414b9c [utils] Prevent hyphen at beginning of filename (Fixes #5035) 2015-02-24 11:38:01 +01:00
Philipp Hagemeister d305dd73a3 [utils] Fix js_to_json
Previously, the runtime could be atrocious for longer inputs.
2015-02-18 23:59:51 +01:00
Philipp Hagemeister 347de4931c [YoutubeDL] Add generic video filtering (Fixes #4916)
This functionality is intended to eventually encompass the current format filtering.
2015-02-10 03:32:24 +01:00
Philipp Hagemeister 9bb8e0a3f9 [wsj] Add new extractor (Fixes #4854) 2015-02-03 10:58:28 +01:00
Philipp Hagemeister 8f4b58d70e [ntvde] Add new extractor (Fixes #4850) 2015-02-02 21:48:54 +01:00
Philipp Hagemeister cfb56d1af3 Add --list-thumbnails 2015-01-25 02:43:19 +01:00
Philipp Hagemeister 61ca9a80b3 [generic] Add support for BOMs (Fixes #4753) 2015-01-23 01:21:30 +01:00
Naglis Jonaitis a69801e2c6 [utils] Add additional format to unified_strdate 2015-01-14 00:16:34 +02:00
Sergey M․ a5fb718c50 [test_utils] Add more tests for parse_duration 2015-01-12 21:39:58 +06:00
Philipp Hagemeister 2aeb06d6dc [utils] Improve colon handling (Fixes #4683) 2015-01-11 17:40:45 +01:00
Philipp Hagemeister 0590062925 Respect age_limit when listing extractors (Fixes #4653) 2015-01-07 07:20:20 +01:00
Philipp Hagemeister cae97f6521 Improve and test ffmpeg version detection 2014-12-14 21:59:59 +01:00
Philipp Hagemeister 42bdd9d051 [cinchcast] Add new extractor (Fixes #4428) 2014-12-12 02:57:36 +01:00
Philipp Hagemeister 47d7c64274 [test_utils] Make test more realistically (#4377) 2014-12-06 12:36:23 +01:00
Philipp Hagemeister 5f9b83944d [ffmpeg] Improve version check and call it from hls (Fixes #4377) 2014-12-06 12:14:26 +01:00
Philipp Hagemeister e8df5cee12 [minhateca] Fix duration parsing 2014-12-04 17:35:40 +01:00
Philipp Hagemeister 4349c07dd7 [minhateca] Add extractor (Fixes #4094) 2014-12-04 17:02:05 +01:00
Philipp Hagemeister e075a44afb [tests] Remove useless u prefixes 2014-11-26 13:07:32 +01:00
Philipp Hagemeister be64b5b098 [xminus] Simplify and extend (#4302) 2014-11-25 09:54:54 +01:00
Jouke Waleson 8bcc875676 PEP8: more applied 2014-11-23 21:20:46 +01:00
Jouke Waleson 5f6a1245ff PEP8 applied 2014-11-23 20:41:03 +01:00