Improve URL previews by not including the content of media tags in the generated description. (#12887)

This commit is contained in:
reivilibre 2022-05-26 16:07:27 +01:00 committed by GitHub
parent 9385cd0633
commit 317248d42c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 10 additions and 1 deletions

1
changelog.d/12887.misc Normal file
View File

@ -0,0 +1 @@
Improve URL previews by not including the content of media tags in the generated description.

View File

@ -246,7 +246,9 @@ def parse_html_description(tree: "etree.Element") -> Optional[str]:
Grabs any text nodes which are inside the <body/> tag, unless they are within
an HTML5 semantic markup tag (<header/>, <nav/>, <aside/>, <footer/>), or
if they are within a <script/> or <style/> tag.
if they are within a <script/>, <svg/> or <style/> tag, or if they are within
a tag whose content is usually only shown to old browsers
(<iframe/>, <video/>, <canvas/>, <picture/>).
This is a very very very coarse approximation to a plain text render of the page.
@ -268,6 +270,12 @@ def parse_html_description(tree: "etree.Element") -> Optional[str]:
"script",
"noscript",
"style",
"svg",
"iframe",
"video",
"canvas",
"img",
"picture",
etree.Comment,
)