Richard van der Hoff
be31adb036
Fix logcontext leak in media repo
...
Make FileResponder.write_to_consumer uphold the logcontext contract
2018-05-02 16:14:50 +01:00
Richard van der Hoff
dbf6f28d64
Merge pull request #3155 from NotAFile/py3-bytes-1
...
more bytes strings
2018-04-30 00:38:21 +01:00
Richard van der Hoff
aab2e4da60
Merge pull request #3140 from matrix-org/rav/use_run_in_background
...
Use run_in_background in preference to preserve_fn
2018-04-30 00:34:28 +01:00
Richard van der Hoff
9e2601f830
Merge pull request #3108 from NotAFile/py3-six-urlparse
...
Use six.moves.urlparse
2018-04-30 00:33:05 +01:00
Adrian Tschira
e9143b6593
more bytes strings
...
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-29 00:13:57 +02:00
Richard van der Hoff
fc149b4eeb
Merge remote-tracking branch 'origin/develop' into rav/use_run_in_background
2018-04-27 14:31:23 +01:00
Richard van der Hoff
2a13af23bc
Use run_in_background in preference to preserve_fn
...
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Richard van der Hoff
9255a6cb17
Improve exception handling for background processes
...
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.
This is unsatisfactory for a number of reasons:
- logging on garbage collection is best-effort and may happen some time after
the error, if at all
- it can be hard to figure out where the error actually happened.
- it is logged as a scary CRITICAL error which (a) I always forget to grep for
and (b) it's not really CRITICAL if a background process we don't care about
fails.
So this is an attempt to add exception handling to everything we fire off into
the background.
2018-04-27 11:07:40 +01:00
Adrian Tschira
2a3c33ff03
Use six.moves.urlparse
...
The imports were shuffled around a bunch in py3
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-15 21:22:43 +02:00
Adrian Tschira
4f40d058cc
Replace old-style raise with six.reraise
...
The old style raise is invalid syntax in python3. As noted in the docs,
this adds one more frame in the traceback, but I think this is
acceptable:
<ipython-input-7-bcc5cba3de3f> in <module>()
16 except:
17 pass
---> 18 six.reraise(*x)
/usr/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
691 if value.__traceback__ is not tb:
692 raise value.with_traceback(tb)
--> 693 raise value
694 finally:
695 value = None
<ipython-input-7-bcc5cba3de3f> in <module>()
9
10 try:
---> 11 x()
12 except:
13 x = sys.exc_info()
Also note that this uses six, which is not formally a dependency yet,
but is included indirectly since most packages depend on it.
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-06 23:06:24 +02:00
Erik Johnston
fa72803490
Merge branch 'master' of github.com:matrix-org/synapse into develop
2018-03-19 11:41:01 +00:00
Erik Johnston
926ba76e23
Replace ujson with simplejson
2018-03-15 23:43:31 +00:00
Erik Johnston
92c52df702
Make store_file use store_into_file
2018-02-14 17:55:18 +00:00
Erik Johnston
5fa571a91b
Tell storage providers about new file so they can upload
2018-02-07 13:35:08 +00:00
Erik Johnston
1f881e0746
Merge pull request #2791 from matrix-org/erikj/media_storage_refactor
...
Ensure media is in local cache before thumbnailing
2018-02-05 11:28:52 +00:00
Richard van der Hoff
d5352cbba8
Handle url_previews with no content-type
...
avoid failing with an exception if the remote server doesn't give us a
Content-Type header.
Also, clean up the exception handling a bit.
2018-02-02 00:53:46 +00:00
Matthew Hodgson
ab9f844aaf
Add federation_domain_whitelist option ( #2820 )
...
Add federation_domain_whitelist
gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Richard van der Hoff
b0d9e633ee
Merge pull request #2814 from matrix-org/rav/fix_urlcache_thumbs
...
Use the right path for url_preview thumbnails
2018-01-19 18:57:15 +00:00
Richard van der Hoff
ad7ec63d08
Use the right path for url_preview thumbnails
...
This was introduced by #2627 : we were overwriting the original media for url
previews with the thumbnails :/
(fixes https://github.com/vector-im/riot-web/issues/6012 , hopefully)
2018-01-19 18:29:39 +00:00
Erik Johnston
cd871a3057
Fix storage provider bug introduced when renamed to store_local
2018-01-18 18:37:59 +00:00
Erik Johnston
8ff6726c0d
Merge pull request #2812 from matrix-org/erikj/media_storage_provider_config
...
Make storage providers configurable
2018-01-18 18:33:57 +00:00
Erik Johnston
3fe2bae857
Missing staticmethod
2018-01-18 17:11:45 +00:00
Erik Johnston
aae77da73f
Fixup comments
2018-01-18 17:11:29 +00:00
Erik Johnston
9a89dae8c5
Fix typo in thumbnail resource causing access times to be incorrect
2018-01-18 15:06:24 +00:00
Erik Johnston
0af5dc63a8
Make storage providers more configurable
2018-01-18 14:07:21 +00:00
Erik Johnston
2cf6a7bc20
Use better file consumer
2018-01-18 12:00:46 +00:00
Erik Johnston
4a53f3a3e8
Ensure media is in local cache before thumbnailing
2018-01-18 12:00:46 +00:00
Erik Johnston
300edc2348
Update last access time when thumbnails are viewed
2018-01-17 10:24:43 +00:00
Erik Johnston
05f98a2224
Keep track of last access time for local media
2018-01-17 10:24:43 +00:00
Erik Johnston
d728c47142
Add docstring
2018-01-17 10:06:14 +00:00
Erik Johnston
d863f68cab
Use local vars
2018-01-16 16:24:15 +00:00
Erik Johnston
6368e5c0ab
Change _generate_thumbnails to take media_type
2018-01-16 16:17:38 +00:00
Erik Johnston
0a90d9ede4
Move setting of file_id up to caller
2018-01-16 16:03:05 +00:00
Erik Johnston
5dfc83704b
Fix typo
2018-01-16 14:32:56 +00:00
Erik Johnston
307f88dfb6
Fix up log lines
2018-01-16 13:53:52 +00:00
Erik Johnston
9795b9ebb1
Correctly use server_name/file_id when generating/fetching remote thumbnails
2018-01-16 12:02:06 +00:00
Erik Johnston
c5b589f2e8
Log when we respond with 404
2018-01-16 12:01:40 +00:00
Erik Johnston
a4c5e4a645
Fix thumbnailing remote files
2018-01-16 11:37:50 +00:00
Erik Johnston
1159abbdd2
Merge pull request #2767 from matrix-org/erikj/media_storage_refactor
...
Refactor MediaRepository to separate out storage
2018-01-16 10:23:50 +00:00
Richard van der Hoff
21bf87a146
Reinstate media download on thumbnail request
...
We need to actually download the remote media when we get a request for a
thumbnail.
2018-01-12 15:38:06 +00:00
Erik Johnston
694f1c1b18
Fix up comments
2018-01-12 15:02:46 +00:00
Erik Johnston
e21370ba54
Correctly reraise exception
2018-01-12 14:44:02 +00:00
Erik Johnston
85a4d78213
Make Responder a context manager
2018-01-12 13:32:03 +00:00
Erik Johnston
dcc8eded41
Add missing class var
2018-01-12 13:16:27 +00:00
Erik Johnston
1e4edd1717
Remove unnecessary condition
2018-01-12 11:28:32 +00:00
Erik Johnston
c6c009603c
Remove unused variables
2018-01-12 11:24:05 +00:00
Erik Johnston
4d88958cf6
Make class var local
2018-01-12 11:23:54 +00:00
Erik Johnston
227c491510
Comments
2018-01-12 11:22:41 +00:00
Erik Johnston
8f03aa9f61
Add StorageProvider concept
2018-01-09 16:16:12 +00:00
Erik Johnston
2442e9876c
Make PreviewUrlResource use MediaStorage
2018-01-09 16:15:07 +00:00
Erik Johnston
9d30a7691c
Make ThumbnailResource use MediaStorage
2018-01-09 16:15:07 +00:00
Erik Johnston
9e20840e02
Use MediaStorage for remote media
2018-01-09 16:15:07 +00:00
Erik Johnston
dd3092c3a3
Use MediaStorage for local files
2018-01-09 16:15:07 +00:00
Erik Johnston
ada470bccb
Add MediaStorage class
2018-01-09 16:15:07 +00:00
Erik Johnston
1ee787912b
Add some helper classes
2018-01-09 16:15:07 +00:00
Erik Johnston
47ca5eb882
Split out add_file_headers
2018-01-09 16:15:07 +00:00
Erik Johnston
b6c9deffda
Remove dead TODO
2018-01-09 15:53:23 +00:00
Erik Johnston
b30cd5b107
Remove dead code related to default thumbnails
2018-01-09 14:38:33 +00:00
Richard van der Hoff
5a4da5bf78
Merge pull request #2697 from matrix-org/rav/fix_urlcache_index_error
...
Fix error on sqlite 3.7
2017-11-27 12:25:48 +00:00
Richard van der Hoff
8132a6b7ac
Fix OPTIONS on preview_url
...
Fixes #2706
2017-11-23 17:52:31 +00:00
Richard van der Hoff
2908f955d1
Check database in has_completed_background_updates
...
so that the right thing happens on workers.
2017-11-22 18:02:15 +00:00
Richard van der Hoff
7098b65cb8
Fix error on sqlite 3.7
...
Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.
To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.
Fixes https://github.com/matrix-org/synapse/issues/2572 .
2017-11-21 11:14:17 +00:00
Richard van der Hoff
5d15abb120
Bit more logging
2017-11-10 16:58:04 +00:00
Richard van der Hoff
46790f50cf
Cache failures in url_preview handler
...
Reshuffle the caching logic in the url_preview handler so that failures are
cached (and to generally simplify things and fix the logcontext leaks).
2017-11-10 16:50:50 +00:00
Maxime Vaillancourt
5287e57c86
Ignore noscript tags when generating URL previews
2017-10-25 20:44:34 -04:00
Richard van der Hoff
eaaabc6c4f
replace 'except:' with 'except Exception:'
...
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff
d03cfc4258
Fix a logcontext leak in the media repo
2017-10-23 14:34:27 +01:00
Erik Johnston
bd5718d0ad
Fix typo in thumbnail generation
2017-10-19 10:27:18 +01:00
Krombel
a6245478c8
fix thumbnailing ( #2548 )
...
in commit 0e28281a
the code for thumbnailing got refactored and the
renaming of this variables was not done correctly.
Signed-Off-by: Matthias Kesler <krombel@krombel.de>
2017-10-17 12:45:33 +02:00
Erik Johnston
1b6b0b1e66
Add try/finally block to close t_byte_source
2017-10-13 15:34:08 +01:00
Erik Johnston
6b725cf56a
Remove old comment
2017-10-13 15:23:41 +01:00
Erik Johnston
2b24416e90
Don't reuse source but instead copy from primary media store to backup
2017-10-13 14:11:34 +01:00
Erik Johnston
b92a8e6e4a
PEP8
2017-10-13 13:58:57 +01:00
Erik Johnston
31aa7bd8d1
Move type into key
2017-10-13 13:47:38 +01:00
Erik Johnston
ad1911bbf4
Comment
2017-10-13 13:47:05 +01:00
Erik Johnston
c021c39cbd
Remove spurious addition
2017-10-13 13:46:53 +01:00
Erik Johnston
1f43d22397
Don't needlessly rename variable
2017-10-13 11:42:07 +01:00
Erik Johnston
a675bd08bd
Add paths back in...
2017-10-13 11:41:06 +01:00
Erik Johnston
4d7e1dde70
Remove unnecessary diff
2017-10-13 11:36:32 +01:00
Erik Johnston
ae5d18617a
Make things be absolute paths again
2017-10-13 11:35:44 +01:00
Erik Johnston
9732ec6797
s/write_to_file/write_to_file_and_backup/
2017-10-13 11:34:41 +01:00
Erik Johnston
0e28281a02
Fix up
2017-10-13 11:33:49 +01:00
Erik Johnston
505371414f
Fix up thumbnailing function
2017-10-13 11:23:53 +01:00
Erik Johnston
e3428d26ca
Fix typo
2017-10-13 10:39:59 +01:00
Erik Johnston
35332298ef
Fix up comments
2017-10-13 10:39:32 +01:00
Erik Johnston
64db043a71
Move makedirs to thread
2017-10-13 10:25:01 +01:00
Erik Johnston
b60859d6cc
Use make_deferred_yieldable
2017-10-13 10:24:19 +01:00
Erik Johnston
d76621a47b
Fix comments
2017-10-12 18:16:25 +01:00
Erik Johnston
4ae85ae121
Don't close prematurely..
2017-10-12 17:57:31 +01:00
Erik Johnston
cc505b4b5e
getvalue closes buffer
2017-10-12 17:52:30 +01:00
Erik Johnston
1259a76047
Get len before close
2017-10-12 17:39:23 +01:00
Erik Johnston
802ca12d05
Don't close file prematurely
2017-10-12 17:37:21 +01:00
Erik Johnston
e283b555b1
Copy everything to backup
2017-10-12 17:31:24 +01:00
Erik Johnston
b77a13812c
Typo
2017-10-12 15:32:32 +01:00
Erik Johnston
6dfde6d485
Remove dead code
2017-10-12 15:30:26 +01:00
Erik Johnston
c8eeef6947
Fix typos
2017-10-12 15:28:24 +01:00
Erik Johnston
67cb89fbdf
Fix typo
2017-10-12 15:23:41 +01:00
Erik Johnston
bf4fb1fb40
Basic implementation of backup media store
2017-10-12 15:20:59 +01:00
Erik Johnston
d5694ac5fa
Only log if we've removed media
2017-09-28 16:08:08 +01:00
Erik Johnston
7cc483aa0e
Clear up expired url cache every 10s
2017-09-28 13:56:53 +01:00
Erik Johnston
e1e7d76cf1
Actually assign result to variable
2017-09-28 13:55:29 +01:00
Erik Johnston
5f501ec7e2
Fix typo in url cache expiry timer
2017-09-28 12:59:01 +01:00
Erik Johnston
ace8079086
Support new and old style media id formats
2017-09-28 12:52:51 +01:00
Erik Johnston
ae79764fe5
Change expires column to expires_ts
2017-09-28 12:37:53 +01:00
Erik Johnston
9ccb4226ba
Delete expired url cache data
2017-09-28 12:18:06 +01:00
Erik Johnston
7fe8ed1787
Store URL cache preview downloads seperately
...
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Erik Johnston
b8b936a6ea
Add API to quarantine media
2017-06-19 17:39:21 +01:00
Erik Johnston
48d2949416
Throw exception when not retrying when downloading media
2017-06-13 10:23:14 +01:00
Matthew Hodgson
836d5c44b6
actually trim oversize og:description meta
2017-05-22 21:14:20 +01:00
Erik Johnston
d12ae7fd1c
Don't log exceptions for NotRetryingDestination
2017-05-15 15:42:18 +01:00
Richard van der Hoff
1d09586599
Address review comments
...
- don't blindly proxy all HTTPRequestExceptions
- log unexpected exceptions at error
- avoid `isinstance`
- improve docs on `from_http_response_exception`
2017-03-14 14:15:37 +00:00
Richard van der Hoff
170ccc9de5
Fix routing loop when fetching remote media
...
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.
This should fix a routing loop where the server keeps forwarding back to
itself.
Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
2017-03-13 16:30:36 +00:00
Jurek
aea5461488
Fix dynamic thumbnails aspect
2017-02-24 22:43:27 +01:00
Mark Haines
32019c9897
Log which files we saved attachments to in the media_repository
2017-01-10 14:19:50 +00:00
Erik Johnston
f7085ac84f
Name linearizer's for better logs
2017-01-09 17:17:10 +00:00
Marcin Bachry
24c16fc349
Fix crash in url preview when html tag has no text
...
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg
32c8b5507c
preview_url_resource: Ellipsis must be in unicode string
...
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Mark Haines
b1c27975d0
Set CORs headers on responses from the media repo
2016-11-02 11:29:25 +00:00
Erik Johnston
d51b8a1674
Add quotes and be explicity about script-src
2016-09-05 17:35:01 +01:00
Erik Johnston
662b031a30
Allow PDF to be rendered from media repo
2016-09-05 17:25:26 +01:00
Erik Johnston
0af9e1a637
Set `Content-Security-Policy` on media repo
...
This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files.
2016-08-17 16:27:39 +01:00
Erik Johnston
f90b3d83a3
Add None check to _iterate_over_text
2016-08-17 15:17:17 +01:00
Erik Johnston
109a560905
Flake8
2016-08-16 14:57:21 +01:00
Erik Johnston
48b5829aea
Fix up preview URL API. Add tests.
...
This includes:
- Splitting out methods of a class into stand alone functions, to make
them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston
5bcccfde6c
Don't include html comments in description
2016-08-05 14:45:11 +01:00
Erik Johnston
b5525c76d1
Typo
2016-08-04 16:10:08 +01:00
Erik Johnston
e97648c4e2
Test summarization
2016-08-04 16:09:09 +01:00
Erik Johnston
58c9653c6b
Don't infer paragrahs from newlines
2016-08-02 18:50:24 +01:00
Erik Johnston
6b58ade2f0
Comment on why we clone
2016-08-02 18:41:22 +01:00
Erik Johnston
9e66c58ceb
Spelling.
2016-08-02 18:37:31 +01:00
Erik Johnston
f83f5fbce8
Make it actually compile
2016-08-02 18:32:42 +01:00
Erik Johnston
aecaec3e10
Change the way we summarize URLs
...
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.
Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston
f52cb4cd78
Remove race
2016-06-29 15:24:50 +01:00
Erik Johnston
a70688445d
Implement purge_media_cache admin API
2016-06-29 14:57:59 +01:00
Erik Johnston
314b146b2e
Track approximate last access time for remote media
2016-06-29 11:41:20 +01:00
Erik Johnston
09a17f965c
Line lengths
2016-06-15 16:58:12 +01:00
Erik Johnston
1e9026e484
Handle floats as img widths
2016-06-15 16:58:05 +01:00
Erik Johnston
a60169ea09
Handle og props with not content
2016-06-15 16:57:48 +01:00
Erik Johnston
eba4ff1bcb
502 on /thumbnail when can't contact remote server
2016-06-09 11:29:43 +01:00
Mark Haines
eb79110beb
Clean up the blacklist/whitelist handling.
...
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines
8d7ad44331
Report per request metrics for all of the things using request_handler
2016-04-28 10:57:49 +01:00
Erik Johnston
e8884e5e9c
Add self.media_repo to PreviewUrlResource
2016-04-19 14:51:34 +01:00
Erik Johnston
a7001c311b
_make_dirs was moved to MediaRepository
2016-04-19 14:49:31 +01:00
Erik Johnston
9181e2f4c7
Add store to PreviewUrlResource
2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7
Reorder imports
2016-04-19 14:45:05 +01:00
Erik Johnston
0c93df89b6
Move MediaRepository to media_repository module
2016-04-19 11:31:43 +01:00
Erik Johnston
43f0941e8f
Split out BaseMediaResource into MediaRepository
...
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e
explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better
2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0
fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong
2016-04-15 13:20:08 +01:00
Matthew Hodgson
f78b479118
fix urlparse import thinko breaking tiny URLs
2016-04-14 15:23:55 +01:00