The new parser no longer uses the browser DOM to validate
that a cosmetic filter is valid or not, this is now done
through a JS library, CSSTree.
This means filter list authors will have to be more careful
to ensure that a cosmetic filter is really valid, as there is
no more guarantee that a cosmetic filter which works for a
given browser/version will still work properly on another
browser, or different version of the same browser.
This change has become necessary because of many reasons,
one of them being the flakiness of the previous parser as
exposed by many issues lately:
- https://github.com/uBlockOrigin/uBlock-issues/issues/2262
- https://github.com/uBlockOrigin/uBlock-issues/issues/2228
The new parser introduces breaking changes, there was no way
to do otherwise. Some current procedural cosmetic filters will
be shown as invalid with this change. This occurs because the
CSSTree library gets confused with some syntax which was
previously allowed by the previous parser because it was more
permissive.
Mainly the issue is with the arguments passed to some procedural
cosmetic filters, and these issues can be solved as follow:
Use quotes around the argument. You can use either single or
double-quotes, whichever is most convenient. If your argument
contains a single quote, use double-quotes, and vice versa.
Additionally, try to escape a quote inside an argument using
backslash. THis may work, but if not, use quotes around the
argument.
When the parser encounter quotes around an argument, it will
discard them before trying to process the argument, same with
escaped quotes inside the argument. Examples:
Breakage:
...##^script:has-text(toscr')
Fix:
...##^script:has-text(toscr\')
Breakage:
...##:xpath(//*[contains(text(),"VPN")]):upward(2)
Fix:
...##:xpath('//*[contains(text(),"VPN")]'):upward(2)
There are not many filters which break in the default set of
filter lists, so this should be workable for default lists.
Unfortunately those fixes will break the filter for previous
versions of uBO since these to not deal with quoted argument.
In such case, it may be necessary to keep the previous filter,
which will be discarded as broken on newer version of uBO.
THis was a necessary change as the old parser was becoming
more and more flaky after being constantly patched for new
cases arising, The new parser should be far more robust and
stay robist through expanding procedural cosmetic filter
syntax.
Additionally, in the MV3 version, filters are pre-compiled
using a Nodejs script, i.e. outside the browser, so validating
cosmetic filters using a live DOM no longer made sense.
This new parser will have to be tested throughly before stable
release.
These functions were renamed in 2018, before the WebAssembly 1.0 spec
was finalized. wabt 1.0.25 dropped support for pre-1.0 names and the
sources fail to compile with errors like:
```
$ wat2wasm lz4-block-codec.wat
lz4-block-codec.wat:71:5: error: unexpected token get_local, expected ).
get_local $ilen
^^^^^^^^^
lz4-block-codec.wat:78:5: error: unexpected token get_local.
get_local $ilen
^^^^^^^^^
```
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/2002
The code was testing only the LSB of a 32-bit integer to detect
whether the current rule was a wildcard (`*`), while it had to
compare against the whole 32-bit integer.
The breakage occurred when the LSB of an offset to the character
buffer happened to match the ASCII code of `*` (42, 0x2A).
(An offset is used when a label is longer than 4 characters)
Too many changes to list here, essentially there is now a
user interface setting to enable/disable dark theme, and
I've rearranged a bit the Settings pane as a result and
also altered other visuals in various places.
There are places which I know have not been thoroughly
tested (i.e. logger inspector).
Will fine-tune as per feedback.
Issues with the classic popup panel will not be addressed,
and if feedback is that it has become unusuable, it will be
outright removed.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
The changes are enough to fulfill the related issue.
A new platform has been added in order to allow for building
a NodeJS package. From the root of the project:
./tools/make-nodejs
This will create new uBlock0.nodejs directory in the
./dist/build directory, which is a valid NodeJS package.
From the root of the package, you can try:
node test
This will instantiate a static network filtering engine,
populated by easylist and easyprivacy, which can be used
to match network requests by filling the appropriate
filtering context object.
The test.js file contains code which is typical example
of usage of the package.
Limitations: the NodeJS package can't execute the WASM
versions of the code since the WASM module requires the
use of fetch(), which is not available in NodeJS.
This is a first pass at modularizing the codebase, and
while at it a number of opportunistic small rewrites
have also been made.
This commit requires the minimum supported version for
Chromium and Firefox be raised to 61 and 60 respectively.
Regex-based static network filters are those most likely to
cause performance degradation, and as such the best guard
against undue performance degradation caused by regex-based
filters is the ability to extract valid and good tokens
from regex patterns.
This commit introduces a complete regex parser so that the
static network filtering engine can now safely extract
tokens regardless of the complexity of the regex pattern.
The regex parser is a library imported from:
https://github.com/foo123/RegexAnalyzer
The syntax highlighter adds an underline to regex-based
filters as a visual aid to filter authors so as to avoid
mistakenly creating regex-based filters. This commit
further colors the underline as a warning when a regex-based
filter is found to be untokenizable.
Filter list authors are invited to spot these untokenizable
regex-based filters in their lists to verify that no
mistake were made for those filters, causing them to be
untokenizabke. For example, what appears to be a mistake:
/^https?:\/\/.*\/sw.js?.[a-zA-Z0-9%]{50,}/
Though the mistake is minor, the regex-based filter above
is untokenizable as a result, and become tokenizable when
the `.` is properly escaped:
/^https?:\/\/.*\/sw\.js?.[a-zA-Z0-9%]{50,}/
Filter list authors can use this search expression in the
asset viewer to find instances of regex-based filters:
/^(@@)?\/[^\n]+\/(\$|$)/
Before this commit, CodeMirror's add-on for search occurrences
was limited to find at most 1000 first occurrences, because of
performance considerations.
This commit removes this low limit by having the search
occurrences done in a dedicated worker. The limit is now
time-based, and highly unlikely to ever be hit under normal
condition.
With this change, all search occurrences are gathered,
and as a result:
- All occurrences are reported in the scrollbar instead of
just the 1,000 first
- The total count of all occurrences is now reported, instead
of capping at "1000+".
- The current occurrence rank at the cursor or selection
position is now reported -- this was not possible to report
this before.
The number of occurrences is line-based, it's not useful to
report finer-grained occurences in uBO.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1134
CodeMirror's code folding reference:
- https://codemirror.net/doc/manual.html#addon_foldcode
This commit adds support for code-folding to the filter
list editor/viewer.
The following blocks of code are foldable by clicking the
corresponding marker in the gutter:
- !#if/#endif blocks
- !#include blocks
Addtionally, the following changes:
- The `!#include` line is now preserved when importing a
sublist
- The `!#if` directives will be syntax-colored according
to whether they evaluate to true or false on the current
platform
- Double-clicking on a foldable line in the gutter will
select the content of the foldable block
- Minor visual improvement to matching brackets
This commit adds CodeMirror's auto-completion capability
to the _My filters_ pane.
Currently, auto-completion is available for scriptlet
tokens: pressing ctrl-space while the text cursor is
positioned where a scriptlet token should appear will
cause auto-completion to kick-in. In case of ambiguity,
CodeMirror's widget to pick a specific scriptlet will
appear.
Rename `l` property to `len`, to avoid ambiguity as
`l` could mean _left_ or _length_. Typically `l` is
to be used for _left_ (whereas `r` is to be used for
_right_).
Additionally, add CodeMirror's bracket-matching and
bracket auto-closing to _My filters_ pane and and
bracket-matching to asset viewer page.
The Promise chain was not properly designed for WASM module
loading. This became apparent when removing WASM modules
from Opera build[1].
The problem was that errors thrown by fetch() -- used to
load WASM modules -- were not properly handled.
[1] Opera refuses updating uBO if there are unrecognized file
types in the package, and `.wasm`/`.wat` files are not
recognized by Opera uploader.