New cosmetic filter parser using CSSTree library

The new parser no longer uses the browser DOM to validate
that a cosmetic filter is valid or not, this is now done
through a JS library, CSSTree.

This means filter list authors will have to be more careful
to ensure that a cosmetic filter is really valid, as there is
no more guarantee that a cosmetic filter which works for a
given browser/version will still work properly on another
browser, or different version of the same browser.

This change has become necessary because of many reasons,
one of them being the flakiness of the previous parser as
exposed by many issues lately:

- https://github.com/uBlockOrigin/uBlock-issues/issues/2262
- https://github.com/uBlockOrigin/uBlock-issues/issues/2228

The new parser introduces breaking changes, there was no way
to do otherwise. Some current procedural cosmetic filters will
be shown as invalid with this change. This occurs because the
CSSTree library gets confused with some syntax which was
previously allowed by the previous parser because it was more
permissive.

Mainly the issue is with the arguments passed to some procedural
cosmetic filters, and these issues can be solved as follow:

Use quotes around the argument. You can use either single or
double-quotes, whichever is most convenient. If your argument
contains a single quote, use double-quotes, and vice versa.

Additionally, try to escape a quote inside an argument using
backslash. THis may work, but if not, use quotes around the
argument.

When the parser encounter quotes around an argument, it will
discard them before trying to process the argument, same with
escaped quotes inside the argument. Examples:

Breakage:

    ...##^script:has-text(toscr')

Fix:

    ...##^script:has-text(toscr\')

Breakage:

    ...##:xpath(//*[contains(text(),"VPN")]):upward(2)

Fix:

    ...##:xpath('//*[contains(text(),"VPN")]'):upward(2)

There are not many filters which break in the default set of
filter lists, so this should be workable for default lists.

Unfortunately those fixes will break the filter for previous
versions of uBO since these to not deal with quoted argument.
In such case, it may be necessary to keep the previous filter,
which will be discarded as broken on newer version of uBO.

THis was a necessary change as the old parser was becoming
more and more flaky after being constantly patched for new
cases arising, The new parser should be far more robust and
stay robist through expanding procedural cosmetic filter
syntax.

Additionally, in the MV3 version, filters are pre-compiled
using a Nodejs script, i.e. outside the browser, so validating
cosmetic filters using a live DOM no longer made sense.

This new parser will have to be tested throughly before stable
release.
This commit is contained in:
Raymond Hill 2022-09-23 16:03:13 -04:00
parent fe21ce5002
commit a71b71e4c8
No known key found for this signature in database
GPG Key ID: 25E1490B761470C2
14 changed files with 558 additions and 552 deletions

View File

@ -22,6 +22,8 @@
// For background page or non-background pages
/* global browser */
'use strict';
/******************************************************************************/
@ -89,6 +91,9 @@ vAPI.webextFlavor = {
soup.add('chromium')
.add('user_stylesheet');
flavor.major = parseInt(match[1], 10) || 0;
if ( flavor.major >= 105 ) {
soup.add('native_css_has');
}
}
// Don't starve potential listeners

View File

@ -55,7 +55,7 @@ const outputDir = commandLineArgs.get('output') || '.';
const cacheDir = `${outputDir}/../mv3-data`;
const rulesetDir = `${outputDir}/rulesets`;
const scriptletDir = `${rulesetDir}/js`;
const env = [ 'chromium', 'ubol' ];
const env = [ 'chromium', 'ubol', 'native_css_has' ];
/******************************************************************************/

View File

@ -37,6 +37,7 @@
<div class="li"><span><a href="https://github.com/foo123/RegexAnalyzer" target="_blank">Regular Expression Analyzer</a> by <a href="https://github.com/foo123">Nikos M.</a></span></div>
<div class="li"><span><a href="https://github.com/hsluv/hsluv" target="_blank">HSLuv - Human-friendly HSL</a> by <a href="https://github.com/boronine">Alexei Boronine</a></span></div>
<div class="li"><span><a href="https://searchfox.org/mozilla-central/rev/d317e93d9a59c9e4c06ada85fbff9f6a1ceaaad1/browser/extensions/webcompat/shims/google-ima.js" target="_blank">google-ima.js</a> by <a href="https://www.mozilla.org/">Mozilla</a></span></div>
<div class="li"><span><a href="https://github.com/csstree/csstree" target="_blank">CSSTree</a> by <a href="https://github.com/lahmatiy">Roman Dvornov</a></span></div>
</div>
<div class="li"><span data-i18n="aboutCDNs"></span></div>
<div class="liul">

View File

@ -95,19 +95,6 @@ class PSelectorMatchesCSSTask extends PSelectorTask {
}
}
}
class PSelectorMatchesCSSAfterTask extends PSelectorMatchesCSSTask {
constructor(task) {
super(task);
this.pseudo = '::after';
}
}
class PSelectorMatchesCSSBeforeTask extends PSelectorMatchesCSSTask {
constructor(task) {
super(task);
this.pseudo = '::before';
}
}
class PSelectorMatchesMediaTask extends PSelectorTask {
constructor(task) {
@ -247,6 +234,20 @@ class PSelectorSpathTask extends PSelectorTask {
output.push(node);
}
}
// Helper method for other operators.
static qsa(node, selector) {
const parent = node.parentElement;
if ( parent === null ) { return []; }
let pos = 1;
for (;;) {
node = node.previousElementSibling;
if ( node === null ) { break; }
pos += 1;
}
return parent.querySelectorAll(
`:scope > :nth-child(${pos})${selector}`
);
}
}
class PSelectorUpwardTask extends PSelectorTask {
@ -339,23 +340,20 @@ class PSelector {
constructor(o) {
if ( PSelector.prototype.operatorToTaskMap === undefined ) {
PSelector.prototype.operatorToTaskMap = new Map([
[ ':has', PSelectorIfTask ],
[ ':has-text', PSelectorHasTextTask ],
[ ':if', PSelectorIfTask ],
[ ':if-not', PSelectorIfNotTask ],
[ ':matches-css', PSelectorMatchesCSSTask ],
[ ':matches-css-after', PSelectorMatchesCSSAfterTask ],
[ ':matches-css-before', PSelectorMatchesCSSBeforeTask ],
[ ':matches-media', PSelectorMatchesMediaTask ],
[ ':matches-path', PSelectorMatchesPathTask ],
[ ':min-text-length', PSelectorMinTextLengthTask ],
[ ':not', PSelectorIfNotTask ],
[ ':nth-ancestor', PSelectorUpwardTask ],
[ ':others', PSelectorOthersTask ],
[ ':spath', PSelectorSpathTask ],
[ ':upward', PSelectorUpwardTask ],
[ ':watch-attr', PSelectorWatchAttrs ],
[ ':xpath', PSelectorXpathTask ],
[ 'has', PSelectorIfTask ],
[ 'has-text', PSelectorHasTextTask ],
[ 'if', PSelectorIfTask ],
[ 'if-not', PSelectorIfNotTask ],
[ 'matches-css', PSelectorMatchesCSSTask ],
[ 'matches-media', PSelectorMatchesMediaTask ],
[ 'matches-path', PSelectorMatchesPathTask ],
[ 'min-text-length', PSelectorMinTextLengthTask ],
[ 'not', PSelectorIfNotTask ],
[ 'others', PSelectorOthersTask ],
[ 'spath', PSelectorSpathTask ],
[ 'upward', PSelectorUpwardTask ],
[ 'watch-attr', PSelectorWatchAttrs ],
[ 'xpath', PSelectorXpathTask ],
]);
}
this.raw = o.raw;
@ -374,7 +372,12 @@ class PSelector {
prime(input) {
const root = input || document;
if ( this.selector === '' ) { return [ root ]; }
return Array.from(root.querySelectorAll(this.selector));
let selector = this.selector;
if ( input !== document && /^ [>+~]/.test(this.selector) ) {
return Array.from(PSelectorSpathTask.qsa(input, this.selector));
}
const elems = root.querySelectorAll(selector);
return Array.from(elems);
}
exec(input) {
let nodes = this.prime(input);
@ -453,7 +456,7 @@ class ProceduralFilterer {
let style, styleToken;
if ( selector.action === undefined ) {
style = vAPI.hideStyle;
} else if ( selector.action[0] === ':style' ) {
} else if ( selector.action[0] === 'style' ) {
style = selector.action[1];
}
if ( style !== undefined ) {

View File

@ -429,7 +429,7 @@ FilterContainer.prototype.compileGenericHideSelector = function(
// https://github.com/uBlockOrigin/uBlock-issues/issues/131
// Support generic procedural filters as per advanced settings.
// TODO: prevent double compilation.
if ( compiled !== raw ) {
if ( compiled.charCodeAt(0) === 0x7B /* '{' */ ) {
if ( µb.hiddenSettings.allowGenericProceduralFilters === true ) {
return this.compileSpecificSelector(parser, '', false, writer);
}
@ -830,12 +830,12 @@ FilterContainer.prototype.cssRuleFromProcedural = function(json) {
let mq;
if ( tasks !== undefined ) {
if ( tasks.length > 1 ) { return; }
if ( tasks[0][0] !== ':matches-media' ) { return; }
if ( tasks[0][0] !== 'matches-media' ) { return; }
mq = tasks[0][1];
}
let style;
if ( Array.isArray(action) ) {
if ( action[0] !== ':style' ) { return; }
if ( action[0] !== 'style' ) { return; }
style = action[1];
}
if ( mq === undefined && style === undefined ) { return; }

View File

@ -834,7 +834,10 @@ const startPicker = function() {
$id('candidateFilters').addEventListener('click', onCandidateClicked);
$stor('#resultsetDepth input').addEventListener('input', onDepthChanged);
$stor('#resultsetSpecificity input').addEventListener('input', onSpecificityChanged);
staticFilteringParser = new StaticFilteringParser({ interactive: true });
staticFilteringParser = new StaticFilteringParser({
interactive: true,
nativeCssHas: vAPI.webextFlavor.env.includes('native_css_has'),
});
};
/******************************************************************************/

View File

@ -1721,7 +1721,9 @@ const getURLFilteringData = function(details) {
};
const compileTemporaryException = function(filter) {
const parser = new StaticFilteringParser();
const parser = new StaticFilteringParser({
nativeCssHas: vAPI.webextFlavor.env.includes('native_css_has'),
});
parser.analyze(filter);
if ( parser.shouldDiscard() ) { return; }
return staticExtFilteringEngine.compileTemporary(parser);

View File

@ -134,7 +134,9 @@ const fromNetFilter = async function(rawFilter) {
if ( typeof rawFilter !== 'string' || rawFilter === '' ) { return; }
const writer = new CompiledListWriter();
const parser = new StaticFilteringParser();
const parser = new StaticFilteringParser({
nativeCssHas: vAPI.webextFlavor.env.includes('native_css_has'),
});
parser.setMaxTokenLength(staticNetFilteringEngine.MAX_TOKEN_LENGTH);
parser.analyze(rawFilter);

View File

@ -97,6 +97,7 @@ function addExtendedToDNR(context, parser) {
if ( bad ) { continue; }
if ( hn.endsWith('.*') ) { continue; }
const { compiled, exception } = parser.result;
if ( typeof compiled !== 'string' ) { continue; }
if ( compiled.startsWith('{') ) { continue; }
if ( exception ) { continue; }
let details = context.cosmeticFilters.get(compiled);
@ -126,14 +127,14 @@ function addExtendedToDNR(context, parser) {
/******************************************************************************/
function addToDNR(context, list) {
const env = context.env || [];
const writer = new CompiledListWriter();
const lineIter = new LineIterator(
StaticFilteringParser.utils.preparser.prune(
list.text,
context.env || []
)
StaticFilteringParser.utils.preparser.prune(list.text, env)
);
const parser = new StaticFilteringParser();
const parser = new StaticFilteringParser({
nativeCssHas: env.includes('native_css_has'),
});
const compiler = staticNetFilteringEngine.createCompiler(parser);
writer.properties.set('name', list.name);
@ -180,10 +181,9 @@ function addToDNR(context, list) {
/******************************************************************************/
async function dnrRulesetFromRawLists(lists, options = {}) {
const context = {};
const context = Object.assign({}, options);
staticNetFilteringEngine.dnrFromCompiled('begin', context);
context.extensionPaths = new Map(options.extensionPaths || []);
context.env = options.env;
context.extensionPaths = new Map(context.extensionPaths || []);
const toLoad = [];
const toDNR = (context, list) => addToDNR(context, list);
for ( const list of lists ) {

File diff suppressed because it is too large Load Diff

View File

@ -19,7 +19,7 @@
Home: https://github.com/gorhill/uBlock
*/
/* globals WebAssembly */
/* globals browser, WebAssembly */
'use strict';
@ -973,7 +973,10 @@ self.addEventListener('hiddenSettingsChanged', ( ) => {
// Useful references:
// https://adblockplus.org/en/filter-cheatsheet
// https://adblockplus.org/en/filters
const parser = new StaticFilteringParser({ expertMode });
const parser = new StaticFilteringParser({
expertMode,
nativeCssHas: vAPI.webextFlavor.env.includes('native_css_has'),
});
const compiler = staticNetFilteringEngine.createCompiler(parser);
const lineIter = new LineIterator(
parser.utils.preparser.prune(rawText, vAPI.webextFlavor.env)

19
src/lib/csstree/LICENSE Normal file
View File

@ -0,0 +1,19 @@
Copyright (C) 2016-2022 by Roman Dvornov
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

File diff suppressed because one or more lines are too long

View File

@ -23,6 +23,7 @@ cp src/js/uri-utils.js $DES/js
cp src/js/url-net-filtering.js $DES/js
mkdir -p $DES/lib
cp -R src/lib/csstree $DES/lib/
cp -R src/lib/punycode.js $DES/lib/
cp -R src/lib/regexanalyzer $DES/lib/
cp -R src/lib/publicsuffixlist $DES/lib/