Fix sanitizer config - multiple rules (#11133)

In #9888, it was reported that my earlier pull request #9075 didn't quite function as expected. I was quite hopeful the `ValuesWithShadow()` worked as expected (and, I thought my testing showed it did) but I guess not. @zeripath proposed an alternative syntax which I like:

```ini
[markup.sanitizer.1]
ELEMENT=a
ALLOW_ATTR=target
REGEXP=something
[markup.sanitizer.2]
ELEMENT=a
ALLOW_ATTR=target
REGEXP=something
```

This was quite easy to adopt into the existing code. I've done so in a semi-backwards-compatible manner:

 - The value from `.Value()` is used for each element.
 - We parse `[markup.sanitizer]` and all `[markup.sanitizer.*]` sections and add them as rules.

This means that existing configs will load one rule (not all rules). It also means people can use string identifiers (`[markup.sanitiser.KaTeX]`) if they prefer, instead of numbered ones.

Co-authored-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: guillep2k <18600385+guillep2k@users.noreply.github.com>
This commit is contained in:
Alexander Scheel 2020-04-29 07:34:59 -04:00 committed by GitHub
parent 6b6f20b6d4
commit 1bf9e44bda
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 39 additions and 40 deletions

View File

@ -976,8 +976,10 @@ SHOW_FOOTER_VERSION = true
; Show template execution time in the footer ; Show template execution time in the footer
SHOW_FOOTER_TEMPLATE_LOAD_TIME = true SHOW_FOOTER_TEMPLATE_LOAD_TIME = true
[markup.sanitizer] [markup.sanitizer.1]
; The following keys can be used multiple times to define sanitation policy rules. ; The following keys can appear once to define a sanitation policy rule.
; This section can appear multiple times by adding a unique alphanumeric suffix to define multiple rules.
; e.g., [markup.sanitizer.1] -> [markup.sanitizer.2] -> [markup.sanitizer.TeX]
;ELEMENT = span ;ELEMENT = span
;ALLOW_ATTR = class ;ALLOW_ATTR = class
;REGEXP = ^(info|warning|error)$ ;REGEXP = ^(info|warning|error)$

View File

@ -658,7 +658,7 @@ Two special environment variables are passed to the render command:
Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc. Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc.
```ini ```ini
[markup.sanitizer] [markup.sanitizer.TeX]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally ; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context. ; with "inline" or "display" classes depending on context.
ELEMENT = span ELEMENT = span
@ -670,7 +670,7 @@ REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
- `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty. - `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty.
- `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute. - `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute.
You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. Multiple sanitisation rules can be defined by adding unique subsections, e.g. `[markup.sanitizer.TeX-2]`.
## Time (`time`) ## Time (`time`)

View File

@ -73,7 +73,7 @@ IS_INPUT_FILE = false
If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/). If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/).
```ini ```ini
[markup.sanitizer] [markup.sanitizer.TeX]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally ; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context. ; with "inline" or "display" classes depending on context.
ELEMENT = span ELEMENT = span
@ -86,6 +86,11 @@ FILE_EXTENSIONS = .md,.markdown
RENDER_COMMAND = pandoc -f markdown -t html --katex RENDER_COMMAND = pandoc -f markdown -t html --katex
``` ```
You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. All three must be defined, but `REGEXP` may be blank to allow unconditional whitelisting of that attribute. You must define `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` in each section.
To define multiple entries, add a unique alphanumeric suffix (e.g., `[markup.sanitizer.1]` and `[markup.sanitizer.something]`).
Once your configuration changes have been made, restart Gitea to have changes take effect. Once your configuration changes have been made, restart Gitea to have changes take effect.
**Note**: Prior to Gitea 1.12 there was a single `markup.sanitiser` section with keys that were redefined for multiple rules, however,
there were significant problems with this method of configuration necessitating configuration through multiple sections.

View File

@ -44,7 +44,7 @@ func newMarkup() {
continue continue
} }
if name == "sanitizer" { if name == "sanitizer" || strings.HasPrefix(name, "sanitizer.") {
newMarkupSanitizer(name, sec) newMarkupSanitizer(name, sec)
} else { } else {
newMarkupRenderer(name, sec) newMarkupRenderer(name, sec)
@ -67,44 +67,36 @@ func newMarkupSanitizer(name string, sec *ini.Section) {
return return
} }
elements := sec.Key("ELEMENT").ValueWithShadows() elements := sec.Key("ELEMENT").Value()
allowAttrs := sec.Key("ALLOW_ATTR").ValueWithShadows() allowAttrs := sec.Key("ALLOW_ATTR").Value()
regexps := sec.Key("REGEXP").ValueWithShadows() regexpStr := sec.Key("REGEXP").Value()
if len(elements) != len(allowAttrs) || if regexpStr == "" {
len(elements) != len(regexps) { rule := MarkupSanitizerRule{
log.Error("All three keys in markup.%s (ELEMENT, ALLOW_ATTR, REGEXP) must be defined the same number of times! Got %d, %d, and %d respectively.", name, len(elements), len(allowAttrs), len(regexps)) Element: elements,
AllowAttr: allowAttrs,
Regexp: nil,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
return return
} }
ExternalSanitizerRules = make([]MarkupSanitizerRule, 0, len(elements)) // Validate when parsing the config that this is a valid regular
// expression. Then we can use regexp.MustCompile(...) later.
for index, pattern := range regexps { compiled, err := regexp.Compile(regexpStr)
if pattern == "" { if err != nil {
rule := MarkupSanitizerRule{ log.Error("In module.%s: REGEXP (%s) at definition %d failed to compile: %v", regexpStr, name, err)
Element: elements[index], return
AllowAttr: allowAttrs[index],
Regexp: nil,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
continue
}
// Validate when parsing the config that this is a valid regular
// expression. Then we can use regexp.MustCompile(...) later.
compiled, err := regexp.Compile(pattern)
if err != nil {
log.Error("In module.%s: REGEXP at definition %d failed to compile: %v", name, index+1, err)
continue
}
rule := MarkupSanitizerRule{
Element: elements[index],
AllowAttr: allowAttrs[index],
Regexp: compiled,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
} }
rule := MarkupSanitizerRule{
Element: elements,
AllowAttr: allowAttrs,
Regexp: compiled,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
} }
func newMarkupRenderer(name string, sec *ini.Section) { func newMarkupRenderer(name string, sec *ini.Section) {