Involved Source Files
Package bluemonday provides a way of describing a whitelist of HTML elements
and attributes as a policy, and for that policy to be applied to untrusted
strings from users that may contain markup. All elements and attributes not on
the whitelist will be stripped.
The default bluemonday.UGCPolicy().Sanitize() turns this:
Hello <STYLE>.XSS{background-image:url("javascript:alert('XSS')");}</STYLE><A CLASS=XSS></A>World
Into the more harmless:
Hello World
And it turns this:
<a href="javascript:alert('XSS1')" onmouseover="alert('XSS2')">XSS<a>
Into this:
XSS
Whilst still allowing this:
<a href="http://www.google.com/">
<img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>
To pass through mostly unaltered (it gained a rel="nofollow"):
<a href="http://www.google.com/" rel="nofollow">
<img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>
The primary purpose of bluemonday is to take potentially unsafe user generated
content (from things like Markdown, HTML WYSIWYG tools, etc) and make it safe
for you to put on your website.
It protects sites against XSS (http://en.wikipedia.org/wiki/Cross-site_scripting)
and other malicious content that a user interface may deliver. There are many
vectors for an XSS attack (https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet)
and the safest thing to do is to sanitize user input against a known safe list
of HTML elements and attributes.
Note: You should always run bluemonday after any other processing.
If you use blackfriday (https://github.com/russross/blackfriday) or
Pandoc (http://johnmacfarlane.net/pandoc/) then bluemonday should be run after
these steps. This ensures that no insecure HTML is introduced later in your
process.
bluemonday is heavily inspired by both the OWASP Java HTML Sanitizer
(https://code.google.com/p/owasp-java-html-sanitizer/) and the HTML Purifier
(http://htmlpurifier.org/).
We ship two default policies, one is bluemonday.StrictPolicy() and can be
thought of as equivalent to stripping all HTML elements and their attributes as
it has nothing on its whitelist.
The other is bluemonday.UGCPolicy() and allows a broad selection of HTML
elements and attributes that are safe for user generated content. Note that
this policy does not whitelist iframes, object, embed, styles, script, etc.
The essence of building a policy is to determine which HTML elements and
attributes are considered safe for your scenario. OWASP provide an XSS
prevention cheat sheet ( https://www.google.com/search?q=xss+prevention+cheat+sheet )
to help explain the risks, but essentially:
1. Avoid whitelisting anything other than plain HTML elements
2. Avoid whitelisting `script`, `style`, `iframe`, `object`, `embed`, `base`
elements
3. Avoid whitelisting anything other than plain HTML elements with simple
values that you can match to a regexp
helpers.gopolicies.gopolicy.gosanitize.go
Code Examples
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
"regexp"
)
func main() {
// Create a new policy
p := bluemonday.NewPolicy()
// Add elements to a policy without attributes
p.AllowElements("b", "strong")
// Add elements as a virtue of adding an attribute
p.AllowAttrs("nowrap").OnElements("td", "th")
// Attributes can either be added to all elements
p.AllowAttrs("dir").Globally()
//Or attributes can be added to specific elements
p.AllowAttrs("value").OnElements("li")
// It is ALWAYS recommended that an attribute be made to match a pattern
// XSS in HTML attributes is a very easy attack vector
// \p{L} matches unicode letters, \p{N} matches unicode numbers
p.AllowAttrs("title").Matching(regexp.MustCompile(`[\p{L}\p{N}\s\-_',:\[\]!\./\\\(\)&]*`)).Globally()
// You can stop at any time and call .Sanitize()
// Assumes that string htmlIn was passed in from a HTTP POST and contains
// untrusted user generated content
htmlIn := `untrusted user generated content `
fmt.Println(p.Sanitize(htmlIn))
// And you can take any existing policy and extend it
p = bluemonday.UGCPolicy()
p.AllowElements("fieldset", "select", "option")
// Links are complex beasts and one of the biggest attack vectors for
// malicious content so we have included features specifically to help here.
// This is not recommended:
p = bluemonday.NewPolicy()
p.AllowAttrs("href").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements("a")
// The regexp is insufficient in this case to have prevented a malformed
// value doing something unexpected.
// This will ensure that URLs are not considered invalid by Go's net/url
// package.
p.RequireParseableURLs(true)
// If you have enabled parseable URLs then the following option will allow
// relative URLs. By default this is disabled and will prevent all local and
// schema relative URLs (i.e. `href="//www.google.com"` is schema relative).
p.AllowRelativeURLs(true)
// If you have enabled parseable URLs then you can whitelist the schemas
// that are permitted. Bear in mind that allowing relative URLs in the above
// option allows for blank schemas.
p.AllowURLSchemes("mailto", "http", "https")
// Regardless of whether you have enabled parseable URLs, you can force all
// URLs to have a rel="nofollow" attribute. This will be added if it does
// not exist.
// This applies to "a" "area" "link" elements that have a "href" attribute
p.RequireNoFollowOnLinks(true)
// We provide a convenience function that applies all of the above, but you
// will still need to whitelist the linkable elements:
p = bluemonday.NewPolicy()
p.AllowStandardURLs()
p.AllowAttrs("cite").OnElements("blockquote")
p.AllowAttrs("href").OnElements("a", "area")
p.AllowAttrs("src").OnElements("img")
// Policy Building Helpers
// If you've got this far and you're bored already, we also bundle some
// other convenience functions
p = bluemonday.NewPolicy()
p.AllowStandardAttributes()
p.AllowImages()
p.AllowLists()
p.AllowTables()
}
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
// NewPolicy is a blank policy and we need to explicitly whitelist anything
// that we wish to allow through
p := bluemonday.NewPolicy()
// We ensure any URLs are parseable and have rel="nofollow" where applicable
p.AllowStandardURLs()
// AllowStandardURLs already ensures that the href will be valid, and so we
// can skip the .Matching()
p.AllowAttrs("href").OnElements("a")
// We allow paragraphs too
p.AllowElements("p")
html := p.Sanitize(
`
`,
)
fmt.Println(html)
}
package main
import (
"github.com/microcosm-cc/bluemonday"
)
func main() {
p := bluemonday.NewPolicy()
// Allow the 'title' attribute on every HTML element that has been
// whitelisted
p.AllowAttrs("title").Matching(bluemonday.Paragraph).Globally()
// Allow the 'abbr' attribute on only the 'td' and 'th' elements.
p.AllowAttrs("abbr").Matching(bluemonday.Paragraph).OnElements("td", "th")
// Allow the 'colspan' and 'rowspan' attributes, matching a positive integer
// pattern, on only the 'td' and 'th' elements.
p.AllowAttrs("colspan", "rowspan").Matching(
bluemonday.Integer,
).OnElements("td", "th")
}
package main
import (
"github.com/microcosm-cc/bluemonday"
)
func main() {
p := bluemonday.NewPolicy()
// Allow styling elements without attributes
p.AllowElements("br", "div", "hr", "p", "span")
}
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
// UGCPolicy is a convenience policy for user generated content.
p := bluemonday.UGCPolicy()
// string in, string out
html := p.Sanitize(`Google`)
fmt.Println(html)
}
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
// UGCPolicy is a convenience policy for user generated content.
p := bluemonday.UGCPolicy()
// []byte in, []byte out
b := []byte(`Google`)
b = p.SanitizeBytes(b)
fmt.Println(string(b))
}
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
"strings"
)
func main() {
// UGCPolicy is a convenience policy for user generated content.
p := bluemonday.UGCPolicy()
// io.Reader in, bytes.Buffer out
r := strings.NewReader(`Google`)
buf := p.SanitizeReader(r)
fmt.Println(buf.String())
}
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
// StrictPolicy is equivalent to NewPolicy and as nothing else is declared
// we are stripping all elements (and their attributes)
p := bluemonday.StrictPolicy()
html := p.Sanitize(
`Goodbye Cruel World`,
)
fmt.Println(html)
}
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
// UGCPolicy is a convenience policy for user generated content.
p := bluemonday.UGCPolicy()
html := p.Sanitize(
`Google`,
)
fmt.Println(html)
}
Package-Level Type Names (total 4, in which 1 are exported)
/* sort exporteds by: | */
Policy encapsulates the whitelist of HTML elements and attributes that will
be applied to the sanitised HTML.
You should use bluemonday.NewPolicy() to create a blank policy as the
unexported fields contain maps that need to be initialized.
If true then we add spaces when stripping tags, specifically the closing
tag is replaced by a space character.
When true add target="_blank" to fully qualified links
Will add for href="http://foo"
Will skip for href="/foo" or href="foo"
When true, allow data attributes.
When true, u, _ := url.Parse("url"); !u.IsAbs() is permitted
If urlPolicy is nil, all URLs with matching schema are allowed.
Otherwise, only the URLs with matching schema and urlPolicy(url)
returning true are allowed.
map[htmlElementName]map[htmlAttributeName]attrPolicy
map[htmlAttributeName]attrPolicy
Declares whether the maps have been initialized, used as a cheap check to
ensure that those using Policy{} directly won't cause nil pointer
exceptions
When true, add rel="nofollow" to HTML anchors
When true, add rel="nofollow" to HTML anchors
Will add for href="http://foo"
Will skip for href="/foo" or href="foo"
When true, URLs must be parseable by "net/url" url.Parse()
If an element has had all attributes removed as a result of a policy
being applied, then the element would be removed from the output.
However some elements are valid and have strong layout meaning without
any attributes, i.e. <table>. To prevent those being removed we maintain
a list of elements that are allowed to have no attributes and that will
be maintained in the output HTML.
setOfElementsToSkipContentmap[string]struct{}
AddSpaceWhenStrippingTag states whether to add a single space " " when
removing tags that are not whitelisted by the policy.
This is useful if you expect to strip tags in dense markup and may lose the
value of whitespace.
For example: "<p>Hello</p><p>World</p>"" would be sanitized to "HelloWorld"
with the default value of false, but you may wish to sanitize this to
" Hello World " by setting AddSpaceWhenStrippingTag to true as this would
retain the intent of the text.
AddTargetBlankToFullyQualifiedLinks will result in all <a> tags that point
to a non-local destination (i.e. starts with a protocol and has a host)
having a target="_blank" added to them if one does not already exist
Note: This requires p.RequireParseableURLs(true) and will enable it.
AllowAttrs takes a range of HTML attribute names and returns an
attribute policy builder that allows you to specify the pattern and scope of
the whitelisted attribute.
The attribute policy is only added to the core policy when either Globally()
or OnElements(...) are called.
AllowDataAttributes whitelists all data attributes. We can't specify the name
of each attribute exactly as they are customized.
NOTE: These values are not sanitized and applications that evaluate or process
them without checking and verification of the input may be at risk if this option
is enabled. This is a 'caveat emptor' option and the person enabling this option
needs to fully understand the potential impact with regards to whatever application
will be consuming the sanitized HTML afterwards, i.e. if you know you put a link in a
data attribute and use that to automatically load some new window then you're giving
the author of a HTML fragment the means to open a malicious destination automatically.
Use with care!
AllowDataURIImages permits the use of inline images defined in RFC2397
http://tools.ietf.org/html/rfc2397
http://en.wikipedia.org/wiki/Data_URI_scheme
Images must have a mimetype matching:
image/gif
image/jpeg
image/png
image/webp
NOTE: There is a potential security risk to allowing data URIs and you should
only permit them on content you already trust.
http://palizine.plynt.com/issues/2010Oct/bypass-xss-filters/
https://capec.mitre.org/data/definitions/244.html
AllowElements will append HTML elements to the whitelist without applying an
attribute policy to those elements (the elements are permitted
sans-attributes)
AllowElementsContent marks the HTML elements whose content should be
retained after removing the tag.
AllowImages enables the img element and some popular attributes. It will also
ensure that URL values are parseable. This helper does not enable data URI
images, for that you should also use the AllowDataURIImages() helper.
AllowLists will enabled ordered and unordered lists, as well as definition
lists
AllowNoAttrs says that attributes on element are optional.
The attribute policy is only added to the core policy when OnElements(...)
are called.
AllowRelativeURLs enables RequireParseableURLs and then permits URLs that
are parseable, have no schema information and url.IsAbs() returns false
This permits local URLs
AllowStandardAttributes will enable "id", "title" and the language specific
attributes "dir" and "lang" on all elements that are whitelisted
AllowStandardURLs is a convenience function that will enable rel="nofollow"
on "a", "area" and "link" (if you have allowed those elements) and will
ensure that the URL values are parseable and either relative or belong to the
"mailto", "http", or "https" schemes
AllowStyling presently enables the class attribute globally.
Note: When bluemonday ships a CSS parser and we can safely sanitise that,
this will also allow sanitized styling of elements via the style attribute.
AllowTables will enable a rich set of elements and attributes to describe
HTML tables
AllowURLSchemeWithCustomPolicy will append URL schemes with
a custom URL policy to the whitelist.
Only the URLs with matching schema and urlPolicy(url)
returning true will be allowed.
AllowURLSchemes will append URL schemes to the whitelist
Example: p.AllowURLSchemes("mailto", "http", "https")
RequireNoFollowOnFullyQualifiedLinks will result in all <a> tags that point
to a non-local destination (i.e. starts with a protocol and has a host)
having a rel="nofollow" added to them if one does not already exist
Note: This requires p.RequireParseableURLs(true) and will enable it.
RequireNoFollowOnLinks will result in all <a> tags having a rel="nofollow"
added to them if one does not already exist
Note: This requires p.RequireParseableURLs(true) and will enable it.
RequireParseableURLs will result in all URLs requiring that they be parseable
by "net/url" url.Parse()
This applies to:
- a.href
- area.href
- blockquote.cite
- img.src
- link.href
- script.src
Sanitize takes a string that contains a HTML fragment or document and applies
the given policy whitelist.
It returns a HTML string that has been sanitized by the policy or an empty
string if an error has occurred (most likely as a consequence of extremely
malformed input)
SanitizeBytes takes a []byte that contains a HTML fragment or document and applies
the given policy whitelist.
It returns a []byte containing the HTML that has been sanitized by the policy
or an empty []byte if an error has occurred (most likely as a consequence of
extremely malformed input)
SanitizeReader takes an io.Reader that contains a HTML fragment or document
and applies the given policy whitelist.
It returns a bytes.Buffer containing the HTML that has been sanitized by the
policy. Errors during sanitization will merely return an empty result.
SkipElementsContent adds the HTML elements whose tags is needed to be removed
with its content.
addDefaultElementsWithoutAttrs adds the HTML elements that we know are valid
without any attributes to an internal map.
i.e. we know that <table> is valid, but <bdo> isn't valid as the "dir" attr
is mandatory
addDefaultSkipElementContent adds the HTML elements that we should skip
rendering the character content of, if the element itself is not allowed.
This is all character data that the end user would not normally see.
i.e. if we exclude a <script> tag then we shouldn't render the JavaScript or
anything else until we encounter the closing </script> tag.
(*T) allowNoAttrs(elementName string) bool
init initializes the maps if this has not been done already
Performs the actual sanitization process.
sanitizeAttrs takes a set of element attribute policies and the global
attribute policies and applies them to the []html.Attribute returning a set
of html.Attributes that match the policies
(*T) validURL(rawurl string) (string, bool)
func NewPolicy() *Policy
func StrictPolicy() *Policy
func StripTagsPolicy() *Policy
func UGCPolicy() *Policy
func (*Policy).AddSpaceWhenStrippingTag(allow bool) *Policy
func (*Policy).AddTargetBlankToFullyQualifiedLinks(require bool) *Policy
func (*Policy).AllowElements(names ...string) *Policy
func (*Policy).AllowElementsContent(names ...string) *Policy
func (*Policy).AllowRelativeURLs(require bool) *Policy
func (*Policy).AllowURLSchemes(schemes ...string) *Policy
func (*Policy).AllowURLSchemeWithCustomPolicy(scheme string, urlPolicy func(url *url.URL) (allowUrl bool)) *Policy
func (*Policy).RequireNoFollowOnFullyQualifiedLinks(require bool) *Policy
func (*Policy).RequireNoFollowOnLinks(require bool) *Policy
func (*Policy).RequireParseableURLs(require bool) *Policy
func (*Policy).SkipElementsContent(names ...string) *Policy
allowEmptyboolattrNames[]stringp*Policyregexp*regexp.Regexp
AllowNoAttrs says that attributes on element are optional.
The attribute policy is only added to the core policy when OnElements(...)
are called.
Globally will bind an attribute policy to all HTML elements and return the
updated policy
Matching allows a regular expression to be applied to a nascent attribute
policy, and returns the attribute policy. Calling this more than once will
replace the existing regexp.
OnElements will bind an attribute policy to a given range of HTML elements
and return the updated policy
func (*Policy).AllowAttrs(attrNames ...string) *attrPolicyBuilder
func (*Policy).AllowNoAttrs() *attrPolicyBuilder
Package-Level Functions (total 6, in which 4 are exported)
NewPolicy returns a blank policy with nothing whitelisted or permitted. This
is the recommended way to start building a policy and you should now use
AllowAttrs() and/or AllowElements() to construct the whitelist of HTML
elements and attributes.
StrictPolicy returns an empty policy, which will effectively strip all HTML
elements and their attributes from a document.
StripTagsPolicy is DEPRECATED. Use StrictPolicy instead.
UGCPolicy returns a policy aimed at user generated content that is a result
of HTML WYSIWYG tools and Markdown conversions.
This is expected to be a fairly rich document where as much markup as
possible should be retained. Markdown permits raw HTML so we are basically
providing a policy to sanitise HTML5 documents safely but with the
least intrusion on the formatting expectations of the user.
Package-Level Variables (total 15, in which 11 are exported)
CellAlign handles the `align` attribute
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-align
CellVerticalAlign handles the `valign` attribute
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-valign
Direction handles the `dir` attribute
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdo#attr-dir
ImageAlign handles the `align` attribute on the `image` tag
http://www.w3.org/MarkUp/Test/Img/imgtest.html
Integer describes whole positive integers (including 0) used in places
like td.colspan
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-colspan
ISO8601 according to the W3 group is only a subset of the ISO8601
standard: http://www.w3.org/TR/NOTE-datetime
Used in places like time.datetime
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/time#attr-datetime
Matches patterns:
Year:
YYYY (eg 1997)
Year and month:
YYYY-MM (eg 1997-07)
Complete date:
YYYY-MM-DD (eg 1997-07-16)
Complete date plus hours and minutes:
YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds:
YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a
second
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
ListType encapsulates the common value as well as the latest spec
values for lists
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ol#attr-type
Number is a double value used on HTML5 meter and progress elements
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-meter-element
NumberOrPercent is used predominantly as units of measurement in width
and height attributes
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-height
Paragraph of text in an attribute such as *.'title', img.alt, etc
https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes#attr-title
Note that we are not allowing chars that could close tags like '>'
SpaceSeparatedTokens is used in places like `a.rel` and the common attribute
`class` which both contain space delimited lists of data tokens
http://www.w3.org/TR/html-markup/datatypes.html#common.data.tokens-def
Regexp: \p{L} matches unicode letters, \p{N} matches unicode numbers
dataURIImagePrefix is used by AllowDataURIImages to define the acceptable
prefix of data URIs that contain common web image formats.
This is not exported as it's not useful by itself, and only has value
within the AllowDataURIImages func
The pages are generated with Goldsv0.3.2-preview. (GOOS=darwin GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @Go100and1 (reachable from the left QR code) to get the latest news of Golds.