Package: golang.org/x/net/html

package html

Import Path
	golang.org/x/net/html (on go.dev)

Dependency Relation
	imports 9 packages, and imported by 4 packages

Involved Source Files

	      const.go
	   d➜ doc.go
		Package html implements an HTML5-compliant tokenizer and parser.

		Tokenization is done by creating a Tokenizer for an io.Reader r. It is the
		caller's responsibility to ensure that r provides UTF-8 encoded HTML.

			z := html.NewTokenizer(r)

		Given a Tokenizer z, the HTML is tokenized by repeatedly calling z.Next(),
		which parses the next token and returns its type, or an error:

			for {
				tt := z.Next()
				if tt == html.ErrorToken {
					// ...
					return ...
				}
				// Process the current token.
			}

		There are two APIs for retrieving the current token. The high-level API is to
		call Token; the low-level API is to call Text or TagName / TagAttr. Both APIs
		allow optionally calling Raw after Next but before Token, Text, TagName, or
		TagAttr. In EBNF notation, the valid call sequence per token is:

			Next {Raw} [ Token | Text | TagName {TagAttr} ]

		Token returns an independent data structure that completely describes a token.
		Entities (such as "&lt;") are unescaped, tag names and attribute keys are
		lower-cased, and attributes are collected into a []Attribute. For example:

			for {
				if z.Next() == html.ErrorToken {
					// Returning io.EOF indicates success.
					return z.Err()
				}
				emitToken(z.Token())
			}

		The low-level API performs fewer allocations and copies, but the contents of
		the []byte values returned by Text, TagName and TagAttr may change on the next
		call to Next. For example, to extract an HTML page's anchor text:

			depth := 0
			for {
				tt := z.Next()
				switch tt {
				case html.ErrorToken:
					return z.Err()
				case html.TextToken:
					if depth > 0 {
						// emitBytes should copy the []byte it receives,
						// if it doesn't process it immediately.
						emitBytes(z.Text())
					}
				case html.StartTagToken, html.EndTagToken:
					tn, _ := z.TagName()
					if len(tn) == 1 && tn[0] == 'a' {
						if tt == html.StartTagToken {
							depth++
						} else {
							depth--
						}
					}
				}
			}

		Parsing is done by calling Parse with an io.Reader, which returns the root of
		the parse tree (the document element) as a *Node. It is the caller's
		responsibility to ensure that the Reader provides UTF-8 encoded HTML. For
		example, to process each anchor node in depth-first order:

			doc, err := html.Parse(r)
			if err != nil {
				// ...
			}
			var f func(*html.Node)
			f = func(n *html.Node) {
				if n.Type == html.ElementNode && n.Data == "a" {
					// Do something with n...
				}
				for c := n.FirstChild; c != nil; c = c.NextSibling {
					f(c)
				}
			}
			f(doc)

		The relevant specifications include:
		https://html.spec.whatwg.org/multipage/syntax.html and
		https://html.spec.whatwg.org/multipage/syntax.html#tokenization

	      doctype.go
	      entity.go
	      escape.go
	      foreign.go
	      node.go
	      parse.go
	      render.go
	      token.go
Code Examples

	Parse
		package main
		
		import (
			"fmt"
			"golang.org/x/net/html"
			"log"
			"strings"
		)
		
		func main() {
			s := `Links:
Foo
BarBaz`
			doc, err := html.Parse(strings.NewReader(s))
			if err != nil {
				log.Fatal(err)
			}
			var f func(*html.Node)
			f = func(n *html.Node) {
				if n.Type == html.ElementNode && n.Data == "a" {
					for _, a := range n.Attr {
						if a.Key == "href" {
							fmt.Println(a.Val)
							break
						}
					}
				}
				for c := n.FirstChild; c != nil; c = c.NextSibling {
					f(c)
				}
			}
			f(doc)
		}


Package-Level Type Names (total 14, in which 7 are exported)

	/* sort exporteds by: alphabet | popularity */
	 type Attribute (struct)
		An Attribute is an attribute namespace-key-value triple. Namespace is
		non-empty for foreign attributes like xlink, Key is alphabetic (and hence
		does not contain escapable characters like '&', '<' or '>'), and Val is
		unescaped (it looks like "a<b" rather than "a&lt;b").

		Namespace is only used by the parser, not the tokenizer.

		Fields (total 3, all are exported)
			Key string
			Namespace string
			Val string
		As Outputs Of (at least one unexported)
			/* at least one unexported ... *//* at least one unexported: */
			func github.com/microcosm-cc/bluemonday.(*Policy).sanitizeAttrs(elementName string, attrs []Attribute, aps map[string]bluemonday.attrPolicy) []Attribute
		As Inputs Of (at least 3, none are exported)
			/* 3+ unexporteds ... *//* 3+ unexporteds: */
			func adjustAttributeNames(aa []Attribute, nameMap map[string]string)
			func adjustForeignAttributes(aa []Attribute)
			func github.com/microcosm-cc/bluemonday.(*Policy).sanitizeAttrs(elementName string, attrs []Attribute, aps map[string]bluemonday.attrPolicy) []Attribute

	 type Node (struct)
		A Node consists of a NodeType and some Data (tag name for element nodes,
		content for text) and are part of a tree of Nodes. Element nodes may also
		have a Namespace and contain a slice of Attributes. Data is unescaped, so
		that it looks like "a<b" rather than "a&lt;b". For element nodes, DataAtom
		is the atom for Data, or zero if Data is not a known tag name.

		An empty Namespace implies a "http://www.w3.org/1999/xhtml" namespace.
		Similarly, "math" is short for "http://www.w3.org/1998/Math/MathML", and
		"svg" is short for "http://www.w3.org/2000/svg".

		Fields (total 10, all are exported)
			Attr []Attribute
			Data string
			DataAtom atom.Atom
			FirstChild *Node
			LastChild *Node
			Namespace string
			NextSibling *Node
			Parent *Node
			PrevSibling *Node
			Type NodeType
		Methods (total 4, in which 3 are exported)
			(*T) AppendChild(c *Node)
				AppendChild adds a node c as a child of n.

				It will panic if c already has a parent or siblings.

			(*T) InsertBefore(newChild, oldChild *Node)
				InsertBefore inserts newChild as a child of n, immediately before oldChild
				in the sequence of n's children. oldChild may be nil, in which case newChild
				is appended to the end of n's children.

				It will panic if newChild already has a parent or siblings.

			(*T) RemoveChild(c *Node)
				RemoveChild removes a node c that is a child of n. Afterwards, c will have
				no parent and no siblings.

				It will panic if c's parent is not n.

			/* one unexported ... *//* one unexported: */
			(*T) clone() *Node
				clone returns a new node with the same type, data and attributes.
				The clone has no parent, no siblings and no children.

		As Outputs Of (at least 15, in which 10 are exported)
			func Parse(r io.Reader) (*Node, error)
			func ParseFragment(r io.Reader, context *Node) ([]*Node, error)
			func ParseFragmentWithOptions(r io.Reader, context *Node, opts ...ParseOption) ([]*Node, error)
			func ParseWithOptions(r io.Reader, opts ...ParseOption) (*Node, error)
			func github.com/andybalholm/cascadia.Filter(nodes []*Node, m cascadia.Matcher) (result []*Node)
			func github.com/andybalholm/cascadia.Query(n *Node, m cascadia.Matcher) *Node
			func github.com/andybalholm/cascadia.QueryAll(n *Node, m cascadia.Matcher) []*Node
			func github.com/andybalholm/cascadia.Selector.Filter(nodes []*Node) (result []*Node)
			func github.com/andybalholm/cascadia.Selector.MatchAll(n *Node) []*Node
			func github.com/andybalholm/cascadia.Selector.MatchFirst(n *Node) *Node
			/* 5+ unexporteds ... *//* 5+ unexporteds: */
			func parseDoctype(s string) (n *Node, quirks bool)
			func (*Node).clone() *Node
			func golang.org/x/pkgsite/internal/testing/htmlcheck.allMatching(n *Node, sel cascadia.Sel) []*Node
			func github.com/andybalholm/cascadia.queryInto(n *Node, m cascadia.Matcher, storage []*Node) []*Node
			func github.com/andybalholm/cascadia.Selector.matchAllInto(n *Node, storage []*Node) []*Node
		As Inputs Of (at least 50, in which 16 are exported)
			func ParseFragment(r io.Reader, context *Node) ([]*Node, error)
			func ParseFragmentWithOptions(r io.Reader, context *Node, opts ...ParseOption) ([]*Node, error)
			func Render(w io.Writer, n *Node) error
			func (*Node).AppendChild(c *Node)
			func (*Node).InsertBefore(newChild, oldChild *Node)
			func (*Node).RemoveChild(c *Node)
			func github.com/andybalholm/cascadia.Filter(nodes []*Node, m cascadia.Matcher) (result []*Node)
			func github.com/andybalholm/cascadia.Query(n *Node, m cascadia.Matcher) *Node
			func github.com/andybalholm/cascadia.QueryAll(n *Node, m cascadia.Matcher) []*Node
			func github.com/andybalholm/cascadia.Matcher.Match(n *Node) bool
			func github.com/andybalholm/cascadia.Sel.Match(n *Node) bool
			func github.com/andybalholm/cascadia.Selector.Filter(nodes []*Node) (result []*Node)
			func github.com/andybalholm/cascadia.Selector.Match(n *Node) bool
			func github.com/andybalholm/cascadia.Selector.MatchAll(n *Node) []*Node
			func github.com/andybalholm/cascadia.Selector.MatchFirst(n *Node) *Node
			func github.com/andybalholm/cascadia.SelectorGroup.Match(n *Node) bool
			/* 34+ unexporteds ... *//* 34+ unexporteds: */
			func copyAttributes(dst *Node, src Token)
			func htmlIntegrationPoint(n *Node) bool
			func isSpecialElement(element *Node) bool
			func mathMLTextIntegrationPoint(n *Node) bool
			func render(w writer, n *Node) error
			func render1(w writer, n *Node) error
			func reparentChildren(dst, src *Node)
			func golang.org/x/pkgsite/internal/frontend.walkHTML(n *Node, info *source.Info, readme *internal.Readme) bool
			func golang.org/x/pkgsite/internal/testing/htmlcheck.allMatching(n *Node, sel cascadia.Sel) []*Node
			func golang.org/x/pkgsite/internal/testing/htmlcheck.check(n *Node, Checkers []htmlcheck.Checker) error
			func golang.org/x/pkgsite/internal/testing/htmlcheck.dump(n *Node, depth int)
			func golang.org/x/pkgsite/internal/testing/htmlcheck.nodeText(n *Node, b *strings.Builder)
			func github.com/andybalholm/cascadia.attributeDashMatch(key, val string, n *Node) bool
			func github.com/andybalholm/cascadia.attributeNotEqualMatch(key, val string, n *Node) bool
			func github.com/andybalholm/cascadia.attributePrefixMatch(key, val string, n *Node) bool
			func github.com/andybalholm/cascadia.attributeRegexMatch(key string, rx *regexp.Regexp, n *Node) bool
			func github.com/andybalholm/cascadia.attributeSubstringMatch(key, val string, n *Node) bool
			func github.com/andybalholm/cascadia.attributeSuffixMatch(key, val string, n *Node) bool
			func github.com/andybalholm/cascadia.childMatch(a, d cascadia.Matcher, n *Node) bool
			func github.com/andybalholm/cascadia.descendantMatch(a, d cascadia.Matcher, n *Node) bool
			func github.com/andybalholm/cascadia.hasChildMatch(n *Node, a cascadia.Matcher) bool
			func github.com/andybalholm/cascadia.hasDescendantMatch(n *Node, a cascadia.Matcher) bool
			func github.com/andybalholm/cascadia.matchAttribute(n *Node, key string, f func(string) bool) bool
			func github.com/andybalholm/cascadia.nodeOwnText(n *Node) string
			func github.com/andybalholm/cascadia.nodeText(n *Node) string
			func github.com/andybalholm/cascadia.nthChildMatch(a, b int, last, ofType bool, n *Node) bool
			func github.com/andybalholm/cascadia.queryInto(n *Node, m cascadia.Matcher, storage []*Node) []*Node
			func github.com/andybalholm/cascadia.queryInto(n *Node, m cascadia.Matcher, storage []*Node) []*Node
			func github.com/andybalholm/cascadia.siblingMatch(s1, s2 cascadia.Matcher, adjacent bool, n *Node) bool
			func github.com/andybalholm/cascadia.simpleNthChildMatch(b int, ofType bool, n *Node) bool
			func github.com/andybalholm/cascadia.simpleNthLastChildMatch(b int, ofType bool, n *Node) bool
			func github.com/andybalholm/cascadia.writeNodeText(n *Node, b *bytes.Buffer)
			func github.com/andybalholm/cascadia.Selector.matchAllInto(n *Node, storage []*Node) []*Node
			func github.com/andybalholm/cascadia.Selector.matchAllInto(n *Node, storage []*Node) []*Node
		As Types Of (only one, which is unexported)
			/* one unexported ... *//* one unexported: */
			  var scopeMarker

	 type NodeType uint32 (basic type)
		A NodeType is the type of a Node.

		As Types Of (total 8, in which 7 are exported)
			const CommentNode
			const DoctypeNode
			const DocumentNode
			const ElementNode
			const ErrorNode
			const RawNode
			const TextNode
			/* one unexported ... *//* one unexported: */
			const scopeMarkerNode

	 type ParseOption (func)
		ParseOption configures a parser.

		As Outputs Of (at least one exported)
			func ParseOptionEnableScripting(enable bool) ParseOption
		As Inputs Of (at least 2, both are exported)
			func ParseFragmentWithOptions(r io.Reader, context *Node, opts ...ParseOption) ([]*Node, error)
			func ParseWithOptions(r io.Reader, opts ...ParseOption) (*Node, error)

	 type Token (struct)
		A Token consists of a TokenType and some Data (tag name for start and end
		tags, content for text, comments and doctypes). A tag Token may also contain
		a slice of Attributes. Data is unescaped for all Tokens (it looks like "a<b"
		rather than "a&lt;b"). For tag Tokens, DataAtom is the atom for Data, or
		zero if Data is not a known tag name.

		Fields (total 4, all are exported)
			Attr []Attribute
			Data string
			DataAtom atom.Atom
			Type TokenType
		Methods (total 2, in which 1 are exported)
			( T) String() string
				String returns a string representation of the Token.

			/* one unexported ... *//* one unexported: */
			( T) tagString() string
				tagString returns a string representation of a tag Token's Data and Attr.

		Implements (at least 4, in which 2 are exported)
			 T : expvar.Var
			 T : fmt.Stringer
			/* 2+ unexporteds ... *//* 2+ unexporteds: */
			 T : context.stringer
			 T : runtime.stringer
		As Outputs Of (at least one exported)
			func (*Tokenizer).Token() Token
		As Inputs Of (at least one unexported)
			/* at least one unexported ... *//* at least one unexported: */
			func copyAttributes(dst *Node, src Token)

	 type Tokenizer (struct)
		A Tokenizer returns a stream of HTML Tokens.

		Fields (total 15, none are exported)
			/* 15 unexporteds ... *//* 15 unexporteds: */
			allowCDATA bool
				allowCDATA is whether CDATA sections are allowed in the current context.

			attr [][2]span
			buf []byte
			convertNUL bool
				convertNUL is whether NUL bytes in the current token's data should
				be converted into \ufffd replacement characters.

			data span
				buf[data.start:data.end] holds the raw bytes of the current token's data:
				a text token's text, a tag token's tag name, etc.

			err error
				err is the first error encountered during tokenization. It is possible
				for tt != Error && err != nil to hold: this means that Next returned a
				valid token but the subsequent Next call will return an error token.
				For example, if the HTML text input was just "plain", then the first
				Next call would set z.err to io.EOF but return a TextToken, and all
				subsequent Next calls would return an ErrorToken.
				err is never reset. Once it becomes non-nil, it stays non-nil.

			maxBuf int
				maxBuf limits the data buffered in buf. A value of 0 means unlimited.

			nAttrReturned int
			pendingAttr [2]span
				pendingAttr is the attribute key and value currently being tokenized.
				When complete, pendingAttr is pushed onto attr. nAttrReturned is
				incremented on each call to TagAttr.

			r io.Reader
				r is the source of the HTML text.

			raw span
				buf[raw.start:raw.end] holds the raw bytes of the current token.
				buf[raw.end:] is buffered input that will yield future tokens.

			rawTag string
				rawTag is the "script" in "</script>" that closes the next token. If
				non-empty, the subsequent call to Next will return a raw or RCDATA text
				token: one that treats "<p>" as text instead of an element.
				rawTag's contents are lower-cased.

			readErr error
				readErr is the error returned by the io.Reader r. It is separate from
				err because it is valid for an io.Reader to return (n int, err1 error)
				such that n > 0 && err1 != nil, and callers should always process the
				n > 0 bytes before considering the error err1.

			textIsRaw bool
				textIsRaw is whether the current text token's data is not escaped.

			tt TokenType
				tt is the TokenType of the current token.

		Methods (total 27, in which 11 are exported)
			(*T) AllowCDATA(allowCDATA bool)
				AllowCDATA sets whether or not the tokenizer recognizes <![CDATA[foo]]> as
				the text "foo". The default value is false, which means to recognize it as
				a bogus comment "<!-- [CDATA[foo]] -->" instead.

				Strictly speaking, an HTML5 compliant tokenizer should allow CDATA if and
				only if tokenizing foreign content, such as MathML and SVG. However,
				tracking foreign-contentness is difficult to do purely in the tokenizer,
				as opposed to the parser, due to HTML integration points: an <svg> element
				can contain a <foreignObject> that is foreign-to-SVG but not foreign-to-
				HTML. For strict compliance with the HTML5 tokenization algorithm, it is the
				responsibility of the user of a tokenizer to call AllowCDATA as appropriate.
				In practice, if using the tokenizer without caring whether MathML or SVG
				CDATA is text or comments, such as tokenizing HTML to find all the anchor
				text, it is acceptable to ignore this responsibility.

			(*T) Buffered() []byte
				Buffered returns a slice containing data buffered but not yet tokenized.

			(*T) Err() error
				Err returns the error associated with the most recent ErrorToken token.
				This is typically io.EOF, meaning the end of tokenization.

			(*T) Next() TokenType
				Next scans the next token and returns its type.

			(*T) NextIsNotRawText()
				NextIsNotRawText instructs the tokenizer that the next token should not be
				considered as 'raw text'. Some elements, such as script and title elements,
				normally require the next token after the opening tag to be 'raw text' that
				has no child elements. For example, tokenizing "<title>a<b>c</b>d</title>"
				yields a start tag token for "<title>", a text token for "a<b>c</b>d", and
				an end tag token for "</title>". There are no distinct start tag or end tag
				tokens for the "<b>" and "</b>".

				This tokenizer implementation will generally look for raw text at the right
				times. Strictly speaking, an HTML5 compliant tokenizer should not look for
				raw text if in foreign content: <title> generally needs raw text, but a
				<title> inside an <svg> does not. Another example is that a <textarea>
				generally needs raw text, but a <textarea> is not allowed as an immediate
				child of a <select>; in normal parsing, a <textarea> implies </select>, but
				one cannot close the implicit element when parsing a <select>'s InnerHTML.
				Similarly to AllowCDATA, tracking the correct moment to override raw-text-
				ness is difficult to do purely in the tokenizer, as opposed to the parser.
				For strict compliance with the HTML5 tokenization algorithm, it is the
				responsibility of the user of a tokenizer to call NextIsNotRawText as
				appropriate. In practice, like AllowCDATA, it is acceptable to ignore this
				responsibility for basic usage.

				Note that this 'raw text' concept is different from the one offered by the
				Tokenizer.Raw method.

			(*T) Raw() []byte
				Raw returns the unmodified text of the current token. Calling Next, Token,
				Text, TagName or TagAttr may change the contents of the returned slice.

				The token stream's raw bytes partition the byte stream (up until an
				ErrorToken). There are no overlaps or gaps between two consecutive token's
				raw bytes. One implication is that the byte offset of the current token is
				the sum of the lengths of all previous tokens' raw bytes.

			(*T) SetMaxBuf(n int)
				SetMaxBuf sets a limit on the amount of data buffered during tokenization.
				A value of 0 means unlimited.

			(*T) TagAttr() (key, val []byte, moreAttr bool)
				TagAttr returns the lower-cased key and unescaped value of the next unparsed
				attribute for the current tag token and whether there are more attributes.
				The contents of the returned slices may change on the next call to Next.

			(*T) TagName() (name []byte, hasAttr bool)
				TagName returns the lower-cased name of a tag token (the `img` out of
				`<IMG SRC="foo">`) and whether the tag has attributes.
				The contents of the returned slice may change on the next call to Next.

			(*T) Text() []byte
				Text returns the unescaped text of a text, comment or doctype token. The
				contents of the returned slice may change on the next call to Next.

			(*T) Token() Token
				Token returns the current Token. The result's Data and Attr values remain
				valid after subsequent Next calls.

			/* 16 unexporteds ... *//* 16 unexporteds: */
			(*T) readByte() byte
				readByte returns the next byte from the input stream, doing a buffered read
				from z.r into z.buf if necessary. z.buf[z.raw.start:z.raw.end] remains a contiguous byte
				slice that holds all the bytes read so far for the current token.
				It sets z.err if the underlying reader returns an error.
				Pre-condition: z.err == nil.

			(*T) readCDATA() bool
				readCDATA attempts to read a CDATA section and returns true if
				successful. The opening "<!" has already been consumed.

			(*T) readComment()
				readComment reads the next comment token starting with "<!--". The opening
				"<!--" has already been consumed.

			(*T) readDoctype() bool
				readDoctype attempts to read a doctype declaration and returns true if
				successful. The opening "<!" has already been consumed.

			(*T) readMarkupDeclaration() TokenType
				readMarkupDeclaration reads the next token starting with "<!". It might be
				a "<!--comment-->", a "<!DOCTYPE foo>", a "<![CDATA[section]]>" or
				"<!a bogus comment". The opening "<!" has already been consumed.

			(*T) readRawEndTag() bool
				readRawEndTag attempts to read a tag like "</foo>", where "foo" is z.rawTag.
				If it succeeds, it backs up the input position to reconsume the tag and
				returns true. Otherwise it returns false. The opening "</" has already been
				consumed.

			(*T) readRawOrRCDATA()
				readRawOrRCDATA reads until the next "</foo>", where "foo" is z.rawTag and
				is typically something like "script" or "textarea".

			(*T) readScript()
				readScript reads until the next </script> tag, following the byzantine
				rules for escaping/hiding the closing tag.

			(*T) readStartTag() TokenType
				readStartTag reads the next start tag token. The opening "<a" has already
				been consumed, where 'a' means anything in [A-Za-z].

			(*T) readTag(saveAttr bool)
				readTag reads the next tag token and its attributes. If saveAttr, those
				attributes are saved in z.attr, otherwise z.attr is set to an empty slice.
				The opening "<a" or "</a" has already been consumed, where 'a' means anything
				in [A-Za-z].

			(*T) readTagAttrKey()
				readTagAttrKey sets z.pendingAttr[0] to the "k" in "<div k=v>".
				Precondition: z.err == nil.

			(*T) readTagAttrVal()
				readTagAttrVal sets z.pendingAttr[1] to the "v" in "<div k=v>".

			(*T) readTagName()
				readTagName sets z.data to the "div" in "<div k=v>". The reader (z.raw.end)
				is positioned such that the first byte of the tag name (the "d" in "<div")
				has already been consumed.

			(*T) readUntilCloseAngle()
				readUntilCloseAngle reads until the next ">".

			(*T) skipWhiteSpace()
				skipWhiteSpace skips past any white space.

			(*T) startTagIn(ss ...string) bool
				startTagIn returns whether the start tag in z.buf[z.data.start:z.data.end]
				case-insensitively matches any element of ss.

		As Outputs Of (at least 2, both are exported)
			func NewTokenizer(r io.Reader) *Tokenizer
			func NewTokenizerFragment(r io.Reader, contextTag string) *Tokenizer

	 type TokenType uint32 (basic type)
		A TokenType is the type of a Token.

		Methods (only one, which is exported)
			( T) String() string
				String returns a string representation of the TokenType.

		Implements (at least 4, in which 2 are exported)
			 T : expvar.Var
			 T : fmt.Stringer
			/* 2+ unexporteds ... *//* 2+ unexporteds: */
			 T : context.stringer
			 T : runtime.stringer
		As Outputs Of (at least 3, in which 1 are exported)
			func (*Tokenizer).Next() TokenType
			/* 2+ unexporteds ... *//* 2+ unexporteds: */
			func (*Tokenizer).readMarkupDeclaration() TokenType
			func (*Tokenizer).readStartTag() TokenType
		As Types Of (total 7, all are exported)
			const CommentToken
			const DoctypeToken
			const EndTagToken
			const ErrorToken
			const SelfClosingTagToken
			const StartTagToken
			const TextToken

	/* 7 unexporteds ... *//* 7 unexporteds: */	 type insertionMode (func)
		An insertion mode (section 12.2.4.1) is the state transition function from
		a particular state in the HTML5 parser's state machine. It updates the
		parser's fields depending on parser.tok (where ErrorToken means EOF).
		It returns whether the token was consumed.


	 type insertionModeStack ([]T)

		Methods (total 2, neither is exported)
			/* 2 unexporteds ... *//* 2 unexporteds: */
			(*T) pop() (im insertionMode)
			(*T) top() insertionMode

	 type nodeStack ([]T)
		nodeStack is a stack of nodes.

		Methods (total 6, none are exported)
			/* 6 unexporteds ... *//* 6 unexporteds: */
			(*T) contains(a atom.Atom) bool
				contains returns whether a is within s.

			(*T) index(n *Node) int
				index returns the index of the top-most occurrence of n in the stack, or -1
				if n is not present.

			(*T) insert(i int, n *Node)
				insert inserts a node at the given index.

			(*T) pop() *Node
				pop pops the stack. It will panic if s is empty.

			(*T) remove(n *Node)
				remove removes a node from the stack. It is a no-op if n is not present.

			(*T) top() *Node
				top returns the most recently pushed node, or nil if s is empty.


	 type parser (struct)
		A parser implements the HTML5 parsing algorithm:
		https://html.spec.whatwg.org/multipage/syntax.html#tree-construction

		Fields (total 17, none are exported)
			/* 17 unexporteds ... *//* 17 unexporteds: */
			afe nodeStack
				The stack of open elements (section 12.2.4.2) and active formatting
				elements (section 12.2.4.3).

			context *Node
				context is the context element when parsing an HTML fragment
				(section 12.4).

			doc *Node
				doc is the document root element.

			form *Node
				Element pointers (section 12.2.4.4).

			fosterParenting bool
				fosterParenting is whether new elements should be inserted according to
				the foster parenting rules (section 12.2.6.1).

			fragment bool
				fragment is whether the parser is parsing an HTML fragment.

			framesetOK bool
				Other parsing state flags (section 12.2.4.5).

			hasSelfClosingToken bool
				Self-closing tags like <hr/> are treated as start tags, except that
				hasSelfClosingToken is set while they are being processed.

			head *Node
				Element pointers (section 12.2.4.4).

			im insertionMode
				im is the current insertion mode.

			oe nodeStack
				The stack of open elements (section 12.2.4.2) and active formatting
				elements (section 12.2.4.3).

			originalIM insertionMode
				originalIM is the insertion mode to go back to after completing a text
				or inTableText insertion mode.

			quirks bool
				quirks is whether the parser is operating in "quirks mode."

			scripting bool
				Other parsing state flags (section 12.2.4.5).

			templateStack insertionModeStack
				The stack of template insertion modes

			tok Token
				tok is the most recently read token.

			tokenizer *Tokenizer
				tokenizer provides the tokens for the parser.

		Methods (total 25, none are exported)
			/* 25 unexporteds ... *//* 25 unexporteds: */
			(*T) acknowledgeSelfClosingTag()
				Section 12.2.5.

			(*T) addChild(n *Node)
				addChild adds a child node n to the top element, and pushes n onto the stack
				of open elements if it is an element node.

			(*T) addElement()
				addElement adds a child element based on the current token.

			(*T) addFormattingElement()
				Section 12.2.4.3.

			(*T) addText(text string)
				addText adds text to the preceding node if it is a text node, or else it
				calls addChild with a new text node.

			(*T) adjustedCurrentNode() *Node
				Section 12.2.4.2.

			(*T) clearActiveFormattingElements()
				Section 12.2.4.3.

			(*T) clearStackToContext(s scope)
				clearStackToContext pops elements off the stack of open elements until a
				scope-defined element is found.

			(*T) elementInScope(s scope, matchTags ...a.Atom) bool
				elementInScope is like popUntil, except that it doesn't modify the stack of
				open elements.

			(*T) fosterParent(n *Node)
				fosterParent adds a child node according to the foster parenting rules.
				Section 12.2.6.1, "foster parenting".

			(*T) generateImpliedEndTags(exceptions ...string)
				generateImpliedEndTags pops nodes off the stack of open elements as long as
				the top node has a tag name of dd, dt, li, optgroup, option, p, rb, rp, rt or rtc.
				If exceptions are specified, nodes with that name will not be popped off.

			(*T) inBodyEndTagFormatting(tagAtom a.Atom, tagName string)
			(*T) inBodyEndTagOther(tagAtom a.Atom, tagName string)
				inBodyEndTagOther performs the "any other end tag" algorithm for inBodyIM.
				"Any other end tag" handling from 12.2.6.5 The rules for parsing tokens in foreign content
				https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inforeign

			(*T) inForeignContent() bool
				Section 12.2.6.

			(*T) indexOfElementInScope(s scope, matchTags ...a.Atom) int
				indexOfElementInScope returns the index in p.oe of the highest element whose
				tag is in matchTags that is in scope. If no matching element is in scope, it
				returns -1.

			(*T) parse() error
			(*T) parseCurrentToken()
				parseCurrentToken runs the current token through the parsing routines
				until it is consumed.

			(*T) parseGenericRawTextElement()
				parseGenericRawTextElements implements the generic raw text element parsing
				algorithm defined in 12.2.6.2.
				https://html.spec.whatwg.org/multipage/parsing.html#parsing-elements-that-contain-only-text
				TODO: Since both RAWTEXT and RCDATA states are treated as tokenizer's part
				officially, need to make tokenizer consider both states.

			(*T) parseImpliedToken(t TokenType, dataAtom a.Atom, data string)
				parseImpliedToken parses a token as though it had appeared in the parser's
				input.

			(*T) popUntil(s scope, matchTags ...a.Atom) bool
				popUntil pops the stack of open elements at the highest element whose tag
				is in matchTags, provided there is no higher element in the scope's stop
				tags (as defined in section 12.2.4.2). It returns whether or not there was
				such an element. If there was not, popUntil leaves the stack unchanged.

				For example, the set of stop tags for table scope is: "html", "table". If
				the stack was:
				["html", "body", "font", "table", "b", "i", "u"]
				then popUntil(tableScope, "font") would return false, but
				popUntil(tableScope, "i") would return true and the stack would become:
				["html", "body", "font", "table", "b"]

				If an element's tag is in both the stop tags and matchTags, then the stack
				will be popped and the function returns true (provided, of course, there was
				no higher element in the stack that was also in the stop tags). For example,
				popUntil(tableScope, "table") returns true and leaves:
				["html", "body", "font"]

			(*T) reconstructActiveFormattingElements()
				Section 12.2.4.3.

			(*T) resetInsertionMode()
				Section 12.2.4.1, "reset the insertion mode".

			(*T) setOriginalIM()
				setOriginalIM sets the insertion mode to return to after completing a text or
				inTableText insertion mode.
				Section 12.2.4.1, "using the rules for".

			(*T) shouldFosterParent() bool
				shouldFosterParent returns whether the next node to be added should be
				foster parented.

			(*T) top() *Node
		As Inputs Of (at least 23, none are exported)
			/* 23+ unexporteds ... *//* 23+ unexporteds: */
			func afterAfterBodyIM(p *parser) bool
			func afterAfterFramesetIM(p *parser) bool
			func afterBodyIM(p *parser) bool
			func afterFramesetIM(p *parser) bool
			func afterHeadIM(p *parser) bool
			func beforeHeadIM(p *parser) bool
			func beforeHTMLIM(p *parser) bool
			func inBodyIM(p *parser) bool
			func inCaptionIM(p *parser) bool
			func inCellIM(p *parser) bool
			func inColumnGroupIM(p *parser) bool
			func inFramesetIM(p *parser) bool
			func inHeadIM(p *parser) bool
			func inHeadNoscriptIM(p *parser) bool
			func initialIM(p *parser) bool
			func inRowIM(p *parser) bool
			func inSelectIM(p *parser) bool
			func inSelectInTableIM(p *parser) bool
			func inTableBodyIM(p *parser) bool
			func inTableIM(p *parser) bool
			func inTemplateIM(p *parser) bool
			func parseForeignContent(p *parser) bool
			func textIM(p *parser) bool

	 type scope int (basic type)

		As Types Of (total 7, none are exported)
			/* 7 unexporteds ... *//* 7 unexporteds: */
			const buttonScope
			const defaultScope
			const listItemScope
			const selectScope
			const tableBodyScope
			const tableRowScope
			const tableScope

	 type span (struct)
		span is a range of bytes in a Tokenizer's buffer. The start is inclusive,
		the end is exclusive.

		Fields (total 2, neither is exported)
			/* 2 unexporteds ... *//* 2 unexporteds: */
			end int
			start int

	 type writer (interface)

		Methods (total 3, all are exported)
			( T) Write(p []byte) (n int, err error)
			( T) WriteByte(c byte) error
			( T) WriteString(s string) (n int, err error)
				io.StringWriter

		Implemented By (at least 13, in which 7 are exported)
			 bufio.ReadWriter
			*bufio.Writer
			*bytes.Buffer
			 github.com/go-redis/redis/v8/internal/proto.Writer
			 github.com/yuin/goldmark/util.BufWriter (interface)
			 net/http/internal.FlushAfterChunkWriter
			*strings.Builder
			/* 6+ unexporteds ... *//* 6+ unexporteds: */
			*golang.org/x/mod/modfile.printer
			*encoding/json.encodeState
			 encoding/xml.printer
			 github.com/go-redis/redis/v8/internal/proto.writer (interface)
			*github.com/Masterminds/squirrel.sqlizerBuffer
			 github.com/prometheus/common/expfmt.enhancedWriter (interface)
		Implements (at least 8, in which 5 are exported)
			 T : github.com/go-git/go-git/v5/plumbing/protocol/packp/sideband.Progress
			 T : github.com/jbenet/go-context/io.Writer
			 T : io.ByteWriter
			 T : io.StringWriter
			 T : io.Writer
			/* 3+ unexporteds ... *//* 3+ unexporteds: */
			 T : golang.org/x/net/http2.stringWriter
			 T : github.com/go-redis/redis/v8/internal/proto.writer
			 T : net/http.http2stringWriter
		As Inputs Of (at least 4, none are exported)
			/* 4+ unexporteds ... *//* 4+ unexporteds: */
			func escape(w writer, s string) error
			func render(w writer, n *Node) error
			func render1(w writer, n *Node) error
			func writeQuoted(w writer, s string) error


Package-Level Functions (total 50, in which 10 are exported)

	 func EscapeString(s string) string
		EscapeString escapes special characters like "<" to become "&lt;". It
		escapes only five such characters: <, >, &, ' and ".
		UnescapeString(EscapeString(s)) == s always holds, but the converse isn't
		always true.

	 func NewTokenizer(r io.Reader) *Tokenizer
		NewTokenizer returns a new HTML Tokenizer for the given Reader.
		The input is assumed to be UTF-8 encoded.

	 func NewTokenizerFragment(r io.Reader, contextTag string) *Tokenizer
		NewTokenizerFragment returns a new HTML Tokenizer for the given Reader, for
		tokenizing an existing element's InnerHTML fragment. contextTag is that
		element's tag, such as "div" or "iframe".

		For example, how the InnerHTML "a<b" is tokenized depends on whether it is
		for a <p> tag or a <script> tag.

		The input is assumed to be UTF-8 encoded.

	 func Parse(r io.Reader) (*Node, error)
		Parse returns the parse tree for the HTML from the given Reader.

		It implements the HTML5 parsing algorithm
		(https://html.spec.whatwg.org/multipage/syntax.html#tree-construction),
		which is very complicated. The resultant tree can contain implicitly created
		nodes that have no explicit <tag> listed in r's data, and nodes' parents can
		differ from the nesting implied by a naive processing of start and end
		<tag>s. Conversely, explicit <tag>s in r's data can be silently dropped,
		with no corresponding node in the resulting tree.

		The input is assumed to be UTF-8 encoded.

	 func ParseFragment(r io.Reader, context *Node) ([]*Node, error)
		ParseFragment parses a fragment of HTML and returns the nodes that were
		found. If the fragment is the InnerHTML for an existing element, pass that
		element in context.

		It has the same intricacies as Parse.

	 func ParseFragmentWithOptions(r io.Reader, context *Node, opts ...ParseOption) ([]*Node, error)
		ParseFragmentWithOptions is like ParseFragment, with options.

	 func ParseOptionEnableScripting(enable bool) ParseOption
		ParseOptionEnableScripting configures the scripting flag.
		https://html.spec.whatwg.org/multipage/webappapis.html#enabling-and-disabling-scripting

		By default, scripting is enabled.

	 func ParseWithOptions(r io.Reader, opts ...ParseOption) (*Node, error)
		ParseWithOptions is like Parse, with options.

	 func Render(w io.Writer, n *Node) error
		Render renders the parse tree n to the given writer.

		Rendering is done on a 'best effort' basis: calling Parse on the output of
		Render will always result in something similar to the original tree, but it
		is not necessarily an exact clone unless the original tree was 'well-formed'.
		'Well-formed' is not easily specified; the HTML5 specification is
		complicated.

		Calling Parse on arbitrary input typically results in a 'well-formed' parse
		tree. However, it is possible for Parse to yield a 'badly-formed' parse tree.
		For example, in a 'well-formed' parse tree, no <a> element is a child of
		another <a> element: parsing "<a><a>" results in two sibling elements.
		Similarly, in a 'well-formed' parse tree, no <a> element is a child of a
		<table> element: parsing "<p><table><a>" results in a <p> with two sibling
		children; the <a> is reparented to the <table>'s parent. However, calling
		Parse on "<a><table><a>" does not return an error, but the result has an <a>
		element with an <a> child, and is therefore not 'well-formed'.

		Programmatically constructed trees are typically also 'well-formed', but it
		is possible to construct a tree that looks innocuous but, when rendered and
		re-parsed, results in a different tree. A simple example is that a solitary
		text node would become a tree containing <html>, <head> and <body> elements.
		Another example is that the programmatic equivalent of "a<head>b</head>c"
		becomes "<html><head><head/><body>abc</body></html>".

	 func UnescapeString(s string) string
		UnescapeString unescapes entities like "&lt;" to become "<". It unescapes a
		larger range of entities than EscapeString escapes. For example, "&aacute;"
		unescapes to "á", as does "&#225;" and "&xE1;".
		UnescapeString(EscapeString(s)) == s always holds, but the converse isn't
		always true.

	/* 40 unexporteds ... *//* 40 unexporteds: */	 func adjustAttributeNames(aa []Attribute, nameMap map[string]string)
	 func adjustForeignAttributes(aa []Attribute)
	 func afterAfterBodyIM(p *parser) bool
		Section 12.2.6.4.22.

	 func afterAfterFramesetIM(p *parser) bool
		Section 12.2.6.4.23.

	 func afterBodyIM(p *parser) bool
		Section 12.2.6.4.19.

	 func afterFramesetIM(p *parser) bool
		Section 12.2.6.4.21.

	 func afterHeadIM(p *parser) bool
		Section 12.2.6.4.6.

	 func beforeHeadIM(p *parser) bool
		Section 12.2.6.4.3.

	 func beforeHTMLIM(p *parser) bool
		Section 12.2.6.4.2.

	 func convertNewlines(s []byte) []byte
		convertNewlines converts "\r" and "\r\n" in s to "\n".
		The conversion happens in place, but the resulting slice may be shorter.

	 func copyAttributes(dst *Node, src Token)
		copyAttributes copies attributes of src not found on dst to dst.

	 func escape(w writer, s string) error
	 func htmlIntegrationPoint(n *Node) bool
	 func inBodyIM(p *parser) bool
		Section 12.2.6.4.7.

	 func inCaptionIM(p *parser) bool
		Section 12.2.6.4.11.

	 func inCellIM(p *parser) bool
		Section 12.2.6.4.15.

	 func inColumnGroupIM(p *parser) bool
		Section 12.2.6.4.12.

	 func inFramesetIM(p *parser) bool
		Section 12.2.6.4.20.

	 func inHeadIM(p *parser) bool
		Section 12.2.6.4.4.

	 func inHeadNoscriptIM(p *parser) bool
		12.2.6.4.5.

	 func initialIM(p *parser) bool
		Section 12.2.6.4.1.

	 func inRowIM(p *parser) bool
		Section 12.2.6.4.14.

	 func inSelectIM(p *parser) bool
		Section 12.2.6.4.16.

	 func inSelectInTableIM(p *parser) bool
		Section 12.2.6.4.17.

	 func inTableBodyIM(p *parser) bool
		Section 12.2.6.4.13.

	 func inTableIM(p *parser) bool
		Section 12.2.6.4.9.

	 func inTemplateIM(p *parser) bool
		Section 12.2.6.4.18.

	 func isSpecialElement(element *Node) bool
	 func lower(b []byte) []byte
		lower lower-cases the A-Z bytes in b in-place, so that "aBc" becomes "abc".

	 func mathMLTextIntegrationPoint(n *Node) bool
	 func parseDoctype(s string) (n *Node, quirks bool)
		parseDoctype parses the data from a DoctypeToken into a name,
		public identifier, and system identifier. It returns a Node whose Type
		is DoctypeNode, whose Data is the name, and which has attributes
		named "system" and "public" for the two identifiers if they were present.
		quirks is whether the document should be parsed in "quirks mode".

	 func parseForeignContent(p *parser) bool
		Section 12.2.6.5

	 func readAtLeastOneByte(r io.Reader, b []byte) (int, error)
		readAtLeastOneByte wraps an io.Reader so that reading cannot return (0, nil).
		It returns io.ErrNoProgress if the underlying r.Read method returns (0, nil)
		too many times in succession.

	 func render(w writer, n *Node) error
	 func render1(w writer, n *Node) error
	 func reparentChildren(dst, src *Node)
		reparentChildren reparents all of src's child nodes to dst.

	 func textIM(p *parser) bool
		Section 12.2.6.4.8.

	 func unescape(b []byte, attribute bool) []byte
		unescape unescapes b's entities in-place, so that "a&lt;b" becomes "a<b".
		attribute should be true if parsing an attribute value.

	 func unescapeEntity(b []byte, dst, src int, attribute bool) (dst1, src1 int)
		unescapeEntity reads an entity like "&lt;" from b[src:] and writes the
		corresponding "<" to b[dst:], returning the incremented dst and src cursors.
		Precondition: b[src] == '&' && dst <= src.
		attribute should be true if parsing an attribute value.

	 func writeQuoted(w writer, s string) error
		writeQuoted writes s to w surrounded by quotes. Normally it will use double
		quotes, but if s contains a double quote, it will use single quotes.
		It is used for writing the identifiers in a doctype declaration.
		In valid HTML, they can't contain both types of quotes.


Package-Level Variables (total 16, in which 1 are exported)

	  var ErrBufferExceeded error
		ErrBufferExceeded means that the buffering limit was exceeded.

	/* 15 unexporteds ... *//* 15 unexporteds: */	  var breakout map[string]bool
		Section 12.2.6.5.

	  var defaultScopeStopTags map[string][]atom.Atom
		Stop tags for use in popUntil. These come from section 12.2.4.2.

	  var entity map[string]rune
		entity is a map from HTML entity names to their values. The semicolon matters:
		https://html.spec.whatwg.org/multipage/syntax.html#named-character-references
		lists both "amp" and "amp;" as two separate entries.

		Note that the HTML5 list is larger than the HTML4 list at
		http://www.w3.org/TR/html4/sgml/entities.html

	  var entity2 map[string][2]rune
		HTML entities that are two unicode codepoints.

	  var isSpecialElementMap map[string]bool
		Section 12.2.4.2 of the HTML5 specification says "The following elements
		have varying levels of special parsing rules".
		https://html.spec.whatwg.org/multipage/syntax.html#the-stack-of-open-elements

	  var mathMLAttributeAdjustments map[string]string
		Section 12.2.6.1

	  var nul []byte
	  var plaintextAbort error
		plaintextAbort is returned from render1 when a <plaintext> element
		has been rendered. No more end tags should be rendered after that.

	  var quirkyIDs []string
		quirkyIDs is a list of public doctype identifiers that cause a document
		to be interpreted in quirks mode. The identifiers should be in lower case.

	  var replacement []byte
	  var replacementTable [32]rune
		These replacements permit compatibility with old numeric entities that
		assumed Windows-1252 encoding.
		https://html.spec.whatwg.org/multipage/syntax.html#consume-a-character-reference

	  var scopeMarker Node
		Section 12.2.4.3 says "The markers are inserted when entering applet,
		object, marquee, template, td, th, and caption elements, and are used
		to prevent formatting from "leaking" into applet, object, marquee,
		template, td, th, and caption elements".

	  var svgAttributeAdjustments map[string]string
	  var svgTagNameAdjustments map[string]string
		Section 12.2.6.5.

	  var voidElements map[string]bool
		Section 12.1.2, "Elements", gives this list of void elements. Void elements
		are those that can't have any contents.


Package-Level Constants (total 26, in which 14 are exported)

	const CommentNode NodeType = 4
	const CommentToken TokenType = 5
		A CommentToken looks like <!--x-->.

	const DoctypeNode NodeType = 5
	const DoctypeToken TokenType = 6
		A DoctypeToken looks like <!DOCTYPE x>

	const DocumentNode NodeType = 2
	const ElementNode NodeType = 3
	const EndTagToken TokenType = 3
		An EndTagToken looks like </a>.

	const ErrorNode NodeType = 0
	const ErrorToken TokenType = 0
		ErrorToken means that an error occurred during tokenization.

	const RawNode NodeType = 6
		RawNode nodes are not returned by the parser, but can be part of the
		Node tree passed to func Render to insert raw HTML (without escaping).
		If so, this package makes no guarantee that the rendered HTML is secure
		(from e.g. Cross Site Scripting attacks) or well-formed.

	const SelfClosingTagToken TokenType = 4
		A SelfClosingTagToken tag looks like <br/>.

	const StartTagToken TokenType = 2
		A StartTagToken looks like <a>.

	const TextNode NodeType = 1
	const TextToken TokenType = 1
		TextToken means a text node.

	/* 12 unexporteds ... *//* 12 unexporteds: */	const buttonScope scope = 2
	const defaultScope scope = 0
	const escapedChars = "&'<>\"\r"
	const listItemScope scope = 1
	const longestEntityWithoutSemicolon = 6
		All entities that do not end with ';' are 6 or fewer bytes long.

	const scopeMarkerNode NodeType = 7
	const selectScope scope = 6
	const tableBodyScope scope = 5
	const tableRowScope scope = 4
	const tableScope scope = 3
	const whitespace = " \t\r\n\f"
	const whitespaceOrNUL = " \t\r\n\f\x00"


The pages are generated with Golds v0.3.2-preview. (GOOS=darwin GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @Go100and1 (reachable from the left QR code) to get the latest news of Golds.