A Builder allows constructing a Tag from individual components.
Its main user is Compose in the top-level language package.
TagTagextensions[]string
// the x extension
variants[]string
AddExt adds extension e to the tag. e must be a valid extension as returned
by Tag.Extension. If the extension already exists, it will be discarded,
except for a -u extension, where non-existing key-type pairs will added.
AddVariant adds any number of variants.
ClearExtensions removes any extensions previously added, including those
copied from a Tag in SetTag.
ClearVariants removes any variants previously added, including those
copied from a Tag in SetTag.
Make returns a new Tag from the current settings.
SetExt sets the extension e to the tag. e must be a valid extension as
returned by Tag.Extension. If the extension already exists, it will be
overwritten, except for a -u extension, where the individual key-type pairs
will be set.
SetTag copies all the settings from a given Tag. Any previously set values
are discarded.
func golang.org/x/text/language.update(b *Builder, part ...interface{}) (err error)
CompactCoreInfo is a compact integer with the three core tags encoded.
Tag generates a tag from c.
func GetCompactCore(t Tag) (cci CompactCoreInfo, ok bool)
( T) Canonicalize() (Language, AliasType)
ISO3 returns the ISO 639-3 language code.
IsPrivateUse reports whether this language code is reserved for private use.
String returns the BCP 47 representation of the langID.
Use b as variable name, instead of id, to ensure the variable
used is consistent with that of Base in which this type is embedded.
StringToBuf writes the string to b and returns the number of bytes
written. cap(b) must be >= 3.
SuppressScript returns the script marked as SuppressScript in the IANA
language tag repository, or 0 if there is no such script.
T : expvar.Var
T : fmt.Stringer
T : context.stringer
T : runtime.stringer
func BaseLanguages() []Language
func MustParseBase(s string) Language
func ParseBase(s string) (Language, error)
func Language.Canonicalize() (Language, AliasType)
func Tag.Raw() (b Language, s Script, r Region)
func getLangID(s []byte) (Language, error)
func getLangISO2(s []byte) (Language, error)
func getLangISO3(s []byte) (Language, error)
func normLang(id Language) (Language, AliasType)
func golang.org/x/text/language.makeHaveTag(tag Tag, index int) (language.haveTag, Language)
func golang.org/x/text/language.(*Tag).lang() Language
func normLang(id Language) (Language, AliasType)
func (*Tag).setUndefinedLang(id Language)
func golang.org/x/text/language.altScript(l Language, s Script) Script
func golang.org/x/text/language.isExactEquivalent(l Language) bool
func golang.org/x/text/language.isParadigmLocale(lang Language, r Region) bool
func golang.org/x/text/language.regionGroupDist(a, b Region, script Script, lang Language) (dist uint8, same bool)
Canonicalize returns the region or a possible replacement if the region is
deprecated. It will not return a replacement for deprecated regions that
are split into multiple regions.
Contains returns whether Region c is contained by Region r. It returns true
if c == r.
ISO3 returns the 3-letter ISO code of r.
Note that not all regions have a 3-letter ISO code.
In such cases this method returns "ZZZ".
IsCountry returns whether this region is a country or autonomous area. This
includes non-standard definitions from CLDR.
IsGroup returns whether this region defines a collection of regions. This
includes non-standard definitions from CLDR.
IsPrivateUse reports whether r has the ISO 3166 User-assigned status. This
may include private-use tags that are assigned by CLDR and used in this
implementation. So IsPrivateUse and IsCountry can be simultaneously true.
M49 returns the UN M.49 encoding of r, or 0 if this encoding
is not defined for r.
String returns the BCP 47 representation for the region.
It returns "ZZ" for an unspecified region.
TLD returns the country code top-level domain (ccTLD). UK is returned for GB.
In all other cases it returns either the region itself or an error.
This method may return an error for a region for which there exists a
canonical form with a ccTLD. To get that ccTLD canonicalize r first. The
region will already be canonicalized it was obtained from a Tag that was
obtained using any of the default methods.
( T) typ() byte
T : expvar.Var
T : fmt.Stringer
T : context.stringer
T : runtime.stringer
func EncodeM49(r int) (Region, error)
func MustParseRegion(s string) Region
func ParseRegion(s string) (Region, error)
func Region.Canonicalize() Region
func Region.TLD() (Region, error)
func Tag.Raw() (b Language, s Script, r Region)
func getRegionID(s []byte) (Region, error)
func getRegionISO2(s []byte) (Region, error)
func getRegionISO3(s []byte) (Region, error)
func getRegionM49(n int) (Region, error)
func normRegion(r Region) Region
func golang.org/x/text/language.(*Tag).region() Region
func Region.Contains(c Region) bool
func normRegion(r Region) Region
func (*Tag).setUndefinedRegion(id Region)
func golang.org/x/text/language.isParadigmLocale(lang Language, r Region) bool
func golang.org/x/text/language.regionGroupDist(a, b Region, script Script, lang Language) (dist uint8, same bool)
Tag represents a BCP 47 language tag. It is used to specify an instance of a
specific language or locale. All language tag values are guaranteed to be
well-formed. The zero value of Tag is Und.
LangIDLanguageRegionIDRegion
TODO: we will soon run out of positions for ScriptID. Idea: instead of
storing lang, region, and ScriptID codes, store only the compact index and
have a lookup table from this code to its expansion. This greatly speeds
up table lookup, speed up common variant cases.
This will also immediately free up 3 extra bytes. Also, the pVariant
field can now be moved to the lookup table, as the compact index uniquely
determines the offset of a possible variant.
// offset of first extension, includes preceding '-'
// offset in str, includes preceding '-'
str is the string representation of the Tag. It will only be used if the
tag has variants or extensions.
Extension returns the extension of type x for tag t. It will return
false for ok if t does not have the requested extension. The returned
extension will be invalid in this case.
Extensions returns all extensions of t.
HasExtensions reports whether t has extensions.
HasString reports whether this tag defines more than just the raw
components.
HasVariants reports whether t has variants.
IsPrivateUse reports whether the Tag consists solely of an IsPrivateUse use
tag.
IsRoot returns true if t is equal to language "und".
MarshalText implements encoding.TextMarshaler.
Maximize returns a new tag with missing tags filled in.
Parent returns the CLDR parent of t. In CLDR, missing fields in data for a
specific language are substituted with fields from the parent language.
The parent for a language may change for newer versions of CLDR.
Raw returns the raw base language, script and region, without making an
attempt to infer their values.
TODO: consider removing
RemakeString is used to update t.str in case lang, script or region changed.
It is assumed that pExt and pVariant still point to the start of the
respective parts.
SetTypeForKey returns a new Tag with the key set to type, where key and type
are of the allowed values defined for the Unicode locale extension ('u') in
https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.
An empty value removes an existing pair with the same key.
String returns the canonical string representation of the language tag.
TypeForKey returns the type associated with the given key, where key and type
are of the allowed values defined for the Unicode locale extension ('u') in
https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.
TypeForKey will traverse the inheritance chain to get the correct value.
UnmarshalText implements encoding.TextUnmarshaler.
VariantOrPrivateUseTags returns variants or private use tags.
Variants returns the part of the tag holding all variants or the empty string
if there are no variants defined.
addLikelySubtags sets subtags to their most likely value, given the locale.
In most cases this means setting fields for unknown values, but in some
cases it may alter a value. It returns an ErrMissingLikelyTagsData error
if the given locale cannot be expanded.
equalTags compares language, script and region subtags only.
findKeyAndType returns the start and end position for the type corresponding
to key or the point at which to insert the key-value pair if the type
wasn't found. The hasExt return value reports whether an -u extension was present.
Note: the extensions are typically very small and are likely to contain
only one key-type pair.
genCoreBytes writes a string for the base languages, script and region tags
to the given buffer and returns the number of bytes written. It will never
write more than maxCoreSize bytes.
minimize removes the region or script subtags from t such that
t.addLikelySubtags() == t.minimize().addLikelySubtags().
(*T) setTagsFrom(id Tag)(*T) setUndefinedLang(id Language)(*T) setUndefinedRegion(id Region)(*T) setUndefinedScript(id Script)
T : encoding.TextMarshaler
*T : encoding.TextUnmarshaler
T : expvar.Var
T : fmt.Stringer
T : golang.org/x/text/internal/language/compact.fullTag
T : context.stringer
*T : github.com/go-git/gcfg.textUnmarshaler
T : runtime.stringer
func Make(s string) Tag
func MustParse(s string) Tag
func Parse(s string) (t Tag, err error)
func (*Builder).Make() Tag
func CompactCoreInfo.Tag() Tag
func Tag.Maximize() (Tag, error)
func Tag.Parent() Tag
func Tag.SetTypeForKey(key, value string) (Tag, error)
func golang.org/x/text/internal/language/compact.ID.Tag() Tag
func golang.org/x/text/internal/language/compact.Tag.Tag() Tag
func addTags(t Tag) (Tag, error)
func grandfathered(s [11]byte) (t Tag, ok bool)
func minimizeTags(t Tag) (Tag, error)
func parse(scan *scanner, s string) (t Tag, err error)
func parseTag(scan *scanner) (t Tag, end int)
func Tag.addLikelySubtags() (Tag, error)
func Tag.minimize() (Tag, error)
func golang.org/x/text/language.canonicalize(c language.CanonType, t Tag) (Tag, bool)
func golang.org/x/text/language.(*Tag).tag() Tag
func GetCompactCore(t Tag) (cci CompactCoreInfo, ok bool)
func (*Builder).SetTag(t Tag)
func golang.org/x/text/internal/language/compact.FromTag(t Tag) (id compact.ID, exact bool)
func golang.org/x/text/internal/language/compact.Make(t Tag) (tag compact.Tag)
func addTags(t Tag) (Tag, error)
func minimizeTags(t Tag) (Tag, error)
func parseVariants(scan *scanner, end int, t Tag) int
func specializeRegion(t *Tag) bool
func Tag.equalTags(a Tag) bool
func (*Tag).setTagsFrom(id Tag)
func golang.org/x/text/internal/language/compact.getCoreIndex(t Tag) (id compact.ID, ok bool)
func golang.org/x/text/language.canonicalize(c language.CanonType, t Tag) (Tag, bool)
func golang.org/x/text/language.equalsRest(a, b Tag) bool
func golang.org/x/text/language.makeHaveTag(tag Tag, index int) (language.haveTag, Language)
func golang.org/x/text/language.makeTag(t Tag) (tag language.Tag)
var Und
var golang.org/x/text/internal/language/compact.root
var golang.org/x/text/language.root
ValueError is returned by any of the parsing functions when the
input is well-formed but the respective subtag is not recognized
as a valid value.
v[8]byte
Error implements the error interface.
Subtag returns the subtag for which the error occurred.
( T) tag() []byte
T : golang.org/x/text/language.ValueError
T : error
func NewValueError(tag []byte) ValueError
Variant represents a registered variant of a language as defined by BCP 47.
IDuint8strstring
String returns the string representation of the variant.
T : expvar.Var
T : fmt.Stringer
T : context.stringer
T : runtime.stringer
func ParseVariant(s string) (Variant, error)
scanner is used to scan BCP 47 tokens, which are separated by _ or -.
b[]bytebytes[32]bytedonebool
// end position of the current token
errerror
// next point for scan
// start position of the current token
token[]byte
acceptMinSize parses multiple tokens of the given size or greater.
It returns the end position of the last token consumed.
deleteRange removes the given range from s.b before the current token.
gobble removes the current token from the input.
Caller must call scan after calling gobble.
(*T) init()
replace replaces the current token with repl.
resizeRange shrinks or grows the array at position oldStart such that
a new string of size newSize can fit between oldStart and oldEnd.
Sets the scan point to after the resized range.
scan parses the next token of a BCP 47 string. Tokens that are larger
than 8 characters or include non-alphanumeric characters result in an error
and are gobbled and removed from the output.
It returns the end position of the last token consumed.
(*T) setError(e error)
restToLower converts the string between start and end to lower case.
func makeScanner(b []byte) scanner
func makeScannerString(s string) scanner
func parse(scan *scanner, s string) (t Tag, err error)
func parseExtension(scan *scanner) int
func parseExtensions(scan *scanner) int
func parseTag(scan *scanner) (t Tag, end int)
func parseVariants(scan *scanner, end int, t Tag) int
Package-Level Functions (total 46, in which 15 are exported)
BaseLanguages returns the list of all supported base languages. It generates
the list by traversing the internal structures.
EncodeM49 returns the Region for the given UN M.49 code.
It returns an error if r is not a valid code.
GetCompactCore generates a uint32 value that is guaranteed to be unique for
different language, region, and script values.
Make is a convenience wrapper for Parse that omits the error.
In case of an error, a sensible default is returned.
MustParse is like Parse, but panics if the given BCP 47 tag cannot be parsed.
It simplifies safe initialization of Tag values.
MustParseBase is like ParseBase, but panics if the given base cannot be parsed.
It simplifies safe initialization of Base values.
MustParseRegion is like ParseRegion, but panics if the given region cannot be
parsed. It simplifies safe initialization of Region values.
MustParseScript is like ParseScript, but panics if the given script cannot be
parsed. It simplifies safe initialization of Script values.
NewValueError creates a new ValueError.
Parse parses the given BCP 47 string and returns a valid Tag. If parsing
failed it returns an error and any part of the tag that could be parsed.
If parsing succeeded but an unknown value was found, it returns
ValueError. The Tag returned in this case is just stripped of the unknown
value. All other values are preserved. It accepts tags in the BCP 47 format
and extensions to this standard defined in
https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.
ParseBase parses a 2- or 3-letter ISO 639 code.
It returns a ValueError if s is a well-formed but unknown language identifier
or another error if another error occurred.
ParseExtension parses s as an extension and returns it on success.
ParseRegion parses a 2- or 3-letter ISO 3166-1 or a UN M.49 code.
It returns a ValueError if s is a well-formed but unknown region identifier
or another error if another error occurred.
ParseScript parses a 4-letter ISO 15924 code.
It returns a ValueError if s is a well-formed but unknown script identifier
or another error if another error occurred.
ParseVariant parses and returns a Variant. An error is returned if s is not
a valid variant.
minimizeTags mimics the behavior of the ICU 51 C implementation.
nextExtension finds the next extension within the string, searching
for the -<char>- pattern from position p.
In the fast majority of cases, language tags will have at most
one extension and extensions tend to be small.
mapLang returns the mapped langID of id according to mapping m.
normRegion returns a region if r is deprecated or 0 otherwise.
TODO: consider supporting BYS (-> BLR), CSK (-> 200 or CZ), PHI (-> PHL) and AFI (-> DJ).
TODO: consider mapping split up regions to new most populous one (like CLDR).
parseExtension parses a single extension and returns the position of
the extension end.
parseExtensions parses and normalizes the extensions in the buffer.
It returns the last position of scan.b that is part of any extension.
It also trims scan.b to remove excess parts accordingly.
parseTag parses language, script, region and variants.
It returns a Tag and the end position in the input that was parsed.
parseVariants scans tokens as long as each token is a valid variant string.
Duplicate variants are removed.
Package-Level Variables (total 34, in which 6 are exported)
AliasMap maps langIDs to their suggested replacements.
Size: 704 bytes, 176 elements
Size: 176 bytes, 176 elements
ErrDuplicateKey is returned when a tag contains the same key twice with
different values in the -u section.
ErrMissingLikelyTagsData indicates no information was available
to compute likely values of missing tags.
ErrSyntax is returned by any of the parsing functions when the
input is not well-formed, according to BCP 47.
TODO: return the position at which the syntax error occurred?
Und is the root language.
altLangIndex is used to convert indexes in altLangISO3 to langIDs.
Size: 12 bytes, 6 elements
altRegionIDs holds a list of regionIDs the positions of which match those
of the 3-letter ISO codes in altRegionISO3.
Size: 22 bytes, 11 elements
fromM49 contains entries to map UN.M49 codes to regions. See m49Index for details.
Size: 666 bytes, 333 elements
grandfatheredMap holds a mapping from legacy and grandfathered tags to
their base language or index to more elaborate tag.
langNoIndex is a bit vector of all 3-letter language codes that are not used as an index
in lookup tables. The language ids for these language codes are derived directly
from the letters and are not consecutive.
Size: 2197 bytes, 2197 elements
likelyLang is a lookup table, indexed by langID, for the most likely
scripts and regions given incomplete information. If more entries exist for a
given language, region and script are the index and size respectively
of the list in likelyLangList.
Size: 5320 bytes, 1330 elements
likelyLangList holds lists info associated with likelyLang.
Size: 388 bytes, 97 elements
likelyRegion is a lookup table, indexed by regionID, for the most likely
languages and scripts given incomplete information. If more entries exist
for a given regionID, lang and script are the index and size respectively
of the list in likelyRegionList.
TODO: exclude containers and user-definable regions from the list.
Size: 1432 bytes, 358 elements
Size: 198 bytes, 33 elements
likelyRegionList holds lists info associated with likelyRegion.
Size: 372 bytes, 93 elements
likelyScript is a lookup table, indexed by scriptID, for the most likely
languages and regions given a script.
Size: 1012 bytes, 253 elements
m49 maps regionIDs to UN.M49 codes. The first isoRegionOffset entries are
codes indicating collections of regions.
Size: 716 bytes, 358 elements
m49Index gives indexes into fromM49 based on the three most significant bits
of a 10-bit UN.M49 code. To search an UN.M49 code in fromM49, search in
fromM49[m49Index[msb39(code)]:m49Index[msb3(code)+1]]
for an entry where the first 7 bits match the 7 lsb of the UN.M49 code.
The region code is stored in the 9 lsb of the indexed value.
Size: 18 bytes, 9 elements
Size: 414 bytes, 5 elements
Size: 264 bytes, 33 elements
regionInclusion maps region identifiers to sets of regions in regionInclusionBits,
where each set holds all groupings that are directly connected in a region
containment graph.
Size: 358 bytes, 358 elements
regionInclusionBits is an array of bit vectors where every vector represents
a set of region groupings. These sets are used to compute the distance
between two regions for the purpose of language matching.
Size: 584 bytes, 73 elements
regionInclusionNext marks, for each entry in regionInclusionBits, the set of
all groups that are reachable from the groups set in the respective entry.
Size: 73 bytes, 73 elements
Size: 80 bytes, 20 elements
regionTypes defines the status of a region for various standards.
Size: 358 bytes, 358 elements
suppressScript is an index from langID to the dominant script for that language,
if it exists. If a script is given, it should be suppressed from the language tag.
Size: 1330 bytes, 1330 elements
Size: 1995 bytes
Package-Level Constants (total 148, in which 8 are exported)
altLangISO3 holds an alphabetically sorted list of 3-letter language code alternatives
to 2-letter language codes that cannot be derived using the method described above.
Each 3-letter code is followed by its 1-byte langID.
altRegionISO3 holds a list of 3-letter region codes that cannot be
mapped to 2-letter codes using the default algorithm. This is a short list.
isoRegionOffset needs to be added to the index of regionISO to obtain the regionID
for 2-letter ISO codes. (The first isoRegionOffset regionIDs are reserved for
the UN.M49 codes used for groups.)
lang holds an alphabetically sorted list of ISO-639 language identifiers.
All entries are 4 bytes. The index of the identifier (divided by 4) is the language tag.
For 2-byte language identifiers, the two successive bytes have the following meaning:
- if the first letter of the 2- and 3-letter ISO codes are the same:
the second and third letter of the 3-letter ISO code.
- otherwise: a 0 and a by 2 bits right-shifted index into altLangISO3.
For 3-byte language identifiers the 4th byte is 0.
regionISO holds a list of alphabetically sorted 2-letter ISO region codes.
Each 2-letter codes is followed by two bytes with the following meaning:
- [A-Z}{2}: the first letter of the 2-letter code plus these two
letters form the 3-letter ISO code.
- 0, n: index into altRegionISO3.
script is an alphabetically sorted list of ISO 15924 codes. The index
of the script in the string, divided by 4, is the internal scriptID.
variantNumSpecialized is the number of specialized variants in variants.
The pages are generated with Goldsv0.3.2-preview. (GOOS=darwin GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @Go100and1 (reachable from the left QR code) to get the latest news of Golds.