Package org.apache.sis.util
Class Characters.Filter
- Object
-
- Character.Subset
-
- Filter
-
- Enclosing class:
- Characters
public static class Characters.Filter extends Character.Subset
Subsets of Unicode characters identified by their general category. The categories are identified by constants defined in theCharacterclass, likeLOWERCASE_LETTER,UPPERCASE_LETTER,DECIMAL_DIGIT_NUMBERandSPACE_SEPARATOR.An instance of this class can be obtained from an enumeration of character types using the
forTypes(byte[])method, or using one of the constants predefined in this class. Then, Unicode characters can be tested for inclusion in the subset by calling thecontains(int)method.Relationship with international standardsISO 19162:2015 §B.5.2 recommends to ignore spaces, case and the following characters when comparing two identified object names: “_” (underscore), “-” (minus sign), “/” (solidus), “(” (left parenthesis) and “)” (right parenthesis). The same specification also limits the set of valid characters in a name to the following (§6.3.1):A-Z a-z 0-9 _ [ ] ( ) { } < = > . , : ; + - (space) % & ' " * ^ / \ ? | °Note: SIS does not enforce this restriction in its programmatic API, but may perform some character substitutions at Well Known Text (WKT) formatting time.If we take only the characters in the above list which are valid in a Unicode identifier and remove the characters that ISO 19162 recommends to ignore, the only characters left are letters and digits.- Since:
- 0.3
- See Also:
Character.Subset,Character.getType(int), WKT 2 specification §B.5
Defined in the
sis-utilitymodule
-
-
Field Summary
Fields Modifier and Type Field Description static Characters.FilterLETTERS_AND_DIGITSThe subset of all characters for whichCharacter.isLetterOrDigit(int)returnstrue.static Characters.FilterUNICODE_IDENTIFIERThe subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)returnstrue, excluding ignorable characters.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleancontains(int codePoint)Returnstrueif this subset contains the given Unicode character.booleancontainsType(int type)Returnstrueif this subset contains the characters of the given type.static Characters.FilterforTypes(byte... types)Returns a subset representing the union of all Unicode characters of the given types.-
Methods inherited from class Character.Subset
equals, hashCode, toString
-
-
-
-
Field Detail
-
LETTERS_AND_DIGITS
public static final Characters.Filter LETTERS_AND_DIGITS
The subset of all characters for whichCharacter.isLetterOrDigit(int)returnstrue. This subset includes the following general categories:
SIS uses this filter when comparing two identified object names. See the Relationship with international standards section in this class javadoc for more information.Character.LOWERCASE_LETTER,UPPERCASE_LETTER,TITLECASE_LETTER,MODIFIER_LETTER,OTHER_LETTERandDECIMAL_DIGIT_NUMBER.
-
UNICODE_IDENTIFIER
public static final Characters.Filter UNICODE_IDENTIFIER
The subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)returnstrue, excluding ignorable characters. This subset includes all theLETTERS_AND_DIGITScategories with the addition of the following ones:Character.LETTER_NUMBER,CONNECTOR_PUNCTUATION,NON_SPACING_MARKandCOMBINING_SPACING_MARK.
-
-
Method Detail
-
contains
public boolean contains(int codePoint)
Returnstrueif this subset contains the given Unicode character.- Parameters:
codePoint- the Unicode character, as a code point value.- Returns:
trueif this subset contains the given character.
-
containsType
public final boolean containsType(int type)
Returnstrueif this subset contains the characters of the given type. The given type shall be one of theCharacterconstants likeLOWERCASE_LETTER,UPPERCASE_LETTER,DECIMAL_DIGIT_NUMBERorSPACE_SEPARATOR.- Parameters:
type- one of theCharacterconstants.- Returns:
trueif this subset contains the characters of the given type.- See Also:
Character.getType(int)
-
forTypes
public static Characters.Filter forTypes(byte... types)
Returns a subset representing the union of all Unicode characters of the given types.- Parameters:
types- the character types, asCharacterconstants.- Returns:
- the subset of Unicode characters of the given type.
- See Also:
Character.LOWERCASE_LETTER,Character.UPPERCASE_LETTER,Character.DECIMAL_DIGIT_NUMBER,Character.SPACE_SEPARATOR
-
-