Package com.composum.sling.core.util
Class UrlCodec
- java.lang.Object
-
- com.composum.sling.core.util.UrlCodec
-
public class UrlCodec extends Object
Codecs for the various URL parts. UnlikeURLCodec
this is focused on Strings and thus the decoder can leave unknown characters untouched: "ä%C3%A4" is decoded to "ää" instead of "?ä" asURLCodec.decode(String)
would do.
-
-
Field Summary
Fields Modifier and Type Field Description protected String
admissibleCharacters
static UrlCodec
AUTHORITY
Codec for the authority of an URL.protected Charset
charset
protected Pattern
charsToEncodeRegex
Matches one or more characters not in theadmissibleCharacters
.static UrlCodec
FRAGMENT
Codec for the fragment part of an URL.protected static String
HEXDIGITS
protected static String
INVALID_CHARACTER_MARKER
"\ufffd" is inserted whenever something could not be decoded, or sometimes when it's encoded - seeencode(String)
.protected String
invalidCharacterMarkerForEncoding
static UrlCodec
OPAQUE
Codec for opaque URLs that are not parsed.protected static String
PART_URL_SAFECHARS
The characters which can always appear in any URL without being encoded: the "unreserved" chars. Unfortunately there are different recommendations about encoding $!*'(), so we exclude them.protected static Pattern
PAT_ENCODED_CHARACTERS
Matches one or several percent encoded bytes.protected static Pattern
PAT_INVALID_ENCODED_CHARACTER
Matches a percent sign followed by something that's not a hexadecimally encoded byte.static UrlCodec
PATH
Codec for the path part of an URL.static UrlCodec
QUERYPART
Codec for the query part of an URL.static UrlCodec
URLSAFE
Codec quoting everything other than the chars which are safe in every part of the URL.protected Pattern
validationRegex
Matches an arbitrarily long sequence of admissible chars and percent encodings.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String
charsToEncode(String admissibleCharacters)
Hook to calculate the set of characters to encode from the admissibleCharactersprotected void
checkResult(@NotNull String encoded, boolean doThrow, CharBuffer out, CoderResult result)
@Nullable String
decode(@Nullable String encoded)
Decodes a percent encoded characters in the string, never throwing exceptions: if an undecodeable character is encountered it's replaced with the replacement character "\ufffd".protected @Nullable String
decode(@Nullable String encoded, boolean doThrow)
protected String
decodePreprocess(String encoded)
Hook to preprocess something about to be decoded.@Nullable String
decodeValidated(@Nullable String encoded)
Decodes percent encoded characters in the string but throws anIllegalArgumentException
if the input string is invalid: if it contains an unencoded quoting character % recognizable because it is not followed by a 2 digit hexadecimal number or it does not encode a character in the charset.@Nullable String
encode(@Nullable String encoded)
Encodes all characters which are not admissible to percent-encodings wrt.protected @Nullable String
encode(@Nullable String encoded, boolean doThrow)
protected void
encodePostprocess(StringBuffer out)
Hook for finalizing encoding@Nullable String
encodeValidated(@Nullable String encoded)
Encodes all characters which are not admissible to percent-encodings wrt.protected String
getInvalidCharacterMarkerForEncoding()
boolean
isValid(@Nullable String encoded)
Verifies that the given String is encoded: all characters are admissible and % is always followed by a hexadecimal number.String
toString()
protected byte
unhex(char c)
protected void
writePercentEncoded(ByteBuffer bytes, StringBuffer out)
-
-
-
Field Detail
-
PART_URL_SAFECHARS
protected static final String PART_URL_SAFECHARS
The characters which can always appear in any URL without being encoded: the "unreserved" chars. Unfortunately there are different recommendations about encoding $!*'(), so we exclude them. Possibly we could include the "extra" chars !*'(), . We exlude ~ since it was declared unsafe in See Also:- Constant Field Values
-
URLSAFE
public static final UrlCodec URLSAFE
Codec quoting everything other than the chars which are safe in every part of the URL.
-
QUERYPART
public static final UrlCodec QUERYPART
Codec for the query part of an URL.
-
FRAGMENT
public static final UrlCodec FRAGMENT
Codec for the fragment part of an URL.
-
OPAQUE
public static final UrlCodec OPAQUE
Codec for opaque URLs that are not parsed. Contains all unreserved, reserved and extra characters
-
PAT_ENCODED_CHARACTERS
protected static final Pattern PAT_ENCODED_CHARACTERS
Matches one or several percent encoded bytes.
-
PAT_INVALID_ENCODED_CHARACTER
protected static final Pattern PAT_INVALID_ENCODED_CHARACTER
Matches a percent sign followed by something that's not a hexadecimally encoded byte.
-
INVALID_CHARACTER_MARKER
protected static final String INVALID_CHARACTER_MARKER
"\ufffd" is inserted whenever something could not be decoded, or sometimes when it's encoded - seeencode(String)
.- See Also:
- Constant Field Values
-
HEXDIGITS
protected static final String HEXDIGITS
- See Also:
- Constant Field Values
-
charset
protected final Charset charset
-
admissibleCharacters
protected final String admissibleCharacters
-
charsToEncodeRegex
protected final Pattern charsToEncodeRegex
Matches one or more characters not in theadmissibleCharacters
.
-
validationRegex
protected final Pattern validationRegex
Matches an arbitrarily long sequence of admissible chars and percent encodings.
-
invalidCharacterMarkerForEncoding
protected transient String invalidCharacterMarkerForEncoding
-
Constructor Detail
-
UrlCodec
public UrlCodec(@NotNull @NotNull String admissibleCharacters, @NotNull @NotNull Charset charset) throws IllegalArgumentException, PatternSyntaxException
Initializes the Codec with a range of admissible characters.- Parameters:
admissibleCharacters
- all characters that remain untouched when encoding, can contain ranges like a-z in simple regex character classes. (Thus, - has to be first or last character if it needs to be included. Obviously, the quoting character '%' always has to be admissible.charset
- the charset needed for the decoder.- Throws:
IllegalArgumentException
- if the admissibleCharacters don't contain '%'PatternSyntaxException
- if the admissibleCharacters are not a well formed character class
-
-
Method Detail
-
charsToEncode
protected String charsToEncode(String admissibleCharacters)
Hook to calculate the set of characters to encode from the admissibleCharacters
-
encode
@Nullable public @Nullable String encode(@Nullable @Nullable String encoded)
Encodes all characters which are not admissible to percent-encodings wrt. the given charset. If characters are not in the charset, they will silently be encoded as a replacement character, which is either "\ufffd" or '?' if one of these is admissible, or the encoding of "\ufffd" for the charset (which might be an encoded '?').
-
encodeValidated
@Nullable public @Nullable String encodeValidated(@Nullable @Nullable String encoded) throws IllegalArgumentException
Encodes all characters which are not admissible to percent-encodings wrt. the given charset. If characters are not in the charset, we will throw anIllegalArgumentException
.- Throws:
IllegalArgumentException
- if a character cannot be encoded
-
decode
@Nullable public @Nullable String decode(@Nullable @Nullable String encoded)
Decodes a percent encoded characters in the string, never throwing exceptions: if an undecodeable character is encountered it's replaced with the replacement character "\ufffd". The only exception we make here is that a % sign without a hexadecimal number is passed through unchanged, so that this can be used to preventively decode strings that might be encoded - which is not 100% safe, though, since there might been something looking like a % encoded character: e.g. "an%effect" will be decoded to "an�fect".
-
encode
@Nullable protected @Nullable String encode(@Nullable @Nullable String encoded, boolean doThrow)
-
encodePostprocess
protected void encodePostprocess(StringBuffer out)
Hook for finalizing encoding
-
writePercentEncoded
protected void writePercentEncoded(ByteBuffer bytes, StringBuffer out)
-
getInvalidCharacterMarkerForEncoding
protected String getInvalidCharacterMarkerForEncoding()
-
decodeValidated
@Nullable public @Nullable String decodeValidated(@Nullable @Nullable String encoded) throws IllegalArgumentException
Decodes percent encoded characters in the string but throws anIllegalArgumentException
if the input string is invalid: if it contains an unencoded quoting character % recognizable because it is not followed by a 2 digit hexadecimal number or it does not encode a character in the charset.- Throws:
IllegalArgumentException
- if encoded is not a validly encoded String
-
decode
@Nullable protected @Nullable String decode(@Nullable @Nullable String encoded, boolean doThrow) throws IllegalArgumentException
- Throws:
IllegalArgumentException
-
decodePreprocess
protected String decodePreprocess(String encoded)
Hook to preprocess something about to be decoded.
-
checkResult
protected void checkResult(@NotNull @NotNull String encoded, boolean doThrow, CharBuffer out, CoderResult result) throws IllegalArgumentException
- Throws:
IllegalArgumentException
-
unhex
protected byte unhex(char c)
-
isValid
public boolean isValid(@Nullable @Nullable String encoded)
Verifies that the given String is encoded: all characters are admissible and % is always followed by a hexadecimal number.
-
-
-