Package com.composum.sling.core.util
Class UrlCodec
- java.lang.Object
-
- com.composum.sling.core.util.UrlCodec
-
public class UrlCodec extends Object
Codecs for the various URL parts. UnlikeURLCodecthis is focused on Strings and thus the decoder can leave unknown characters untouched: "ä%C3%A4" is decoded to "ää" instead of "?ä" asURLCodec.decode(String)would do.
-
-
Field Summary
Fields Modifier and Type Field Description protected StringadmissibleCharactersstatic UrlCodecAUTHORITYCodec for the authority of an URL.protected Charsetcharsetprotected PatterncharsToEncodeRegexMatches one or more characters not in theadmissibleCharacters.static UrlCodecFRAGMENTCodec for the fragment part of an URL.protected static StringHEXDIGITSprotected static StringINVALID_CHARACTER_MARKER"\ufffd" is inserted whenever something could not be decoded, or sometimes when it's encoded - seeencode(String).protected StringinvalidCharacterMarkerForEncodingstatic UrlCodecOPAQUECodec for opaque URLs that are not parsed.protected static StringPART_URL_SAFECHARSThe characters which can always appear in any URL without being encoded: the "unreserved" chars. Unfortunately there are different recommendations about encoding $!*'(), so we exclude them.protected static PatternPAT_ENCODED_CHARACTERSMatches one or several percent encoded bytes.protected static PatternPAT_INVALID_ENCODED_CHARACTERMatches a percent sign followed by something that's not a hexadecimally encoded byte.static UrlCodecPATHCodec for the path part of an URL.static UrlCodecQUERYPARTCodec for the query part of an URL.static UrlCodecURLSAFECodec quoting everything other than the chars which are safe in every part of the URL.protected PatternvalidationRegexMatches an arbitrarily long sequence of admissible chars and percent encodings.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected StringcharsToEncode(String admissibleCharacters)Hook to calculate the set of characters to encode from the admissibleCharactersprotected voidcheckResult(@NotNull String encoded, boolean doThrow, CharBuffer out, CoderResult result)@Nullable Stringdecode(@Nullable String encoded)Decodes a percent encoded characters in the string, never throwing exceptions: if an undecodeable character is encountered it's replaced with the replacement character "\ufffd".protected @Nullable Stringdecode(@Nullable String encoded, boolean doThrow)protected StringdecodePreprocess(String encoded)Hook to preprocess something about to be decoded.@Nullable StringdecodeValidated(@Nullable String encoded)Decodes percent encoded characters in the string but throws anIllegalArgumentExceptionif the input string is invalid: if it contains an unencoded quoting character % recognizable because it is not followed by a 2 digit hexadecimal number or it does not encode a character in the charset.@Nullable Stringencode(@Nullable String encoded)Encodes all characters which are not admissible to percent-encodings wrt.protected @Nullable Stringencode(@Nullable String encoded, boolean doThrow)protected voidencodePostprocess(StringBuffer out)Hook for finalizing encoding@Nullable StringencodeValidated(@Nullable String encoded)Encodes all characters which are not admissible to percent-encodings wrt.protected StringgetInvalidCharacterMarkerForEncoding()booleanisValid(@Nullable String encoded)Verifies that the given String is encoded: all characters are admissible and % is always followed by a hexadecimal number.StringtoString()protected byteunhex(char c)protected voidwritePercentEncoded(ByteBuffer bytes, StringBuffer out)
-
-
-
Field Detail
-
PART_URL_SAFECHARS
protected static final String PART_URL_SAFECHARS
The characters which can always appear in any URL without being encoded: the "unreserved" chars. Unfortunately there are different recommendations about encoding $!*'(), so we exclude them. Possibly we could include the "extra" chars !*'(), . We exlude ~ since it was declared unsafe in See Also:- Constant Field Values
-
URLSAFE
public static final UrlCodec URLSAFE
Codec quoting everything other than the chars which are safe in every part of the URL.
-
QUERYPART
public static final UrlCodec QUERYPART
Codec for the query part of an URL.
-
FRAGMENT
public static final UrlCodec FRAGMENT
Codec for the fragment part of an URL.
-
OPAQUE
public static final UrlCodec OPAQUE
Codec for opaque URLs that are not parsed. Contains all unreserved, reserved and extra characters
-
PAT_ENCODED_CHARACTERS
protected static final Pattern PAT_ENCODED_CHARACTERS
Matches one or several percent encoded bytes.
-
PAT_INVALID_ENCODED_CHARACTER
protected static final Pattern PAT_INVALID_ENCODED_CHARACTER
Matches a percent sign followed by something that's not a hexadecimally encoded byte.
-
INVALID_CHARACTER_MARKER
protected static final String INVALID_CHARACTER_MARKER
"\ufffd" is inserted whenever something could not be decoded, or sometimes when it's encoded - seeencode(String).- See Also:
- Constant Field Values
-
HEXDIGITS
protected static final String HEXDIGITS
- See Also:
- Constant Field Values
-
charset
protected final Charset charset
-
admissibleCharacters
protected final String admissibleCharacters
-
charsToEncodeRegex
protected final Pattern charsToEncodeRegex
Matches one or more characters not in theadmissibleCharacters.
-
validationRegex
protected final Pattern validationRegex
Matches an arbitrarily long sequence of admissible chars and percent encodings.
-
invalidCharacterMarkerForEncoding
protected transient String invalidCharacterMarkerForEncoding
-
Constructor Detail
-
UrlCodec
public UrlCodec(@NotNull @NotNull String admissibleCharacters, @NotNull @NotNull Charset charset) throws IllegalArgumentException, PatternSyntaxExceptionInitializes the Codec with a range of admissible characters.- Parameters:
admissibleCharacters- all characters that remain untouched when encoding, can contain ranges like a-z in simple regex character classes. (Thus, - has to be first or last character if it needs to be included. Obviously, the quoting character '%' always has to be admissible.charset- the charset needed for the decoder.- Throws:
IllegalArgumentException- if the admissibleCharacters don't contain '%'PatternSyntaxException- if the admissibleCharacters are not a well formed character class
-
-
Method Detail
-
charsToEncode
protected String charsToEncode(String admissibleCharacters)
Hook to calculate the set of characters to encode from the admissibleCharacters
-
encode
@Nullable public @Nullable String encode(@Nullable @Nullable String encoded)
Encodes all characters which are not admissible to percent-encodings wrt. the given charset. If characters are not in the charset, they will silently be encoded as a replacement character, which is either "\ufffd" or '?' if one of these is admissible, or the encoding of "\ufffd" for the charset (which might be an encoded '?').
-
encodeValidated
@Nullable public @Nullable String encodeValidated(@Nullable @Nullable String encoded) throws IllegalArgumentException
Encodes all characters which are not admissible to percent-encodings wrt. the given charset. If characters are not in the charset, we will throw anIllegalArgumentException.- Throws:
IllegalArgumentException- if a character cannot be encoded
-
decode
@Nullable public @Nullable String decode(@Nullable @Nullable String encoded)
Decodes a percent encoded characters in the string, never throwing exceptions: if an undecodeable character is encountered it's replaced with the replacement character "\ufffd". The only exception we make here is that a % sign without a hexadecimal number is passed through unchanged, so that this can be used to preventively decode strings that might be encoded - which is not 100% safe, though, since there might been something looking like a % encoded character: e.g. "an%effect" will be decoded to "an�fect".
-
encode
@Nullable protected @Nullable String encode(@Nullable @Nullable String encoded, boolean doThrow)
-
encodePostprocess
protected void encodePostprocess(StringBuffer out)
Hook for finalizing encoding
-
writePercentEncoded
protected void writePercentEncoded(ByteBuffer bytes, StringBuffer out)
-
getInvalidCharacterMarkerForEncoding
protected String getInvalidCharacterMarkerForEncoding()
-
decodeValidated
@Nullable public @Nullable String decodeValidated(@Nullable @Nullable String encoded) throws IllegalArgumentException
Decodes percent encoded characters in the string but throws anIllegalArgumentExceptionif the input string is invalid: if it contains an unencoded quoting character % recognizable because it is not followed by a 2 digit hexadecimal number or it does not encode a character in the charset.- Throws:
IllegalArgumentException- if encoded is not a validly encoded String
-
decode
@Nullable protected @Nullable String decode(@Nullable @Nullable String encoded, boolean doThrow) throws IllegalArgumentException
- Throws:
IllegalArgumentException
-
decodePreprocess
protected String decodePreprocess(String encoded)
Hook to preprocess something about to be decoded.
-
checkResult
protected void checkResult(@NotNull @NotNull String encoded, boolean doThrow, CharBuffer out, CoderResult result) throws IllegalArgumentException- Throws:
IllegalArgumentException
-
unhex
protected byte unhex(char c)
-
isValid
public boolean isValid(@Nullable @Nullable String encoded)Verifies that the given String is encoded: all characters are admissible and % is always followed by a hexadecimal number.
-
-
-