Class UrlCodec

  • public class UrlCodec
    extends Object
    Codecs for the various URL parts. Unlike URLCodec this is focused on Strings and thus the decoder can leave unknown characters untouched: "ä%C3%A4" is decoded to "ää" instead of "?ä" as URLCodec.decode(String) would do.
    • Field Detail


        protected static final String PART_URL_SAFECHARS
        The characters which can always appear in any URL without being encoded: the "unreserved" chars. Unfortunately there are different recommendations about encoding $!*'(), so we exclude them. Possibly we could include the "extra" chars !*'(), . We exlude ~ since it was declared unsafe in See Also:
        Constant Field Values
      • URLSAFE

        public static final UrlCodec URLSAFE
        Codec quoting everything other than the chars which are safe in every part of the URL.

        public static final UrlCodec AUTHORITY
        Codec for the authority of an URL.
      • OPAQUE

        public static final UrlCodec OPAQUE
        Codec for opaque URLs that are not parsed. Contains all unreserved, reserved and extra characters

        protected static final Pattern PAT_ENCODED_CHARACTERS
        Matches one or several percent encoded bytes.

        protected static final Pattern PAT_INVALID_ENCODED_CHARACTER
        Matches a percent sign followed by something that's not a hexadecimally encoded byte.

        protected static final String INVALID_CHARACTER_MARKER
        "\ufffd" is inserted whenever something could not be decoded, or sometimes when it's encoded - see encode(String).
        See Also:
        Constant Field Values
      • charset

        protected final Charset charset
      • admissibleCharacters

        protected final String admissibleCharacters
      • validationRegex

        protected final Pattern validationRegex
        Matches an arbitrarily long sequence of admissible chars and percent encodings.
      • invalidCharacterMarkerForEncoding

        protected transient String invalidCharacterMarkerForEncoding
    • Constructor Detail

      • UrlCodec

        public UrlCodec​(@NotNull
                        @NotNull String admissibleCharacters,
                        @NotNull Charset charset)
                 throws IllegalArgumentException,
        Initializes the Codec with a range of admissible characters.
        admissibleCharacters - all characters that remain untouched when encoding, can contain ranges like a-z in simple regex character classes. (Thus, - has to be first or last character if it needs to be included. Obviously, the quoting character '%' always has to be admissible.
        charset - the charset needed for the decoder.
        IllegalArgumentException - if the admissibleCharacters don't contain '%'
        PatternSyntaxException - if the admissibleCharacters are not a well formed character class
    • Method Detail

      • charsToEncode

        protected String charsToEncode​(String admissibleCharacters)
        Hook to calculate the set of characters to encode from the admissibleCharacters
      • encode

        public @Nullable String encode​(@Nullable
                                       @Nullable String encoded)
        Encodes all characters which are not admissible to percent-encodings wrt. the given charset. If characters are not in the charset, they will silently be encoded as a replacement character, which is either "\ufffd" or '?' if one of these is admissible, or the encoding of "\ufffd" for the charset (which might be an encoded '?').
      • decode

        public @Nullable String decode​(@Nullable
                                       @Nullable String encoded)
        Decodes a percent encoded characters in the string, never throwing exceptions: if an undecodeable character is encountered it's replaced with the replacement character "\ufffd". The only exception we make here is that a % sign without a hexadecimal number is passed through unchanged, so that this can be used to preventively decode strings that might be encoded - which is not 100% safe, though, since there might been something looking like a % encoded character: e.g. "an%effect" will be decoded to "an�fect".
      • encode

        protected @Nullable String encode​(@Nullable
                                          @Nullable String encoded,
                                          boolean doThrow)
      • encodePostprocess

        protected void encodePostprocess​(StringBuffer out)
        Hook for finalizing encoding
      • getInvalidCharacterMarkerForEncoding

        protected String getInvalidCharacterMarkerForEncoding()
        To mark characters that could not properly be encoded, we use "\ufffd" or ? if one of these is admissible, or "\ufffd" encoded if that belongs to the charset, or ? encoded if it's not.
      • decodeValidated

        public @Nullable String decodeValidated​(@Nullable
                                                @Nullable String encoded)
                                         throws IllegalArgumentException
        Decodes percent encoded characters in the string but throws an IllegalArgumentException if the input string is invalid: if it contains an unencoded quoting character % recognizable because it is not followed by a 2 digit hexadecimal number or it does not encode a character in the charset.
        IllegalArgumentException - if encoded is not a validly encoded String
      • decodePreprocess

        protected String decodePreprocess​(String encoded)
        Hook to preprocess something about to be decoded.
      • unhex

        protected byte unhex​(char c)
      • isValid

        public boolean isValid​(@Nullable
                               @Nullable String encoded)
        Verifies that the given String is encoded: all characters are admissible and % is always followed by a hexadecimal number.