Class SecurityUtils
java.lang.Object
eu.righettod.SecurityUtils
Provides different utilities methods to apply processing from a security perspective.
These code snippet:
GitHub repository.
Source code of the class.
These code snippet:
- Can be used, as "foundation", to customize the validation to the app context.
- Were implemented in a way to facilitate adding or removal of validations depending on usage context.
- Were centralized on one class to be able to enhance them across time as well as missing case/bug identification.
GitHub repository.
Source code of the class.
-
Method Summary
Modifier and TypeMethodDescriptionstatic booleanapplyJWTExtraValidation(com.auth0.jwt.interfaces.DecodedJWT token, TokenType expectedTokenType, List<String> revokedTokenJTIList) static StringapplyURLDecoding(String encodedData, int decodingRoundThreshold) Perform sequential URL decoding operations against a URL encoded data until the data is not URL encoded anymore or if the specified threshold is reached.static voidclearPDFMetadata(org.apache.pdfbox.pdmodel.PDDocument document) Remove as much as possible metadata from the provided PDF document object.static byte[]Compute a SHA256 hash from an input composed of a collection of strings.
This method take care to build the source string in a way to prevent this source string to be prone to abuse targeting the different parts composing it.static UUIDCompute a UUID version 7 without using any external dependency.
Below are my personal point of view and perhaps I'm totally wrong!ensureSerializedObjectIntegrity(ProcessingModeType processingModeType, String input, byte[] secret) Provide a way to add an integrity marker (HMAC) to a serialized object serialized using the java native system (binary).
The goal is to provide a temporary workaround to try to prevent deserialization attacks and give time to move to a text-based serialization approach.extractAllPDFLinks(String pdfFilePath) Extract all URL links from a PDF file provided.
This can be used to apply validation on a PDF against contained links.static Map<SensitiveInformationType, Set<String>> extractAllSensitiveInformation(String content) Extract all sensitive information from a string provided.
This can be used to identify any sensitive information into a message expected to be written in a log and then replace every sensitive values by an obfuscated ones.
For the luxembourg national identification number, this method focus on detecting identifiers for a physical entity (people) and not a moral one (company).
I delegated the validation of the IBAN to a dedicated library (iban4j) to not "reinvent the wheel" and then introduce buggy validation myself.static StringidentifyMimeType(byte[] content) Identify the mime type of the content specified (array of bytes).
Note that it cannot be fully trusted (see the tweet '1595824709186519041' referenced), so, additional validations are required.static booleanisEmailAddress(String addr) Apply a collection of validations on a string expected to be an email address: Is a valid email address, from a parser perspective, following RFCs on email addresses. Is not using "Encoded-word" format. Is not using comment format. Is not using "Punycode" format. Is not using UUCP style addresses. Is not using address literals. Is not using source routes. Is not using the "percent hack".
This is based on the research work from Gareth Heyes added in references (Portswigger).
Note: The notion of valid, here, is to take from a secure usage of the data perspective.static booleanisExcelCSVSafe(String csvFilePath) Apply a collection of validations on a EXCEL CSV file provided (file was expected to be opened in Microsoft EXCEL): Real CSV file. Do not contains any payload related to a CSV injections. Ensure that, if Apache Commons CSV does not find any record then, the file will be considered as NOT safe (prevent potential bypasses).
Note: Record delimiter used is the,(comma) character.static booleanisGZIPCompressedDataSafe(byte[] compressedBytes, long maxCountOfDecompressedBytesAllowed) Apply a collection of validations on a bytes array provided representing GZIP compressed data: Are valid GZIP compressed data. The number of bytes once decompressed is under the specified limit.
Note: The valueInteger.MAX_VALUE - 8was chosen because during my tests on Java 25 (JDK 64 bits on Windows 11 Pro), it was possible to decompress such amount of data with the default JVM settings without causing an Out Of Memory error.static booleanisImageSafe(String imageFilePath, List<String> imageAllowedMimeTypes) Apply a collection of validations on a image file provided: Real image file. Its mime type is into the list of allowed mime types. Its metadata fields do not contains any characters related to a malicious payloads.
Important note: This implementation is prone to bypass using the "raw insertion" method documented in the blog post from the Synacktiv team.static booleanisJSONSafe(String json, int maxItemsByArraysCount, int maxDeepnessAllowed) Apply a collection of validations on a JSON string provided: Real JSON structure. Contain less than a specified number of deepness for nested objects or arrays. Contain less than a specified number of items in any arrays.
Note: I decided to use a parsing approach using only string processing to prevent any StackOverFlow or OutOfMemory error that can be abused.
I used the following assumption: The character{identify the beginning of an object. The character}identify the end of an object. The character[identify the beginning of an array. The character]identify the end of an array. The character"identify the delimiter of a string. The character sequence\"identify the escaping of an double quote.static booleanisPathSafe(String path) Apply a collection of validations on a string expected to be an system file/folder path: Does not contains path traversal payload. The canonical path is equals to the absolute path.static booleanApply a collection of validations on a PDF file provided: Real PDF file. No attachments. No Javascript code. No links using action of type URI/Launch/RemoteGoTo/ImportData. No XFA forms in order to prevent exposure to XXE/SSRF like CVE-2025-54988.static booleanisPSD2StetSafeCertificateURL(String certificateUrl) The PSD2 STET specification require to use HTTP Signature.static booleanApply a collection of validations on a string expected to be an public IP address: Is a valid IP v4 or v6 address. Is public from an Internet perspective.
Note: I often see missing such validation in the value read from HTTP request headers like "X-Forwarded-For" or "Forwarded".static booleanisRegexSafe(String regex, String data, Optional<Integer> maximumRunningTimeInSeconds) Apply a validations on a regular expression to ensure that is not prone to the ReDOS attack.static booleanisRelativeURL(String targetUrl) Validate that the URL provided is really a relative URL.static booleanisWeakPINCode(String pinCode) Apply a collection of validation to verify if a provided PIN code is considered weak (easy to guess) or none.
This method consider that format of the PIN code is [0-9]{6,}
Rule to consider a PIN code as weak: Length is inferior to 6 positions. Contain only the same number or only a sequence of zero. Contain sequence of following incremental or decremental numbers.static booleanisWord972003DocumentSafe(String wordFilePath) Apply a collection of validations on a Word 97-2003 (binary format) document file provided: Real Microsoft Word 97-2003 document file. No VBA Macro.
No embedded objects.static booleanisXMLHaveCommentsOrXSLProcessingInstructions(String xmlFilePath) Identify if an XML contains any XML comments or have any XSL processing instructions.
Stream reader based parsing is used to support large XML tree.static booleanisXMLOnlyUseAllowedXSDorDTD(String xmlFilePath, List<String> allowedSystemIdentifiers) Ensure that an XML file only uses DTD/XSD references (called System Identifier) present in the allowed list provided.
The code is based on the validation implemented into the OpenJDK 21, by the class java.util.prefs.XmlSupport, in the method loadPrefsDoc().
The method also ensure that no Public Identifier is used to prevent potential bypasses of the validations.static booleanEnsure that an XML file does not contain any External Entity, DTD or XInclude instructions.static booleanEnsure that an XSD file does not contain any include/import/redefine instruction (prevent exposure to SSRF).static booleanApply a collection of validations on a ZIP file provided: Real ZIP file. Contain less than a specified level of deepness. Do not contain Zip-Slip entry path.static byte[]sanitizeFile(String inputFilePath, InputFileType inputFileType) Rewrite the input file to remove any embedded files that is not embedded using a methods supported by the official format of the file.
Example: a file can be embedded by adding it to the end of the source file, see the reference provided for details.static StringsanitizeLogMessage(String message, int maxMessageLength) Process a string, intended to be written in a log, to remove as much as possible information that can lead to an exposure to a log injection vulnerability.
Log injection is also called log forging.
The following information are removed: Characters: Carriage Return (CR), Linefeed (LF) and Tabulation (TAB). Leading and trailing spaces. Any HTML tags.
A parameter is also used to limit the maximum length of the sanitized message.
-
Method Details
-
isWeakPINCode
Apply a collection of validation to verify if a provided PIN code is considered weak (easy to guess) or none.
This method consider that format of the PIN code is [0-9]{6,}
Rule to consider a PIN code as weak:- Length is inferior to 6 positions.
- Contain only the same number or only a sequence of zero.
- Contain sequence of following incremental or decremental numbers.
- Parameters:
pinCode- PIN code to verify.- Returns:
- True only if the PIN is considered as weak.
-
isWord972003DocumentSafe
Apply a collection of validations on a Word 97-2003 (binary format) document file provided:- Real Microsoft Word 97-2003 document file.
- No VBA Macro.
- No embedded objects.
- Parameters:
wordFilePath- Filename of the Word document file to check.- Returns:
- True only if the file pass all validations.
- See Also:
-
isXMLSafe
-
extractAllPDFLinks
Extract all URL links from a PDF file provided.
This can be used to apply validation on a PDF against contained links.- Parameters:
pdfFilePath- pdfFilePath Filename of the PDF file to process.- Returns:
- A List of URL objects that is empty if no links is found.
- Throws:
Exception- If any error occurs during the processing of the PDF file.- See Also:
-
isPDFSafe
Apply a collection of validations on a PDF file provided:- Real PDF file.
- No attachments.
- No Javascript code.
- No links using action of type URI/Launch/RemoteGoTo/ImportData.
- No XFA forms in order to prevent exposure to XXE/SSRF like CVE-2025-54988.
- Parameters:
pdfFilePath- Filename of the PDF file to check.- Returns:
- True only if the file pass all validations.
- See Also:
-
clearPDFMetadata
Remove as much as possible metadata from the provided PDF document object.- Parameters:
document- PDFBox PDF document object on which metadata must be removed.- See Also:
-
isRelativeURL
Validate that the URL provided is really a relative URL.- Parameters:
targetUrl- URL to validate.- Returns:
- True only if the file pass all validations.
- See Also:
-
isZIPSafe
public static boolean isZIPSafe(String zipFilePath, int maxLevelDeepness, boolean rejectArchiveFile) Apply a collection of validations on a ZIP file provided:- Real ZIP file.
- Contain less than a specified level of deepness.
- Do not contain Zip-Slip entry path.
- Parameters:
zipFilePath- Filename of the ZIP file to check.maxLevelDeepness- Threshold of deepness above which a ZIP archive will be rejected.rejectArchiveFile- Flag to specify if presence of any archive entry will cause the rejection of the ZIP file.- Returns:
- True only if the file pass all validations.
- See Also:
-
identifyMimeType
Identify the mime type of the content specified (array of bytes).
Note that it cannot be fully trusted (see the tweet '1595824709186519041' referenced), so, additional validations are required.- Parameters:
content- The content as an array of bytes.- Returns:
- The mime type in lower case or null if it cannot be identified.
- See Also:
-
isPublicIPAddress
Apply a collection of validations on a string expected to be an public IP address:- Is a valid IP v4 or v6 address.
- Is public from an Internet perspective.
Note: I often see missing such validation in the value read from HTTP request headers like "X-Forwarded-For" or "Forwarded".
Note for IPv6: I used documentation found so it is really experimental!- Parameters:
ip- String expected to be a valid IP address.- Returns:
- True only if the string pass all validations.
- See Also:
-
computeHashNoProneToAbuseOnParts
Compute a SHA256 hash from an input composed of a collection of strings.
This method take care to build the source string in a way to prevent this source string to be prone to abuse targeting the different parts composing it.
Example of possible abuse without precautions applied during the hash calculation logic:
This method ensure that both hash above will be different.
Hash ofSHA256("Hello", "My", "World!!!")will be equals to the hash ofSHA256("Hell", "oMyW", "orld!!!").
Note: The character|is used, as separator, of every parts so a part is not allowed to contains this character.- Parameters:
parts- Ordered list of strings to use to build the input string for which the hash must be computed on. No null value is accepted on object composing the collection.- Returns:
- The hash, as an array of bytes, to allow caller to convert it to the final representation wanted (HEX, Base64, etc.). If the collection passed is null or empty then the method return null.
- Throws:
Exception- If any exception occurs- See Also:
-
isXMLOnlyUseAllowedXSDorDTD
public static boolean isXMLOnlyUseAllowedXSDorDTD(String xmlFilePath, List<String> allowedSystemIdentifiers) Ensure that an XML file only uses DTD/XSD references (called System Identifier) present in the allowed list provided.
The code is based on the validation implemented into the OpenJDK 21, by the class java.util.prefs.XmlSupport, in the method loadPrefsDoc().
The method also ensure that no Public Identifier is used to prevent potential bypasses of the validations.- Parameters:
xmlFilePath- Filename of the XML file to check.allowedSystemIdentifiers- List of URL allowed for System Identifier specified for any XSD/DTD references.- Returns:
- True only if the file pass all validations.
- See Also:
-
isExcelCSVSafe
Apply a collection of validations on a EXCEL CSV file provided (file was expected to be opened in Microsoft EXCEL):- Real CSV file.
- Do not contains any payload related to a CSV injections.
Note: Record delimiter used is the,(comma) character. See the Apache Commons CSV reference provided for EXCEL.- Parameters:
csvFilePath- Filename of the CSV file to check.- Returns:
- True only if the file pass all validations.
- See Also:
-
ensureSerializedObjectIntegrity
public static Map<String,Object> ensureSerializedObjectIntegrity(ProcessingModeType processingModeType, String input, byte[] secret) throws Exception Provide a way to add an integrity marker (HMAC) to a serialized object serialized using the java native system (binary).
The goal is to provide a temporary workaround to try to prevent deserialization attacks and give time to move to a text-based serialization approach.- Parameters:
processingModeType- Define the mode of processing i.e. protect or validate. (ProcessingModeType)input- When the processing mode is "protect" than the expected input (string) is a java serialized object encoded in Base64 otherwise (processing mode is "validate") expected input is the output of this method when the "protect" mode was used.secret- Secret to use to compute the SHA256 HMAC.- Returns:
- A map with the following keys:
- PROCESSING_MODE: Processing mode used to compute the result.
- STATUS: A boolean indicating if the processing was successful or not.
- RESULT: Always contains a string representing the protected serialized object in the format
[SERIALIZED_OBJECT_BASE64_ENCODED]:[SERIALIZED_OBJECT_HMAC_BASE64_ENCODED].
- Throws:
Exception- If any exception occurs.- See Also:
-
isJSONSafe
Apply a collection of validations on a JSON string provided:- Real JSON structure.
- Contain less than a specified number of deepness for nested objects or arrays.
- Contain less than a specified number of items in any arrays.
Note: I decided to use a parsing approach using only string processing to prevent any StackOverFlow or OutOfMemory error that can be abused.
I used the following assumption:- The character
{identify the beginning of an object. - The character
}identify the end of an object. - The character
[identify the beginning of an array. - The character
]identify the end of an array. - The character
"identify the delimiter of a string. - The character sequence
\"identify the escaping of an double quote.
- Parameters:
json- String containing the JSON data to validate.maxItemsByArraysCount- Maximum number of items allowed in an array.maxDeepnessAllowed- Maximum number nested objects or arrays allowed.- Returns:
- True only if the string pass all validations.
- See Also:
-
isImageSafe
Apply a collection of validations on a image file provided:- Real image file.
- Its mime type is into the list of allowed mime types.
- Its metadata fields do not contains any characters related to a malicious payloads.
Important note: This implementation is prone to bypass using the "raw insertion" method documented in the blog post from the Synacktiv team. To handle such case, it is recommended to resize the image to remove any non image-related content, see here for an example.- Parameters:
imageFilePath- Filename of the image file to check.imageAllowedMimeTypes- List of image mime types allowed.- Returns:
- True only if the file pass all validations.
- See Also:
-
sanitizeFile
public static byte[] sanitizeFile(String inputFilePath, InputFileType inputFileType) throws Exception Rewrite the input file to remove any embedded files that is not embedded using a methods supported by the official format of the file.
Example: a file can be embedded by adding it to the end of the source file, see the reference provided for details.- Parameters:
inputFilePath- Filename of the file to clean up.inputFileType- Type of the file provided.- Returns:
- A array of bytes with the cleaned file.
- Throws:
IllegalArgumentException- If an invalid parameter is passedException- If any technical error during the cleaning processing- See Also:
-
isEmailAddress
Apply a collection of validations on a string expected to be an email address:- Is a valid email address, from a parser perspective, following RFCs on email addresses.
- Is not using "Encoded-word" format.
- Is not using comment format.
- Is not using "Punycode" format.
- Is not using UUCP style addresses.
- Is not using address literals.
- Is not using source routes.
- Is not using the "percent hack".
This is based on the research work from Gareth Heyes added in references (Portswigger).
Note: The notion of valid, here, is to take from a secure usage of the data perspective.- Parameters:
addr- String expected to be a valid email address.- Returns:
- True only if the string pass all validations.
- See Also:
-
isPSD2StetSafeCertificateURL
The PSD2 STET specification require to use HTTP Signature.
Section 3.5.1.2 of the document Documentation Framework version 1.6.3.
The problem is that, by design, the HTTP Signature specification is prone to blind SSRF.
URL example taken from the STET specification:https://path.to/myQsealCertificate_714f8154ec259ac40b8a9786c9908488b2582b68b17e865fede4636d726b709f.
The objective of this code is to try to decrease the "exploitability/interest" of this SSRF for an attacker.- Parameters:
certificateUrl- Url pointing to a Qualified Certificate (QSealC) encoded in PEM format and respecting the ETSI/TS119495 technical Specification .- Returns:
- TRUE only if the url point to a Qualified Certificate in PEM format.
- See Also:
-
applyURLDecoding
public static String applyURLDecoding(String encodedData, int decodingRoundThreshold) throws SecurityException Perform sequential URL decoding operations against a URL encoded data until the data is not URL encoded anymore or if the specified threshold is reached.- Parameters:
encodedData- URL encoded data.decodingRoundThreshold- Threshold above which decoding will fail.- Returns:
- The decoded data.
- Throws:
SecurityException- If the threshold is reached.- See Also:
-
isPathSafe
Apply a collection of validations on a string expected to be an system file/folder path:- Does not contains path traversal payload.
- The canonical path is equals to the absolute path.
- Parameters:
path- String expected to be a valid system file/folder path.- Returns:
- True only if the string pass all validations.
- See Also:
-
isXMLHaveCommentsOrXSLProcessingInstructions
Identify if an XML contains any XML comments or have any XSL processing instructions.
Stream reader based parsing is used to support large XML tree.- Parameters:
xmlFilePath- Filename of the XML file to check.- Returns:
- True only if XML comments or XSL processing instructions are identified.
- See Also:
-
applyJWTExtraValidation
public static boolean applyJWTExtraValidation(com.auth0.jwt.interfaces.DecodedJWT token, TokenType expectedTokenType, List<String> revokedTokenJTIList) Perform a set of additional validations against a JWT token:- Parameters:
token- JWT token for which signature was already validated and on which a set of additional validations will be applied.expectedTokenType- The type of expected token using the enumeration provided.revokedTokenJTIList- A list of token identifier (JTI claim) referring to tokens that were revoked and to which the JTI claim of the token will be compared to.- Returns:
- True only the token pass all the validations.
- See Also:
-
isRegexSafe
public static boolean isRegexSafe(String regex, String data, Optional<Integer> maximumRunningTimeInSeconds) Apply a validations on a regular expression to ensure that is not prone to the ReDOS attack.
If your technology is supported by regexploit then use it instead of this method!
Indeed, the Doyensec team has made an intensive and amazing work on this topic and created this effective tool.- Parameters:
regex- String expected to be a valid regular expression (regex).data- Test data on which the regular expression is executed for the test.maximumRunningTimeInSeconds- Optional parameter to specify a number of seconds above which a regex execution time is considered as not safe (default to 4 seconds when not specified).- Returns:
- True only if the string pass all validations.
- See Also:
-
computeUUIDv7
Compute a UUID version 7 without using any external dependency.
Below are my personal point of view and perhaps I'm totally wrong!
Why such method?- Java inferior or equals to 21 does not supports natively the generation of an UUID version 7.
- Import a library just to generate such value is overkill for me.
- Library that I have found, generating such version of an UUID, are not provided by entities commonly used in the java world, such as the SPRING framework provider.
Full credits for this implementation goes to the authors and contributors of the UUIDv7 project.
Below are the java libraries that I have found but, for which, I do not trust enough the provider to use them directly:- Returns:
- A UUID object representing the UUID v7.
- See Also:
-
isXSDSafe
-
extractAllSensitiveInformation
public static Map<SensitiveInformationType, Set<String>> extractAllSensitiveInformation(String content) throws Exception Extract all sensitive information from a string provided.
This can be used to identify any sensitive information into a message expected to be written in a log and then replace every sensitive values by an obfuscated ones.
For the luxembourg national identification number, this method focus on detecting identifiers for a physical entity (people) and not a moral one (company).
I delegated the validation of the IBAN to a dedicated library (iban4j) to not "reinvent the wheel" and then introduce buggy validation myself. I used iban4j over the IBANValidator class from the Apache Commons Validator library because iban4j perform a full official IBAN specification validation so its reduce risks of false-positives by ensuring that an IBAN detected is a real IBAN.
Same thing and reason regarding the validation of the bank card PAN using the class CreditCardValidator from the Apache Commons Validator library.- Parameters:
content- String in which sensitive information must be searched.- Returns:
- A map with the collection of identified sensitive information gathered by sensitive information type. If nothing is found then the map is empty. A type of sensitive information is only present if there is at least one item found. A set is used to not store duplicates occurrence of the same sensitive information.
- Throws:
Exception- If any error occurs during the processing.- See Also:
-
isGZIPCompressedDataSafe
public static boolean isGZIPCompressedDataSafe(byte[] compressedBytes, long maxCountOfDecompressedBytesAllowed) Apply a collection of validations on a bytes array provided representing GZIP compressed data:- Are valid GZIP compressed data.
- The number of bytes once decompressed is under the specified limit.
Note: The valueInteger.MAX_VALUE - 8was chosen because during my tests on Java 25 (JDK 64 bits on Windows 11 Pro), it was possible to decompress such amount of data with the default JVM settings without causing an Out Of Memory error.- Parameters:
compressedBytes- Array of bytes containing the GZIP compressed data to check.maxCountOfDecompressedBytesAllowed- Maximum number of decompressed bytes allowed. Default to 10 MB if the specified value is inferior to 1 or superior to Integer.MAX_VALUE - 8.- Returns:
- True only if the file pass all validations.
- See Also:
-
sanitizeLogMessage
Process a string, intended to be written in a log, to remove as much as possible information that can lead to an exposure to a log injection vulnerability.
Log injection is also called log forging.
The following information are removed:- Characters: Carriage Return (CR), Linefeed (LF) and Tabulation (TAB).
- Leading and trailing spaces.
- Any HTML tags.
A parameter is also used to limit the maximum length of the sanitized message. To remove any HTML tags, the OWASP project Java HTML Sanitizer is leveraged.
I delegated such removal to a dedicated library to prevent missing of edge cases as well as potential bypasses.- Parameters:
message- The original string message intended to be written in a log.maxMessageLength- The maximum number of characters after which the sanitized message must be truncated. If inferior to 1 then default to the value of 500.- Returns:
- The string message cleaned.
- See Also:
-