> For the complete documentation index, see [llms.txt](https://docs.guardware.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.guardware.com/data-governance/data-governance/data-type.md). # Data Type Data Type defines the kind of information that GuardWare detects, monitors, and protects during scans and policy enforcement. Each Data type represents a rule, pattern, or phrase used to identify sensitive content within files. For example, credit card numbers, passport numbers, national IDs, employee records, financial data, and other confidential information. Data types are the core detection rules that determine what content is flagged as sensitive in your environment. GuardWare includes a built-in library of predefined data types covering common sensitive information categories such as: * Payment Card Industry (PCI) data (credit card numbers, CVV codes, cardholder names, transaction data) * Personally Identifiable Information (PII) (social security numbers, driver's licence numbers, passport numbers, names, email addresses, phone numbers, residential addresses). You can use these predefined data types as-is, or create custom data types for information specific to your organisation, such as employee ID number formats, internal project code patterns, proprietary document naming conventions, or confidential business terminology. ## Data Type Components Each data type consists of a Name and Description that identify what it detects, a Data Identifier that defines how GuardWare recognises this type of content (sensitive words, regular expressions, or filename patterns), Context Parameters that control how much surrounding text is captured and how many matches to show, a Classification that determines the sensitivity level, assigned Data Owners who receive alerts, and automatically managed subtypes that refine the detection to reduce false positives.

## Create a Data Type If the predefined data types in GuardWare's library don't cover the sensitive information unique to your organisation, you can create custom data types. 1. Navigate to **DATA GOVERNANCE** > **Data Type** and click **+Data Type**. 2. In **Data Type Name**, enter a clear, descriptive name for this data type. For example: "Employee ID Numbers", "Project Code", "Confidential Contract Terms".

3. In **Description**(optional), add details about what this data type represents and why it's sensitive. This helps other administrators understand the purpose of this data type. 4. The **Data Identifier** determines how GuardWare recognises this type of sensitive content. Choose one of three methods from the drop-down menu: {% hint style="success" icon="sparkles" %} Click the tabs below to view the relevant content, or use the links provided here to navigate to the desired section. * [**Sensitive Words**](#sensitive-words) * [**Regular Expressions (regex)**](#regular-expressions-regex) * [**Filename Expressions**](#filename-expressions) {% endhint %} {% tabs %} {% tab title="Sensitive Words" %} Use this method when sensitive content can be identified by the presence of specific words or phrases. This is useful for detecting proprietary terminology, confidential project names, or classification markings. GuardWare offers three conditions for how sensitive words must appear: 1. **All Phrases Condition:** Select this when ALL specified phrases must appear together in a document for it to be flagged as sensitive. Type each phrase in the input field and press **Enter** to add it to the list. Repeat for each phrase that must be present.

**Example:** A document is only considered sensitive if it contains ALL of these phrases: "Project Alpha", "Q4 2025", "Confidential Revenue". 2. **At Least (n) Phrases Condition:** Select this when a minimum number of phrases must be present for the content to be flagged. Add each phrase and press **Enter** to include it. After adding all phrases, specify the minimum number that must appear.

**Example:** Flag documents containing at least 3 of these 5 terms: "merger", "acquisition", "due diligence", "confidential", "NDA". 3. **None of the Phrases Condition:** Select this to specify phrases that must NOT appear. If any of these phrases are found, the content will NOT be considered sensitive (even if other conditions are met). Enter each exclusionary phrase and press **Enter**.

**Example:** Don't flag documents as sensitive if they contain "public announcement" or "press release", even if they contain other sensitive terms. 4. **Context Parameters for Sensitive Words:** 1. **Context Length:** Defines how many words before and after the detected sensitive word should be captured in the results. This helps you review the surrounding text to determine if the detection is genuinely sensitive or a false positive.

Select a number between 1 and 20 from the drop-down (typically, 3-6 words provide good context). Example: If Context Length is set to 3 and the sensitive phrase is "Employee ID 12345", the result might show: `... is assigned to Employee ID 12345 for the upcoming...` 2. **Number of Hits:** Specifies how many occurrences of the sensitive word must be present before the document is flagged in results. This reduces noise from documents that only mention sensitive terms once in passing.

Enter the number of times the sensitive content must appear (between 1 and 100). Example: If set to 5, clicking **View Result** will only show instances where the sensitive data appears 5 or more times in the document. {% endtab %} {% tab title="Regular Expressions (regex)" %} Regular expressions (regex) are pattern-matching rules ideal for detecting structured data. Use this method for sensitive data that follows specific patterns or formats such as employee IDs (e.g., EMP-2024-0001), product codes (e.g., PROD-ABC-12345), custom reference numbers, or any structured identifier unique to your organisation.\ \ To create a Regular expression, in the **Regular Expression** field, enter your pattern using standard regex syntax. Common patterns may include:\ Employee ID (EMP followed by year and number): `EMP-\d{4}-\d{4}` This matches: EMP-2024-0001, EMP-2025-0234, etc.\ Australian Business Number (ABN, 11 digits): `\d{2}\s\d{3}\s\d{3}\s\d{3}` This matches: 51 824 753 556.\ Product Code (PROD-3 letters-5 digits): `PROD-[A-Z]{3}-\d{5}` This matches: PROD-ABC-12345, PROD-XYZ-99999.

In the **Test Text** field, enter sample text that should match your pattern. Always test your regex with multiple examples to ensure it captures what you intend without generating false positives.\ Click **+Validate** to check if your regex correctly identifies the pattern. DISCOVER will notify you of matches, confirming your pattern works as intended.

1. **Additional Options:**\ \ **Space Before/After:** Enable these options if you want DISCOVER to only match the pattern when it has a space before and/or after it. This reduces false positives by ensuring the match is a complete word or code and not part of a larger string.

**Example:** If searching for "EMP-1234", enabling "space required" prevents matching within "TEMP-1234-SAMPLE". 1. **Checksum (Luhn):** Enable this for patterns like credit card numbers that use the Luhn algorithm for validation. When enabled, DISCOVER verifies that detected numbers pass the Luhn checksum test, reducing false positives. 2. **Turkish ID (T.C. Kimlik No.):** Enable this for patterns related to numbers that use the Turkish ID algorithm (T.C. Kimlik No.) for validation. When enabled, DISCOVER verifies that detected numbers pass the test, reducing false positives.

2. **Context Parameters for Regular Expressions:** 1. **Masking:** Determines how much of the detected sensitive data is concealed in reports and dashboards. This protects the actual sensitive content while still showing that it was found. Select **Hide rule from Rule Violation screen** and choose a level from the drop-down:

\ **None:** Complete data is visible (use with caution)\ **1/4 Mask:** 25% of the data is hidden. Example: "EMP-2024-0001" becomes "EMP-2024-00\*\*"\ **1/2 Mask:** 50% of the data is hidden. Example: "EMP-2024-0001" becomes "EMP-20\*\*-\*\*\*\*"\ **3/4 Mask:** 75% of the data is hidden. Example: "EMP-2024-0001" becomes "EMP-\*\*\*\*-\*\*\*\*" {% endtab %} {% tab title="Filename Expressions" %} Use this method to identify sensitive data based on file naming patterns rather than file contents, such as files starting with `Confidential_`, `HR_`, or `Financial_Report`, files in specific directories with standard names, or document types where the filename itself indicates sensitivity. This is useful when your organisation uses specific naming conventions for confidential documents.

In the **Add Expression** field, type or select a filename pattern using wildcards. Click the **(****) button** to add the expression to the list. You can add multiple filename expressions.\ \ **Common Filename Patterns:**


.	Matches all files.
*.docx	Matches all Microsoft Word documents.
Tender*.xlsx	Matches all Excel files starting with “Tender”.
\192.168.1.1\folder.	Scans all files within the specified shared folder path.
Confidential.pdf	Matches any PDF with "Confidential" in the filename: Report_Confidential_2024.pdf, Confidential_Contract.pdf, etc.

1. **Return File Content for Violated Rules:**\ Select **On** if you want DISCOVER to show sensitive information inside the selected wildcard file. Otherwise, select **Off.**

2. **Wildcard Reference:**\ `*` (asterisk): Matches any number of any characters\ `?` (question mark): Matches exactly one character.\ \ **Examples:** Report\_202?.docx matches Report\_2024.docx, Report\_2025.docx, but not Report\_2026.docx if the pattern is specifically looking for a single digit. `HR_????.xlsx` matches HR\_2024.xlsx, HR\_ABCD.xlsx (any 4 characters after HR\_). {% endtab %} {% endtabs %} #### Completing the Data Type Configuration After selecting and configuring your Data Identifier (Sensitive Words, Regular Expressions, or Filename Expressions), you must complete the data type setup: 5. From the **Classification** drop-down, select which sensitivity level this data type belongs to (e.g., Public, Internal, Confidential, Secret). This determines how GuardWare prioritises and handles files containing this data type. 6. From the **Data Owner(s)** drop-down, select one or more people who should be notified when this data type is discovered. You can assign multiple data owners to a single data type; all assigned owners will receive alerts. If you don't see the data owner you need, you must first create them in the [Add Data Owner](#add-data-owner) section. **Include Subtypes (Optional):**\ Subtypes let you refine the data type with additional criteria. GuardWare creates these subtypes automatically from the values and parameters you define in the parent data type. ![](/files/634126322e8e506b7f82620490c72d7a05369963) GuardWare offers two subtype conditions: * **Sub-Type If Present:** Select existing data subtypes from the drop-down and press Enter to add each one. When using this condition, reports will ONLY be generated if ALL selected subtypes are found in the file. This creates a very strict detection rule.\ \ **Example:** Only flag a document if it contains "Credit Card" and "Expiry Date", and "CVV" data subtypes. * **If One or More Subtype:** Select existing data subtypes and press Enter to add each one. Reports will be generated if ANY ONE of the selected subtypes is found in the file. This is a more permissive rule.\ \ **Example:** Flag a document if it contains either "Visa Card" or "MasterCard" or "American Express" data subtypes. You do not create data subtypes separately. GuardWare creates them automatically when you create the parent data type, and updates them automatically when you edit that data type later. 7. Click **Save** to create the data type. The data type is now active and will be used in future scans. ## Data Subtype Data Subtypes are more specific detection rules that refine parent data types. They help narrow scan results and minimise false positives by adding another layer to data type detection. GuardWare creates data subtypes automatically from the values and parameters configured in the parent data type. You cannot create or edit data subtypes directly. ### Update Data Subtype You do not make changes to data subtypes directly. Create or update the parent data type instead. For example, if you create a custom data type called **Credit Cards** and add **American Express** to it, GuardWare automatically creates the **American Express** data subtype. If you later edit the same data type and add **MasterCard** to the parent datatype, GuardWare automatically adds the matching subtype and keeps it in sync with the parent datatype. 1. Navigate to **DATA GOVERNANCE** > **Data Type**. 2. Create a new data type or edit an existing one. 3. Add or update the values, identifiers, or parameters in the parent data type. 4. Click **Save**. ## Assign Data Classification The Assign Classification feature allows you to link one or more existing data types to a classification. This is useful when you've created multiple data types and want to bulk-assign them to the same sensitivity level. 1. Navigate to **DATA GOVERNANCE** > **Data Type**. 2. In the data types list, check the boxes next to one or more data types that should belong to the same classification.

3. Click **Assign Data Classification**. 4. From the drop-down menu, select the classification you want to assign to all selected data types and click **Assign**. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.guardware.com/data-governance/data-governance/data-type.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.