Automatically anonymize PDF documents - GDPR compliant

Step-by-step instructions for automated anonymization of PDF documents

GDPR (General Data Protection Regulation) often requires anonymization of personal data before documents can be shared. With Automatic PDF Processor, you can fully automate this process.

Typical use cases

  • Anonymization of customer data for training purposes
  • Removal of names and addresses from contracts
  • Redaction of account numbers and IBAN
  • Replacement of employee numbers with pseudonyms
  • Anonymization of medical records

Step 1: Create a new profile

Create a new profile with a descriptive name such as "Anonymize documents". Set up the monitored folder where documents to be anonymized will be placed.

Create profile for anonymization

Step 2: Create extraction rules for data to be anonymized

To anonymize data, you first need to define which areas in the document contain the data to be replaced. Go to the "Data Extraction" tab and create rules for each area to be anonymized:

  • Name: Define an area containing the name
  • Address: Define an area for the address
  • IBAN: Define an area for the account number

The position can be determined via a keyword (e.g., "Name:") or absolute coordinates.

When extracting data using a keyword, it is often better to set the data position to "Area of the found location" and use "Extend data area" to shift and enlarge it, so that for example the text to the right is fully captured. This often results in more accurate positioning.

Extraction rules for anonymization

Step 3: Activate the "Replace Content" task

Go to the task view and select the "Replace Content" task. This task allows you to replace the text defined in the extraction rules with another value (redaction with text replacement).

Replace Content task

Step 4: Configure replacement rules

For each extraction rule, you can specify what value the found text should be replaced with. The following replacement sources are available:

Source Description Example
Fixed text Always the same replacement text "[ANONYMIZED]" or "XXXXX"
Random number Random number with configurable digits "98234567"
Sequential Sequential number "PERSON-00001", "PERSON-00002"
Random from list Random value from a text file Random name from name list
CSV mapping Value from CSV file based on key Pseudonym based on original ID
Date/Time Current date or time "2024-01-01"

Step 5: Set application scope

For each replacement rule, you can set the application scope:

  • Single occurrence: Replace only the occurrence found by the rule
  • All pages at same position: Same position on all pages (e.g., headers/footers)
  • All occurrences in document: Replace every match in the entire document

Example: Replace names with pseudonyms

To consistently replace names with pseudonyms, you can use CSV mapping:

  1. Create a CSV file with the mapping Original → Pseudonym
  2. Set up a DynamicQueryList in the program options
  3. Select "CSV mapping" as the replacement source
  4. Select the appropriate list

This way, "John Smith" is always replaced with "Person A", while "Jane Doe" is always replaced with "Person B".


Step 6: Set destination

Specify where the anonymized documents should be saved. It is recommended to use a separate folder for the anonymized versions:

D:\Documents\Anonymized\<TodaysYear4>\<TodaysMonth>

Result

After configuration, all documents placed in the monitored folder are automatically:

  • Analyzed to find the defined data areas
  • Personal data replaced with the configured replacement values
  • Saved as an anonymized version in the destination folder

The replacement is irreversible - the original data cannot be reconstructed from the anonymized version.


Tips & notes

  • Keep originals: Make sure the original documents are archived separately in case they are needed later.
  • Test: Test the anonymization with some sample documents first to ensure all relevant areas are captured.
  • Consistency: Use CSV mappings when the same data in different documents should be consistently replaced with the same pseudonyms.
  • Images: Note that this function only replaces text content. Information contained in images is not modified.

Other step-by-step instructions

Getting Started

Basic Tasks

PDF Editing

E-Invoicing & Archiving

Practical Examples


To the product page of Automatic PDF Processor
Try Automatic PDF Processor now for 30 days...     To the download page