Tutorial - Anonymize documents automatically

Automatic PDF Processor - automatically process PDF files

The complete solution for automated processing of PDF documents

Automatically anonymize PDF documents - GDPR compliant

Step-by-step instructions for automated anonymization of PDF documents

GDPR (General Data Protection Regulation) often requires anonymization of personal data before documents can be shared. With Automatic PDF Processor, you can fully automate this process.

Typical use cases

Anonymization of customer data for training purposes
Removal of names and addresses from contracts
Redaction of account numbers and IBAN
Replacement of employee numbers with pseudonyms
Anonymization of medical records

Step 1: Create a new profile

Create a new profile with a descriptive name such as "Anonymize documents". Set up the monitored folder where documents to be anonymized will be placed.

Step 2: Create extraction rules for data to be anonymized

To anonymize data, you first need to define which areas in the document contain the data to be replaced. Go to the "Data Extraction" tab and create rules for each area to be anonymized:

Name: Define an area containing the name
Address: Define an area for the address
IBAN: Define an area for the account number

The position can be determined via a keyword (e.g., "Name:") or absolute coordinates.

When extracting data using a keyword, it is often better to set the data position to "Area of the found location" and use "Extend data area" to shift and enlarge it, so that for example the text to the right is fully captured. This often results in more accurate positioning.

Step 3: Activate the "Replace Content" task

Go to the task view and select the "Replace Content" task. This task allows you to replace the text defined in the extraction rules with another value (redaction with text replacement).

Step 4: Configure replacement rules

For each extraction rule, you can specify what value the found text should be replaced with. The following replacement sources are available:

Source	Description	Example
Fixed text	Always the same replacement text	"[ANONYMIZED]" or "XXXXX"
Random number	Random number with configurable digits	"98234567"
Sequential	Sequential number	"PERSON-00001", "PERSON-00002"
Random from list	Random value from a text file	Random name from name list
CSV mapping	Value from CSV file based on key	Pseudonym based on original ID
Date/Time	Current date or time	"2024-01-01"

Step 5: Set application scope

For each replacement rule, you can set the application scope:

Single occurrence: Replace only the occurrence found by the rule
All pages at same position: Same position on all pages (e.g., headers/footers)
All occurrences in document: Replace every match in the entire document

Example: Replace names with pseudonyms

To consistently replace names with pseudonyms, you can use CSV mapping:

Create a CSV file with the mapping Original → Pseudonym
Set up a DynamicQueryList in the program options
Select "CSV mapping" as the replacement source
Select the appropriate list

This way, "John Smith" is always replaced with "Person A", while "Jane Doe" is always replaced with "Person B".

Step 6: Set destination

Specify where the anonymized documents should be saved. It is recommended to use a separate folder for the anonymized versions:

D:\Documents\Anonymized\<TodaysYear4>\<TodaysMonth>

Result

After configuration, all documents placed in the monitored folder are automatically:

Analyzed to find the defined data areas
Personal data replaced with the configured replacement values
Saved as an anonymized version in the destination folder

The replacement is irreversible - the original data cannot be reconstructed from the anonymized version.

Tips & notes

Keep originals: Make sure the original documents are archived separately in case they are needed later.
Test: Test the anonymization with some sample documents first to ensure all relevant areas are captured.
Consistency: Use CSV mappings when the same data in different documents should be consistently replaced with the same pseudonyms.
Images: Note that this function only replaces text content. Information contained in images is not modified.

Other step-by-step instructions

Try Automatic PDF Processor now for 30 days... To the download page

Automatically anonymize PDF documents - GDPR compliant

Step-by-step instructions for automated anonymization of PDF documents

Typical use cases

Step 1: Create a new profile

Step 2: Create extraction rules for data to be anonymized

Step 3: Activate the "Replace Content" task

Step 4: Configure replacement rules

Step 5: Set application scope

Example: Replace names with pseudonyms

Step 6: Set destination

Result

Tips & notes

Other step-by-step instructions

Getting Started

Basic Tasks

PDF Editing

E-Invoicing & Archiving

Practical Examples

Active products

Discontinued products