70.2 Text Extraction

70.2.1 Overview ¶

Text extraction is an extension of the placeholder logic that lets you specifically read out partial values from emails or attachments - e.g. an invoice number from the subject, a booking code from the body, or a contract partner from an attached TXT or CSV file.

In contrast to the fixed placeholders (see chapter 70.1), text extraction rules are configurable: per rule, you define which part of the email is searched, with which boundaries (from / to), and with which additional constraint (regex, number of characters).

70.2.2 Direct regex in subject or body ¶

The simplest variant is direct regex extraction - without a separate rule definition. In any input field you can write:

<BeginOfSubjectRegex>INV-\d{4}-\d{3}<EndOfRegex>$1

Mechanics: The program applies the regex to the subject. The first match (or its capture groups) is stored in the back-references $1, $2, … The <BeginOf…>…<EndOfRegex> block itself is removed from the result. You therefore need to place the back-reference ($1) separately at the position where the found value should appear.

Without brackets in the pattern: $1 contains the full match.
With brackets in the pattern: $1 contains the first capture group, $2 the second, and so on.

Analogously, there is <BeginOfBodyRegex>...<EndOfRegex> for the body.

Examples:

Subject	Placeholder in path	Result
`Invoice INV-2026-456 Mueller Ltd`	`<BeginOfSubjectRegex>INV-\d{4}-\d{3}<EndOfRegex>$1`	`INV-2026-456`
`Invoice INV-2026-456 Mueller Ltd`	`<BeginOfSubjectRegex>INV-(\d{4})-(\d{3})<EndOfRegex>$1-$2`	`2026-456`
`Order Number 78901`	`<BeginOfSubjectRegex>Number (\d+)<EndOfRegex>$1`	`78901`

Multiple matches: If the pattern occurs more than once in the same mail, only the first match is used - further matches are ignored. If the desired position is not the first match, tighten the pattern (e.g. with a more specific prefix or word boundaries \b).

Multiple regex blocks in the same path: When several <BeginOf…>…<EndOfRegex> blocks are used in the same input field, the ambiguity of $1 can be resolved through the numbered back-references $R1G1, $R2G1, … - $R1G1 for group 1 of the first block, $R2G1 for group 1 of the second block.

70.2.3 Text Extraction Rules ¶

For more complex extractions (e.g. multi-step range narrowing, application to attachments, encoding control), use text extraction rules, which are defined in the profile editor under Text Extraction.

For each rule you configure:

Field	Description
Name	Unique identifier (for the placeholder reference)
Source	Message body or Attachment (with file filter)
Encoding	ANSI, UTF-8, Unicode, or explicit code page (for attachments with a special format)
Range from	Search string or regex from which extraction begins
Range to	Search string or regex at which extraction ends
Constraint	First X characters, Last X characters, or Regex on the extracted range
Value conversion	Optional lookup table that further maps the extracted value (e.g. code -> plain text)

70.2.4 Using the Rule as a Placeholder ¶

You reference a configured rule as a placeholder:

Placeholder	Effect
`<MRuleId:5(InvoiceNumber)>`	Applies the rule with ID 5 (display name “InvoiceNumber”) to the message body
`<FRuleId:7(BookingCode)>`	Applies the rule with ID 7 (display name “BookingCode”) to the matching attachment

MRuleId stands for Message-Rule (message body), FRuleId for File-Rule (file attachment). The ID is the unique key of the rule; the bracketed suffix is only a readable display name and is ignored during processing.

Selection is made in the placeholder menu - all defined rules appear under “Text Extraction”.

70.2.5 Range Narrowing ¶

The two-step range narrowing (from + to) is the central logic:

Range from: Search string identifies the starting position. Everything before it is ignored.
Range to: Search string identifies the end position. Everything after it is ignored.
The text in between is the raw match.
The constraint is applied to the raw match (e.g. first 20 characters).
Optional: value conversion through a lookup table.

Example email body:

Dear Sir or Madam,
We hereby send you Invoice Number INV-2026-456
with a total amount of 1,234.56 EUR.
Best regards

Rule: - Range from: Number - Range to: with - Constraint: none

Result: INV-2026-456

70.2.6 Encoding and Attachment Sources ¶

For file-based extraction (source: attachment), the program reads the attachment with the configured encoding:

Encoding	When to use
ANSI	Classic Windows text files
UTF-8	Modern text files, JSON, XML
Unicode	UTF-16 Little-Endian (typical Windows email bodies)
Code page	Explicit code page (e.g. 1252, 850) for legacy formats

Text extraction only works for pure text attachments (e.g. TXT, CSV, XML, JSON, HTML). Binary formats are not supported.

70.2.7 Use case ¶

Invoice number from subject

Email subject: “Invoice INV-2026-456 from May 7.” Path pattern: D:\Incoming-Invoices\<EmailYear4>\<BeginOfSubjectRegex>INV-\d{4}-\d{3}<EndOfRegex>$1.pdf. The regex finds INV-2026-456, stores this value in $1 and removes the <BeginOf…>…<EndOfRegex> block from the path. Final result: D:\Incoming-Invoices\2026\INV-2026-456.pdf.

70.2.8 Tips ¶

The value conversion through a lookup table is powerful - you can directly convert an extracted code into a readable plain text (see chapter 70.3)
Test new rules on sample emails in the profile editor - the preview shows the result directly

70.2.9 Related how-tos ¶

How to extract text with regex - step-by-step instructions for the direct-regex variant and text extraction rules, including a beginner-friendly introduction to regex building blocks

Placeholders Using Lookup Tables

70.2 Text Extraction

70.2.1 Overview ¶

70.2.2 Direct regex in subject or body ¶

70.2.3 Text Extraction Rules ¶

70.2.4 Using the Rule as a Placeholder ¶

70.2.5 Range Narrowing ¶

70.2.6 Encoding and Attachment Sources ¶

70.2.7 Use case ¶

Invoice number from subject

70.2.8 Tips ¶

70.2.9 Related how-tos ¶

Active products

Discontinued products

70.2 Text Extraction

70.2.1 Overview ¶

70.2.2 Direct regex in subject or body ¶

70.2.3 Text Extraction Rules ¶

70.2.4 Using the Rule as a Placeholder ¶

70.2.5 Range Narrowing ¶

70.2.6 Encoding and Attachment Sources ¶

70.2.7 Use case ¶

Invoice number from subject

70.2.8 Tips ¶

70.2.9 Related how-tos ¶

Related topics ¶