20 Save Attachments

Task: Save Attachments

20.1 Description

The Save Attachments task extracts embedded files from a PDF document and saves them as separate files. PDF attachments can be any file type, such as Excel spreadsheets, Word documents, images, or additional PDFs.

Typical Use Cases

  • E-Invoice: Extract XML data from ZUGFeRD/Factur-X invoices
  • Document Archiving: Archive attached source files separately
  • Data Processing: Extract embedded tables for further processing
  • Backup: Save all attachments of a PDF file

20.2 General Settings

Enabled

Enable this option so the task is executed for matching PDF files. Disabled tasks are skipped.


20.3 Attachment Filter

Attachment Name Contains

Enter text that must be contained in the attachment name. Only attachments whose name contains this text are extracted.

Examples: - factur-x - Only ZUGFeRD XML files - .xlsx - Only Excel files - (empty) - All attachments

Attachment Name Does Not Contain

Enter text that must not be contained in the attachment name. Attachments with this text in the name are excluded.

Example: - thumbnail - Exclude preview images - .tmp - Exclude temporary files

Combined Filtering

You can combine both filters: - Contains: .xml - Does Not Contain: metadata

Result: All XML files except metadata files are extracted.


20.4 Storage Location

Directory

Specify the target directory for extracted attachments. You can: - Enter a fixed path (e.g., D:\Attachments) - Select the folder via Browse… - Use placeholders for dynamic folder paths

Examples with Placeholders:

Input Result
D:\Attachments\<TodaysYear4>\<TodaysMonth> D:\Attachments\2024\12
D:\Customers\<RuleId:1(Customer)>\Attachments D:\Customers\Sample Company Inc\Attachments

Note: It’s recommended to use a separate folder for each processing step to ensure clear separation.

Filename

The attachment filename is preserved by default. However, you can set a custom name:

  • Leave field empty (original attachment name is used)
  • Enter a fixed name
  • Use placeholders for dynamic names

Note: When multiple attachments exist and you use a fixed name, files are handled according to the selected name collision option.

Name Collisions

Choose what should happen if a file with the target name already exists:

Option Description
Overwrite Existing file is replaced
Append number Adds a number: Attachment.pdf, Attachment(1).pdf
Append date Adds processing date
Append date and time Adds date and time
Cancel operation Attachment is not saved

20.5 File Date

Adjust Creation and Modification Date

Optionally, you can change the file date of extracted attachments:

Option Description
Do not change File automatically receives current date
Creation date of original file Uses PDF’s creation date
Modification date of original file Uses PDF’s modification date
PDF creation date Date from PDF metadata
Extracted date A date obtained with an extraction rule
Current date Sets today’s date

20.6 Afterwards

Call External Program

After saving each attachment, an external program can be started automatically.

Program: Path to executable file

Parameters: Command line parameters. Available placeholders: - <PathIncludingFilename> - Full path of attachment - <ParentDirectory> - Path of parent folder - <Filename> - Filename of attachment

Example: Automatically open extracted Excel file: - Program: cmd.exe - Parameters: /c start "" "<PathIncludingFilename>"


20.7 Example: Extract ZUGFeRD XML

Initial Situation

You receive electronic invoices in ZUGFeRD format. These contain an embedded XML file with structured invoice data that you want to extract for your accounting software.

Configuration

  1. Enabled: Yes
  2. Attachment name contains: factur-x or zugferd
  3. Attachment name does not contain: (empty)
  4. Directory: D:\ZUGFeRD\XML
  5. Filename: <RuleId:1(InvoiceNo)>.xml
  6. On name collision: Append number

Result

PDF File Extracted Attachment
Invoice_2024001.pdf (contains factur-x.xml) D:\ZUGFeRD\XML\2024001.xml

20.8 Example: Extract All Attachments from Document Collection

Initial Situation

You receive PDF documents with various embedded files (images, tables, additional PDFs) that should all be extracted.

Configuration

  1. Enabled: Yes
  2. Attachment name contains: (empty - all attachments)
  3. Attachment name does not contain: (empty)
  4. Directory: D:\Extracted\<FileName>
  5. Filename: (empty - keep original names)
  6. On name collision: Append number

Result

For each PDF, a subfolder with the PDF name is created containing all extracted attachments:

D:\Extracted\
├── Report2024\
│   ├── Table.xlsx
│   ├── Chart.png
│   └── SourceData.csv
└── Presentation\
    ├── Logo.png
    └── Notes.docx

20.5 Tips and Notes

No Attachments Present

If a PDF contains no attachments, the task is skipped without error. Simply no files are extracted.

Check Attachments

To check if a PDF contains attachments: 1. Open PDF in a PDF viewer 2. Look for a paperclip symbol or attachments section 3. Or use the “Attachment count” filter in profile settings

Filtering with Regular Expressions

The “Attachment name contains” and “Attachment name does not contain” fields support regular expressions: - <BeginOfRegex>.*\.xml$<EndOfRegex> - All files with .xml extension

Combine with Other Tasks

Typical combinations: 1. Save Attachments + Copy File: Archive invoice and extract XML 2. Save Attachments + Send Email: Send XML to accounting 3. Save Attachments + Rename File: Rename PDF based on extracted data

ZUGFeRD/Factur-X Standard

For ZUGFeRD/Factur-X invoices, the embedded XML file is typically named: - factur-x.xml (Factur-X) - zugferd-invoice.xml (ZUGFeRD 1.0) - xrechnung.xml (XRechnung)

File Types

PDF attachments can have any file type. The task extracts files unchanged. The file extension is preserved.