Extract data from structured and unstructured emails and documents.
Automatically find and extract data that is locked in emails, attachments, images, documents and archives.
Documents contain data.
Sphereon offers functions that get exceptional results in extracting data from digital documents by using and combine different technologies and multiple engines for:
- Optical Character Recognition (OCR)
- Handwritten Character Recognition (ICR)
- Barcode and QR-code Recognition
- Optical Mark Recognition (OMR)
Data Extraction
Sphereon offers functions to capture data from both structured documents, such as Forms and Questionnaires, but also from more difficult unstructured documents, such as e-mail, correspondence, requests, complaints, invoices, orders, etc, where the data is not on a fixed spot, but can be anywhere in the document.
Sphereon offers several functions for data extraction:
Format extraction
By using Regular Expressions values can be found in a document.
For example a valid Visa card: ^4[0-9]{12}(?:[0-9]{3})?$
All Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13.
Key-Value extraction
Regular Expressions can also be used to specify a key value, f.i. ‘Invoice Number’, and the value that needs to be extracted. Sphereon is also able to evaluate the relative position between the Key and the Value, like ‘Right of’, ‘Below’, etc., and score the results based on that.
Knowledge Base assisted extraction
Using our Knowledge Base system for extraction increases the results of the extraction over time. It also can evaluate other dimensions in deciding the best possible result, such as formats, historical results and relationships with other data.
Data Validation
To confirm the data or to increase the confidence of the recognized data, several functions are available for validating the captured data:
Format checks
Regular Expressions enable checks — from simple to complex — on the format of the data. For example simply checking for numbers to complex checks for example the validity of an IBAN code. Or splitting a value into multiple values or substitution of data.
Database lookups
One of the most powerful checks is a validation of a value against the known data in a trusted database. This also enables the retrieval of data from a database and adding those as additional data to a document.
Validation rules, Business Rules
Logical validation checks can be performed by using validation rules, also known as Business Rules Management (BRMS). Checks like Net Amount + VAT amount = Gross Amount. Or is a Last Name found the same as the Last Name retrieved from a database using the Policy Number.
User validation
When no automatic checks are possible, or give not enough confidence or conflicting results or even fail, documents and data can be manually checked by users. Or even be checked “blind” by multiple users.
Knowledge bases
Data can be processed into Knowledge Bases during validation. Combining the results of the different checks with user input “teach” the system and make the system “smarter”.