Document to Text

On this page

Options
Outputs
Tips
Support File Types

The Document to Text step lets you extract the text content from a document file. We use deterministic AI models to parse content from a variety of file types, including PDFs, Word documents, and more.

Options

Name	Type	Description
File	Document File	The document you want to extract text from.

Outputs

Name	Type	Description
File Contents	Plain Text	The text extracted from the document.

Tips

If the document has structured elements, such as tables or multiple side-by-side elements, the extracted text may be parsed incorrectly. We try our best to remediate this issues, but we recommend cleaning the output up with a Generate Text step if this happens consistently.

Support File Types

The only mimetypes supported for OCR are:

application/pdf
image/gif
image/tiff
image/jpeg
image/png
image/bmp
image/webp

Excel spreadsheets and Word documents are parsed using third party libraries. Anything else that isn’t in the above supported mimetype list is converted into plaintext.

Summarize Text Text to Document

Flow Tools

Text Tools

File Tools

Web & Code Tools

Integrations

Document to Text

Options

Outputs

Tips

Support File Types

Flow Tools

Text Tools

File Tools

Web & Code Tools

Integrations

​Options

​Outputs

​Tips

​Support File Types

Options

Outputs

Tips

Support File Types