File Tools
Document to Text
The Document to Text step lets you extract the text content from a document file. We use deterministic AI models to parse content from a variety of file types, including PDFs, Word documents, and more.
Options
Name | Type | Description |
---|---|---|
File | Document File | The document you want to extract text from. |
Outputs
Name | Type | Description |
---|---|---|
File Contents | Plain Text | The text extracted from the document. |
Tips
- If the document has structured elements, such as tables or multiple side-by-side elements, the extracted text may be parsed incorrectly. We try our best to remediate this issues, but we recommend cleaning the output up with a Generate Text step if this happens consistently.
Support File Types
The only mimetypes supported for OCR are:
- application/pdf
- image/gif
- image/tiff
- image/jpeg
- image/png
- image/bmp
- image/webp
Excel spreadsheets and Word documents are parsed using third party libraries. Anything else that isn’t in the above supported mimetype list is converted into plaintext.