> ## Documentation Index
> Fetch the complete documentation index at: https://docs.respell.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Document to Text

The Document to Text step lets you extract the text content from a document file. We use deterministic AI models to parse content from a variety of file types, including PDFs, Word documents, and more.

![Document to Text step](https://mintlify.s3-us-west-1.amazonaws.com/respell-docs-v2/_images/steps-reference/document-to-text.png)

## Options

| Name | Type          | Description                                 |
| ---- | ------------- | ------------------------------------------- |
| File | Document File | The document you want to extract text from. |

## Outputs

| Name          | Type       | Description                           |
| ------------- | ---------- | ------------------------------------- |
| File Contents | Plain Text | The text extracted from the document. |

## Tips

* If the document has structured elements, such as tables or multiple side-by-side elements, the extracted text may be parsed incorrectly. We try our best to remediate this issues, but we recommend cleaning the output up with a Generate Text step if this happens consistently.

## Support File Types

The only mimetypes supported for OCR are:

* application/pdf
* image/gif
* image/tiff
* image/jpeg
* image/png
* image/bmp
* image/webp

Excel spreadsheets and Word documents are parsed using third party libraries. Anything else that isn't in the above supported mimetype list is converted into plaintext.
