Read website is a simple step that extracts the text content from a provided URL. We are able to read most public website pages (pages which don’t require authentication), but websites with sophisticated bot protection (ie. CAPTCHAs) or those that require authentication may not be able to be read.

The text content returned from this step may not be formatted for readability, and it will include all text (even hidden text), making it quite verbose in many cases. In most cases, you’ll want to process this data using a Generate Text, Extract Fields, or similar step directly afterwards.

Options

NameTypeDescription
URLURLThe URL of the website page you want to extract text content from.

Outputs

NameTypeDescription
Website ContentsPlain TextThe full text content of the website page.

Tips

  • If you receive an error about not being able to access the website page, it may be blocked. Unfortunately there’s not much we can do in most cases, though feel free to reach out to the Respell team to see if we have any workarounds in mind.