Some websites are partially (or entirely) rendered on the client (aka your web browser). If you try to search the initial HTML for elements that haven’t finished rendering, you won’t find them.

One solution is to use a headless browser that runs a web browser in the background that fetches the page, renders it, and then allows you to search the final document.

Headless browsers aren’t a good fit for Val Town due to the amount of resources they require to run. However, services like Browserless provide APIs to interact with a hosted headless browser. For example, their /scrape API.

1. Sign up to Browserless and grab your API Key

Copy your API Key from https://cloud.browserless.io/account/ and save it as a Val Town secret as browserlessKey.

Screenshot 2023-06-24 at 22.43.01.png

2. Make an API call to the /scrape API

Check the documentation for the /scrape API and form your request.

For example, here’s how you scrape the introduction paragraph of OpenAI’s wikipedia page.

https://www.val.town/embed/vtdocs.browserlessScrapeExample

Browserless also has more APIs for taking screenshots and PDFs of websites.

3. Alternatively, use Puppeteer and a browser running on Browserless

You can use the Puppeteer library to connect to a browser instance running on Browserless.

Once you’ve navigated to a page, you can run arbitrary JavaScript with page.evaluate – like getting the text from a paragraph.

https://www.val.town/embed/vtdocs.browserlessPuppeteerExample