Ensure you have installed Web Transpose Lite:

pip install webtranspose[lite]

Crawl a website in just 3 lines.

import webtranspose as webt

crawl = webt.Crawl(
  "https://www.example.com",
  max_pages=100,
  render_js=True,
)
await crawl.crawl()

Webᵀ Crawl handles crawling and downloading PDFs and documents as well.

Saving and Re-accessing Crawl

Creating a Crawl will create a unique crawl.crawl_id that can be used in the future to get the crawl again.

crawl_id = crawl.crawl_id

crawl = webt.get_crawl(crawl_id)

Crawl Parameters

url
string
required

Base URL to crawl

max_pages
int
default: 100

Maximum number of pages to crawl. (Can be updated later.)

render_js
bool
default: false

Render Javascript on the website.

allowed_urls
string

A list of allowed URL paths.

Exmaple: ["https://webtranspose/*"]

banned_urls
string

A list of banned URL paths.

Exmaple: ["https://webtranspose/blogs/*"]

verbose
bool

Extra logging to help debugging.

n_workers
int
default: 1

Number of CPU processers to use if running locally.