Ensure you have installed Web Transpose Lite:pip install webtranspose[lite]
Crawl a website in just 3 lines.
import webtranspose as webt

crawl = webt.Crawl(
  "https://www.example.com",
  max_pages=100,
  render_js=True,
)
await crawl.crawl()
Webᵀ Crawl handles crawling and downloading PDFs and documents as well.

Saving and Re-accessing Crawl

Creating a Crawl will create a unique crawl.crawl_id that can be used in the future to get the crawl again.
crawl_id = crawl.crawl_id

crawl = webt.get_crawl(crawl_id)

Crawl Parameters

url
string
required
Base URL to crawl
max_pages
int
default:100
Maximum number of pages to crawl. (Can be updated later.)
render_js
bool
default:false
Render Javascript on the website.
allowed_urls
string
A list of allowed URL paths.Exmaple: ["https://webtranspose/*"]
banned_urls
string
A list of banned URL paths.Exmaple: ["https://webtranspose/blogs/*"]
verbose
bool
Extra logging to help debugging.
n_workers
int
default:1
Number of CPU processers to use if running locally.