Webᵀ Crawl
Crawl with Webᵀ Lite
Crawl a website locally using Webᵀ Lite’s built-in crawler.
Ensure you have installed Web Transpose Lite:
pip install webtranspose[lite]
Crawl a website in just 3 lines.
Webᵀ Crawl handles crawling and downloading PDFs and documents as well.
Saving and Re-accessing Crawl
Creating a Crawl
will create a unique crawl.crawl_id
that can be used in the future to get the crawl again.
Crawl Parameters
url
string
requiredBase URL to crawl
max_pages
int
default: 100Maximum number of pages to crawl. (Can be updated later.)
render_js
bool
default: falseRender Javascript on the website.
allowed_urls
string
A list of allowed URL paths.
Exmaple: ["https://webtranspose/*"]
banned_urls
string
A list of banned URL paths.
Exmaple: ["https://webtranspose/blogs/*"]
verbose
bool
Extra logging to help debugging.
n_workers
int
default: 1Number of CPU processers to use if running locally.