Ensure you have installed Web Transpose Lite:
pip install webtranspose[lite]
Webᵀ Crawl handles crawling and downloading PDFs and documents as well.
Saving and Re-accessing Crawl
Creating aCrawl
will create a unique crawl.crawl_id
that can be used in the future to get the crawl again.
Crawl Parameters
Base URL to crawl
Maximum number of pages to crawl. (Can be updated later.)
Render Javascript on the website.
A list of allowed URL paths.Exmaple:
["https://webtranspose/*"]
A list of banned URL paths.Exmaple:
["https://webtranspose/blogs/*"]
Extra logging to help debugging.
Number of CPU processers to use if running locally.