Crawl with Webᵀ Lite

Saving and Re-accessing Crawl
Crawl Parameters

Ensure you have installed Web Transpose Lite:pip install webtranspose[lite]

Crawl a website in just 3 lines.

import webtranspose as webt

crawl = webt.Crawl(
  "https://www.example.com",
  max_pages=100,
  render_js=True,
)
await crawl.crawl()

Webᵀ Crawl handles crawling and downloading PDFs and documents as well.

Saving and Re-accessing Crawl

Creating a Crawl will create a unique crawl.crawl_id that can be used in the future to get the crawl again.

crawl_id = crawl.crawl_id

crawl = webt.get_crawl(crawl_id)

Crawl Parameters

url

string

required

Base URL to crawl

max_pages

int

default:100

Maximum number of pages to crawl. (Can be updated later.)

render_js

bool

default:false

Render Javascript on the website.

allowed_urls

string

A list of allowed URL paths.Exmaple: ["https://webtranspose/*"]

banned_urls

string

A list of banned URL paths.Exmaple: ["https://webtranspose/blogs/*"]

verbose

bool

Extra logging to help debugging.

n_workers

int

default:1

Number of CPU processers to use if running locally.

Update a Web Crawl Create an AI Web Scraper

⌘I

Get Started

Webᵀ Crawl

Webᵀ Scrape

Query Website as Vector DB

AI Web Search

Saving and Re-accessing Crawl

Crawl Parameters

Get Started

Webᵀ Crawl

Webᵀ Scrape

Query Website as Vector DB

AI Web Search

​Saving and Re-accessing Crawl

​Crawl Parameters

Saving and Re-accessing Crawl

Crawl Parameters