Core Concepts
Understand the building blocks of Parser Run: projects, schemas, jobs, and data.
Projects
A project is a scraping configuration targeting a specific data goal. Each project has its own database, schema, and schedule.
Example: Price Tracker Project
{
id: "nike-price-tracker",
name: "Nike Price Tracker",
description: "Track Nike shoe prices across retailers",
urls: ["https://nike.com/shoes", "https://zappos.com/nike"],
schedule: "daily",
status: "active"
}Projects can be created via conversation or the API.
Schemas
A schema defines what data to extract from each page. Parser Run auto-detects schemas using LLM analysis.
| Type | Description | Example |
|---|---|---|
| text | String values | Product name, brand |
| number | Decimal numbers | Price, rating |
| integer | Whole numbers | Review count, stock qty |
| boolean | True/false | In stock, on sale |
| date | ISO date string | Release date |
| array | List of values | Tags, sizes, colors |
Jobs
A job is a single execution of a project's scraping pipeline. Jobs progress through stages:
- 1. discovery— Find all pages matching the target patterns
- 2. acquisition— Fetch HTML from each page
- 3. parsing— Extract data according to schema
- 4. complete— Data stored and ready to query
Records
Records are the extracted data rows stored in your project database.
Example: Product Record
{
id: "rec_abc123",
projectId: "nike-price-tracker",
jobId: "job_xyz789",
url: "https://nike.com/shoes/air-max-90",
data: {
product_name: "Nike Air Max 90",
price: 129.99,
brand: "Nike",
rating: 4.7,
review_count: 2341,
in_stock: true
},
extractedAt: "2025-01-15T10:30:00Z"
}HTML Archives
Every page fetch is archived for debugging and replay. This enables:
- • Debugging — See exactly what the scraper saw
- • Re-extraction — Update extraction logic without re-fetching
- • Audit trail — Prove data provenance
- • A/B testing — Compare extraction algorithms
Alerts & Webhooks
Configure notifications when your data changes:
Alert Configuration
{
type: "price_drop",
condition: {
field: "price",
operator: "decreased_by",
value: 10, // percent
},
notify: ["email", "webhook"],
webhookUrl: "https://your-app.com/hooks/price-alert"
}Next Steps
- • Query data with the tRPC API
- • Deploy your own instance with self-hosting