DIY Viewer
A DIY tutorial viewer that extracts project guides from sites like Instructables
DIY Viewer Demo
The DIY Viewer demonstrates how to extract complex, multi-step tutorial content using Refyne's async crawler API via the official refyne-sdk. It pulls tutorials from DIY sites like Instructables, including materials lists, step-by-step instructions, and measurement conversions.
Try it live at diyviewer-demo.refyne.uk
This demo uses the async crawler pattern - it starts a job, then polls for results. This is ideal for longer extractions where you want to show progress to users. See the Recipe App for a simpler synchronous approach.
Features
- Tutorial Extraction: Paste any DIY tutorial URL and extract structured data
- Smart Filtering: Automatically skips non-actionable steps (intros, conclusions, promotional content)
- Glossary Generation: Creates definitions for technical terms beginners might not understand
- Measurement Conversions: Automatically converts between metric and imperial units
- Materials vs Tools: Separates consumables from reusable equipment
- Materials Checklist: Track what you have for your project
Schema Definition
The schema instructs Refyne to intelligently extract and organize tutorial content:
// DIY Tutorial extraction schema for Refyne
const TUTORIAL_SCHEMA = `
name: DIYTutorial
description: |
Extracts tutorial information from DIY sites like Instructables.
IMPORTANT INSTRUCTIONS:
1. Only include steps that contain actual actionable instructions.
2. Skip and exclude any steps that are:
- Introduction or overview steps (put this content in the overview field)
- Conclusion, summary, or "final thoughts" steps
- Steps asking users to subscribe, follow, or vote
- Steps promoting other content or products
- Steps with only images and no real instructions
- "Supplies" or "Materials" steps (extract these into materials/tools fields)
3. Renumber the remaining steps sequentially starting from 1.
4. Create a glossary of technical/specialized terms that may be unfamiliar to beginners.
5. Separate materials (consumables) from tools (reusable equipment).
6. For any measurements, provide both metric and imperial conversions.
fields:
- name: title
type: string
description: The title of the tutorial/project
required: true
- name: overview
type: string
description: A descriptive summary of what this tutorial covers, what will be built, and the overall scope. Include any introductory content here rather than as a step.
required: true
- name: image_url
type: string
description: |
URL of the main project/tutorial image showing the FINISHED PROJECT or the project being built.
This should be the hero/featured image that represents what the tutorial creates.
IMPORTANT - DO NOT use:
- Author photos or profile pictures (headshots of people)
- Avatar images or small circular profile images
- Advertisement or promotional images unrelated to the project
- Social media icons or logos
Look for the largest, most prominent image showing the actual project/build result.
NOTE: Images are in the YAML frontmatter at the top of the content (between --- markers).
The frontmatter has an 'images:' section mapping placeholders (IMG_001, IMG_002, etc.) to URLs.
Find the most appropriate image URL from the frontmatter 'images' section.
- name: author
type: string
description: Name of the tutorial author or content creator
- name: author_url
type: string
description: URL to the author's profile page or website (if available)
- name: difficulty
type: string
description: Difficulty level (e.g., "Beginner", "Intermediate", "Advanced")
- name: estimated_time
type: string
description: Estimated time to complete the project (e.g., "2-3 hours", "Weekend project")
- name: glossary
type: array
description: |
Technical terms, jargon, or specialized vocabulary used in this tutorial that beginners might not understand.
Examples: "joist" (horizontal structural beam), "OSB" (oriented strand board), "miter cut" (angled cut), etc.
Include any industry-specific terms, material names, tool names, or techniques that need explanation.
items:
type: object
properties:
term:
type: string
description: The technical term or jargon
required: true
definition:
type: string
description: Clear, beginner-friendly explanation of what this term means
required: true
context:
type: string
description: How this term is used in this specific project (optional)
- name: materials
type: array
required: true
description: |
Consumable materials needed for the project (things that get used up or become part of the finished project).
Examples: lumber, screws, paint, glue, sandpaper, etc.
items:
type: object
properties:
name:
type: string
description: Name of the material with specific type/grade if mentioned
required: true
quantity:
type: string
description: Amount needed (e.g., "4 boards", "1 gallon", "50 pieces")
notes:
type: string
description: Additional details like dimensions, alternatives, or specifications
measurement:
type: object
description: If the material has a size measurement, provide conversions
properties:
original:
type: string
description: The measurement as written in the source
required: true
metric:
type: string
description: Metric equivalent (mm, cm, m, ml, L, g, kg)
required: true
imperial:
type: string
description: Imperial equivalent (in, ft, oz, lb, gal)
required: true
- name: tools
type: array
required: true
description: |
Reusable tools and equipment needed for the project.
Examples: drill, saw, hammer, measuring tape, clamps, safety glasses, etc.
items:
type: object
properties:
name:
type: string
description: Name of the tool
required: true
notes:
type: string
description: Specific type, size, or features needed (e.g., "cordless", "with Phillips bit")
required:
type: boolean
description: Whether this tool is essential (true) or optional/nice-to-have (false)
required: true
- name: steps
type: array
required: true
description: Step-by-step instructions for completing the project. Only include steps with actual actionable instructions.
items:
type: object
properties:
step_number:
type: integer
description: The step number in sequence (renumber sequentially after filtering)
required: true
title:
type: string
description: Title or heading of this step
required: true
instructions:
type: string
description: Detailed instructions for completing this step
required: true
tips:
type: string
description: Any tips, warnings, or helpful hints for this step
image_urls:
type: array
description: |
Extract ALL image URLs for this step.
IMPORTANT: Images are NOT in standard markdown format. Instead:
1. In the body, images appear as placeholders like {{IMG_001}}, {{IMG_002}}, etc.
2. At the TOP of the content is a YAML frontmatter section (between --- markers)
3. The frontmatter has an 'images:' section that maps each placeholder to its URL
Example frontmatter:
---
images:
IMG_001:
url: "https://content.instructables.com/abc.jpg"
IMG_002:
url: "https://content.instructables.com/def.jpg"
---
To extract images for a step:
1. Find which {{IMG_XXX}} placeholders appear in that step's content
2. Look up each placeholder in the frontmatter 'images' section
3. Extract the 'url' value for each placeholder
4. Return all URLs as an array
Include every image URL found in this step - do not skip any images.
items:
type: string
measurements:
type: array
description: Any measurements mentioned in this step with conversions
items:
type: object
properties:
original:
type: string
description: The measurement as written
required: true
metric:
type: string
description: Metric equivalent
required: true
imperial:
type: string
description: Imperial equivalent
required: true
skill_references:
type: array
description: |
When this step mentions a technique or skill that beginners might not know,
provide a reference to help them learn. Examples:
- "threading a needle" for sewing projects
- "using a miter saw safely" for woodworking
- "soldering components" for electronics
Only include skills that are actually performed in this step.
items:
type: object
properties:
skill_name:
type: string
description: The technique or skill mentioned (e.g., "threading a needle")
required: true
difficulty:
type: string
description: Skill difficulty - "beginner", "intermediate", or "advanced"
required: true
description:
type: string
description: Brief explanation of what this skill involves
required: true
search_query:
type: string
description: A search query to find tutorials (e.g., "how to thread a needle for beginners tutorial")
required: true
safety_warnings:
type: array
description: |
Important safety information for this step. Include warnings about:
- Power tool usage and safety
- Chemical or fume hazards
- Sharp objects or cutting tools
- Electrical safety
- Heat or fire risks
- Required protective equipment
items:
type: object
properties:
warning:
type: string
description: The safety warning text
required: true
severity:
type: string
description: '"caution" (minor risk), "warning" (moderate risk), or "danger" (serious risk)'
required: true
ppe_required:
type: array
description: Required protective equipment (e.g., "safety glasses", "gloves", "hearing protection")
items:
type: string
- name: tags
type: array
description: Project categories or tags
items:
type: string
`;API Integration
Install the official TypeScript SDK:
npm install refyne-sdkStarting an Async Extraction Job
The crawl() method starts an async job and returns immediately with a job ID for polling:
/**
* Start an async extraction (crawl) job.
* Returns a job_id that can be polled for status.
*/
export async function startExtraction(
url: string,
apiUrl: string,
apiKey: string,
referer?: string
): Promise<CrawlJobStartResponse> {
try {
const client = new Refyne({
apiKey,
baseUrl: apiUrl,
timeout: EXTRACTION_TIMEOUT_MS,
referer,
});
const response = await client.crawl({
url,
schema: TUTORIAL_SCHEMA,
capture_debug: true,
fetch_mode: 'auto', // Auto-detect JS-heavy sites and use browser rendering when needed
});
const jobId = (response as any).job_id || (response as any).id;
if (!jobId) {
return {
success: false,
error: 'No job ID returned from API',
};
}
return {
success: true,
jobId: jobId,
};
} catch (error) {
console.error('[startExtraction] Error:', error);
return {
success: false,
error: error instanceof Error ? error.message : 'Failed to start extraction',
};
}
}Polling for Results
Poll the job status until extraction is complete:
/**
* Get the status of an async extraction job.
* When complete, returns the extracted tutorial data.
*/
export async function getJobStatus(
jobId: string,
apiUrl: string,
apiKey: string,
referer?: string
): Promise<CrawlJobStatusResponse> {
try {
const client = new Refyne({
apiKey,
baseUrl: apiUrl,
timeout: 30000, // Shorter timeout for status checks
referer,
});
const job = await client.jobs.get(jobId);
const status = (job as any).status;
// Job still in progress
if (status === 'pending' || status === 'crawling' || status === 'processing') {
const response: CrawlJobStatusResponse = {
success: true,
status,
progress: (job as any).progress || 0,
};
// Include queue position for pending jobs
if (status === 'pending' && (job as any).queue_position > 0) {
response.queuePosition = (job as any).queue_position;
}
return response;
}
// Job failed
if (status === 'failed') {
return {
success: false,
status,
error: (job as any).error || 'Extraction failed',
};
}
// Job completed - get results
if (status === 'completed') {
const results = await client.jobs.getResults(jobId);
const extracted = (results as any).data || (results as any).results?.[0]?.data || results;
return {
success: true,
status,
data: transformExtractedData(extracted),
};
}
// Unknown status
return {
success: true,
status,
};
} catch (error) {
console.error('[getJobStatus] Error:', error);
return {
success: false,
error: error instanceof Error ? error.message : 'Failed to get job status',
};
}
}Client-Side Polling Loop
The frontend polls the API and shows progress to users:
async function pollForResults(jobId: string) {
const maxAttempts = 300; // 5 minutes max
let attempts = 0;
const interval = setInterval(async () => {
attempts++;
if (attempts > maxAttempts) {
clearInterval(interval);
showError('Extraction timed out');
return;
}
const response = await fetch(`/api/poll/${jobId}`);
const result = await response.json();
if (result.status === 'completed') {
clearInterval(interval);
showResults(result.data);
} else if (result.status === 'failed') {
clearInterval(interval);
showError(result.error);
}
// Otherwise keep polling...
}, 1000);
}Response Example
When extraction completes, you get structured data with glossary terms and measurement conversions:
{
"title": "Build a Simple Bookshelf",
"overview": "Learn how to build a sturdy wooden bookshelf using basic tools...",
"image_url": "https://example.com/bookshelf.jpg",
"author": "DIY Workshop",
"author_url": "https://example.com/author",
"difficulty": "Beginner",
"estimated_time": "4-6 hours",
"glossary": [
{
"term": "miter cut",
"definition": "An angled cut across the width of a board, typically at 45 degrees",
"context": "Used for joining corners of the shelf frame"
}
],
"materials": [
{
"name": "Pine boards",
"quantity": "4",
"notes": "1x10 lumber",
"measurement": {
"original": "6 feet",
"metric": "183 cm",
"imperial": "6 ft"
}
}
],
"tools": [
{
"name": "Circular saw",
"notes": "Or hand saw",
"required": true
},
{
"name": "Clamps",
"notes": "For holding pieces while drilling",
"required": false
}
],
"steps": [
{
"step_number": 1,
"title": "Cut the Side Panels",
"instructions": "Measure and mark your pine boards at 36 inches...",
"tips": "Clamp a straight edge as a guide for cleaner cuts",
"image_urls": ["https://example.com/step1.jpg"],
"measurements": [
{
"original": "36 inches",
"metric": "91.4 cm",
"imperial": "36 in"
}
]
}
],
"tags": ["woodworking", "furniture", "beginner"]
}When to Use Async vs Sync
| Async Crawler (this demo) | Sync Extract (Recipe App) |
|---|---|
| Start job, poll for results | Single API call |
| Best for complex/long pages | Best for quick pages |
| Shows progress to users | Blocks until complete |
| Better timeout handling | Simpler error handling |
Use the async crawler pattern when extracting from complex pages with lots of content, or when you want to show extraction progress to users.
Tech Stack
- Astro - Static site generator with SSR support
- Cloudflare Pages - Hosting with edge functions
- Cloudflare D1 - SQLite database at the edge
- Tailwind CSS - Styling
Source Code
The complete source code is available in the refyne-demos repository.
Key files:
diyviewer/src/lib/refyne.ts- Refyne SDK integration and schemadiyviewer/src/lib/db.ts- Database operationsdiyviewer/src/pages/add.astro- Tutorial extraction pagediyviewer/src/pages/tutorial/[id].astro- Tutorial detail pagediyviewer/src/pages/checklist.astro- Materials checklist
Try It Yourself
- Visit diyviewer-demo.refyne.uk
- Click "Add Project"
- Paste a tutorial URL from Instructables or Make:
- Watch the extraction progress
- Save to your project list and track materials with the checklist