DIY Viewer

A DIY tutorial viewer that extracts project guides from sites like Instructables

DIY Viewer Demo

The DIY Viewer demonstrates how to extract complex, multi-step tutorial content using Refyne's async crawler API via the official refyne-sdk. It pulls tutorials from DIY sites like Instructables, including materials lists, step-by-step instructions, and measurement conversions.

This demo uses the async crawler pattern - it starts a job, then polls for results. This is ideal for longer extractions where you want to show progress to users. See the Recipe App for a simpler synchronous approach.

Features

  • Tutorial Extraction: Paste any DIY tutorial URL and extract structured data
  • Smart Filtering: Automatically skips non-actionable steps (intros, conclusions, promotional content)
  • Glossary Generation: Creates definitions for technical terms beginners might not understand
  • Measurement Conversions: Automatically converts between metric and imperial units
  • Materials vs Tools: Separates consumables from reusable equipment
  • Materials Checklist: Track what you have for your project

Schema Definition

The schema instructs Refyne to intelligently extract and organize tutorial content:

// DIY Tutorial extraction schema for Refyne
const TUTORIAL_SCHEMA = `
name: DIYTutorial
description: |
  Extracts tutorial information from DIY sites like Instructables.

  IMPORTANT INSTRUCTIONS:
  1. Only include steps that contain actual actionable instructions.
  2. Skip and exclude any steps that are:
     - Introduction or overview steps (put this content in the overview field)
     - Conclusion, summary, or "final thoughts" steps
     - Steps asking users to subscribe, follow, or vote
     - Steps promoting other content or products
     - Steps with only images and no real instructions
     - "Supplies" or "Materials" steps (extract these into materials/tools fields)
  3. Renumber the remaining steps sequentially starting from 1.
  4. Create a glossary of technical/specialized terms that may be unfamiliar to beginners.
  5. Separate materials (consumables) from tools (reusable equipment).
  6. For any measurements, provide both metric and imperial conversions.

fields:
  - name: title
    type: string
    description: The title of the tutorial/project
    required: true

  - name: overview
    type: string
    description: A descriptive summary of what this tutorial covers, what will be built, and the overall scope. Include any introductory content here rather than as a step.
    required: true

  - name: image_url
    type: string
    description: |
      URL of the main project/tutorial image showing the FINISHED PROJECT or the project being built.
      This should be the hero/featured image that represents what the tutorial creates.

      IMPORTANT - DO NOT use:
      - Author photos or profile pictures (headshots of people)
      - Avatar images or small circular profile images
      - Advertisement or promotional images unrelated to the project
      - Social media icons or logos

      Look for the largest, most prominent image showing the actual project/build result.

      NOTE: Images are in the YAML frontmatter at the top of the content (between --- markers).
      The frontmatter has an 'images:' section mapping placeholders (IMG_001, IMG_002, etc.) to URLs.
      Find the most appropriate image URL from the frontmatter 'images' section.

  - name: author
    type: string
    description: Name of the tutorial author or content creator

  - name: author_url
    type: string
    description: URL to the author's profile page or website (if available)

  - name: difficulty
    type: string
    description: Difficulty level (e.g., "Beginner", "Intermediate", "Advanced")

  - name: estimated_time
    type: string
    description: Estimated time to complete the project (e.g., "2-3 hours", "Weekend project")

  - name: glossary
    type: array
    description: |
      Technical terms, jargon, or specialized vocabulary used in this tutorial that beginners might not understand.
      Examples: "joist" (horizontal structural beam), "OSB" (oriented strand board), "miter cut" (angled cut), etc.
      Include any industry-specific terms, material names, tool names, or techniques that need explanation.
    items:
      type: object
      properties:
        term:
          type: string
          description: The technical term or jargon
          required: true
        definition:
          type: string
          description: Clear, beginner-friendly explanation of what this term means
          required: true
        context:
          type: string
          description: How this term is used in this specific project (optional)

  - name: materials
    type: array
    required: true
    description: |
      Consumable materials needed for the project (things that get used up or become part of the finished project).
      Examples: lumber, screws, paint, glue, sandpaper, etc.
    items:
      type: object
      properties:
        name:
          type: string
          description: Name of the material with specific type/grade if mentioned
          required: true
        quantity:
          type: string
          description: Amount needed (e.g., "4 boards", "1 gallon", "50 pieces")
        notes:
          type: string
          description: Additional details like dimensions, alternatives, or specifications
        measurement:
          type: object
          description: If the material has a size measurement, provide conversions
          properties:
            original:
              type: string
              description: The measurement as written in the source
              required: true
            metric:
              type: string
              description: Metric equivalent (mm, cm, m, ml, L, g, kg)
              required: true
            imperial:
              type: string
              description: Imperial equivalent (in, ft, oz, lb, gal)
              required: true

  - name: tools
    type: array
    required: true
    description: |
      Reusable tools and equipment needed for the project.
      Examples: drill, saw, hammer, measuring tape, clamps, safety glasses, etc.
    items:
      type: object
      properties:
        name:
          type: string
          description: Name of the tool
          required: true
        notes:
          type: string
          description: Specific type, size, or features needed (e.g., "cordless", "with Phillips bit")
        required:
          type: boolean
          description: Whether this tool is essential (true) or optional/nice-to-have (false)
          required: true

  - name: steps
    type: array
    required: true
    description: Step-by-step instructions for completing the project. Only include steps with actual actionable instructions.
    items:
      type: object
      properties:
        step_number:
          type: integer
          description: The step number in sequence (renumber sequentially after filtering)
          required: true
        title:
          type: string
          description: Title or heading of this step
          required: true
        instructions:
          type: string
          description: Detailed instructions for completing this step
          required: true
        tips:
          type: string
          description: Any tips, warnings, or helpful hints for this step
        image_urls:
          type: array
          description: |
            Extract ALL image URLs for this step.

            IMPORTANT: Images are NOT in standard markdown format. Instead:
            1. In the body, images appear as placeholders like {{IMG_001}}, {{IMG_002}}, etc.
            2. At the TOP of the content is a YAML frontmatter section (between --- markers)
            3. The frontmatter has an 'images:' section that maps each placeholder to its URL

            Example frontmatter:
            ---
            images:
              IMG_001:
                url: "https://content.instructables.com/abc.jpg"
              IMG_002:
                url: "https://content.instructables.com/def.jpg"
            ---

            To extract images for a step:
            1. Find which {{IMG_XXX}} placeholders appear in that step's content
            2. Look up each placeholder in the frontmatter 'images' section
            3. Extract the 'url' value for each placeholder
            4. Return all URLs as an array

            Include every image URL found in this step - do not skip any images.
          items:
            type: string
        measurements:
          type: array
          description: Any measurements mentioned in this step with conversions
          items:
            type: object
            properties:
              original:
                type: string
                description: The measurement as written
                required: true
              metric:
                type: string
                description: Metric equivalent
                required: true
              imperial:
                type: string
                description: Imperial equivalent
                required: true
        skill_references:
          type: array
          description: |
            When this step mentions a technique or skill that beginners might not know,
            provide a reference to help them learn. Examples:
            - "threading a needle" for sewing projects
            - "using a miter saw safely" for woodworking
            - "soldering components" for electronics
            Only include skills that are actually performed in this step.
          items:
            type: object
            properties:
              skill_name:
                type: string
                description: The technique or skill mentioned (e.g., "threading a needle")
                required: true
              difficulty:
                type: string
                description: Skill difficulty - "beginner", "intermediate", or "advanced"
                required: true
              description:
                type: string
                description: Brief explanation of what this skill involves
                required: true
              search_query:
                type: string
                description: A search query to find tutorials (e.g., "how to thread a needle for beginners tutorial")
                required: true
        safety_warnings:
          type: array
          description: |
            Important safety information for this step. Include warnings about:
            - Power tool usage and safety
            - Chemical or fume hazards
            - Sharp objects or cutting tools
            - Electrical safety
            - Heat or fire risks
            - Required protective equipment
          items:
            type: object
            properties:
              warning:
                type: string
                description: The safety warning text
                required: true
              severity:
                type: string
                description: '"caution" (minor risk), "warning" (moderate risk), or "danger" (serious risk)'
                required: true
              ppe_required:
                type: array
                description: Required protective equipment (e.g., "safety glasses", "gloves", "hearing protection")
                items:
                  type: string

  - name: tags
    type: array
    description: Project categories or tags
    items:
      type: string
`;

API Integration

Install the official TypeScript SDK:

npm install refyne-sdk

Starting an Async Extraction Job

The crawl() method starts an async job and returns immediately with a job ID for polling:

/**
 * Start an async extraction (crawl) job.
 * Returns a job_id that can be polled for status.
 */
export async function startExtraction(
  url: string,
  apiUrl: string,
  apiKey: string,
  referer?: string
): Promise<CrawlJobStartResponse> {
  try {
    const client = new Refyne({
      apiKey,
      baseUrl: apiUrl,
      timeout: EXTRACTION_TIMEOUT_MS,
      referer,
    });

    const response = await client.crawl({
      url,
      schema: TUTORIAL_SCHEMA,
      capture_debug: true,
      fetch_mode: 'auto', // Auto-detect JS-heavy sites and use browser rendering when needed
    });

    const jobId = (response as any).job_id || (response as any).id;

    if (!jobId) {
      return {
        success: false,
        error: 'No job ID returned from API',
      };
    }

    return {
      success: true,
      jobId: jobId,
    };
  } catch (error) {
    console.error('[startExtraction] Error:', error);
    return {
      success: false,
      error: error instanceof Error ? error.message : 'Failed to start extraction',
    };
  }
}

Polling for Results

Poll the job status until extraction is complete:

/**
 * Get the status of an async extraction job.
 * When complete, returns the extracted tutorial data.
 */
export async function getJobStatus(
  jobId: string,
  apiUrl: string,
  apiKey: string,
  referer?: string
): Promise<CrawlJobStatusResponse> {
  try {
    const client = new Refyne({
      apiKey,
      baseUrl: apiUrl,
      timeout: 30000, // Shorter timeout for status checks
      referer,
    });

    const job = await client.jobs.get(jobId);
    const status = (job as any).status;

    // Job still in progress
    if (status === 'pending' || status === 'crawling' || status === 'processing') {
      const response: CrawlJobStatusResponse = {
        success: true,
        status,
        progress: (job as any).progress || 0,
      };
      // Include queue position for pending jobs
      if (status === 'pending' && (job as any).queue_position > 0) {
        response.queuePosition = (job as any).queue_position;
      }
      return response;
    }

    // Job failed
    if (status === 'failed') {
      return {
        success: false,
        status,
        error: (job as any).error || 'Extraction failed',
      };
    }

    // Job completed - get results
    if (status === 'completed') {
      const results = await client.jobs.getResults(jobId);
      const extracted = (results as any).data || (results as any).results?.[0]?.data || results;

      return {
        success: true,
        status,
        data: transformExtractedData(extracted),
      };
    }

    // Unknown status
    return {
      success: true,
      status,
    };
  } catch (error) {
    console.error('[getJobStatus] Error:', error);
    return {
      success: false,
      error: error instanceof Error ? error.message : 'Failed to get job status',
    };
  }
}

Client-Side Polling Loop

The frontend polls the API and shows progress to users:

async function pollForResults(jobId: string) {
  const maxAttempts = 300; // 5 minutes max
  let attempts = 0;

  const interval = setInterval(async () => {
    attempts++;
    if (attempts > maxAttempts) {
      clearInterval(interval);
      showError('Extraction timed out');
      return;
    }

    const response = await fetch(`/api/poll/${jobId}`);
    const result = await response.json();

    if (result.status === 'completed') {
      clearInterval(interval);
      showResults(result.data);
    } else if (result.status === 'failed') {
      clearInterval(interval);
      showError(result.error);
    }
    // Otherwise keep polling...
  }, 1000);
}

Response Example

When extraction completes, you get structured data with glossary terms and measurement conversions:

{
  "title": "Build a Simple Bookshelf",
  "overview": "Learn how to build a sturdy wooden bookshelf using basic tools...",
  "image_url": "https://example.com/bookshelf.jpg",
  "author": "DIY Workshop",
  "author_url": "https://example.com/author",
  "difficulty": "Beginner",
  "estimated_time": "4-6 hours",
  "glossary": [
    {
      "term": "miter cut",
      "definition": "An angled cut across the width of a board, typically at 45 degrees",
      "context": "Used for joining corners of the shelf frame"
    }
  ],
  "materials": [
    {
      "name": "Pine boards",
      "quantity": "4",
      "notes": "1x10 lumber",
      "measurement": {
        "original": "6 feet",
        "metric": "183 cm",
        "imperial": "6 ft"
      }
    }
  ],
  "tools": [
    {
      "name": "Circular saw",
      "notes": "Or hand saw",
      "required": true
    },
    {
      "name": "Clamps",
      "notes": "For holding pieces while drilling",
      "required": false
    }
  ],
  "steps": [
    {
      "step_number": 1,
      "title": "Cut the Side Panels",
      "instructions": "Measure and mark your pine boards at 36 inches...",
      "tips": "Clamp a straight edge as a guide for cleaner cuts",
      "image_urls": ["https://example.com/step1.jpg"],
      "measurements": [
        {
          "original": "36 inches",
          "metric": "91.4 cm",
          "imperial": "36 in"
        }
      ]
    }
  ],
  "tags": ["woodworking", "furniture", "beginner"]
}

When to Use Async vs Sync

Async Crawler (this demo)Sync Extract (Recipe App)
Start job, poll for resultsSingle API call
Best for complex/long pagesBest for quick pages
Shows progress to usersBlocks until complete
Better timeout handlingSimpler error handling

Use the async crawler pattern when extracting from complex pages with lots of content, or when you want to show extraction progress to users.

Tech Stack

Source Code

The complete source code is available in the refyne-demos repository.

Key files:

  • diyviewer/src/lib/refyne.ts - Refyne SDK integration and schema
  • diyviewer/src/lib/db.ts - Database operations
  • diyviewer/src/pages/add.astro - Tutorial extraction page
  • diyviewer/src/pages/tutorial/[id].astro - Tutorial detail page
  • diyviewer/src/pages/checklist.astro - Materials checklist

Try It Yourself

  1. Visit diyviewer-demo.refyne.uk
  2. Click "Add Project"
  3. Paste a tutorial URL from Instructables or Make:
  4. Watch the extraction progress
  5. Save to your project list and track materials with the checklist