GTM Engine Background

Scrape Task

Feature Overview

The Web Scrape Task is a powerful workflow component that allows you to extract content from websites and PDF documents automatically. This feature enables you to capture valuable information from external sources and incorporate it directly into your GTM Engine workflows, enhancing your ability to gather insights without manual copying and pasting.

The Scrape Task can extract content from:

  • Website URLs
  • Hosted PDF documents
  • Search result pages

Benefits to the User

  • Eliminate Manual Research: Automatically pull relevant information from websites and documents without switching between tabs or applications
  • Enhanced Context: Add valuable external information to your sales workflows for better decision-making
  • Time Savings: Reduce the administrative burden of gathering information, giving you back more time for actual selling
  • Improved Research Capabilities: Easily incorporate competitive intelligence, industry news, or product information into your workflows
  • Seamless Integration: Use scraped content as input for other workflow tasks or CRM updates

Accessing the Feature

The Web Scrape Task is located in the Workflows section of GTM Engine:

  1. Navigate to the Workflows section in your GTM Engine main nav
  2. Create a new workflow or edit an existing one
  3. In the workflow editor, look for the Web Scrape Task option in the task selection menu
  4. Click to add the Web Scrape Task to your workflow

Step-by-Step Usage Guide

Basic Setup

  1. Add the Scrape Task to your workflow by selecting it from the task options
  2. Enter the URL you wish to scrape in the designated field:
    • For websites: Enter the complete URL (e.g., https://www.example.com/page)
    • For PDFs: Enter the URL where the PDF is hosted (e.g., https://www.example.com/document.pdf)

Configuring Advanced Options

Click the Advanced Options dropdown to access additional configuration settings:

  1. Render JavaScript: Toggle this option ON if the target website requires JavaScript to display content properly
  2. Block Ads: Toggle this option ON to prevent ads from being included in the scraped content
  3. Wait for Selector: Enter a time value in milliseconds to wait for specific elements to load before scraping
  4. Master Timeout: Set a maximum time (in milliseconds) for the entire scraping operation to complete before timing out

Running the Task

  1. Save your workflow configuration
  2. Execute the workflow either manually or based on your configured triggers
  3. The scraped content will be available as output from this task, which can be used in subsequent workflow steps

Tips and Best Practices

  • Test Before Relying: Always test your scrape task with a sample URL before incorporating it into critical workflows
  • Be Specific: Target specific pages rather than general websites to get more relevant content
  • Respect Terms of Service: Ensure you have permission to scrape content from the websites you're targeting
  • Optimize Timeouts: Adjust the wait times based on the complexity of the website - larger sites may need longer timeouts
  • Use with Search Tasks: Combine the Scrape Task with search operations to automatically gather information on prospects or competitors
  • Process the Output: Consider using text processing tasks after scraping to extract only the most relevant information
  • Monitor Performance: Regularly check that your scrape tasks are working as expected, as website structures can change

Note: When scraping websites, be aware that some sites may have measures in place to prevent automated scraping. Always ensure your use complies with the website's terms of service and relevant regulations.

By leveraging the Web Scrape Task, you can automate information gathering that would otherwise require manual effort, allowing you to focus on high-value selling activities while GTM Engine handles the data collection process.

GTM Engine Logo

SALES PIPELINE AUTOMATION FAQS

GTM Engine is a Pipeline Execution Platform that automatically analyzes unstructured customer interaction data (like calls, emails, CRM entries, chats) and turns it into structured insights and actions for Sales, Marketing, Customer Success, and Product teams.