Alphalens Logo

Alphalens Bot

The Alphalens Bot is a respectful web crawling service that indexes companies and their offerings to power more effective business discovery. Our bot operates with the user-agent identifier alphalens-bot and follows web standards and best practices for automated content retrieval.

Our Mission

We're building a comprehensive index of companies that captures the full scope of their products and services, not just surface-level descriptions. Traditional business indexes often reduce complex offerings to single-line summaries, making it nearly impossible for potential customers to discover relevant solutions through semantic search.
The result? Your next customer might be searching for exactly what you offer but failing to find you due to inadequate indexing of your business.

How We Index Your Website

Our crawling process follows these steps:

Compliance Check: We first examine your robots.txt file and sitemap to ensure we have permission to access your content. We also follow crawl delays.

Content Retrieval: We fetch HTML content from approved pages without attempting to interact with forms, buttons, or bypass security measures like CAPTCHAs.

Respectful Crawling: We honor rate limits and crawling restrictions specified in your site's configuration

Technical Details

User-Agent

All requests from our service use the user-agent: alphalens-bot

Robots.txt Compliance

The Alphalens Bot strictly adheres to robots.txt directives. Before crawling any website, we check for and respect the permissions specified in your robots.txt file. If a crawl delay is present in your robots.txt file we will pause for the amount of time set in the crawl delay between requests.

URL Discovery Process

With Sitemap: We begin by reading your sitemap and visiting listed URLs while respecting any priority weights you've assigned.

Without Sitemap: We start from your homepage and use a breadth-first approach to discover additional pages.

Access Control: We automatically skip any URLs that are blocked by your robots.txt file.

Public IP Addresses

You can find the public ip addresses used here.

Managing Bot Access

Blocking the Alphalens Bot

To prevent our bot from accessing your website, update your robots.txt file with the following directive:

User-agent: alphalens-bot
Disallow: /

Ensure your robots.txt file is accessible at the standard location (e.g., example.com/robots.txt).

Partial Access Control

You can also restrict access to specific sections of your site:
User-agent: alphalens-bot
Disallow: /private/
Disallow: /admin/

Contact Information

For questions about our crawling practices or to request specific accommodations, please contact our team at contact@alphalensbot.com.