Alphalens Bot
The Alphalens Bot is a respectful web crawling service that indexes companies and their offerings to power more effective business discovery. Our bot operates with the user-agent identifier alphalens-bot and follows web standards and best practices for automated content retrieval.
We're building a comprehensive index of companies that captures the full scope of their products and services, not just surface-level descriptions. Traditional business indexes often reduce complex offerings to single-line summaries, making it nearly impossible for potential customers to discover relevant solutions through semantic search. The result? Your next customer might be searching for exactly what you offer but failing to find you due to inadequate indexing of your business.
Our crawling process follows these steps:
Compliance Check: We first examine your robots.txt file and sitemap to ensure we have permission to access your content. We also follow crawl delays.
Content Retrieval: We fetch HTML content from approved pages without attempting to interact with forms, buttons, or bypass security measures like CAPTCHAs.
Respectful Crawling: We honor rate limits and crawling restrictions specified in your site's configuration
All requests from our service use the user-agent: alphalens-bot
The Alphalens Bot strictly adheres to robots.txt directives. Before crawling any website, we check for and respect the permissions specified in your robots.txt file. If a crawl delay is present in your robots.txt file we will pause for the amount of time set in the crawl delay between requests.
With Sitemap: We begin by reading your sitemap and visiting listed URLs while respecting any priority weights you've assigned.
Without Sitemap: We start from your homepage and use a breadth-first approach to discover additional pages.
Access Control: We automatically skip any URLs that are blocked by your robots.txt file.
You can find the public ip addresses used here.
To prevent our bot from accessing your website, update your robots.txt file with the following directive:
User-agent: alphalens-bot Disallow: /
Ensure your robots.txt file is accessible at the standard location (e.g., example.com/robots.txt).
You can also restrict access to specific sections of your site: User-agent: alphalens-bot Disallow: /private/ Disallow: /admin/
For questions about our crawling practices or to request specific accommodations, please contact our team at contact@alphalensbot.com.