Skip to content

Introduction

Welcome to the streeteasy-scraper documentation. This project helps you collect data from StreetEasy.com and related sources, managing the process and storing results in a database.

  • Address Seeding: Initialize your database with a list of target addresses from external sources.
  • Flexible Scraping Methods: This project offers two distinct methods for scraping:
    • Synchronous: A simpler, multi-threaded approach for direct requests.
    • Asynchronous: A more robust and scalable workflow leveraging BrightData’s callbacks for high-volume or longer-running jobs.
  • Webhook Integration: A dedicated service to handle callbacks from BrightData for efficient asynchronous job processing.
  • Dockerized Deployment: Easily deploy and manage all project components using Docker and Docker Compose.
  • Cloudflared Tunneling: Securely expose the webhook to the public internet for asynchronous workflows.
  • Structured Data Output: Scraped data is stored in a PostgreSQL database for easy access and analysis.

Build StreetEasy datasets for analysis. Choose the method that fits your needs: sync for simplicity, async for scale and reliability with BrightData.

The streeteasy-scraper is built using a modern Python stack and leverages powerful infrastructure tools:

  • Python: The primary programming language.
  • uv: A fast Python package installer and resolver.
  • FastAPI: For building the asynchronous webhook service.
  • SQLAlchemy: ORM for interacting with the database.
  • PostgreSQL: The chosen database for storing addresses and scraped data.
  • Docker & Docker Compose: For containerization and orchestration.
  • Cloudflared: To expose the webhook publicly via secure tunnels.
  • BrightData WebUnlocker: A third-party service used for web scraping.

This documentation starts by guiding you through the synchronous profile quickstart. We’ll then cover the system’s design and the asynchronous setup in detail. Let’s begin with installation.