Skip to content

StreetEasy Scraper

Get started self-hosting your own StreetEasy Scraper!

This project provides a powerful and adaptable framework for gathering real estate data from StreetEasy. Whether for analysis, research, or building custom applications, the StreetEasy Scraper simplifies the data collection process with features like:

BrightData Integration

Leverage BrightData’s Web Unlocker to handle complex website defenses like CAPTCHAs at scale.

Flexible Scraping Methods

Choose the method that fits your needs: Synchronous for immediate results with a simpler setup, or Asynchronous for scale and higher success rates.

Concurrent Sync Processing

The synchronous profile leverages multithreading to process multiple addresses in parallel for faster results at smaller scale.

Efficient Async Workflow

An asynchronous workflow with a FastAPI webhook and background processing handles BrightData callbacks for scalable and reliable data collection.

Secure Tunneling

Use Cloudflared tunnels to securely expose your webhook to the public internet without opening inbound firewall ports.

Enhanced Security

Implement robust Cloudflare IP whitelisting and SSL/TLS encryption to protect your webhook endpoint.

Robust Database

Store and manage your addresses and scraped data reliably using a PostgreSQL database.

Simplified Deployment

Easily deploy and manage all project services using Docker Compose with distinct profiles for synchronous and asynchronous modes.

The StreetEasy Scraper operates by:

  1. Providing Addresses: Either seed the database with addresses from nyc.gov using the built-in tool, or add your own list of addresses to the database.
  2. Scraping Data: Run either the synchronous or asynchronous scraper profile to collect data from StreetEasy for the provided addresses, using BrightData Web Unlocker.
  3. Storing Results: Scraped data is saved directly into the PostgreSQL database, associated with the original address.

The asynchronous workflow adds complexity with a dedicated webhook to receive callbacks from BrightData when a job is ready, and a background process to fetch and process the results.

Our documentation will guide you through every step, from initial setup to running scraping jobs and accessing your data.