Skip to content

Configuration

Configuration is managed through environment variables loaded from a .env file in the project’s root directory.

In the root directory of the streeteasy-scraper project, create a new file named .env. You can copy the contents of the .env.example file as a starting point:

Terminal window
cp .env.example .env

Now, you will edit the .env file to fill in the required values.

Required Environment Variables for Sync Profile

Section titled “Required Environment Variables for Sync Profile”

Configure these variables in your .env file:

Connection details for the local PostgreSQL database:

POSTGRES_USER=user
POSTGRES_PASSWORD=password
POSTGRES_DB=user
POSTGRES_HOST=db
POSTGRES_PORT=5432

Credentials for the synchronous BrightData Web Unlocker service.

  • BRIGHTDATA_USERNAME: Your BrightData Web Unlocker username.
  • BRIGHTDATA_PASSWORD: Your BrightData Web Unlocker password.
  • BRIGHTDATA_HOST: BrightData proxy hostname (e.g., brd.superproxy.io).
BRIGHTDATA_USERNAME=your_brightdata_username
BRIGHTDATA_PASSWORD=your_brightdata_password
BRIGHTDATA_HOST=brd.superproxy.io

Async/Shared Variables (Required by Config)

Section titled “Async/Shared Variables (Required by Config)”

These variables are required by the project’s configuration loading but are primarily used for the asynchronous profile. Provide values from your BrightData account or use a placeholder for Cloudflare.

  • BRIGHTDATA_API_TOKEN: Your BrightData API Token.
  • BRIGHTDATA_CUSTOMER_ID: Your BrightData Customer ID.
  • BRIGHTDATA_ZONE: Your BrightData Web Unlocker zone name.
  • CLOUDFLARE_TUNNEL_TOKEN: Your Cloudflare Tunnel Token. Use dummy_token for sync quickstart.
BRIGHTDATA_API_TOKEN=your_brightdata_api_token
BRIGHTDATA_CUSTOMER_ID=your_brightdata_customer_id
BRIGHTDATA_ZONE=your_brightdata_zone
CLOUDFLARE_TUNNEL_TOKEN=dummy_token # Placeholder for sync quickstart

Filter addresses from the nyc.gov list. Default is MANHATTAN.

  • BOROUGHS_TO_KEEP: Comma-separated borough names (e.g., MANHATTAN,BROOKLYN).
BOROUGHS_TO_KEEP=MANHATTAN

Once you have filled in the required environment variables in your .env file, you are ready to run the synchronous scraping profile using Docker Compose.