Configuration
Configuration is managed through environment variables loaded from a .env file in the project’s root directory.
Creating the .env File
Section titled “Creating the .env File”In the root directory of the streeteasy-scraper project, create a new file named .env. You can copy the contents of the .env.example file as a starting point:
cp .env.example .envNow, you will edit the .env file to fill in the required values.
Required Environment Variables for Sync Profile
Section titled “Required Environment Variables for Sync Profile”Configure these variables in your .env file:
Database Configuration
Section titled “Database Configuration”Connection details for the local PostgreSQL database:
POSTGRES_USER=userPOSTGRES_PASSWORD=passwordPOSTGRES_DB=userPOSTGRES_HOST=dbPOSTGRES_PORT=5432BrightData Web Unlocker Credentials
Section titled “BrightData Web Unlocker Credentials”Credentials for the synchronous BrightData Web Unlocker service.
BRIGHTDATA_USERNAME: Your BrightData Web Unlocker username.BRIGHTDATA_PASSWORD: Your BrightData Web Unlocker password.BRIGHTDATA_HOST: BrightData proxy hostname (e.g., brd.superproxy.io).
BRIGHTDATA_USERNAME=your_brightdata_usernameBRIGHTDATA_PASSWORD=your_brightdata_passwordBRIGHTDATA_HOST=brd.superproxy.ioAsync/Shared Variables (Required by Config)
Section titled “Async/Shared Variables (Required by Config)”These variables are required by the project’s configuration loading but are primarily used for the asynchronous profile. Provide values from your BrightData account or use a placeholder for Cloudflare.
BRIGHTDATA_API_TOKEN: Your BrightData API Token.BRIGHTDATA_CUSTOMER_ID: Your BrightData Customer ID.BRIGHTDATA_ZONE: Your BrightData Web Unlocker zone name.CLOUDFLARE_TUNNEL_TOKEN: Your Cloudflare Tunnel Token. Usedummy_tokenfor sync quickstart.
BRIGHTDATA_API_TOKEN=your_brightdata_api_tokenBRIGHTDATA_CUSTOMER_ID=your_brightdata_customer_idBRIGHTDATA_ZONE=your_brightdata_zoneCLOUDFLARE_TUNNEL_TOKEN=dummy_token # Placeholder for sync quickstartBOROUGHS_TO_KEEP (Optional)
Section titled “BOROUGHS_TO_KEEP (Optional)”Filter addresses from the nyc.gov list. Default is MANHATTAN.
BOROUGHS_TO_KEEP: Comma-separated borough names (e.g.,MANHATTAN,BROOKLYN).
BOROUGHS_TO_KEEP=MANHATTANOnce you have filled in the required environment variables in your .env file, you are ready to run the synchronous scraping profile using Docker Compose.