Configuration
Configuration is managed through environment variables loaded from a .env
file in the project’s root directory.
Creating the .env
File
Section titled “Creating the .env File”In the root directory of the streeteasy-scraper
project, create a new file named .env
. You can copy the contents of the .env.example
file as a starting point:
cp .env.example .env
Now, you will edit the .env
file to fill in the required values.
Required Environment Variables for Sync Profile
Section titled “Required Environment Variables for Sync Profile”Configure these variables in your .env
file:
Database Configuration
Section titled “Database Configuration”Connection details for the local PostgreSQL database:
POSTGRES_USER=userPOSTGRES_PASSWORD=passwordPOSTGRES_DB=userPOSTGRES_HOST=dbPOSTGRES_PORT=5432
BrightData Web Unlocker Credentials
Section titled “BrightData Web Unlocker Credentials”Credentials for the synchronous BrightData Web Unlocker service.
BRIGHTDATA_USERNAME
: Your BrightData Web Unlocker username.BRIGHTDATA_PASSWORD
: Your BrightData Web Unlocker password.BRIGHTDATA_HOST
: BrightData proxy hostname (e.g., brd.superproxy.io).
BRIGHTDATA_USERNAME=your_brightdata_usernameBRIGHTDATA_PASSWORD=your_brightdata_passwordBRIGHTDATA_HOST=brd.superproxy.io
Async/Shared Variables (Required by Config)
Section titled “Async/Shared Variables (Required by Config)”These variables are required by the project’s configuration loading but are primarily used for the asynchronous profile. Provide values from your BrightData account or use a placeholder for Cloudflare.
BRIGHTDATA_API_TOKEN
: Your BrightData API Token.BRIGHTDATA_CUSTOMER_ID
: Your BrightData Customer ID.BRIGHTDATA_ZONE
: Your BrightData Web Unlocker zone name.CLOUDFLARE_TUNNEL_TOKEN
: Your Cloudflare Tunnel Token. Usedummy_token
for sync quickstart.
BRIGHTDATA_API_TOKEN=your_brightdata_api_tokenBRIGHTDATA_CUSTOMER_ID=your_brightdata_customer_idBRIGHTDATA_ZONE=your_brightdata_zoneCLOUDFLARE_TUNNEL_TOKEN=dummy_token # Placeholder for sync quickstart
BOROUGHS_TO_KEEP
(Optional)
Section titled “BOROUGHS_TO_KEEP (Optional)”Filter addresses from the nyc.gov list. Default is MANHATTAN
.
BOROUGHS_TO_KEEP
: Comma-separated borough names (e.g.,MANHATTAN,BROOKLYN
).
BOROUGHS_TO_KEEP=MANHATTAN
Once you have filled in the required environment variables in your .env
file, you are ready to run the synchronous scraping profile using Docker Compose.