Hi,
We are trying to scrape company numbers from director IDs with the API, but keep getting these below errors after it runs for a while:
ConnectionResetError: [Errno 54] Connection reset by peer
ProtocolError: (‘Connection aborted.’, ConnectionResetError(54, ‘Connection reset by peer’))
ConnectionError: (‘Connection aborted.’, ConnectionResetError(54, ‘Connection reset by peer’))
We created a status code dictionary in the code so that we could monitor any status code errors we may get. From the last attempt, we did not find any evidence that the problem was a rate limit problem (we have a 5min sleep which hopefully addresses this.
We’ve included our code below. Any advice would be great!
Our code:
import requests
import pandas as pd
import json
import time
from tqdm import tqdm
from datetime import datetime
request_number = 0
co_list_a =[]
dir_num_error ={}
for item in tqdm(dir_ids_0):
if request_number > 599:
print(“sleeping”)
time.sleep(300)
request_number = 0
else: pass
response = requests.get(f"{base_url}{item}",auth=(api_key,’’))
request_number = request_number + 1
dir_num_error[item] = {}
dir_num_error[item][‘status_code’] = response.status_code
dir_num_error[item][‘timestamp’] = str(datetime.now())
dir_num_error[item][‘request_number’] = request_number
if response.status_code == 429:
print(“429_sleeping”)
time.sleep(300)
request_number = 0
continue
else: pass
if response.status_code != 200:
continue
else: pass
json_search_result = response.text
data = json.JSONDecoder().decode(json_search_result)
for item in data['items']:
co_list_a.append(item['links']['company'])