Solved the problem now. Posting so it can help others.
I thought I was getting rate limited so did a timing exercise.
import time
import requests
all_urls = [
'https://stream.companieshouse.gov.uk/companies',
'https://stream.companieshouse.gov.uk/filings',
'https://stream.companieshouse.gov.uk/insolvency-cases',
'https://stream.companieshouse.gov.uk/charges',
'https://stream.companieshouse.gov.uk/officer-appointments',
'https://stream.companieshouse.gov.uk/persons-with-significant-control',
]
all_resps = []
start_time = time.time()
for url in all_urls:
headers = {'Authorization': API_key}
resp = requests.get(url, headers=headers, stream=True)
all_resps.append(resp)
print("Time since start: ", time.time() - start_time)
This was the output:
Time since start: 1.6528594493865967
Time since start: 24.2913715839386
Time since start: 55.80907225608826
Time since start: 87.31857132911682
Time since start: 87.40170788764954
Time since start: 87.46614027023315
So it takes a ~25 seconds to establish connections to the first 4 streams. Then the last three take no time.
Checking the response codes shows the problem:
for resp in all_resps:
print(resp.status_code)
Results:
200
200
200
200
503
404
Total facepalm moment. Having a read on the forum, these appear not to have been completed.
While I totally get that developing a stream takes time. Deleting documentation doesn’t really require much effort and there shouldn’t be documentation up for streams that just don’t work right now.