Welcome - a good error report this so pretty clear what’s going on. You’re nearly there. The issue is just the first point of call in your last request is still a Companies House server - so you will need to pass the API key there:
url = 'https://frontend-doc-api.company-information.service.gov.uk/document/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/content'
response = requests.get(url, headers={'Accept': 'application/pdf'})
If you don’t pass in the API key then you’ll see what you said e.g. 401, same as calling any other part of Companies House without the API key.
However, what this request returns if you do send the key is a http 302 redirect (to Amazon AWS). Many others had the problem that the tool they were using was “following” this as expected but continuing to send the Companies House API key as http Basic Authentication - which then causes Amazon’s servers to complain (or did at that point).
I don’t know python but there’s an answer to “how to check if a request redirects to a new URL” here:
https://www.adamsmith.haus/python/answers/how-to-check-if-a-request-redirects-to-a-new-url-in-python
So - unless you use a different library it looks like you call your last endpoint with the API key, then look through the list of responses. Find the first one and then just request that URL without passing in the API key. (You may not even need to do this. I don’t know but Python - like curl as described below - may be able to follow the redirect(s) and not pass the API key every time. I’d check what the python library libraries will do).
You can see / test this with Curl. Your last request is effectively sending:
curl -I "https://frontend-doc-api.company-information.service.gov.uk/document/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/content"
Not surprisingly Companies House won’t let you do that!
HTTP/1.1 404 Not Found
Content-Length: 19
Content-Type: text/plain; charset=utf-8
Date: Thu, 14 Apr 2022 13:12:15 GMT
Server: nginx/1.18.0
X-Content-Type-Options: nosniff
Connection: keep-alive
If you add the API key to that you’ll get a 302 (using curl -v instead of -I as otherwise you don’t see the redirect) (some of the following snipped both within lines and complete lines for clarity - marked “…”):
curl -v -u MY_API_KEY: "https://frontend-doc-api.company-information.service.gov.uk/document/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/content"
< HTTP/1.1 302 Found
< Date: Thu, 14 Apr 2022 13:11:27 GMT
< Location: https://s3.eu-west-2.amazonaws.com/document-api-images-live.ch.gov.uk/docs/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/application-pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...
Aside - with curl you can of course tell it to follow the redirects (the -L flag). This actually allows me to get the file with version of curl e.g. it only passes the authorization to the first host e.g. Companies House:
(some of the following snipped both within lines and complete lines for clarity - marked “…”)
curl -v -L -u MY_API_KEY: "https://frontend-doc-api.company-information.service.gov.uk/document/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/content" > download.pdf
...
GET /document/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/content HTTP/1.1
Host: frontend-doc-api.company-information.service.gov.uk
Authorization: Basic {ENCODED API KEY HERE}
…
< HTTP/1.1 302 Found
< Date: Thu, 14 Apr 2022 13:23:37 GMT
< Location: https://s3.eu-west-2.amazonaws.com/document-api-images-live.ch.gov.uk/docs/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/application-pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=…
Issue another request to this URL: 'https://s3.eu-west-2.amazonaws.com/document-api-images-live.ch.gov.uk/docs/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/application-pdf?X-Amz-Algorithm=A…
GET /document-api-images-live.ch.gov.uk/docs/iTf5l1sphFi4eBM-ndd7WGZclS11-L4FJdVSx7SN3xE/application-pdf?X-Amz-Algorithm=A
Host: s3.eu-west-2.amazonaws.com
< HTTP/1.1 200 OK
< Content-Type: application/pdf
< Server: AmazonS3
< Content-Length: 183106
That should get this working for you.