Hi everyone! I’m trying to access company financials and I’m having a difficult time with the content type of the filing. I’m looking at company ID :00197009 and I was able to find it in the bulk data with a html format; however, I’m only finding the pdf version of this filing. Can anyone help me? I’ve submitted my code below in python.
import requests
import json
import requests.packages.urllib3
requests.packages.urllib3.disable_warnings()
url = “https://api.companieshouse.gov.uk/company/00197009/filing-history/MzE5Mzc0OTc3MGFkaXF6a2N4”
resp = requests.get(url, auth=(‘API_KEY’, ‘’))
resp_json = resp.json()
print(resp_json[“links”][“document_metadata”])
url_2 = “https://document-api.companieshouse.gov.uk/document/T53BLYf734zxeBWyvna131JtREqLsBgclFME-v6rxI8/content”
resp_2 = requests.get(url_2, auth=(‘API_KEY’, ‘’,), headers={“content-type”:“application/html”})
print(resp_2.headers)
print(resp_2.url)
Was able to fix this myself with the following code. I had to change “content-type” to “Accept” as well as “html” to “xhtml-xml”
import requests
import json
import requests.packages.urllib3
requests.packages.urllib3.disable_warnings()
url = “https://api.companieshouse.gov.uk/company/00197009/filing-history/MzE5Mzc0OTc3MGFkaXF6a2N4”
resp = requests.get(url, auth=(‘API_KEY’, ‘’))
resp_json = resp.json()
print(resp_json[“links”][“document_metadata”])
url_2 = “https://document-api.companieshouse.gov.uk/document/T53BLYf734zxeBWyvna131JtREqLsBgclFME-v6rxI8/content”
resp_2 = requests.get(url_2, auth=(‘API_KEY’, ‘’,), headers={“Accept”:“application/xhtml-xml”})
print(resp_2.headers)
print(resp_2.url)
1 Like
I was facing the exact same problem. You’re a life saver.