Help with error getting docs from API

I have been using the API to search for companies, obtain company profiles, and fetch filing history without any problems but, when I try to download the documents associated with the filings, I consistently encounter the following error message: “Exception: Resource not found: [https://api.companieshouse.gov.uk/document/document_id]”

The document IDs appear to be valid, and the corresponding documents are available on the website in both PDF and iXBRL formats. This issue occurs for all companies I have attempted to access.

An example of a document ID with company number I have tried to access:

  • Company number:00446417 Document ID:yFROnrh17RpxPH0MM3ccdb2PPnoudhMUm_CUe6svRPM

I have properly authenticated my requests with my API key and have been successful in accessing other endpoints without issue. I am unsure why I am unable to download these documents.

I would appreciate any guidance.

Welcome - note that Companies House have actually split their API into sections. The part for retrieving information about documents or the document contents is a separate API, with its own subdomain e.g. you need to make requests of the form:

https://document-api.company-information.service.gov.uk/document/{document_id}

So not https://api.company-information.service.gov.uk/…

I believe their intent with the api was that you would just follow the links e.g. request the Filing History List or a single Filing. Each Filing History item will contain a links member and that will have a document_metatdata member with the URL to request e.g. for your example company and document I can request this via curl, snipping lots of information ("…"):

curl -u MY_API_KEY: "https://api.company-information.service.gov.uk/company/00446417/filing-history"
{
    "items": [
        {
            "action_date": "2021-12-31",
            "category": "accounts",
            "date": "2022-10-21",
            "description": "accounts-with-accounts-type-full",
           ...
            "links": {
                "self": "/company/00446417/filing-history/MzM1NjExMzUwOWFkaXF6a2N4",
                "document_metadata": "https://frontend-doc-api.company-information.service.gov.uk/document/yFROnrh17RpxPH0MM3ccdb2PPnoudhMUm_CUe6svRPM"
            },
            ...
         },
       ...
    ],
    ...
 }

So you then request:

https://frontend-doc-api.company-information.service.gov.uk/document/yFROnrh17RpxPH0MM3ccdb2PPnoudhMUm_CUe6svRPM"

(Or - per their documentation - https://document-api.company-information.service.gov.uk/document/yFROnrh17RpxPH0MM3ccdb2PPnoudhMUm_CUe6svRPM" - both work)

That gives you the document metadata (not the content):

{
    ...
    "links": {
        "self": "https://document-api.company-information.service.gov.uk/document/yFROnrh17RpxPH0MM3ccdb2PPnoudhMUm_CUe6svRPM",
        "document": "https://document-api.company-information.service.gov.uk/document/yFROnrh17RpxPH0MM3ccdb2PPnoudhMUm_CUe6svRPM/content"
    },
    "resources": {
        "application/pdf": {
            "content_length": 712506
        },
        "application/xhtml+xml": {
            "content_length": 1413148
        }
    }
}

You would then request the URL in the links.document member to get the actual data. Note that this is also a slightly involved process - or at least it seems to cause many people some problems. There are several posts on threads here which aim to help explain the process e.g. mine here and the posts linked in my response:

You can see the details in the Companies House documentation at:

https://developer-specs.company-information.service.gov.uk/document-api/reference

Hope this helps.

1 Like

Hey @voracityemail

Small point - should there be a " at the end of second step requests (…svRPM")? Not sure if this is a typo.

More generally, thanks for your help on this topic! I’ve read a number of related posts. For most you have commented and, where you have, your contribution has made things that bit clearer!

Ah yes, thanks, that’s a typo (copying and pasting from command line / shell). Doesn’t look like I can edit the post so for reference the correct curl command for that should read (leaving out quotes works for me in a bash shell):

curl -u YOURAPIKEY: https://document-api.company-information.service.gov.uk/document/yFROnrh17RpxPH0MM3ccdb2PPnoudhMUm_CUe6svRPM

1 Like