Help with Document API & Fetch a Document

Hi there,

I was wondering if anyone has had any luck with the “Fetch a Document” within the Document API (https://developer.companieshouse.gov.uk/document/docs/document/id/content/fetchDocument.html). The documentation is not very helpful:

Authorization This header parameter contains the token_type and the access_token. See example

Does anyone know what the token_type and access_token are?

On the explore API section it asked for “id” (Which I know), “Authorization [sic]” (Which I don’t know) and Accept (Which I know)

Any help would be greatly appreciated.

Paul

Paul,

Assuming you have identified the document you want from the filing history response, the steps are:

  1. Set your mandatory HTTP request headers.

    Accept: application/pdf
    Authorization: Basic your_encoded_key_goes_here

  2. Invoke https://document-api.companieshouse.gov.uk/document/{transaction_id}/content where transaction_id relates to the document of interest in filing history response. E.g.

    https://document-api.companieshouse.gov.uk/document/n1EjP_MALLs8xZp5hs86iHcYDli0TE-n6t4HUDeZuq8/content

  3. Retrieve the location header from the HTTP response. The headers you get back will include:

    Content-Type: text/plain; charset=utf-8
    Date: Fri, 04 Sep 2015 15:18:36 GMT
    Location: https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/n1EjP_MALLs8xZp5hs86iHcYDli0TE-n6t4HUDeZuq8/application-pdf?AWSAccessKeyId=blahblahblahblah_plus_other_attributes
    Server: nginx/1.6.2
    X-Ratelimit-Limit: 600
    X-Ratelimit-Remaining: 599
    X-Ratelimit-Reset: 1441380216
    Content-Length: 0
    Connection: keep-alive

    Note the response body is empty if all goes well!

  4. Connect to the https location and download your document.

I’ve avoided technical details but have successfully implemented this in Java.

I hope this helps.

Andrew

1 Like

A note for step (2):

A client should be following links.document_metadata URL of the filing history to get to a document, and not have to construct URL’s on the fly. Should we change the ID encoding, then client’s that construct their own URLs will break.

So, in the short term, take the links.document_metadata URL from the filing history, and append /content onto the end. This will return the default PDF, but if PDF is not available, you’ll have to deal with the failure. Calling the document_metatdata endpoint first allows you to query which document types are available before you request a particular form (possible types will be PDF, XBRL, iXBRL (coming soon), plus others in the future).

If a direct link to the default document is required, then we will look to adding this to the filing history links: {} sub-document. Client side (re)manipulation of URL’s is not desirable:

Also see the following discussion:

Thanks for the clarification. I’ve learnt something new today :+1:

Thanks for the information, quick question, what do you mean by “your_encoded_key_goes_here”? Do you mean our API key or does it need to be encoded in some way?

The API is authenticated using BASIC authentication, where the username is your API key, and the password is blank:

See also:

This is described on the Developer Hub page, with an example:

Hi,

I’ve been trying to connect to the get document API in java and failing with 500 Internal Server Error. Surprising is I could connect to the search companies and Filling History APIs successfully. Please see below analysis.

  1. Search companies SUCCESS

HttpClient client = new HttpClient();
GetMethod pm = new GetMethod(“https://api.companieshouse.gov.uk/search/companies?q=MICROSOFT”);
pm.addRequestHeader(“Authorization”, “*******”);

2. Filling History SUCCESS
HttpClient client = new HttpClient();
GetMethod pm = new GetMethod(“https://api.companieshouse.gov.uk/company/01624297/filing-history”);
pm.addRequestHeader(“Authorization”, “*******”);

3. Document API FAILED (500 Internal Server Error)

HttpClient client = new HttpClient();
GetMethod pm = new GetMethod(“http://document-api.companieshouse.gov.uk/document/MzIyNzQ2OTQ2OWFkaXF6a2N4/content”);
pm.addRequestHeader(“Content-Type”, “application/json”);
pm.addRequestHeader(“Accept”, “application/pdf”);
pm.addRequestHeader(“Authorization”, “************”);

Response Headers

Content-Type:text/plain; charset=utf-8
Date:Wed, 13 Mar 2019 14:14:08 GMT
Server:nginx/1.12.1
Content-Length:0
Connection:keep-alive

Can Someone help me on this?

Thanks,
Anup

My python implementation of downloading the most recent accounts data where the type is a pdf document:

#List of company numbers
import chwrapper

company_numbers = ['']
search_client = chwrapper.Search(access_token=COMPANIES_HOUSE_API_KEY)
for company_id in company_numbers:
    #print (company_id)
    response = search_client.filing_history(company_id)
    resp_json = response.json()
    #print (resp_json)
    first_accounts_dict = next((file for file in resp_json['items'] if (file['description'] == 'accounts-with-accounts-type-group' or file['description'] == 'accounts-with-accounts-type-total-exemption-full')), None)
    first_xbrl_trans_id = first_accounts_dict['transaction_id']
    if first_xbrl_trans_id is not None:
        url_2 = f"https://beta.companieshouse.gov.uk/company/{company_id}/filing-history/{first_xbrl_trans_id}/document"
        with open(f'{company_id}.pdf', "wb") as pdf_file:
            response_2 = requests.get(url_2, auth=('COMPANIES_HOUSE_API_KEY', '',), headers={"Accept": "application/pdf"})
            pdf_file.write(response_2.content)

according to this request: https://api.companiseshouse.gov.uk/company/https://api.companiseshouse.gov.uk/company/00002065
from where I should collect this number (00002065 ) ?

Topic solved. Unfortunatelyresponse.pdf.txt (19.2 KB) collected document has to less data (delete .txt to see pdf file).

Is it possible to collect document with company information \ officers list through API ?

I found information about document API on website (https://developer.companieshouse.gov.uk/document/docs/), however document, which I received, doesn’t have sufficient information (please refer to ‘response.pdf’ file in attachment above).

I would be very grateful for any advice and answer, whether It is possible to receive other documents through API.

Is it possible to collect document with company information \ officers list through API?

In general that’s not a feature of the API - for reasons given below.

The way to get the information you want is either:

  • via the API (e.g. request Company Profile and Officers List + anything else you want).
  • via a “data product” e.g. one of the data dumps from Companies House.

If you’re interested in ongoing changes you might want to look into the streaming API instead.

Your example is a confirmation statement (this one I think - “Confirmation statement made on 30 August 2019 with no updates”). It would seem to be correct as it is. Confirmation statements replaced “Annual Returns”. These were more like what you seem to be expecting e.g. they summarised information about a company at that point and were required each year. Here’s the link to the last one from your example company:
https://beta.companieshouse.gov.uk/company/SC327000/filing-history/MzEzMjQ1NjQzN2FkaXF6a2N4/document?format=pdf&download=0

For your example company look back in time and find the first example “with updates” - e.g. “Confirmation statement made on 30 August 2018 with updates”. You might expect to find a note of any updates during the reporting period. However rather unhelpfully it says “all information…either has been delivered or is being delivered”. So you won’t even see updates over the period gathered in one document. You need to use the API.

The system now works like this (caveat - this is just my understanding and I’m neither a lawyer or member of Companies House):

  • The Companies House dataset - e.g. what you see via the API / the CH Beta website - is the main “record”.
  • Companies must inform Companies House if there are certain changes to their status.
  • This is normally done on separate filings through the reporting period.
  • If there are changes during the reporting period the company submits a confirmation statement “with updates”. (There are a couple of changes which can be reported on the confirmation statement I think).
  • If there are no changes, the company submits a confirmation statement like the one you found.

The overall issue here is that “ease of access to the data via the API” is not the same as “it’s easy to answer (some question) about a company / officer”. For many questions you need to understand something of the law as it relates to companies, reporting requirements and the role of Companies House. Given that this is a free service it may not be so suprising that the onus is on users to work out some of this information for themselves, including dealing with issues in the data itself…

Hello,

I’ve been trying unsuccessfully to get the document metadata in order to pull up various accounts.

I can successfully run other get requests in python with API authorization.
I can get the filing history for a company and find in there the “relevant transaction_id” and also the links.document.metadata (metadata link I assume).

However, I am not able to access the metadata link!

I have tried this with the same formatted get request in python using the following urls:
https://api.company-information.service.gov.uk/document/MzQyMTE1NzUxMGFkaXF6a2N4/content
blank data returned, but no error. This is the url shown in the documentation.

or even
https://document-api.companieshouse.gov.uk/document/MzQyMTE1NzUxMGFkaXF6a2N4/content
gives error: {‘error’: ‘Invalid document ID’, ‘type’: ‘ch:service’}

Yet if I put the transaction_id into a a link for an online search a the companies house website, it shows the PDF without a problem (I can’t paste a third url in here as a new user, but it’s on the main search page and starts find-and_update.company-information.service.gov.uk)

Any advice welcome to get the metadata.

I would ideally like to get to the XBRL data, rather than PDFs.

Thank you!

See this article I put together on accessing XBRL documents through the document API: Filings Document API | CH Guide .
I believe your issue may be attempting to use the transaction ID of the filing in the URL for accessing metadata or content. As you can see in the following example, there is a different ID for the filing history endpoint and the metadata/content:

"links": {
       "self": "/company/14350376/filing-history/MzM5MzQwMzczMGFkaXF6a2N4",
        "document_metadata": "https://frontend-doc-api.company-information.service.gov.uk/document/6SiXtBV4PIRQ0ybL6I7EbPBkgiOEucMrIaz0i6IoKDM"
     }
1 Like

This seems to be the issue Brian. Many thanks.

I can now pull the metadata from the filing URLs.

I can get a table of the metadata with links. Seems a bit strange that I can’t yet find a compamy with XBRL data? Only PDFs it seems. I thought XBRL was a mandatory requirement to be filed with Companies House?

Tony.

I think you may be able to find example filings with XBRL data by reference to the Companies House Accounts bulk data set here? I’m not 100% sure on that though!

The document metadata should show you if / when a filing is available in XBRL. You can get that from the AWS servers by setting the appropriate mime type (I think it’s usually "application/xhtml+xml") in the http “Accept” header that you send when making the document request. There is some info on that and an example filing with XBRL data in the following post:

Another example here:

Hope this helps.

Thank you voracityemail, I will check the data set.

Yes I thought I saw one item listed with application/xhtml+xml…but I couldn’t find it again to try and view!

Tony.