Help with Document API & Fetch a Document

itdev · September 4, 2015, 8:05am

Hi there,

I was wondering if anyone has had any luck with the “Fetch a Document” within the Document API (https://developer.companieshouse.gov.uk/document/docs/document/id/content/fetchDocument.html). The documentation is not very helpful:

Authorization This header parameter contains the token_type and the access_token. See example

Does anyone know what the token_type and access_token are?

On the explore API section it asked for “id” (Which I know), “Authorization [sic]” (Which I don’t know) and Accept (Which I know)

Any help would be greatly appreciated.

Paul

andrew_murphy · September 4, 2015, 3:50pm

Paul,

Assuming you have identified the document you want from the filing history response, the steps are:

Set your mandatory HTTP request headers.

Accept: application/pdf
Authorization: Basic your_encoded_key_goes_here
Invoke https://document-api.companieshouse.gov.uk/document/{transaction_id}/content where transaction_id relates to the document of interest in filing history response. E.g.

https://document-api.companieshouse.gov.uk/document/n1EjP_MALLs8xZp5hs86iHcYDli0TE-n6t4HUDeZuq8/content
Retrieve the location header from the HTTP response. The headers you get back will include:

Content-Type: text/plain; charset=utf-8
Date: Fri, 04 Sep 2015 15:18:36 GMT
Location: https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/n1EjP_MALLs8xZp5hs86iHcYDli0TE-n6t4HUDeZuq8/application-pdf?AWSAccessKeyId=blahblahblahblah_plus_other_attributes
Server: nginx/1.6.2
X-Ratelimit-Limit: 600
X-Ratelimit-Remaining: 599
X-Ratelimit-Reset: 1441380216
Content-Length: 0
Connection: keep-alive

Note the response body is empty if all goes well!
Connect to the https location and download your document.

I’ve avoided technical details but have successfully implemented this in Java.

I hope this helps.

Andrew

csmith · September 7, 2015, 3:59pm

A note for step (2):

A client should be following links.document_metadata URL of the filing history to get to a document, and not have to construct URL’s on the fly. Should we change the ID encoding, then client’s that construct their own URLs will break.

So, in the short term, take the links.document_metadata URL from the filing history, and append /content onto the end. This will return the default PDF, but if PDF is not available, you’ll have to deal with the failure. Calling the document_metatdata endpoint first allows you to query which document types are available before you request a particular form (possible types will be PDF, XBRL, iXBRL (coming soon), plus others in the future).

If a direct link to the default document is required, then we will look to adding this to the filing history links: {} sub-document. Client side (re)manipulation of URL’s is not desirable:

Also see the following discussion:

andrew_murphy · September 7, 2015, 11:22pm

Thanks for the clarification. I’ve learnt something new today

itdev · September 8, 2015, 8:35am

Thanks for the information, quick question, what do you mean by “your_encoded_key_goes_here”? Do you mean our API key or does it need to be encoded in some way?

csmith · September 8, 2015, 12:13pm

The API is authenticated using BASIC authentication, where the username is your API key, and the password is blank:

See also:

This is described on the Developer Hub page, with an example:

anup_a · March 13, 2019, 2:18pm

Hi,

I’ve been trying to connect to the get document API in java and failing with 500 Internal Server Error. Surprising is I could connect to the search companies and Filling History APIs successfully. Please see below analysis.

Search companies SUCCESS

HttpClient client = new HttpClient();
GetMethod pm = new GetMethod(“https://api.companieshouse.gov.uk/search/companies?q=MICROSOFT”);
pm.addRequestHeader(“Authorization”, “*******”);

2. Filling History SUCCESS
HttpClient client = new HttpClient();
GetMethod pm = new GetMethod(“https://api.companieshouse.gov.uk/company/01624297/filing-history”);
pm.addRequestHeader(“Authorization”, “*******”);

3. Document API FAILED (500 Internal Server Error)

HttpClient client = new HttpClient();
GetMethod pm = new GetMethod(“http://document-api.companieshouse.gov.uk/document/MzIyNzQ2OTQ2OWFkaXF6a2N4/content”);
pm.addRequestHeader(“Content-Type”, “application/json”);
pm.addRequestHeader(“Accept”, “application/pdf”);
pm.addRequestHeader(“Authorization”, “************”);

Response Headers

Content-Type:text/plain; charset=utf-8
Date:Wed, 13 Mar 2019 14:14:08 GMT
Server:nginx/1.12.1
Content-Length:0
Connection:keep-alive

Can Someone help me on this?

Thanks,
Anup

jack_lewis · November 20, 2019, 4:09pm

My python implementation of downloading the most recent accounts data where the type is a pdf document:

#List of company numbers
import chwrapper

company_numbers = ['']
search_client = chwrapper.Search(access_token=COMPANIES_HOUSE_API_KEY)
for company_id in company_numbers:
    #print (company_id)
    response = search_client.filing_history(company_id)
    resp_json = response.json()
    #print (resp_json)
    first_accounts_dict = next((file for file in resp_json['items'] if (file['description'] == 'accounts-with-accounts-type-group' or file['description'] == 'accounts-with-accounts-type-total-exemption-full')), None)
    first_xbrl_trans_id = first_accounts_dict['transaction_id']
    if first_xbrl_trans_id is not None:
        url_2 = f"https://beta.companieshouse.gov.uk/company/{company_id}/filing-history/{first_xbrl_trans_id}/document"
        with open(f'{company_id}.pdf', "wb") as pdf_file:
            response_2 = requests.get(url_2, auth=('COMPANIES_HOUSE_API_KEY', '',), headers={"Accept": "application/pdf"})
            pdf_file.write(response_2.content)

miroslaw_storoniak · January 29, 2020, 7:49pm

according to this request: https://api.companiseshouse.gov.uk/company/https://api.companiseshouse.gov.uk/company/00002065
from where I should collect this number (00002065 ) ?

miroslaw_storoniak · February 4, 2020, 2:46pm

Topic solved. Unfortunatelyresponse.pdf.txt (19.2 KB) collected document has to less data (delete .txt to see pdf file).

miroslaw_storoniak · February 4, 2020, 2:47pm

Is it possible to collect document with company information \ officers list through API ?

I found information about document API on website (https://developer.companieshouse.gov.uk/document/docs/), however document, which I received, doesn’t have sufficient information (please refer to ‘response.pdf’ file in attachment above).

I would be very grateful for any advice and answer, whether It is possible to receive other documents through API.

voracityemail · February 6, 2020, 4:28pm

Is it possible to collect document with company information \ officers list through API?

In general that’s not a feature of the API - for reasons given below.

The way to get the information you want is either:

via the API (e.g. request Company Profile and Officers List + anything else you want).
via a “data product” e.g. one of the data dumps from Companies House.

If you’re interested in ongoing changes you might want to look into the streaming API instead.

Your example is a confirmation statement (this one I think - “Confirmation statement made on 30 August 2019 with no updates”). It would seem to be correct as it is. Confirmation statements replaced “Annual Returns”. These were more like what you seem to be expecting e.g. they summarised information about a company at that point and were required each year. Here’s the link to the last one from your example company:
https://beta.companieshouse.gov.uk/company/SC327000/filing-history/MzEzMjQ1NjQzN2FkaXF6a2N4/document?format=pdf&download=0

For your example company look back in time and find the first example “with updates” - e.g. “Confirmation statement made on 30 August 2018 with updates”. You might expect to find a note of any updates during the reporting period. However rather unhelpfully it says “all information…either has been delivered or is being delivered”. So you won’t even see updates over the period gathered in one document. You need to use the API.

The system now works like this (caveat - this is just my understanding and I’m neither a lawyer or member of Companies House):

The Companies House dataset - e.g. what you see via the API / the CH Beta website - is the main “record”.
Companies must inform Companies House if there are certain changes to their status.
This is normally done on separate filings through the reporting period.
If there are changes during the reporting period the company submits a confirmation statement “with updates”. (There are a couple of changes which can be reported on the confirmation statement I think).
If there are no changes, the company submits a confirmation statement like the one you found.

The overall issue here is that “ease of access to the data via the API” is not the same as “it’s easy to answer (some question) about a company / officer”. For many questions you need to understand something of the law as it relates to companies, reporting requirements and the role of Companies House. Given that this is a free service it may not be so suprising that the onus is on users to work out some of this information for themselves, including dealing with issues in the data itself…

tonyjolly21 · May 22, 2024, 11:29am

Hello,

I’ve been trying unsuccessfully to get the document metadata in order to pull up various accounts.

I can successfully run other get requests in python with API authorization.
I can get the filing history for a company and find in there the “relevant transaction_id” and also the links.document.metadata (metadata link I assume).

However, I am not able to access the metadata link!

I have tried this with the same formatted get request in python using the following urls:
https://api.company-information.service.gov.uk/document/MzQyMTE1NzUxMGFkaXF6a2N4/content’
blank data returned, but no error. This is the url shown in the documentation.

or even
https://document-api.companieshouse.gov.uk/document/MzQyMTE1NzUxMGFkaXF6a2N4/content
gives error: {‘error’: ‘Invalid document ID’, ‘type’: ‘ch:service’}

Yet if I put the transaction_id into a a link for an online search a the companies house website, it shows the PDF without a problem (I can’t paste a third url in here as a new user, but it’s on the main search page and starts find-and_update.company-information.service.gov.uk)

Any advice welcome to get the metadata.

I would ideally like to get to the XBRL data, rather than PDFs.

Thank you!

ebrian101 · May 22, 2024, 2:44pm

See this article I put together on accessing XBRL documents through the document API: Filings Document API | CH Guide .
I believe your issue may be attempting to use the transaction ID of the filing in the URL for accessing metadata or content. As you can see in the following example, there is a different ID for the filing history endpoint and the metadata/content:

"links": {
       "self": "/company/14350376/filing-history/MzM5MzQwMzczMGFkaXF6a2N4",
        "document_metadata": "https://frontend-doc-api.company-information.service.gov.uk/document/6SiXtBV4PIRQ0ybL6I7EbPBkgiOEucMrIaz0i6IoKDM"
     }

tonyjolly21 · May 22, 2024, 4:42pm

This seems to be the issue Brian. Many thanks.

I can now pull the metadata from the filing URLs.

I can get a table of the metadata with links. Seems a bit strange that I can’t yet find a compamy with XBRL data? Only PDFs it seems. I thought XBRL was a mandatory requirement to be filed with Companies House?

Tony.

voracityemail · May 22, 2024, 5:12pm

I think you may be able to find example filings with XBRL data by reference to the Companies House Accounts bulk data set here? I’m not 100% sure on that though!

The document metadata should show you if / when a filing is available in XBRL. You can get that from the AWS servers by setting the appropriate mime type (I think it’s usually "application/xhtml+xml") in the http “Accept” header that you send when making the document request. There is some info on that and an example filing with XBRL data in the following post:

Another example here:

Hope this helps.

tonyjolly21 · May 22, 2024, 5:29pm

Thank you voracityemail, I will check the data set.

Yes I thought I saw one item listed with application/xhtml+xml…but I couldn’t find it again to try and view!

Tony.