API for company statements

Hi, firstly thanks for a useful forum, it has come in handy countless times!

I’m looking to get financial statements based on the company number, from the documentation I think this is doable but it asks for either a transaction_id or document_id. Firstly this will be time consuming and secondly I don’t know where they are, I wondered if anyone had undertaken a similar task and called the API for the latest statements in HTML format similar to the bulk downloads?

There are a few posts already but not one’s I folow the thread…any help appreciated!

Thanks, Darren

Welcome. There are actually a lot of posts showing how to do this. First question - when you say “financial statements” what exactly is it you want?

  • If you want annual accounts (or other financial filing) for particular companies (by Company Number): the way to do this is to request which of these are available using the Filing History API - you can filter by category too e.g. accounts, capital etc. The response from this endpoint is where you’ll have seen the transaction ID or document ID. The way to do this is not to focus on “how do I find those?” but just follow the links you get back. These include the link to the Document Metadata endpoint for the ones you want so you can request that. That in turn gives you the link to the Content endpoint to download the actual files. The process has an unusual step in that it redirects you away from Companies House and onto an AWS (Amazon) server but that’s covered in my posts below. Secondary question: are you happy with just PDFs (images) of these or do you want this (where available - often not) in XBRL / iXBRL format? Again the Document Metadata will list what formats are available.

  • If alternatively you want financials for ALL companies (or you want the numbers specifically and are prepared to filter some large datasets) you could try the bulk accounts data - see monthly data or the historic archive data.

  • If you want to get updated when a company files a financial statement rather than searching for existing data then you’ll want the Streaming API - the structures of the data sets are the same or very similar to the main REST API but the way you get this data is quite different.

(UPDATE: I have now added the direct links to Companies House documentation on the different endpoints as I’m no longer getting a 504 Gateway Timeout error from that site.)

It sounds like you want the first option - to download financial documents (filings) for a given company. In which case I suggest you follow my posts in the following threads. Don’t worry about what system is being used to access these e.g. Postman etc. The steps are the same.

If it was specifically XBRL data there is some info here in the thread below. It’s the same process - the only difference is to supply a http header (Accept) with the appropriate mime type (I believe this is “application/xml” for both but possible “application/xhtml+xml” for iXBRL) when finally requesting the data.

1 Like

Thanks Voracityemail, appreciate it!

You are indeed correct it’s individual finanical statement (accounts) based on company number. I’ve made a little bit of progress and have got to stage where I have doc IDs from links, I pass back to a get http request from the CH site but getting an odd SSL: Certificate_verify_failed and now a 407 Proxy Authentication Required error.

I’ve been reading through content but links have expired to where some of the useful info would have been available. I’ll keep trying, feels like I need to consider further authetication, any suggestions or further links?

Thanks again for taking time to respond, Darren

ps can you only get PDFs or XBRL / iXBRL (no html)?

Haven’t come across that one. I suspect that’s how your own system / network / proxy / tool is interacting with things but don’t know. Someone might be able to help if you posted what tool / language / environment you were using, what calls to the API you were making and details of the response. Don’t post your own API key though! e.g.

Using curl from the command line on our production server and our ‘live’ key we request to get the company profile as follows: curl -u OUR_API_KEY: ‘https://api.company-information.service.gov.uk/company/NF004299’ and get no body. Looking at the header we send / response we get back with curl -v -u OUR_API_KEY: ‘https://api.company-information.service.gov.uk/company/NF004299’ we get a 407 …

(I always recommend playing around with something really simple and interactive like curl to eliminate basic issues like incorrect API key / proxy issues / details of the language or tool you’re wanting to use etc.)

Initially, like the XML Gateway and WebCHeck I think all filings were in PDF format only **. There are occasional older filings where no image is available at all. However if there is something I think there’s always a PDF version. Some financial filings may be available also in XBRL as mentioned. The documentation here:

https://developer-specs.company-information.service.gov.uk/document-api/resources/documentmetadata?v=latest

… says "Available content types are application/pdf, application/json, application/xml, application/xhtml+xml and text/csv"

We’ve not yet encountered a CSV but haven’t specifically been looking for these!
I’m not aware that there’s any way of knowing in advance what will be available from the API - you get what you find. You might be able to decide where these will be available by reference to the bulk accounts data (links in my first post above) but I don’t know and that’s not something we’ve ever looked at.

** Note that the PDFs all appear to be raster images. It’s possible that newer ones (which appear to be computer-generated rather than scanned paper copies) have text data within them. However the last time I checked (years ago) they seemed to be raster. So you’d likely have to OCR if you wanted to scrape text.