Haven’t come across that one. I suspect that’s how your own system / network / proxy / tool is interacting with things but don’t know. Someone might be able to help if you posted what tool / language / environment you were using, what calls to the API you were making and details of the response. Don’t post your own API key though! e.g.
Using curl from the command line on our production server and our ‘live’ key we request to get the company profile as follows: curl -u OUR_API_KEY: ‘https://api.company-information.service.gov.uk/company/NF004299’ and get no body. Looking at the header we send / response we get back with curl -v -u OUR_API_KEY: ‘https://api.company-information.service.gov.uk/company/NF004299’ we get a 407 …
(I always recommend playing around with something really simple and interactive like curl to eliminate basic issues like incorrect API key / proxy issues / details of the language or tool you’re wanting to use etc.)
Initially, like the XML Gateway and WebCHeck I think all filings were in PDF format only **. There are occasional older filings where no image is available at all. However if there is something I think there’s always a PDF version. Some financial filings may be available also in XBRL as mentioned. The documentation here:
https://developer-specs.company-information.service.gov.uk/document-api/resources/documentmetadata?v=latest
… says "Available content types are application/pdf
, application/json
, application/xml
, application/xhtml+xml
and text/csv
"
We’ve not yet encountered a CSV but haven’t specifically been looking for these!
I’m not aware that there’s any way of knowing in advance what will be available from the API - you get what you find. You might be able to decide where these will be available by reference to the bulk accounts data (links in my first post above) but I don’t know and that’s not something we’ve ever looked at.
** Note that the PDFs all appear to be raster images. It’s possible that newer ones (which appear to be computer-generated rather than scanned paper copies) have text data within them. However the last time I checked (years ago) they seemed to be raster. So you’d likely have to OCR if you wanted to scrape text.