Using curl to download document returns 404 error

seanbradley · July 17, 2017, 6:08pm

curl -I -u****************************************************: https://document-api.companieshouse.gov.uk/document/d-r3ceb--OvuiuCDmQjj0RLW2_ZvY75zT4B1tB40cq8/content

HTTP/1.1 404 Not Found
Content-Length: 19
Content-Type: text/plain; charset=utf-8
Date: Mon, 17 Jul 2017 17:16:42 GMT
Server: nginx/1.8.0
Connection: keep-alive

i’ve tried many documents, all return 404 error

voracityemail · July 26, 2017, 11:38am

AFAIK cURL sends a http HEAD request when you use the -I or --head flag. This works for requests to the main API (try “curl -I -u{your auth}: https://api.companieshouse.gov.uk/company/10869207/” - I think that’s the company (“TBC ENVIRONMENT SOLUTIONS LTD”) you wanted to access in your example?).

I suspect that the HEAD request isn’t supported for the documents API (https://document-api.companieshouse.gov.uk/) so this doesn’t work. I don’t know if that was a CH decision or it just didn’t get done!

There is however an --include (-i) flag for cURL which will use the GET request but will show the header:

curl -i -u{my-CH-ID}: https://document-api.companieshouse.gov.uk/document/d-r3ceb--OvuiuCDmQjj0RLW2_ZvY75zT4B1tB40cq8

…just worked for me.

You could just collect the header from the normal response as above or via whichever http library / functions are appropriate for you. I use PHP so use the cURL library for PHP which allows you to collect the header info separately when getting the main response.

The documents are certainly accessible (just downloaded the document above via our code, for example) - you just can’t get the header on its own with cURL -I.

Hope that helps.