500 error when getting document

Using Google Apps, I am able to query the filings history of a company and get document meta data, when I then try and fetch the document I get “ Exception: Request failed for https://document-api.companieshouse.gov.uk returned code 500”.

The document is available if I go through CH website, plus, I have tried it on various different companies/filings

It is failing on the UrlFetchApp:

function getDocumentContents() {
var documentUrl = “https://document-api.companieshouse.gov.uk/document/HOaGqrNBOC3Amh7dF-H3qDnMg3FPit0CBRroJrrUAMs/content”;
var response = UrlFetchApp.fetch(documentUrl, {
headers: {
Authorization: “MYKEY”,
Accept: “application/pdf”
}});

var blob = response.getBlob();
}

Presumably this is in Google Apps Script?

I don’t know the answer but there are some quirks to fetching the document data:

a) When you request the content you’ll get at least one redirect. According to the UrlFetchApp documentation the default is to follow redirects. However it’s possible that this is causing you problems because …

b) You are currently redirected to AWS where the documents are hosted. AWS have their own authorization scheme and so if the GScript is passing the Companies House authentication header to them that will cause issues.

c) … so you may have to tell GScript NOT to follow redirects, parse the redirect to find out where it’s going to and then make a separate request to that URL without the Companies House http Basic Authorization header.

It does appear that you’re correct in requesting application/pdf - most documents are but it’s always a good idea to request the document metadata first (if you’re not already doing so) and check what formats are available / use whatever link to the data they provide there.

For more detail see e.g. my reply posted in this thread (they also got a 500):

Hope this helps.

Thanks for taking the time to reply.

I have set “followRedirects: false” in my headers, so it (should) not be redirecting yet I am still getting a 500 error.

Does anyone have a document (url) they know works through their script I could test with?

Did you try my example in the thread? (It’s old but probably still works)

I would definitely try to step through this manually using e.g. curl, so you can see exactly what url / authentication data you are supplying and exactly what responses you get back. Once you can do this manually it’s a matter of working out what’s happening differently in your language / library.

Also it looks like there’s a way of examining what Google apps is actually sending which may help - see here:

(
From the last reply at this stackoverflow: google apps script - Examine raw HTTP request from UrlFetchApp.fetch - Stack Overflow
)

Hope this helps.

Hi @voracityemail - sorry for the delay in replying - I have been diligently going through your example. Given I did not know Curl I had to learn that and I think that was both good and bad.

On the plus side I have now managed to get the file I need via a Curl command which means I need to translate that into Google Apps. e.g there is a redirect on my “Curl https://document-api.companieshouse.gov.uk/document/jANs6mWaWVOvb0Dd292JxysyB_l-clxHyQ-6uNHuH0A/content -u MY_KEY: -k -v” when using ‘-k -v’ the redirect shows but my Google Apps script which is supposed to report redirects is not showing it.

The redirect also gave me some challenges but I have overcome them in Curl - just need to translate into GA.

When I have it all working in a GA script I am happy to share it here - if anyone needs it.