Authorization failed

dhavalicfai · June 5, 2021, 1:17pm

Do we need two different API key for authentication. I got basic authentication successfully , while getting error 401 to fetch document. not able to get document id through company history other details are available.

voracityemail · June 6, 2021, 3:00pm

Welcome to the forum.

Do we need two different API key for authentication

No. If you can definitely get the Company profile information and / or Filing history information then it would seem your API key / authentication is working. For more information about how to find document information and download documents see the following:

From your previous post:

… it looks like you’re using a particular tool (Blue Prism - which I don’t have any knowledge of) to access the API. You’ll need to work out exactly how to use that to make these requests. The Companies House API is a rather simple REST API however so it should not be too difficult.

I got basic authentication successfully …

That’s good - many people seem to struggle with the http Basic authentication. As always I recommend checking this via something really simple like using curl from the command line. If you do that then you can see exactly what you’re requesting e.g. what http data is being sent - and what data you get in any response(s). It also makes it simple to post that information to this forum so people can see what the problem is and help you!

Problems with http basic?

https://developer.company-information.service.gov.uk/authentication

dhavalicfai · June 7, 2021, 11:32am

Hi,

Link provided in answer is not working. I have attached screen shot of link.

voracityemail · June 7, 2021, 12:08pm

Yes, Companies House have changed where their documentation is sited since that post. (Google and searching the forum are also available to you too as well as just asking questions!)

I’ve updated this section of the post you mention so it now points to the new locations. You can see it below:

Get the filing history
For a given filing in the list, locate the “links” object, “document_metadata” member to get the metadata link
Request the metadata link to get the document metadata object .
Select your chosen data format (pdf, xml etc.), set the appropriate “Accept” http header, add “/content” on the end of the metadata link and request this to get the actual file.

For reference - the main Companies House documentation area is now:
https://developer-specs.company-information.service.gov.uk/

The main REST API documentation is at:
https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/reference

The document API is at:
https://developer-specs.company-information.service.gov.uk/document-api/reference

dhavalicfai · June 7, 2021, 12:28pm

i use this link to get document metadata
https://document-api.companieshouse.gov.uk/document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w/content

i got 401 error
i want to ask one thing .
bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w is document_id??

voracityemail · June 8, 2021, 9:41am

Thanks for the information - that’s a simple one to fix. The current document metadata link is of the following form:

GET https://document-api.company-information.service.gov.uk/document/{document_id}

(the old link https://document-api.companieshouse.gov.uk/document/{document_id} works also)

…but you say you are requesting:

https://document-api.companieshouse.gov.uk/document/{document_id}/content

That is the link for the actual document content, not the metadata.

Using curl (again - I recommend this as it’s really simple to play about with) I just tried this and get the following:

curl -uMY_API_KEY_HERE “https://document-api.companieshouse.gov.uk/document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w”
{
“company_number”: “00966425”,
“barcode”: “AZ80ZJXW”,
“significant_date”: null,
“significant_date_type”: “”,
“category”: “miscellaneous”,
“pages”: 168, “filename”: “”,
“created_at”: “2015-03-19T22:12:01.324329702Z”,
“etag”: “”,
“links”: {
“self”: “https://document-api.companieshouse.gov.uk/document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w”,
“document”: “https://document-api.companieshouse.gov.uk/document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w/content”
},
“resources”: {
“application/pdf”: { “content_length”: 6592335 }
}
}

You might as well use the current recommended URI - it gives the same result:
https://document-api.company-information.service.gov.uk/document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w

Now if you ask for the content (file data) itself, you’ll first get a redirect from Companies House. This is simplest to see with curl again:

curl -v -uMY_API_KEY_HERE “https://document-api.company-information.service.gov.uk/document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w/content”

About to connect() to document-api.company-information.service.gov.uk port 443 (#0)
…(I’m removing some of the output lines here)

GET /document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w/content HTTP/1.1
… (skipping some more lines until the response, which is:)
< HTTP/1.1 302 Found
< Date: Tue, 08 Jun 2021 08:51:41 GMT
< Location: https://s3.eu-west-2.amazonaws.com/document-api-images-live.ch.gov.uk/docs/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w/application-pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=…&X-Amz-Date=20210608T085141Z&X-Amz-Expires=60&X-Amz-Security-Token=…

I’ve cut out a lot of the url location here but it has this general form.

You can test this manually with curl to see what’s going on. You can use the command below to see if you can download the document. You probably want to redirect output to a file here too! The -L flag requests that curl follow redirects and the -H flag allows you to pass in a http header line - in this case to say you want the PDF file type. (You’d find what types are available by examining the document metadata):

curl -L -H “Accept: application/pdf” -uMY_API_KEY_HERE “https://document-api.company-information.service.gov.uk/document/bMscXpz0zF_an0GAKSQTrtXwZ9dwMVQaj-dfB3g1n_w/content”

…or you can do this in two parts:

Request the document and receive the redirect header with link to Amazon - but don’t follow it.
Request this link to get the content. Do not supply your API key / http Basic details though when you do so. You may also need to send the curl -L flag in case Amazon redirect internally on their site. I can’t remember if they do that.

Why do you get a 401? This might be happening because your system is getting the redirect and following it but still sending the Companies House http basic authentication header to the Amazon url above. This is not correct because:

You’re not on Companies House domain any more so you shouldn’t be sending passwords appropriate to them to another site.
Amazon provides authentication / authorization in their url so by sending the http authorization you’re effectively sending two sets of authorization details. That confuses the Amazon servers.

This is explained in the posts I linked to.

dhavalicfai · June 9, 2021, 11:17am

Thanks it works now. I got link and I can open it in browser but if I download through .net code (Using wc As New System.Net.WebClient()
wc.DownloadFile(Url, Path)
End Using)

Getting this error: Internal : Could not execute code stage because exception thrown by code stage: An exception occurred during a WebClient request.

One more question: can I get list of companies no. which files document today?

voracityemail · June 10, 2021, 1:42pm

I can’t help you debugging your .net code, but I can help with your last question:

can I get list of companies no. which files document today?

And the answer is - sort of. There’s no direct query for this. (There may be some 3rd party tools / lists that can do this for you but I don’t have any direct links). However you can do the following:

Consume data from the Companies House streaming API. There is a specific stream for filings. So whenever there is a filing you’d receive that information.
If you were interested in a (probably small) number of specific companies you can sign up for email updates (Follow) for them.

Good luck.