Issues retrieving application/xhtml+xml account data

maas_peter · April 8, 2020, 11:58am

I am executing the following sequence:

GET company/filing-history/transaction_id/

from the response, I extract links$document_metadata

GET metadata, Accept = “application/xhtml+xml”, Authorization = "Basic "+ auth_code

from the response I get the following fields:
links$document
resouces$application/pdf
resources$application/xhtml+xml

Both resources have non-zero length.

I then try:
GET links$document Authorization = "Basic "+auth_code followlocation=FALSE
response status: 302 and no content

or

GET links$document followlocation=FALSE
response status: 401 and no content

or

GET links$document Authorization = "Basic "+auth_code followlocation=FALSE
response status 400
Body says: one auth mechanism allowed; only the X-Amz-Algorthim query parameter, Signature query string parameter or the Authorization header should be specified.

Any suggestions as to what I am doing wrong?

voracityemail · April 8, 2020, 8:42pm

Short: you were almost correct with either:

…Or

(By the way, I’m guessing that should read “followlocation=TRUE”?)

You can get this to work by following your first case, and then adding an extra step. In the first case you correctly retrieved the first response you received - a http redirect (302). If you examine the http headers you’ll find the “Location” header points to (currently) Amazon AWS - this is where the resource is actually stored. Your code should then request this location (from Amazon), follow location / links and you do not send the Companies House http “Authorization” header. That is just for Companies House, not Amazon.

In the last case it seems that the system has done as above but it has sent the Companies House http “Authorization” header. That’s why you get the message from Amazon about “only one auth mechanism”. If you examine the link (the Location header) you’ll see it’s got Amazon’s own authorization mechanism in there. So you’re effectively sending two different “passwords” hence the error.

In the middle case you’re just trying to connect to Companies House without an API key so it returns an error (401 unauthorized).

For a working through of details see my posts on the thread below. I think that’s still current:

voracityemail · April 8, 2020, 8:49pm

A detail on getting a particular type of resource.

You should send the Accept = “application/xhtml+xml” header later. The GET metadata call will just return the metadata about the document (in json, with mime type application/json). You should choose an appropriate type from the ones you get back e.g. here:

You should then send your Accept header (with one of those two types) on the GET document request.

maas_peter · April 9, 2020, 7:05am

Hi voracityemail -

First - thanks for your reply - it was exactly what I need to find the issue.

I am using the httr library in R.

RESP is the response to the query that generates the 302 response.

The location is tricky to locate:
RESP$all_headers[[1]]$headers$location

Requesting this URL returns the XML document corresponding to the accounts and I am able to browse the data.

Many thanks,
Peter Maas