Document API Implementation

deepti_penneru · July 25, 2018, 9:34am

We are implementing Documents API functionality in our portal. The Get document metadata method returns
“The remote server returned an error: (404) Not Found.”
Please let me know wrong here?

voracityemail · July 22, 2018, 9:15pm

(FYI You shouldn’t need to post your API key / authentication header here…)

It looks like you’re using a filing-history ID instead of a document ID.

If you found the ID by requesting the filing history list (or a filing history item) you should get an object containing a “links” object:

"links": {
    "**document_metadata**":  "**https://frontend-doc-api.companieshouse.gov.uk/document/-VZAbtNU3pwHHac-8_g_eEG6m_aVxu4OgHf-w813038**",
    "self": "/company/NF004299/filing-history/MzExNTAyNDk5M2FkaXF6a2N4"
 }

For the document metadata use the “document_metadata” link (see note below). The “self” path is for that particular filing history item. You may have come across a “transaction_id” field but again this is a reference to the filing history item not the document ID.

(As per the document API documentation you can request “https://document-api.companieshouse.gov.uk/document/-VZAbtNU3pwHHac-8_g_eEG6m_aVxu4OgHf-w813038” and that should work fine).

Was this what you were trying to look up?

The document ID document/MzE1MDgwNzM4NWFkaXF6a2N4 doesn’t seem to be valid. It actually looks more like a filing-history ID or transaction_id (e.g. MzExNTAyNDk5M2FkaXF6a2N4).

Full example:
if I request filing history for a company (random example - restricting this to one item for simplicity, using curl):

curl -uMY_API_KEY_HERE: “https://api.companieshouse.gov.uk/company/NF004299/filing-history?items_per_page=1”

Response is:
{ “items_per_page”:1,“total_count”:5,“items”: [
{
“links”: {
“document_metadata”: “https://frontend-doc-api.companieshouse.gov.uk/document/-VZAbtNU3pwHHac-8_g_eEG6m_aVxu4OgHf-w813038”,
“self”: “/company/NF004299/filing-history/MzExNTAyNDk5M2FkaXF6a2N4”
},
“transaction_id” : “MzExNTAyNDk5M2FkaXF6a2N4”,
… some values snipped …
}
],
“start_index”:0,“filing_history_status”:“filing-history-available” }

For the document metadata you need to request the part in bold ( field links.document_metadata) which you need to make a request for.

If you wanted the filing history detail you’d use the URI fragment in “links.self” and append that to the end of “https://api.companieshouse.gov.uk”. (As of when I checked last this didn’t give you any extra information that you don’t already get from the filing history list.)

Further info: your request would give you a 404 as you say and the body also helps a little here - from your example I get back:

{ “error”: “Invalid document ID”, “type”: “ch:service” }

deepti_penneru · July 26, 2018, 12:36pm

Thanks for the detailed response.Do we get file location from Get Document method response to download the document?
Please let me know the GetDocument method response parameters that requests http://document-api.companieshouse.gov.uk/document/{id}/content

Thanks and Regards,
Deepti

deepti_penneru · July 26, 2018, 5:50pm

I am getting location header now, but when I access the amazonaws url of the pdf document got access denied error.
Please help me out here.

Thanks and Regards,
Deepti

voracityemail · July 27, 2018, 9:09am

Good that you managed to answer your own question e.g. how to download documents for a given company number e.g.:

Get the filing history
For a given filing in the list, locate the “links” object, “document_metadata” member to get the metadata link
Request the metadata link to get the document metadata object.
Select your chosen data format (pdf, xml etc.), set the appropriate “Accept” http header, add “/content” on the end of the metadata link and request this to get the actual file.

So you’re now at the last point in list above.

amazonaws url of the pdf document got access denied error.

Most likely this is because you’re still sending the Companies House http basic authentication header to Amazon. You need to send this header with each request up to this point, but when you make the last request for the amazonaws url you shouldn’t. This is because:

Amazon don’t use it (don’t send your “passwords” to others) - the documentation could be clearer here.
Amazon do have an authorisation mechanism but it is done using the query string parameters (on the end of the url). If you also send an http authorisation header it will confuse Amazon.

See my response at:

For a general overview of downloading documents see my response at:

And do make use of the search facility on this forum, it’s helped me…

deepti_penneru · July 30, 2018, 1:44am

Hi,

I am trying to get Document types using Get Document Meta data call to check available types.The document says resources.{content_type} parameter (i.e.https://developer.companieshouse.gov.uk/document/docs/document/id/documentMetaData-resource.html), but the response returns as below example

Fetch a document’s metadata
GET /document/gh438fghd09euthg829/metadata HTTP/1.1
Host: document-api.companieshouse.gov.uk
Accept: application/json
Authorization: Basic bXlfYXBpX2tleTo=

HTTP/1.1 200 Found
Access-Control-Allow-Origin: *
Connection: close
Content-Type: application/json; charset=utf-8

{
“transaction_id”: “”,
“company_number”: “01234567”,
“barcode”: “X1234567”,
“significant_date”: “2012-02-29T00:00:00Z”,
“significant_date_type”: “made-up-date”,
“category”: “accounts”,
“created_at”: “2014-07-17T15:18:35.604259447Z”,
“etag”: “”,
“links”: {
“self”: “/document/gh438fghd09euthg829”,
“document”: “/document/gh438fghd09euthg829/content”,
}
“resources” : {
“application/pdf” : {
“content_length” : 442176,
},
“application/xhtml+xml” : {
“content_length” : 234122,
}
}
}

When I deserialise the object, the resource is returning null.Could you please provide information on this?

Thanks and Regards,
Deepti

voracityemail · July 30, 2018, 9:13am

Two issues here.

Point 1
The example is not correct because the JSON response it lists is not valid JSON (try running it through a formatter / validator). So if you tried this, your deserialise tool will likely fail - and my JSON deserialiser certainly returns null if it can’t process. Companies House documentation is extensive but not always current (at the moment). There are also some errors. The fetch document metadata documentation does indeed contain what you describe:

{
“transaction_id”: “”,
“company_number”: “01234567”,
“barcode”: “X1234567”,
“significant_date”: “2012-02-29T00:00:00Z”,
“significant_date_type”: “made-up-date”,
“category”: “accounts”,
“created_at”: “2014-07-17T15:18:35.604259447Z”,
“etag”: “”,
“links”: {
“self”: “/document/gh438fghd09euthg829”,
“document”: “/document/gh438fghd09euthg829/content”, ← comma here incorrect
} ← comma needed here
“resources” : {
“application/pdf” : {
“content_length” : 442176, ← comma here incorrect
},
“application/xhtml+xml” : {
“content_length” : 234122, ← comma here incorrect
}
}
}

So it should be:

{
“transaction_id”: “”,
“company_number”: “01234567”,
“barcode”: “X1234567”,
“significant_date”: “2012-02-29T00:00:00Z”,
“significant_date_type”: “made-up-date”,
“category”: “accounts”,
“created_at”: “2014-07-17T15:18:35.604259447Z”,
“etag”: “”,
“links”:
{
“self”: “/document/gh438fghd09euthg829”,
“document”: “/document/gh438fghd09euthg829/content”
},
“resources”:
{
“application/pdf”:
{
“content_length”: 442176
},
“application/xhtml+xml”:
{
“content_length”: 234122
}
}
}

One for Companies House to fix when they next update their documentation ( @MArkWilliams )

Point 2
It wasn’t quite clear (to me) if you had just tried the example or if you had tried actual data from the API. If you’ve tried this on actual data from CH and have an issue the first task is for you to investigate:

What data you’re getting (including any relevant http codes and headers).
What you’re doing with it (any processing).
That the tool you’re using to process the JSON is working and you’re using it correctly.

For others to help, they’d probably want a note of that (or assurance you’d checked).
If any of the following are the case, it might be an issue with Companies House API (or their docs):

You get an inappropriate response. You’re confident your request is correct and of what it should return but you get back something unexpected e.g. http 404 not found / 500 errors [ most times ] or get back e.g. HTML instead of JSON, or invalid JSON.
You get a valid response but it’s really different from what’s expected (e.g. different object, main fields are incorrect types etc.)

In this case, it would help if you reported data e.g. what call you were using, what data you received etc. (but not your API key).

Sounds like you’re nearly there though!

deepti_penneru · July 30, 2018, 12:18pm

Thanks for the quick response.It looks like the resource null error while deserilaizing json into a class object.I am getting the Json response like the above example.But, the json resource have “application/pdf” and cannot create appropriate property name in c# class object.
Example: I have tried http://json2csharp.com/ to create properties of class using the above JSON response mentioned by you.
Hope this information helps!

public class Links
{
public string self { get; set; }
public string document { get; set; }
}

public class ApplicationPdf
{
public int content_length { get; set; }
}

public class ApplicationXhtmlXml
{
public int content_length { get; set; }
}

public class Resources
{
public ApplicationPdf __invalid_name__application/pdf { get; set; }
public ApplicationXhtmlXml __invalid_name__application/xhtml+xml { get; set; }
}

public class RootObject
{
public string transaction_id { get; set; }
public string company_number { get; set; }
public string barcode { get; set; }
public DateTime significant_date { get; set; }
public string significant_date_type { get; set; }
public string category { get; set; }
public string created_at { get; set; }
public string etag { get; set; }
public Links links { get; set; }
public Resources resources { get; set; }
}

voracityemail · July 30, 2018, 8:36pm

Yes, as it says in at the C# class creator site you linked to, not all JSON object names are valid class / member names in C#. You will find that in many languages.

Naming
The rules for identifiers in C# will not accommodate the sequences of characters that are allowed in JSON. For example, { “0” : “foobar” } is completely valid JSON, but a C# property with identifier “0” will not compile. Further, objects or properties in your JSON may have names that conflict with .NET framework types that may also lead to compiler errors.

If I recall correctly (you can check) - CH error objects also have some variable mappings, but I think they’re less likely to include troublesome characters.

I think I can’t help you further, I’m not a user of .NET, C# etc. You’ve some implementation choices to make. Maybe now is the time to go back and consider another way to achieve the end result. Or if you stick with what you’re using you’ll maybe be able to find a way to specify or change the mappings of whatever JSON to C# class de-serialiser you’re using, or parse these objects yourself, or intercept the JSON and re-map the “invalid” names or … the choice is yours.

deepti_penneru · July 30, 2018, 10:29pm

Ok.Thanks for the reply.What are the document types JSON response will return other than “application/pdf”?

voracityemail · July 31, 2018, 9:02am

The documentation you should read (although it might be a good idea to start at the beginning to get an idea of the whole) is at:

https://developer.companieshouse.gov.uk/document/docs/document/id/documentMetaData-resource.html

See the resources.{content_type} parameter.

deepti_penneru · July 31, 2018, 3:31pm

Sorry for the trouble.Now I have managed to get content type from JSON response in c#.Can you please provide any sample company name or number that have documents other than pdf type.I would like to test other document types, but couldn’t find the right company that have xml,csv etc.

deepti_penneru · August 1, 2018, 11:28am

@voracityemail Please provide some sample company names that have non pdf document types in the Get Company Meta data.

Thanks and Regards,
Deepti

voracityemail · August 1, 2018, 11:55am

It’s a good question but sorry - I’m not part of Companies House and don’t have lists of these. It would be nice if there were such lists of examples, but you might be able to find them by searching on this forum e.g. I’d look for XBRL or XML.

I did find one (which I’ve checked):
For company 00197009, the filing history item:

https://api.companieshouse.gov.uk/company/00197009/filing-history/MzE5Mzc0OTc3MGFkaXF6a2N4

If you request the metadata from this with:
https://document-api.companieshouse.gov.uk/document/T53BLYf734zxeBWyvna131JtREqLsBgclFME-v6rxI8

…you get:

{
“company_number”:“00197009”,
“links”: {
“self”:“https://document-api.companieshouse.gov.uk/document/T53BLYf734zxeBWyvna131JtREqLsBgclFME-v6rxI8”,
“document”:“https://document-api.companieshouse.gov.uk/document/T53BLYf734zxeBWyvna131JtREqLsBgclFME-v6rxI8/content”
},
“resources”:{
“application/pdf”: { “content_length” : 26343 },
"application/xhtml+xml": { “content_length” : 20364 }
}
… other parts omitted …
}

Unfortunately I can’t help you with understanding or parsing whatever data this returns as I don’t use this. I’m sure searching on the forum may help you if you needed…