401 Error: Accessing API through KNIME

Hi CH Devs (and community)

I have been trying to access the Companies House API through KNIME. I have created an App, and tested a ‘get company’ query using the website tools, which has worked fine. However when I try to access the API through KNIME, I return a 401 Error. Public API’s, such as the world bank, return data fine.

Is this a known problem - or at least a similar issue to something others are having?

Many thanks,
Neil

An Addendum: I’ve managed to contact the API using my key using the REST add-in for Firefox - so definately something about KNIME and the CH API playing nicely together. It’s developed in Java; is that likely to be an issue?

Neil

More: posted on KNIME forum here:

(Since no-one replied) I don’t know / use KNIME and don’t use java to access CH but but since you’ve got a 401 error I suspect the issue is the user authentication isn’t being sent / fields are being sent incorrectly / http authentication header is not right. The link you posted above asks me for my KNIME login to read it so I haven’t.

I don’t see why using java would come into it - unless you’ve worked out that the java compiler and/or runtime version has a bug? In which case tell Oracle too.

The only other comms. issue I’ve come across is the certificate chain not matching those supplied by CH - but that should give you a different error.

(From a very cursory look at the KNIME docs):
Presumably to get the data you’re using the IO → File handling → Remote → Connections → Remote connection node to get the data (e.g. as in https://www.knime.org/files/node-documentation/org.knime.base.filehandling.remote.connectioninformation.node.https.HTTPSConnectionInformationNodeFactory.html)? If so, that seems to allow authentication, but it doesn’t go into exactly how it’s doing that so I’d check the http header that’s getting sent. It should be as per the API documentation e.g. the username CH supply and a blank password.

(You might even be able to get KNIME to report on this as it’s got a IO → File handling → Remote → URI → Extract URI info node).

It looks (from my 10 minute scan) as if KNIME should be capable of communicating with CH. It even looks like there’s a node for an Amazon S3 Connection so you may be able to download document images / additional financial data should you wish.

And finally - and apologies if you gave detailed info in your link above: http://testthewebforward.org/docs/bugs.html

Good luck.
Chris

Hi Chris

Thanks for the response! I was starting to dispair. The KNIME forum link should be public, but basically details what I’ve set out above, but with a bit more detail as regards to KNIME. I will check the status of that.

I’m not using the remote connection node. KNIME has a set of dedicated REST nodes, which allow for a variety of authentication methods, including BASIC. I’m using the GET node (description) and following the instructions in the CH developer guide. No dice with CH, but I can access the world bank, TfL etc with it. I’m currently suspecting something to do with Base64 encoding that’s popped up on a couple of threads… but I tried doing that manually with the web encoder, and that failed with a 401 as well.

However, something you’ve said has intrigued me: the amazon S3 connection. KNIME does have a native amazon S3 connection (and redshift, and an Athena one), and financial data would be something I’m interested in… do you know of any docs on this? I’ll start searching the CH dev site, but I’m still feeling my way here, so any and all advice gratefully received!

That would be a perfectly acceptable solution…

Neil

I shouldn’t be meddling here (don’t use KNIME) but since I started…

For some reason your link to the GET documentation above displays as a blank link (an “a” tag without any “href” in the HTML) so I can’t check that either.

I see that new REST nodes which came in with KNIME 3.2 do have a way of passing authentication as you say, along with custom http headers. Couldn’t find main docs on the REST node but found info at:
http://www.dataminingreporting.com/blog/the-new-rest-nodes-get-request

As long as you’re setting it to use Basic authentication with the credentials as per CH, it should be fine.

Only other suggestions from me (you’ve probably covered) are:

  • Check the username works with another tool e.g. use cURL - you’ve done this via browser.

  • Try the (deprecated by RFC 3986) prepend username and password to URL e.g. I just requested the following example URL (via cURL on command line) and got the expected json response:

    https:// {my CH ID}:@api.companieshouse.gov.uk/search/companies?q=barclays

(Don’t have a space after “//” that’s just because otherwise this forum makes it a link!)

I wouldn’t rely on this instead of getting the normal authentication working for critical applications. Security info about this method in case you’re interested.

  • Check KNIME isn’t getting confused with username / password on Basic authentication e.g. there is no “password” for CH - check the output from KNIME if you can. According to e.g. wikipedia or (detail) RFC7617 the username and password string ("{youruserid}:") are base64-encoded and the Authorization header will appear as:

    Authorization: Basic {base64-encoded-username-password}

I’ve used a couple of different ways of accessing the CH REST services (e.g. curl on command line, the PHP cURL extension etc.). I have experienced some issues but it turned out they were all at my end of the pipe.

Financial stuff: this is available now for companies that have filed it, see

It looks interesting but you’ll likely need some heavyweight financial analysis software to make sense of the data:

KNIME looks interesting - maybe I’ll check it out in my spare time…

A final link:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication

…good general info and notes at the bottom that the username:password@www.site… is blocked in several browsers etc.

Good luck and do share (to the main thread, not me) how you got it working.

Hello - answer on KNIME forum here: https://tech.knime.org/forum/knime-labs-general/cannot-connect-knime-to-companies-house-via-rest-api-get-401-error

The basic issue is that the ‘authentication’ tab of the KNIME node does not work with the CH API.

Instead, you need to create a custom request header in the node itself.

This should have a header title of ‘Authorization’ (note the US spelling).

It should have a header value of: basic [your API Key plus a : encoded in base64]

It should be a constant.

Example workflow attached to teh KNIME forum post. Can confirm it returns values from a company search, am doing more tests to confirm full capability.

Neil

Hi Chris

Thanks for all your help. I’ve had an answer on the KNIME forum which has solved the issue, and the API now works for me. Basically ignore the nodes built-in authorisation and put in a custom header with the basic prefix and base-64 encoded (plus :slight_smile: API key. I’ve got it to return searches for companies, just need to test more. But as started - thank you - I really appreciate the time you’e put in to help me.

You’ve also got me really interested in XBRL - I’m curious as to how it works and fits together. I take your point about a suitable program, but am I right in thinking that at its heart, it’s a particularly complex xml schema?

Reason I ask is that KNIME is capable of converting XML into a column-table format through a specific node, and you can specify the schema in an input node. It therefore seems suitable for putting the data across several companies and turning it into an analytical format. If not there are JSON parsers that can deal with complex JSONs and turn them into usable data. It therefore strikes me that it should be able to step into that role as well.

I’d urge you to look into KNIME. Think of it as a similar product to Alteryx - only open source and cross-platform - with all the strengths and weaknesses that entails. It does more than alteryx - and more interesting and cutting-edge stuff; and can be used on Mac or Linux; however the learning curve is steeper, and there’s more hacking and programming skills required to get it going. Alteryx is far more seamless, but is windows only and doesn’t go into some of the more advanced stuff that KNIME can do.

I like KNIME - I think it’s brilliant.

Thanks for all your help,
Neil

Good to hear, that should help anyone else using KNIME.

Maybe there should be a way of flagging threads here for a kind of FAQ - the article you mentioned at Getting invalid authorization should be on if so!

I can’t help with XBRL. Yes, it’s XML so readers should be able to parse it. I haven’t troubled with it because the financial data are rather complex and translating that into simple categories is outside my remit!

If you’re interested short examles can be found at https://beta.companieshouse.gov.uk/company/SC229764/filing-history - there are a couple of short filings there e.g. https://beta.companieshouse.gov.uk/company/SC229764/filing-history/MzE1MDYxMzMwNWFkaXF6a2N4/document?format=xhtml&download=1

The text on Beta says “Download iXBRL” - the document itself is just XBRL (XML) with a processing instruction for a stylesheet which will transform to XHTML. This works quite neatly using an XSLT processor. FYI when I tried in browser (Firefox, current) I get an error that it can’t load the stylesheet.

Note that for downloads as of now [2017-07] when you use the fetch a document request you’ll be send an http redirect response (302 here) which points to the document location on Amazon (S3). KNIME may handle this for you transparently (as happens if you make the request e.g. via AJAX call) - just saying as I had to explicitly catch the location returned then request this from Amazon.