Locating Timepoints for data snapshots

Hi,

Getting to grips with the Streaming API and the documentation states that all snapshots come with a provided timepoint (“A data snapshot also comes with a timepoint, which gives you the point at which the snapshot was taken.”) to be used to continue the data on from that point, however I cannot locate this? I was wondering where exactly I could find this.

Thank you!

1 Like

Unfortunately those snapshots are not yet in place. It is still our intention to develop them and the current thinking is that we will form a project to do this next financial year, with Discovery potentially starting in April.

Hi Nathan
How did you get on with your streaming API, i don’t have any valid timepoints? All i get a ‘416 requested range not satisfiable’??

Hi Jay,

The issue here is you are providing a timepoint that is probably out of range, I think they only cover about the last 5ish days worth of data so if you ask outside this range you will not get a full response.

As a word of warning the timepoints are just simply a count of the documents pushed to the API too, these are not say unix timepoints, which initially confused me.

I recommend just sending a default request without a timepoint and getting a response just to get your head around it first then go from there :slight_smile:

Hi Nathan
Many thanks for the response, i’ve just sent a request with out any parameters through postman but i suspect it will be too much data. https://stream.companieshouse.gov.uk/companies?

I’m very new to this, trying to get my head around it can you use a company number as a parameter on the stream API?

Thanks
Jay

1 Like

If you do not add (the optional) timepoint, then you’ll get a stream starting when you connected.
If you are looking to query specific company numbers, then the streaming API is not the one you require. And yes, There is a LOT of data and it does not include all companies, just the companies that are making changes / filling account etc. If you keep the data, then after about 18 months, you’ll have a full dataset of all active companies.

1 Like

Hi Phil
Many thanks for the reply this is very good information that I struggled to find documentation on.
Could you please give me the optional timepoint value, as I really don’t my know where to find a usable timepoint.
Thank you
J

Hi Jay,

So sorry I didn’t reply I went too and then closed the tab and forgot, sorry!

I see above you are using postman, I must warn you that this won’t print data out, postman can’t seem to handle the stream…

I don’t have a valid time point to hand right now but it is optional so you could just attempt without…

If you’d like, I’d be happy to provide an early prototype that I made for the streamingapi? It’s written in NodeJS using Typescript, would that be of any use to you? Let me know and I can place it on my GitHub and send you a link :slight_smile: May help you get an idea of how the data looks etc?

Thanks,
Nathan

Hi Nathan
yes that would be brilliant thank you, at this point it’s really to get a feel of the data potential.
It’s for a larger project I’m working on, trying to gather as much information as I can on companies performance against potential weird and wonderful causal factors.
So if you also know of any other off the beaten track data sources please let me know
Thanks
J

Hi J,

Here is the link: GitHub - naathanbrown/StreamingAPI-Prototype

Please forgive the standard of the code and general documentation, this was a really simple prototype I made months ago before I wrote something more official (which I cannot share, unfortunately).

Assuming you are familiar with node and typescript this should be easy for you to set up and use, it just simply prints the data to the console. It makes use of Node Streams which I recommend you reading up on if you end up using something similar.

As for other data sources… not really? I will warn you too every experience I have with CH data, from bulk data, to the API to this streaming API, has been a total pain. So I wish you the best of luck, any questions about the code just give me a shout and I’ll answer the best I can, hope it is some use to you.

Oh final note, in the file with the function that calls the API with a fetch, you need to go add a streaming API key, pretty sure this is different to the regular document one too.

Good luck!
Nathan

Hi Nathan
No I’m not familiar at all with node and typescript but I know the basics of Javascript if it’s at all similar.
Should it run ok in visual studio? And yes I do have my own API keys
If not do you maybe have a screen shot of the typical output?
Sorry for being a pain, I’m very new to all this
Thanks
J

The timepoint is provided in each payload that you get from parsing the json response. All streams have different timepoints and the idea is that you keep the last timepoint that you see before being disconnected (CH will disconnect each stream after 24 hours). When you re-connect, you can then add the timepoint to your request.
NOTE: When no timepoint is included, then the stream will start from an un-documented timepoint.