How can I update my dataset with the streaming API

Hi there,
I currently have a big dataset with 1,034,000 rows that contains the charges of the companies I have chosen.
The attributes are: ChargeCode, CompanyName, CompanyNumber, DeliveredOn, Has_insolvency_history, accounts_overdue, Persons entitled, Director1, Director2, Director3, Director4.
I have basically created this dataset with a combination of the bulk datasets available and the REST API by calling 3 endpoints (company, company/charges and officers).

I am very interested on updating my table automatically, so that I don’t get to update it manually every week or every month.
I don’t have expertise with streaming API’s, can anyone help me?

Thank you so much for your interest

Look at the getting started for Streaming API:
There are links on the left to the respective endpoints’ specification(s) that you’ll need to subscribe to, i.e
Streaming API: Basic company information stream
Streaming API: Charges stream
Streaming API: Officers stream

Thank you so much or your answer.
I have tried to understand how it works, but it’s still unclear for me.
Basically I have a snapshot of the data I told you previously in a CSV format.
How should I proceed to update this CSV, should I download a software?
How can I upload the information from the streaming API in a separate CSV, which software should I use?

I am very proficient in Python and the libraries related to REST API’s, but it seams that python is not appropriate for streaming API’s…
Do you have a video I can watch or whatever that can help me?

The first link I gave shows you how to connect to the streaming API endpoints. You need to connect to the endpoints and parse the JSON payloads then use that data to update your database, which you have preloaded with the downloaded CSV data.

Thank you for your reply.
My question is how can I get the data from the streaming API, which software should I use?
I have use Jupyter notebook for that, and it only shows a running chunk.
Does it mean that I have to keep the chunk running indefinitely?

Just write your code. You simply need to connect a socket then send a request with your authentication code then listen. This can be done in any programming language.

url_full2 = “https://stream.companieshouse.gov.uk/charges
import json
import urllib.request
response2 = requests.get(url_full2,auth=(api_key,’’))
json_search_result2 = response2.text
print(json.JSONDecoder().decode(json_search_result2))

This is what I write, I can get an answer with the REST API but only a running chunk with the streaming API…

here’s a link to the authentication page (linked from the first link I provided): Streaming API: (company-information.service.gov.uk)
If you need help with your python code, I am afraid I cannot help as I use c#

I have read the above link many times but it seems to not work like the REST API…
Unless there are no new charges for the last hour, do you think this might be the reason?
If this is the case, should I leave my computer always turned on and running? It doesn’t make any sense to me, I am sure there is a software or whatever that helps us to collect the information even if the computer is turned off

1 Like

Funny you say that 'unless there has not been any data in the last hour". My logs of a live app show there was no data for the filing history API between 11am and 1pm today! Very strange, probably an outage of sorts (they also at times batch process). It’s showing activity since 1 pm though.

If you have c# code that has failed to connect I can look over it, however I do not think this is a place for a coding tutorial.

I don’t know how streaming API’s works, do I have to run my code in a server?

and wait until I receive changes?

You can run it anywhere you want, on a server or on your PC. Just remember that you only get data when you are connected. There is latitude in ‘rewinding’ the stream by using a timepoint, but I suppose that can wait till you get a connection going.

That will be good to see, do you have a timepoint I can use, maybe before 11am today, so I can see something? Just to check how the response is

You get your timepoint from the payloads, and you need to listen on the stream to get your payloads. You’ll have to parse the payloads in order to extract it’s timepoint - but you need to connect first!

PS: Search the forum - it has a lot of discussions on timepoint use

We did have a delay processing updates this morning which would have resulted in ‘pauses’ in data going out onto the streams. That was all sorted and caught up by 16:00hrs today.
Apologies.

1 Like