Can anyone tell me the capacity we would need to hold the bulk data files for the streaming APIs:
Filing History

Thank you.

for what timeframe as the ‘stream’ API is never ending so - bulk snapshot can be anything from 200MB-1.2GB depending on the type.

An example of what is meant in the original question:
If we start streaming at midnight tonight, we’ll only get new information for filing histories. We’ll need all data for all companies from when records began before midnight tonight to save into our database to have a complete picture. How big would this be for filing history etc?

We’d think it could be a couple hundred gigs at least but would like to be more certain.


There is currently no snapshot data for streaming API’s. The best is the bulk data downloads which include data up to the end of the month. You can check the sizes of the bulk data downloads from the relevant site.
PS: the zipped sizes DO NOT come to several hundred gigs as you thought, a different matter unzipped, but still the compression ration would have to be massive for it to amount to that!


Is there not an older bulk data download for filing history though? As having a filing history stream with data from, for example, starting from only today but no historical data makes this steam pretty limited - certainly if you want to create filing history documents for hundreds of companies (and not having to break your rate limit via the API to do it)


Here’s the link to CH bulk data products page: Companies House data products - GOV.UK (www.gov.uk)

But to answer your question, NO, there is no bulk data product for filing history. As mentioned, CH had, before the pandemic, plans to provide streaming API snapshots, but that has not happened yet (cannot comment as I am not a CH employee).

By listening to the filing history streaming API, you can update your company data database with the data in the payload (in conjunction with other streaming API’s depending on the fields you are interested in).

As for the timing, you’ll have to set up your database to be ready for the start of the month in order to maintain a complete database. I do not think there will be a work-around for this, even when the streaming API snapshot are finally available.

@phillip - what do you mean by ‘Streaming API snapshots’? Downloadable zip files of past streamed events?

: (company-information.service.gov.uk)

