Limits & Bulk Downloads

Hello there,

We need to get bulk filing history data for each of the 3 million live companies on the Companies House register. How would we go about receiving this? I can only think of two solutions for this:

1 - To automatically retrieve each company’s filing history via the API which can be time consuming and due to limits, may not be possible.

2 - To recieve a bulk data feed directly from Companies House.

What options can you recommend?

Thanks

I’ve also been referred here by Companies House customer service team to ask for an increase in my Rate limiting. Please advise how do I speak to someone from the Developer section.

Many thanks in advance

We are working on re-formulating our bulk data products. We are likely to produce bulk copies of the same resources the REST API returns, as that looks likely to meet a majority of use cases.

Versioning of course has a role to play here, so that is a consideration also.

It is early days on this though.

It would be helpful to understand your use-case for having the entire filing history held your end, if you can share that. The more we understand, the better our new digital platform will be!

Hi, thank you for your reply. We own a company information website where company data is searchable and accessed by the public for free. At the moment we are having to charge £1 for each document due to previously signed agreements from third party suppliers.

Our contract is coming to an end this year and so we are exploring the possibility of retrieving the entire database of documents directly from Companies House to allow our customers to download for free.

What options can you offer in the mean time?

Many thanks in advance

Okay. The searching aspect seems to be a common business model.

So, do you index the filing history data so that it is searchable in itself?

I wonder why you need to retrieve the entire database ahead of time instead of retrieving data (say, filing history) directly on demand from the CH API, specifying filters as necessary, and then similarly retrieve documents as and when the customer requires?

We’re building the new API to meet user need, and our own (new) website http://beta.companieshouse.gov.uk accesses company data only though this API. It has no more access to company data than you do, as we’re all using the same API :wink:.

Unless there is some unusual business process that we’re unaware of, or you are doing data analysis, there shouldn’t be a need for you to create your own data store and keep that up to date. Unlike previously, Companies House data is now free and highly available, and application designs could benefit by exploiting this fact.

Thoughts?

Hi,

Thank you for your reply. Our website is an online database providing information on UK companies along with other supplementary data such as trademarks, logos, patents, gazettes etc. We source data from multiple public sources and connect them together.

Due to the sheer number of visitors we receive on a daily basis, we don’t think it’s practical to retrieve API information in real time. At the moment during busy times (1pm-3pm) we will have to query the API around 300 to 400 times every second which will cause heavy server load, I can only imagine. For this reason, I believe storing the data in our own internal database is more practical.

What solutions can you offer for this?

Thanks

Hi,

first of all, sorry for posting in your thread, but as we have the same request, it seems better to put it here.
We are aslo interested in getting bulk export for officers, filling, and other information as it’s already done for accounts data and company data.
Our use case is pretty different from displaying information about a searched company, we make deep and cross data analysis on all companies information and provide result to our customers.
The only way to achieve this it getting access to all data, and scraping the API is definitely not the right way to do this.
We will unnecessary overload your servers, and in our side we will take ages to get data, and it will be really difficult to maintain it up to date.

Hi, just to add, that we too would be interested in a bulk download.
We want to perform deep analysis on the whole historic company history available and that would not be appropriate to do through the rate-limited API.

Is there any advance on this facility being offered?
Thanks.

Hi, curious if anyone figured out a way to get a better bulk download? I’m trying to bring down 2018 and 2017 Filing History and so far the only solution I see is get the bulk companies and then call the API for each one but that’s just gonna be a long process and may run into limits