We want to run a daly cron that gets any new companies every night from the previous day but also want to fetch the backlog.
What is the best way to paginate over all companies to get the full list of company numbers?
We want to run a daly cron that gets any new companies every night from the previous day but also want to fetch the backlog.
What is the best way to paginate over all companies to get the full list of company numbers?
It sounds like you want to “get all the data” and keep it up to date. If that is correct, and as Companies House have repeatedly stated that the Public Data API is not for this purpose and it’s rate-limited anyway. I’m not certain you’d be able to cover all the data.
However - how would you find out what new companies have appeared during the day anyway? I would suggest the best fit for this is to use the Streaming API - you will get notified (with the basic company data I think) whenever a company event happens (like a new company being registered). So no need for further look-ups.
On doing this see e.g.:
Here’s a somewhat similar question to yours with some information e.g. links