Best way to paginate over all search/companies to get all companyNumbers

We want to run a daly cron that gets any new companies every night from the previous day but also want to fetch the backlog.

What is the best way to paginate over all companies to get the full list of company numbers?

It sounds like you want to “get all the data” and keep it up to date. If that is correct, and as Companies House have repeatedly stated that the Public Data API is not for this purpose and it’s rate-limited anyway. I’m not certain you’d be able to cover all the data.

However - how would you find out what new companies have appeared during the day anyway? I would suggest the best fit for this is to use the Streaming API - you will get notified (with the basic company data I think) whenever a company event happens (like a new company being registered). So no need for further look-ups.

  1. If you are only interested in new companies starting now then just use the Streaming API. See e.g. here:
  1. If you want all current companies then you can get the list from the Bulk Data files, then use the Streaming API to keep this up to date. @ebrian101 has an unofficial guide with some information and the links here: Bulk Data products from Companies House | CH Guide

On doing this see e.g.:

Here’s a somewhat similar question to yours with some information e.g. links