Need to Page through advanced-search/companies, but fails after start_index 10000

I’ve been trying to figure out a good way to page through all company numbers. Best option I see is to use the Advance Search API. However it seems to throw an error after start_index reaches 10000. Is this expected behavior? or is this an error?

command

curl -u 801932d9-9bdc-4428-b413-61ef40f0e791:  https://api.company-information.service.gov.uk/advanced-search/companies?start_index=10000&size=1

result

{"timestamp":"2023-09-22T22:08:47.822+00:00","status":500,"error":"Internal Server Error","path":"/advanced-search/companies"}
1 Like

Page through all company numbers?
The API is not intended for that sort of use.
We do have bulk products Companies House that
that may be more suited to your needs

1 Like

Thanks for this - I’ve been wrestling with this for a little while (something saying that it will only let you look at the first 10,000 records on the web pages would have saved me a couple of days of assuming it was me doing something wrong)…

However I’m a little confused as to what the intention of this method actually is, then. As you can’t access a company after the 10,000th in the list of whatever filter you are selecting using this method, what’s the use case it is designed for?

I’ve grabbed the bulk download file and put the 5.5 million or so records of active companies (i.e. not dissolved ones) into a table, but that is correct s at 1/3/24 and I’ve been asked by my colleagues in research (I work in a University) if I can get the balance up to today and maintain that. Looking for companies incorporated after 1/3/24 yields about 38,000 records…

So I’ll need to cycle through each date from 1st to today using that as a filter to grab the smaller number of daily records (hopefully within 10,000 each day) to get that full list.
Point being, the API will still deliver out the same amount of data to me (so no ‘advantage’ to Companies house) but it just makes life harder and causes more API calls to be made.

Just my musings having encountered this API for the first time recently.
Thanks