What is the best way to return all the companies from a location?

I am using the advanced search API - Companies House Public Data API: Advanced search for a company

I use the size param to return 5k companies in a specific location. The return range on the website mentioned 1 to 5k. I want to get back all the companies located in a particular location for example let’s say London. I think there are more than 5k companies incorporated in London. What is the best way to return all the companies located in London?

The API documentation indicates that the size and start_index parameters allow you to page through results in the same way as other endpoints in the Public Data API (other endpoints use e.g. items_per_page and start_index). I’ve not tried setting start_index above 5000 but that’s how it would work for the other endpoints. So in this case size=5000&start_index=0 for the first page, size=5000&start_index=5000 for the second page etc.

In the list of companies response you’ll get a hits member which has a count of matches. I’ve just tried this for London:

curl -u MY_API_KEY: "https://api.company-information.service.gov.uk/advanced-search/companies?location=london&size=3"

This returns (some data snipped):

{
    "etag": "9b37f850873c018da6b9fb6a81742e0e5df5c697",
    "top_hit": {
        ...
    },
    "items": [
        ...
    ],
    "kind": "search#advanced-search",
    "hits": 2440956
}

Hi there I try to get all the companies in Surrey. Here is my endpoint.

https://api.company-information.service.gov.uk/advanced-search/companies?location=surrey&size=5000&company_status=active&start_index=0

In the second API call I change start_index to 5000.

The hit is 94733 that means there are 94k companies in Surrey right?

When I make the third API call using start_index as 10000 i got an error

{
    "timestamp": "2022-09-05T10:18:32.805+00:00",
    "status": 500,
    "error": "Internal Server Error",
    "path": "/advanced-search/companies"
}

third api call - https://api.company-information.service.gov.uk/advanced-search/companies?location=surrey&size=5000&company_status=active&start_index=10000

How do I get to the next 5000 after 10,000, shouldn’t the start index be 10000 according to your explanation above ?

See this thread - like other searches Companies House Advanced Search seems to have a hard limit:

This is “by design” - Companies House have regularly said that the purpose of the API - especially the search - is not to collect large quantities of data. They have various “bulk data products” for that including one for company data which would be the one for this. Unfortunately in general not all the data you can get from the API is there, it’s only updated monthly I think and the format is different too. However if you’re prepared to do some address parsing (e.g. town / postcode - in spotty data) then I think you could make this work. Companies House may just be doing this behind the scenes anyway - you doing it yourself puts you in control at least.

A note of caution!

This type of search will only ever return part of the truth because the correct address element is frequently blank or the data has been positioned in an incorrect element. When present in the right element the data can be correct, or misspelt (SURRY, SUREY), or corrupted with punctuation (SUREY, SURRY.) or just plain wrong (LONDON is commonly used with non-London postcodes around the capital).

Using the postcode only and with these mapped to the definitive ONS master data tables will give the most accurate output BUT not all postcodes are present either and errors are also present; the upside is that there are fewer issues overall.

Our data is probably more precise than most commercial datasets ( http://statbooks.co.uk/ ) and is further improved daily but is still not perfect; further details are available from admin@statbooks.co.uk