Hi all,
I have been trying to use the API to extract company officers, but I have found that if the company has more that 100 officers, it will only return a max of 100 through the API, regardless of how many I have requested as the maximum. Does anyone know whether this is a problem at my end or with the API?
On an unrelated note, is it possible to download all of the Companies House data in one file? I found a way to download every company in a .csv file, but is the people data also available as a single location? That would be much more convenient for what I’m trying to do.
Cheers,
G
Which ones did you try? How were you doing this e.g. what were you setting in the start_index
parameter?
I found the following old example - OC305357 - from our collection still seems to work. This company has over a thousand officers total. Don’t think I’ve ever tested the lot but I just got 200 now and have got back more in the past. Notice the etag has changed between calls - so this represents the record set you’ve requested (not the entire list). Not that this system is fully functional yet if I recall…
Example shown was obtained using curl but the principle’s the same e.g. just page through using parameters start_index
and items_per_page
(obviously I’ve snipped the results for this one!)
curl -u{Our API key here}: “https://api.companieshoe.gov.uk/company/OC305357/officers?start_index=0&items_per_page=100”
{
"total_results":1641,
"links":{"self":"/company/OC305357/officers"},
"active_count":677, "resigned_count":964,
"etag":"a1750b86b9c5e77c7e3725e991b620e199a48480",
"start_index":0,
"items": [ ... ]
}
Next call:
curl -u{Our API key here}: “https://api.companieshoe.gov.uk/company/OC305357/officers?start_index=100&items_per_page=100”
{
"total_results": 1641,
"start_index": 100,
"inactive_count": 0,
"items_per_page": 100,
"active_count": 677,
"etag": "55b890e63e2cc2a81b8cc4ba0adffbe7fdb47645",
"kind": "officer-list",
"resigned_count": 964,
"items":[ ... ]
}
Oh right - I think I see what I might have done wrong. I was just upping items_per_page to try and capture more search entries, but it seems that that can only take a max of 100. There’s no particular reason that the items_per_page need to be capped though is there?
I’ll have to just update my code to include start-index offsets to capture all the officers systematically. It does mean that I’ll have to make multiple calls to the API if it has more that 100 officers, but it’s not the end of the world.
Cheers,
G
There’s no particular reason that the items_per_page need to be capped though is there?
100 - arbitrary limit as far as I know but they did confirm this:
https://forum.aws.chdev.org/t/data-capped-limit/551/2
Documentation on doing this is on the Officer list documentation page.
Unfortunately the documentation is sometimes a little scattered / incomplete and you need to read in conjunction with this forum - but then again you’re getting what you pay for. Some posts have collected info about sections of the API e.g. my post on a thread about getting “all the data”. Of note is that the search endpoints are more limited than specific company information e.g. you won’t be able to get above a certain arbitrary number of results. For specific companies you should be able to get all the data if you’re prepared to page through it as described and as long as you respect the rate limits.
Chris