When using a Curl to search all I only get 20 results per page, how can I get more per page and/or iterate through the pages, I can’t see any clear explanation how to index through.
Thanks in advance
Shaun
When using a Curl to search all I only get 20 results per page, how can I get more per page and/or iterate through the pages, I can’t see any clear explanation how to index through.
Thanks in advance
Shaun
Documentation is at https://developer.companieshouse.gov.uk/api/docs/search/search.html
You need something like (e.g. for “lloyds”):
curl -u{APIkey} "https://api.companieshouse.gov.uk/search?q=lloyds&items_per_page=30&start_index=0"
You obviously need your own API key - which should end in a colon - instead of {APIkey} here.
Check the “items_per_page” field in the response to ensure you did get back 30 results (or count 'em!).
To get the next 30:
curl -u{APIkey} "https://api.companieshouse.gov.uk/search?q=lloyds&items_per_page=30&start_index=30"
You’ll also probably be interested in the “total_results” field.
(By the way - if you are just interested in companies or officers, you can limit the search to these by using /search/companies
or /search/officers
instead of /search
).
The general syntax is as per a standard RESTful API:
curl -u{APIKEY} "{restURI}"
(Obviously see curl docs if you e.g. need to get the http header instead etc.)
Where (to spell it out):
https://api.companieshouse.gov.uk/
http://document-api.companieshouse.gov.uk/
https://api.companieshouse.gov.uk/search
Many CH endpoints use start_index and Items_per_page to step through potentially large data sets - a quick search on this forum will show you various ways to do this e.g.:
You’ll also find out about limits in some cases which are not currently documented in the main documentation.
Thanks for the responses guys, let me show you what I have done, because it still seems to return the same information:
curl -u myKey: https://api.companieshouse.gov.uk/search?q=searchTerm&items_per_page=20&start_index=0
Then using the number of items returned to define the start_index value I use the following:
curl -u myKey: https://api.companieshouse.gov.uk/search?q=searchTerm&items_per_page=20&start_index=20
I seem to get the same information?
Thanks in advance
Ah I got what I need now…
I used Python instead and urllib3 and then the start_index but the example shown above using curl did not work for me.
Thanks for the help both
Good you’ve got it working.
When using curl, did you enclose the https://… part in quotes? Your example doesn’t show any.
(If you did so then the info below won’t apply. I don’t know what’s wrong but for help post exactly which command you issued to curl [without your API key details obviously] and the response you received.)
That might be the reason why you got the same information using a different “items_per_page”. The command line / in Windows / many linux shells will split the command you gave above into 3, after the “&” character.
So if you run (using “tesco” as search term):
curl -u myKey: https://api.companieshouse.gov.uk/search?q=tesco&items_per_page=20&start_index=0
This is intepreted as 3 commands:
https://api.companieshouse.gov.uk/search?q=tesco
items_per_page=20
start_index=0
The first will happily give the first page of results (as if you’d set “start_index+0”).
When you run
… and then the system tries to run:
curl -u myKey: https://api.companieshouse.gov.uk/search?q=tesco&items_per_page=20&start_index=20
…again the shell / command line will split this up and you’ll get the same command as you had before sent via curl:
https://api.companieshouse.gov.uk/search?q=tesco
See:
Good man @voracityemail that was exactly the issue, I tried again using double quotes as you suggested and then the indexing worked.
Thanks for that, it might come in handy again, I am new to Curl, in the end I resorted to using pythons urllib3. but its still nice to know for future reference
Hi!
I am using also the start_index and the items_per_page to iterate through the pages. However, I got an error when I set the start index over 901 (it works with 900 but not with 901). Is there any limit or I’m doing something wrong? And how can I get the rest of the results?
Thanks
Search is tuned to ‘find’ a specific company name, it is not intended to be used to get all company names, we have bulk products for that.
If you search returns too many results, you need to make your search term more specific.
Thanks for the answer, Mark.
I have checked the bulk project, and find the company data product. However, the dataset includes only “live companies” and not those which have been dissolved recently. Is there any other product where I can have the complete database to narrow down the companies that I am interested in?
Regarding making the term more specific, how can I do it? when I include more than one term in the search, it returns even more results (for both terms separately and not results which include both terms at the same time.
Thanks
Can you provide a company name that you are searching for that you cannot find?
So, for instance, this company “BITCOIN ALLIANCE LTD” is not in the dataset for “live companies” available for download, because it was dissolved in February.
Regarding the searching terms, I try to make my search more specific but, for instance, if I write q=virtual+coin or q=%22+virtual+coin+%22 I got more results than using only “coin”.
Thanks
When I search the API for BITCOIN ALLIANCE LTD, it is the first one returned, an exact match.
GET /search/companies?q=BITCOIN+ALLIANCE+LTD
There is no company called virtual coin, so you will not find it.
Sorry, I didn’t explain well.
My aim is getting the companies that have been dissolved in the two last years. The type of companies that I am looking at are those related to bitcoins. There is no catalogue of which companies are related to that business, so I am searching in Companies House for those which have in the name (title) bitcoin, crypto, virtual coin… Not for specific companies.
I know that Bitcoin Alliance LTD is in the API, but it is not in the bulk data available to download here: Companies House. That is why I am using the API to get the companies that can be related to bitcoin, and later filter by the “dissolved” ones. But I cannot get more than 1000 results. I tried to be more specific with the terms of the search, but it is given me even more results.
Is it possible to produce a csv with the companies dissolved in the last two years, as there is one for the live companies? Or, is it an alternative way of getting this information?
The short answer is that the API search is not intended to do what you are trying to do.
There is no bulk product of dissolved companies either.
Okis,
Thanks for all the help!
There is another product called the DVD ROM product that has 20 years dissolved companies data on it but it is chargeable. Details can be found About our services - Companies House - GOV.UK
What is the maximum number of results that the CH API returns?
I’ll reply here so your main queries get a chance for Companies House / someone more knowledgeable to pick up.
maximum number of results that the CH API returns
items_per_page
but which still return a list e.g. company insolvency information, company exemptions, company registers.items_per_page=100
” just returns 100 results. You could always request 500 and see what you get…Looking at your questions in general you seem to be saying “could the API be changed so I can efficiently replicate the data and keep updated with changes on a notification basis?”
CH repeatedly state that this is not their remit when creating the API. Providing more granularity in the way of searching (nationality) may be down to your own implementation. However:
Aside from CH there are some services mentioned on the forum which may provide you with additional functionality - search around!
Edit: overview of bulk data products - see:
There’s also a DVD of ex-companies (for sale):
Found out about this in the following thread: