Searching Using Curl returns only 20 results a Page

When using a Curl to search all I only get 20 results per page, how can I get more per page and/or iterate through the pages, I can’t see any clear explanation how to index through.

Thanks in advance

Shaun

Documentation is at https://developer.companieshouse.gov.uk/api/docs/search/search.html

You need something like (e.g. for “lloyds”):

curl -u{APIkey} "https://api.companieshouse.gov.uk/search?q=lloyds&items_per_page=30&start_index=0"

You obviously need your own API key - which should end in a colon - instead of {APIkey} here.
Check the “items_per_page” field in the response to ensure you did get back 30 results (or count 'em!).
To get the next 30:

curl -u{APIkey} "https://api.companieshouse.gov.uk/search?q=lloyds&items_per_page=30&start_index=30"

You’ll also probably be interested in the “total_results” field.
(By the way - if you are just interested in companies or officers, you can limit the search to these by using /search/companies or /search/officers instead of /search).

The general syntax is as per a standard RESTful API:

curl -u{APIKEY} "{restURI}"

(Obviously see curl docs if you e.g. need to get the http header instead etc.)
Where (to spell it out):

  • {APIKEY} is your API username followed by a colon.
    The format used by cURL is actually username:password but CH just give you a username and no password.
  • {restURI} - note you’ll want to enclose this in quotes for windows command line / unix shells.
    This is:
  • The URI for the appropriate end point. The first part is either:
    For main API - https://api.companieshouse.gov.uk/
    For Document API - http://document-api.companieshouse.gov.uk/
    So for search (all), you want https://api.companieshouse.gov.uk/search
  • Search takes the following query parameters:
    search?q={term}&items_per_page={ipp}&start_index={start}
  • {term} is the string you’re searching for (obviously, if that includes URI “special characters” like “?”, “&” etc. these need percent encoding)
  • {ipp} is the number of items you want back (note - this doesn’t guarantee you’ll get as many as this).
  • {start} is the item in your search results to start with (zero-based I believe).

Many CH endpoints use start_index and Items_per_page to step through potentially large data sets - a quick search on this forum will show you various ways to do this e.g.:

You’ll also find out about limits in some cases which are not currently documented in the main documentation.

Thanks for the responses guys, let me show you what I have done, because it still seems to return the same information:

curl -u myKey: https://api.companieshouse.gov.uk/search?q=searchTerm&items_per_page=20&start_index=0

Then using the number of items returned to define the start_index value I use the following:

curl -u myKey: https://api.companieshouse.gov.uk/search?q=searchTerm&items_per_page=20&start_index=20

I seem to get the same information?

Thanks in advance

Ah I got what I need now…

I used Python instead and urllib3 and then the start_index but the example shown above using curl did not work for me.

Thanks for the help both

1 Like

Good you’ve got it working.

When using curl, did you enclose the https://… part in quotes? Your example doesn’t show any.

(If you did so then the info below won’t apply. I don’t know what’s wrong but for help post exactly which command you issued to curl [without your API key details obviously] and the response you received.)

That might be the reason why you got the same information using a different “items_per_page”. The command line / in Windows / many linux shells will split the command you gave above into 3, after the “&” character.

So if you run (using “tesco” as search term):

curl -u myKey: https://api.companieshouse.gov.uk/search?q=tesco&items_per_page=20&start_index=0

This is intepreted as 3 commands:

https://api.companieshouse.gov.uk/search?q=tesco
items_per_page=20
start_index=0

The first will happily give the first page of results (as if you’d set “start_index+0”).

When you run
… and then the system tries to run:

curl -u myKey: https://api.companieshouse.gov.uk/search?q=tesco&items_per_page=20&start_index=20

…again the shell / command line will split this up and you’ll get the same command as you had before sent via curl:

https://api.companieshouse.gov.uk/search?q=tesco

See:

Good man @voracityemail that was exactly the issue, I tried again using double quotes as you suggested and then the indexing worked.

Thanks for that, it might come in handy again, I am new to Curl, in the end I resorted to using pythons urllib3. but its still nice to know for future reference

Hi!
I am using also the start_index and the items_per_page to iterate through the pages. However, I got an error when I set the start index over 901 (it works with 900 but not with 901). Is there any limit or I’m doing something wrong? And how can I get the rest of the results?
Thanks

Search is tuned to ‘find’ a specific company name, it is not intended to be used to get all company names, we have bulk products for that.
If you search returns too many results, you need to make your search term more specific.

Thanks for the answer, Mark.
I have checked the bulk project, and find the company data product. However, the dataset includes only “live companies” and not those which have been dissolved recently. Is there any other product where I can have the complete database to narrow down the companies that I am interested in?

Regarding making the term more specific, how can I do it? when I include more than one term in the search, it returns even more results (for both terms separately and not results which include both terms at the same time.

Thanks

Can you provide a company name that you are searching for that you cannot find?

So, for instance, this company “BITCOIN ALLIANCE LTD” is not in the dataset for “live companies” available for download, because it was dissolved in February.

Regarding the searching terms, I try to make my search more specific but, for instance, if I write q=virtual+coin or q=%22+virtual+coin+%22 I got more results than using only “coin”.
Thanks

When I search the API for BITCOIN ALLIANCE LTD, it is the first one returned, an exact match.
GET /search/companies?q=BITCOIN+ALLIANCE+LTD

There is no company called virtual coin, so you will not find it.

Sorry, I didn’t explain well.

My aim is getting the companies that have been dissolved in the two last years. The type of companies that I am looking at are those related to bitcoins. There is no catalogue of which companies are related to that business, so I am searching in Companies House for those which have in the name (title) bitcoin, crypto, virtual coin… Not for specific companies.

I know that Bitcoin Alliance LTD is in the API, but it is not in the bulk data available to download here: Companies House. That is why I am using the API to get the companies that can be related to bitcoin, and later filter by the “dissolved” ones. But I cannot get more than 1000 results. I tried to be more specific with the terms of the search, but it is given me even more results.

Is it possible to produce a csv with the companies dissolved in the last two years, as there is one for the live companies? Or, is it an alternative way of getting this information?

The short answer is that the API search is not intended to do what you are trying to do.
There is no bulk product of dissolved companies either.

Okis,
Thanks for all the help!

There is another product called the DVD ROM product that has 20 years dissolved companies data on it but it is chargeable. Details can be found About our services - Companies House - GOV.UK

What is the maximum number of results that the CH API returns?

I’ll reply here so your main queries get a chance for Companies House / someone more knowledgeable to pick up.

maximum number of results that the CH API returns

  1. Queries without an items_per_page but which still return a list e.g. company insolvency information, company exemptions, company registers.
  • all the data as far as I’m aware (presumably never going to be very extensive).
  1. Assuming (for lists) by “maximum number of results” you mean “maximum number of items returned in one request”
  • I had thought CH deliberately didn’t make any promises here but I found this thread:
    Data capped limit
    … says “100” - and this is what I’d found experimentally e.g. anything above “items_per_page=100” just returns 100 results. You could always request 500 and see what you get…
  • if you don’t specify “items_per_page” the default seems to be 20.
  1. If you mean "is there a maximum number of results I can get (with multiple api calls):

Looking at your questions in general you seem to be saying “could the API be changed so I can efficiently replicate the data and keep updated with changes on a notification basis?”

CH repeatedly state that this is not their remit when creating the API. Providing more granularity in the way of searching (nationality) may be down to your own implementation. However:

  1. They provide the company, PSC datasets and accounts (if that interests you) as bulk data. You can also sign up on the forum for officer appointments as bulk data. It seems there’s no bulk dissolved companies data / disqualified officers although some people who post here offer these services themselves. Caveats - the company bulk data set is only updated monthly, the format doesn’t match the API and it doesn’t contain all data that you get e.g. with Company Profile in the API.
  2. They have plans - trailed now for a couple of years, search the forum - for a “streaming API” which sounds like it would give you the required updates on changes.
  3. Probably not useful for your needs but you can sign up to follow companies via email updates when they make a filing.

Aside from CH there are some services mentioned on the forum which may provide you with additional functionality - search around!

Edit: overview of bulk data products - see:

There’s also a DVD of ex-companies (for sale):

Found out about this in the following thread: