API returns a total_count of '0' when there are definitely more

carl_jabbour · September 21, 2016, 2:26pm

Hey everyone, been trying to set up an application using the API but have run into some problems and I’m hoping somebody can help. I am using python to retrieve the data.

Bascially, I have a txt file which contains a list of company numbers that I want the API to search for, here is my code:

input_file = open('company.txt', 'r')
for x in input_file:

    url = 'https://api.companieshouse.gov.uk/company/{}/filing-history'.format(x)

    r = requests.get(url, auth=('AUTH', ''))

    data = r.json()

    print(data)

This basically cycles through all the entries in the document, and for every new line, constructs a new URL which is then sent through the API. The problem is however, is that the API only really processes the first one on the list, even though I know for a fact each line is in fact being individually cycled through. This is what happens:

https://api.companieshouse.gov.uk/company/06495921
/filing-history
{'filing_history_status': 'filing-history-available', 'total_count': 0, 'items_per_page': 25, 'start_index': 0, 'items': []}

https://api.companieshouse.gov.uk/company/03778604/filing-history
{'filing_history_status': 'filing-history-available', 'total_count': 137,

As you can see, one of the URL’s gets sent off to the API but returns 0 results, even though I know this is not true, while the other returns 137. Its very odd because it doesn’t matter what company numbers I put into the text file, at least 1 of them returns blank.

I have tested each company individually and lots of results are returned, but its only when I try to run them one after the other do I get this error.

Furthermore, even if I run a script that makes the 2 same queries one right after the other, the API returns the results exactly as required, this code is:

import requests
import json

url = 'https://api.companieshouse.gov.uk/company/03778604/filing-history'

r = requests.get(url, auth=('**AUTH**, ''))

data = r.json()

print(data)

url = 'https://api.companieshouse.gov.uk/company/00235446/filing-history'

r = requests.get(url, auth=('AUTH', ''))

data = r.json()

print(data)

Do you think this is a problem on my side? or perhaps something to do elsewhere.

Any help is greatly appreciated

carl_jabbour · September 21, 2016, 2:56pm

Ok think I’ve figured out a solution, basically I was stupid because I didn’t realise that one of the API requests had added a new line in between the company number and the ‘/filing-history’, rendering the request invalid.

This is because when the code reads the text document, it also copies over the new line character ‘/n’ and adds that to the API request, hence the mixup.

Looking for a solution online now but its a very simple problem

mfairhurst · September 21, 2016, 4:03pm

@carl_jabbour

My suspicion is you have a carriage return/newline on the end of the company number in your company.txt file. You are then calling the API with effectively a company number including the CR or Linefeed. As this company doesn’t exists, we should return a 404 status code of “not found”, but are returning the message you have referenced in your post with the total count of 0.

I have tested using the code below with and without the x = x.replace line and have replicated what you are experiencing.

#!/usr/bin/env python

import requests

input_file = open('company.txt', 'r')
for x in input_file:
    x = x.replace("\n", "")
    url = 'https://api.companieshouse.gov.uk/company/{}/filing-history'.format(x)
    print(url)
    r = requests.get(url, auth=('<<auth_key>>', ''))

    data = r.json()

    print(data)

Hope this helps

Thanks

@mfairhurst