Advanced Search for Company Name, Python and Fuzzy Match

Hi There
I noticed that when using the advanced search for company name and using the (max) 20 suggested results, they are in quite random order.

To find the most relevant to my search, I used fuzzy match and filtered to get the best match - it worked pretty well. The code doesn’t have 600 limitation/5 minutes (I used batches for that as it was just more feasible for me).

Any suggestions, opinions - would be very grateful :slight_smile:

on the top of typical libraries like panda, requests and json you need to install install thefuzz and python-Levenshtein.
Also, I struggled to paste in the code properly, so apologies for that.

import requests
import pandas as pd
api_key=“your apikey”
cols=[“company_name”,“company_number”,“company_status”,“registered_office_address.postal_code”]

br=pd.read_csv(’’/path to your file.csv’’)
(below code needs to be run manually for each batch if using batches - CH limitation is 600 calls per 5 minutes)
br=im.query(“Batches==1”)

for index, row in br.iterrows():
df1=pd.DataFrame(columns=cols)
qstr=row[‘AccountName’]
uniqueID=row[‘Unique_ID’]
url=“https://api.company-information.service.gov.uk/advanced-search/companies?company_name_includes=”+ qstr
params={‘company_status’:‘active’}
response = requests.get(url, params=params,auth=(api_key,’’))
statusCode=response.status_code
if statusCode == 200:
responseJson=response.json()
df1=pd.json_normalize(responseJson,record_path=[‘items’])
df1=df1.filter([“company_name”,“company_number”,“company_status”,“registered_office_address.postal_code”])
df1=df1.assign(uniquePartyID=uniqueID,partyName=qstr,response_code=statusCode,fuzz=0)
for index, row in df1.iterrows():
companyN=row[‘company_name’]
from thefuzz import fuzz
fuzz=fuzz.token_sort_ratio(qstr,companyN)
df1.at[index,‘fuzz’]=fuzz
df2=df1.sort_values(by=[‘fuzz’], ascending=False).head(1)
else:
continue
frames=[df2,df3]
df3=pd.concat(frames)
df3=df3.reset_index()

df3.to_csv(’/path to your file.csv’,index=False)

1 Like