Software for importing Company Director Appointments

Hello,

We (our research team) will use the open data by the Companies House for academic purposes. To facilitate processing of the company director appointments and company profiles, I tried to develop an importer program. The intention is the store the records into a highly structured database with tables for company profiles, directors, and appointments. However, things didn’t really work out as intended.

Has anyone else struggled with the documentation and inconsistencies in the bulk snapshots and daily updates?

I would more than appreciate it if I could have a quick look at your code, to find out where I made a mistake!

Thank you.

Sincerely,
Timo Acden

Hi,

it might be easier for people to comment if you could say a little more about how your program works, and what issues you have come across.

documentation and inconsistencies in the bulk snapshots and daily updates?

As was said it might be more helpful if you could give examples of:
a) what you wanted to achieve
b) what “didn’t really work as intended”
c) what documentation you were looking at.

Caveat - we don’t - yet - make heavy use of the bulk data but do use this for “investigation”.

Documentation - I’m only aware of (please post if you know more):

API
I don’t know whether anyone else wants to post their code, but if you Google about there are some libraries now for CH access (at least via the API) and plenty of sample code on the forum.

True. My apologies for being somewhat cryptic.

The issue that we are facing involves the daily update files for company profiles and company director appointments. Issues arise when determining whether a row in the update file is an update to an existing data entry, a correction, or a new data entry. The documentation does provide information about these subjects, but for an unknown reason, the data are not processed validly when comparing the result to the online CH application.

Do you have, perhaps, other documentation regarding these update files?

Hello,

Thank you for the links. I was indeed able to find plenty documentation for the API. However, we have access to the bulk snapshots and update files, which are processed differently than data retrieved via the API.

Code on Google mainly relates to the API however, or the monthly company profiles archive.

I saw your other post, I’ll let that get picked up by others e.g. CH, but I think you should be able to map these all to something in the API constants at GitHub - companieshouse/api-enumerations

So:

  • APP_DATE_ORIGIN - not sure but the officer appointments data structure in the API differentiates between “appointed_before” and “appointed_on”, according to another field ( “is_pre_1992_appointment”).
  • APPOINTMENT_TYPE - should correspond directly to one of the officer_role constants (main constants file) at the API enumerations site.
  • Prefix for COMPANY_NUMBER - represented by company_type in the API / main constants (main constants file) (and possibly company_subtype). Mostly this is straightforward with two complications:
  1. The API constants represent a “company” e.g. private / public as one of a range of constants. However the company prefix instead records which country (England / Wales vs. Scotland vs. Ireland - for which there are two prefixes) they were incorporated in, and the England / Wales ones don’t have a specific prefix but just have numbers for all 8 characters.
  2. for ICVC types there are various sub-options (as represented in the API constants).
    I’ve listed JSON below - with no guarantees! - which should be correct as of now and will map from prefix to CH constant as far as they go. Working out sub-types is up to you! For UK companies the “00” dummy prefix is what you need.
  • COMPANY_STATUS should map on to some combination of company_status / company_status_detail in the main api constants.

###JSON for company prefix → API constants###
The const field is one or more API company type constant values, type is either the text which this represents or something meaningful where there’re several, text is
full text, ctry is obviously the text for country with jurisdiction (CH have a constant for this too).
(CH limited the types you can upload, this should be .json).
companyPrefixToType_20181029.txt (8.8 KB)

Examples of people using the API / bulk data (disclaimer - these are nothing to do with me and I’ve not run any of the code):

Mapping politicians to company ownership as well as running a lot of statistics looking at data integrity (using Jupyter Notebooks / python):

Build a graph of the network of relationships between officers. Written in r: