Date field data - non-ISO values e.g. "Unknown" / blank

Date fields blank, null or “Unknown”. Examples:

  • 00365818 - companies search has "date_of_cessation": “Unknown”, company profile just doesn’t have this field. Company “status”: “dissolved”.
  • NF002764 - companies search: “date_of_cessation”: “Unknown”, company profile doesn’t have this field. “company_status”: “converted-closed”
  • SZ000001 - companies search has “date_of_creation”: null, company profile has “date_of_creation”: “”. Company “status”: “active”.

Related issues have been noted before - IIRC we’ve also found epoch-format values in date fields (sorry, no example recorded):

P.S. dates within “description_values”: { “description”} (“legacy” fields) in filing history - presumably these won’t be marked up and will stay as plain data? Example from company 00229606 (some fields removed for clarity):
{
“description”: “legacy”,
“links”:
{
“self”: “/company/00229606/filing-history/MDAxMjU4NjA0NGFkaXF6a2N4”
},
“description_values”:
{
“description”: “Return made up to 25/07/05; full list of members”
},
“type”: “363s”,
“date”: “2005-08-18”,
“category”: “annual-return”,
“transaction_id”: “MDAxMjU4NjA0NGFkaXF6a2N4”
}

As an external consumer and refiner of Companies House data I am far from clear what you are trying to achieve and, more importantly, what you expect their systems to supply you with.

Taking 00365818 as the example this company was dissolved; the last accounts, which we of DORMANT type covering the accounting period to 2 April 1994. Interpretation of the data suggests it was dissolved in the succeeding 2 or 3 years by Compulsory Strike-off so it effectively moved to the Companies Closed Register a good 20 years ago.

If you were permitted to use our system you would not, as a user, be allowed to see the closed archive; I think that we are very fortunate to be able to see anything at all like this on the open, ACTIVE, public system.

Frank Murphy, CEO, StatBooks Ltd

This is an issue with the quality and consistancy of data on a beta service that we’re testing. What someone is trying to achieve by reporting this issue seems fairly self evident. Given that Companies House also consumes and refines their own API through the beta search site, raising awareness of these issues benefits them as well.

Taking 00365818 as an example…

The fact that this company has an Unknown date_of_cessation in the company search results means that beta search site doesn’t know how to deal with it and they end up with this:

BOWMER & KIRKLAND (PLANT) LIMITED
00365818 - Dissolved on
Stratton House, Piccadilly, London, W1X 6AS

Which is not ideal in terms of presentation and something I’d be surprised if they didn’t want to fix at some point.

It’s also something I wasn’t aware could happen, and potentially would have never been aware of if I hadn’t read this post, so I can go and check how the systems that consume and refine Companies House data, that I’m responsible for, deal with this.

So thanks for posting it!

The point I was trying to make here was are events, and related data, from 20 years ago for a dissolved company actually relevant to anyone now.

I did not have an issue with the question per se but would have liked, and would still like, to know the rationale(s) that led to it!

This could be relevant in some sectors (legal being the one we’re most likely to deal with, but there may be others), but from my (developer) perspective it’s not about relevancy and availability, but the ability to create an interface that returns…

00365818 - Dissolution date unknown

instead of

00365818 - Dissolved on

or even just crashing, when it encounters something other than the expected data. In the documentation date_of_cessation/creation are both listed as optional date fields in search and profile. The fact that these fields can also be null or a non-date string Unknown are issues whether or not the data itself is relevant.

OK - I can see the potential relevance to the legal sector but probably in a more recent past than twenty years ago. I may also have to change my views on relevance as well; some around me would say that was a good thing!

I totally agree with you that if the data is available then the interface should actually return that data and not unknown or null. Perhaps the issue here will prove to be actually availability within what was loaded.

I even more totally agree that the interface must not crash on encountering the unexpected.

We are on the same side really.

It will be interesting to see a response on this topic from the Companies House team!

Thanks for replies - my perspective was as a developer wanting to share what I’d found as much as hoping for definitive answers. I’ve learned quite a bit from reading around here. I expect changes here and appreciate that such a large and long-lived dataset will have exceptions. It’s because CH do seem to be helpful and responsive I’m asking whether there are “known unknowns”.

By-the-by I do work for the legal side so “old data” can sometimes be relevant although the particular data is by way of example.

…for a more recent date example, this time in Unix epoch (milliseconds), filing history, company #03888792, again some fields removed:
{
“associated_filings”: [
{
“action_date”: 1447113600000,
“category”: “capital”,
“date”: “2015-11-10”,
“description”: “statement-of-capital”,
“description_values”:
{

“date”: “2015-11-10”
},
“original_description”: “10/11/15 Statement of Capital;GBP 1339546478.75”,
“type”: “SH01”
}],
“type”: “AR01”,
“description”: “annual-return-company-with-made-up-date-no-member-list”,
“date”: “2015-11-10”,
“category”: “annual-return”,
“barcode”: “A4IG6XHU”,
“transaction_id”: “MzEzNDg4NjczOGFkaXF6a2N4”

}

Yes, here we get date correctly in description_values. But does this represent a bug, or will dates be “mostly ISO format, occasionally ‘Unknown’ or null, and rarely epoch”?

Anyway, over to Companies House team.

This is a good piece of drain cover lifting.

It will be most interesting to know what else you find and to hear from the Companies House team.

Keep on lifting those covers!!

I have just come across this problem when searching for ‘Axa’. Some of the records returned have date_of_cessation set to ‘Unknown’.
Since the date format used by Companies House is of the form “2016-12-13” and we want to display it as ‘13-Dec-2016’ this makes it very difficult for us to do the conversion. I’m not sure that even checking the first character is numeric would do the trick since one can apparently get dates in the format ‘1405327611000’.
Would checking that the first two characters are ‘18’, ‘19’ or’20’ do it? How far back do Companies House records go?

CH seems to have gone a bit quiet on responding here on the documentation front or for data issues. I suspect this will be the case 'till the Swagger-compatible docs are available. However - they clearly do take note. And their documentation is quietly being updated over time.

As noted, I’ve found it’s wise to trap the following:

  • Values of “unknown” and “null” for non-text fields.
  • Values of “null” for objects e.g. address / registered_office_address
  • Integers possibly starting with 0 e.g. month and year in
    date_of_birth.
  • Dates as YYYY-MM-DD or Epoch format (and if you want to poke inside
    fields e.g. filing history good old DD/MM/YY put in by humans).
  • Undocumented constant values in lookup fields.
  • Unusual-looking markdown in e.g. filing history description constants “[ < link text > ]” for links.
  • UK company registration numbers missing leading / internal zeros (for corporate officers / PSCs).

The company reg. number mentioned last may be missing entirely. This is a nuisance but we’re probably stuck with it - see Officer List return company number for corporates?. So to find company data on such officers we’re back to hoping given names match company names - see Comapny Name matching question

I think the policy is of “just recording what we get on the form”. (In a perfect world some of these might be validated / corrected where it’s “obvious” what is meant…some things e.g. dates, filings get put into formats…)

I wonder how far back the records go also. There are obviously old companies around. I’ve seen:

SC012574 - Incorporated on 22 February 1923
00289141 - Incorporated on 14 June 1934

I’m not sure the Shore Porters falls under CH remit…

I think the oldest is Marine and General https://beta.companieshouse.gov.uk/company/00000006 I only know that off the top of my head because it gets used in examples a lot!

Where (presumably) underlying data fields are missing / unknown this may affect several fields. An example from Companies search: NF003350 (admittedly this looks like a rather odd old entry).
“date_of_cessation” is “Unknown” and the description is incomplete also:

{
  "date_of_cessation": "Unknown",
  "company_number": "NF003350",
  "description": "NF003350 - Closed on " ...
}

By chance, I encountered another “Closed on …” company earlier today, also an NF company: NF002699

There’s an enumeration for a status of “closed” rather than “closed-on”, which does not require a date, which suggests to me that when the date is unknown it should have been given that status instead.

I’ve been a little frustrated with this stuff today. As we don’t have access to the underlying data or the logic that CH employs to consume the API on the Beta search site there ends up being a lot of trial and error and guesswork involved. Made even more frustrating when we don’t get a response to a lot of documentation/data issues raised here. Please keep posting them though, they’re helpful to me at least!

1 Like

Yup, definitely sounds like a bug.

It’s been a slightly odd experience working with this system - lack of documentation is annoying but then people have been very helpful (sometimes even CH although not with a direct response).

Roundabouts and swings, and maybe sometime we’ll have it all in Swagger… Good luck!

Late response but are any of those with date_of_cessation set to ‘unknown’ AC / IC / BR companies? CH doesn’t hold info on those. However from data seen I’d then not expect any date_of_cessation field to be returned…

FYI I’ve noted the issues with missing dates in another post on the search resources: