Companies House PSC Daily JSON snapshot - Syntax Error?

Hello,

Since around the 26th of January, we have been unable to decode the JSON
present within the daily companies house PSC snapshot. Are you aware of
any issues? Or do we need to sanitize the data before processing?

We have tried to decode the JSON (convert it into an array) using a few
different methods (using PHP etc) - what would be the best way for us to
approach this?

http://download.companieshouse.gov.uk/persons-with-significant-control-snapshot-2018-01-26.zip

Very Simple Example:

$contents = file_get_contents(‘persons-with-significant-control-snapshot-2018-03-18.txt’);

$json = json_decode($contents);
if($json == NULL) {

echo “invalid”;

}

Any suggestions would be welcome.

Thank you!

Steve

I don’t know why things would have changed - I have a file from before 26th and have just tried the last of the latest files and they look the same.

The files were not correct json as they stood, IIRC - they were a series of lines, each containing the (correct) json for one entry. So (I’ve cropped some of the data “…” ):

{“company_number”:“SL009363”,“data”:{“etag”:“4304b90e0392aac00d505011031f633f5d5d6832”,“kind”:“persons-with-significant-control-statement”, … ,“notified_on”:“2017-08-12”, … }}
{“company_number”:“SL031371”,“data”:{ … }}

So if you wanted to load them via your code you could e.g. turn the whole thing into a json array, then load it e.g. something like:

$contents = file_get_contents(‘persons-with-significant-control-snapshot-2018-03-18.txt’);
$contents = ‘[’. str_replace(array("\r\n", “\r”, “\n”), “,\r\n”, $contents); // build a json array
$contents = substr($contents, 0, strlen($contents)-3) . ‘]’; // trim off last ‘,’

$json = json_decode($contents);
// …