Matching persons from Prod195 to records in the /officers stream

We have used the officer bulk export to make a database of officers, and I would like to keep it up to date using the officer stream.

In the bulk product officers are identified by a “person number” (eg 047170140001), but this is not present in the JSON objects sent in the streaming API. I’m guessing there is some mapping from “person number” to “resource_id” in the JSON

Is that mapping documented somewhere?

There was this previous question Officer stream lacks person number where the there was a suggestion of just using the Prod198 files … but it would be much simpler/modern to be able to use the streaming API for this.

Cheers,
Steve

As far as I’m aware, the person_number field is now included in the streaming API responses so you can use this to match the records. The difficulty lies in tracking when a person number has been incremented/merged and finding the correct record to apply the update to. The ‘fields_changed’ property on the streaming API objects is never populated so it may be complicated to use the streaming API to update the table you’ve built. Perhaps @MArkWilliams might be able to share some insight?

1 Like

Thanks for the reply. Yeah I just assumed there would be some way to match on the ID … because if we can’t handle that case of person number changing, it kinda contradicts the line on the streaming API documentation …

For this purpose, Companies House produces snapshot datasets that you can import into a database, and subsequenlty keep current with the streaming API.

Or am I using the wrong snapshot for this? as I’m assuming we are not the only people to have bumped into this.

Totally keen for any insight @MArkWilliams

Yeah, I’ve been wondering about this for some time now so I was hoping that you might get a useful response but it seems like we might be out of luck. I’m not sure if there are any Companies House employees on this forum other than @MArkWilliams so unless someone here has a solution and sees this thread, the best thing to do may be to contact the support team. Please do update me if you come across any solutions as my team and I are in a similar position to you (at the minute, we’re working out some logic to determine which record to update when officer person records are merged and the person number is completely different).

The person_number field can be used to match data from the bulk file with data from the JSON API (rest and streaming).
This never used to be possible, until 6 April 2024 when they added this field to API responses.

The first 8 digits stay the same when the last 4 get incremented. This allows matching across updates or changes (eg to address).

While it’s true that the first 8 digits of the person number almost always remain the same, there is a specific case where the person number can be changed entirely. This occurs when two person records are matched against each other and merged by Companies House. In this case the first 8 digits become completely different and I’m not clear how we are supposed to go about determining which record to update/whether this is an entirely new appointment. @ebrian101 do you have any thoughts on how to handle merged person records without being told what the old person number was?

Aha thanks for explaining that. No, I’ve not found a way of doing this via the streaming API.
The daily bulk update files to the officer data set on the FTP server do contain old person number and new person number, but this defeats the purpose of the streaming API.

Each row in the prod 198 file has both old person number and new person number.

Thanks for the comments folks,

I think for how we are using the data we might get away with not being able to handle that case. It is a little frustrating that we can’t use the streaming API the way it was intended. If all the bulk downloads were like the PSCs this would be pretty easy.