Suggestion: put the timepoint on the download pages

From what I gather, snapshots that could be used as a starting point for streaming updates have not been implemented yet. However you do have pages where I can download something like a snapshot for a couple of kinds of data:

https://download.companieshouse.gov.uk/en_output.html

http://download.companieshouse.gov.uk/en_pscdata.html

I can’t see timepoints in the returned data. Could you put the timepoint on those pages so we can grab the initial data and the timepoint to use for retrieving changes after that?

2 Likes

The timepoint is sent with each event on the streaming APIs, so you don’t need a snapshot for that purpose. If you make a request to https://stream.companieshouse.gov.uk/charges for example, you will get charge events, each containing their timepoint. If you don’t specify a timepoint in the request, you will get the latest by default. You can then keep track of the timepoints in your database.

I was thinking more about starting with a snapshot, then augmenting it each day with a catch up from the change stream. But I can avoid missing updates in the initial setup by starting to populate from the stream and then using the next start-of-month drop to fill in any records that I haven’t already got from the change stream; that way I know it will be complete.