The simplest way may be to request the data in bulk from Companies House on this thread:
…otherwise I believe that yes, you’ll need the API - but that is not designed for obtaining “bulk” data (e.g. just downloading their whole data set). I think the intention of that is for more limited / specific queries about particular companies / officers etc.
With the API You are limited to 600 Requests per 5 minutes. One Request can pull a list of all Company Offices for a Company. Which for 1 Million Companies would take the best part of 6 Days flat out. So it really depends how ofter you pull down company data and how often you need a refresh.