4. Data
The primary dataset for this study is Statistics New Zealand's prototype Longitudinal Business Database (LBD). The data were accessed in the Statistics New Zealand Data Laboratory under conditions designed to give effect to the security and confidentiality provisions of the Statistics Act 1975. The core of the LBD dataset is the Longitudinal Business Frame (LBF), which provides longitudinal information on all businesses in the Statistics New Zealand Business Frame since 1999, combined with information from the tax administration system. The LBF population includes all economically significant businesses.10
The LBF contains information at both the enterprise level and the plant level. At any point in time, an enterprise will contain one or more plants, and each plant will belong to only one enterprise. Plants are assigned a "permanent business number" (PBN) that identifies them longitudinally. The longitudinal links are established through the application of a number of continuity rules that allow PBNs to be linked even if they change enterprises or tax identifier (Seyb (2003), Statistics New Zealand (2006)). The LBF provides monthly snapshots of an enterprise's industry, institutional sector, business type, geographic location, and employee count.11 For PBNs, there is monthly information on industry, location, and employee count. We apply an enterprise's industry to all plants within the enterprise, which will lead to some imprecision in the estimation of and adjustment for industry productivity differentials.
The LBD is a research database that includes the LBF as well as a range of administrative and survey data that can be linked to the LBF. The primary unit of observation in the LBD is an enterprise observed in a particular year. The current study uses business demographic information from the LBF, linked with financial performance measures (from the Annual Enterprise Survey, and various tax returns, including IR10s and the GST-sourced Business Activity Indicator data), and measures of labour input (working proprietor counts from IR10 forms, and employee counts for PBNs from PAYE (pay-as-you-earn income tax) returns as included in the Linked Employer-Employee Dataset (LEED).
Labour productivity is measured as current-price value added per worker.12 The primary source used to obtain a value added measure is the Annual Enterprise Survey (AES). The AES is a postal sample survey, supplemented with administrative data from tax sources. We use postal returns from AES to provide annual value added for the firm-specified financial year. This information is available for around 10% of enterprises, which are disproportionately larger firms, accounting for around 50% of total employment in New Zealand.
Where AES information is not available, we use a proxy for value added, based on net sales as reported in GST returns, adjusted for changes in stocks. A measure of stock adjustment is taken from IR10 tax forms, and where this is unavailable, the change in stocks is imputed from the ratio of stock change to sales within each 3-digit ANZSIC industry.
The GST information comes from Statistics New Zealand's Business Activity Indicator data. In some cases, GST returns are provided for groups of enterprises, or at lower than monthly frequency. In the BAI data, GST return information is allocated to enterprises within groups, and if necessary across time to derive a monthly track. In the current study, we aggregate BAI-sourced value added to group level, to reduce possible measurement error in value added per worker estimates arising from the allocation of group returns to enterprises operating in different locations. Within each group, we deduct value added as measured in AES postal returns from the group's aggregate BAI value added, and allocate this residual value added to non-AES-reporting enterprises in proportion to enterprise labour input, which has the effect of masking some labour productivity differences across enterprises.13
A measure of monthly labour input is calculated for each PBN as the sum of rolling mean employment (RME) and a share of working proprietor input in the enterprises to which the PBN belongs. RME is the average number of employees on the PBN's monthly PAYE return in the 12 months of the enterprise's financial year, as recorded in the LEED data. PAYE information is not always provided at the PBN level, and in LEED, there is some allocation of PAYE information to PBNs as outlined in Seyb (2003). The annual number of working proprietors in each enterprise is available in the LEED data, based on tax return information. Labour input from working proprietors is allocated to the PBNs within each enterprise in proportion to the PBN's RME. Where an enterprise has only working proprietors, the working proprietor input is allocated equally across all component PBNs. There is a large number of PBNs in each year for which RME is zero. Labour productivity is undefined for these PBNs unless working proprietor information is also incorporated in labour input. Using working proprietor information increases the number of plants with usable labour productivity information by 80 to 100 percent, and increases labour input by 13 to 20 percent.14
For each year from 1999/2000 to 2005/06 (referred to as 2000 to 2006 respectively for the remainder of the paper), we select plants that belong to an enterprise that: a) is always private-for-profit ; b) is never a household or located overseas; c) has non-missing industry information; and d) is not in the ‘Government Administration and Defence' industry.15
We exclude plants for which location (area unit, territorial authority, or regional council) information is missing, and plants in area units outside territorial authorities (island and inlets). In order to maintain a consistent population that can support geographic tabulations and maps later in the paper while protecting confidentiality, some additional exclusions16 are applied. Finally, we drop observations where labour input is zero, and about half of one percent of plant observations where the absolute value of value added per worker is greater than $1m.
4.1 Firm performance by location
The geographic location of economic activity is better captured by the location of PBNs than by the location of enterprises. However, value added is available only for enterprises. An estimate of firm performance by geographic location is obtained by allocating enterprise value added to PBNs in proportion to the PBNs labour input. This approach constrains value added per worker to be constant within enterprises, reducing measured geographic differences in productivity. Where enterprise value added is obtained from BAI group returns, the averaging is more severe, constraining labour productivity to be constant for all PBNs in non-AES-reporting enterprises.
The allocation of enterprise or group value added to PBNs is complicated by the fact that, within a financial year, a PBN may belong to more than one enterprise, and an enterprise may belong to more than one group. Appendix B summarises the treatment of such cases.
To control for the impact of averaging, mean productivity by location is derived from an employment-weighted regression of labour productivity on a set of location share covariates. Average value added per worker (VAPW) within a group (s) is the employment-weighted average of location-specific value added per worker. Because we know the employment share of each group that is in each location, we can statistically recover the underlying location-specific VAPW, using the following regression specification.

The subscript s here refers to a set of plants over which value added per worker has been averaged. The term in brackets captures, for each area j, the proportion of employment in s that is located in area j. If each group operated in only one location, this would be equivalent to including a dummy variable for each location. The resulting estimated coefficients gj are estimates of the underlying mean productivities in each location. The term est captures idiosyncratic productivity in the set of plants s beyond what can be explained by average productivities in the locations in which the plants operate. Equation (1) is estimated separately by year.
4.2 Adjusting for differences in industry composition
Some of the differences in productivity across areas may be due to the different mix of industries in different areas. To gauge the significance of such industry composition differences, we also calculate an alternative set of estimated locational premia based on a regression similar to that shown in equation 1, but including a set of 3-digit industry dummies.
The equation again contains a full set of share covariates. The added industry intercepts (dk) are estimated relative to national means and thus have zero mean, allowing the estimated
coefficients to capture the level of local productivity that would be observed if each location had the national industry mix. The adjustment removes the influence of cross-industry differences such as capital or energy intensity and identifies geographic productivity premia solely from within-industry variation across locations. Equation 2 is estimated separately by year.
Having obtained estimates of average labour productivity within each area, we calculate the relative performance of Auckland or of areas within Auckland by dividing the area's average labour productivity (
) by labour productivity averaged over all areas outside the Auckland region.
This approach can be extended to adjust for other plant-level differences that may influence productivity, although in the current paper, industry composition is the only adjustment made.
Back to Top