5. Key Caveats on the Data
The prior discussion ignores many issues of data quality. The largest of these issues are summarised in this section. While this section of the paper appears somewhat daunting, none of the caveats raised present insurmountable obstacles to the use of the data provided appropriate cautions are attached to outputs. Where feasible, next steps are discussed for researchers wishing to remove these obstacles.
5.1 Longitudinal Firm Continuity
We would like the LBD firm to correspond to an economic definition of a firm, such as the combination of production factors within the span of control of a set of owners. In a cross-sectional sense the Business Frame enterprise satisfies this definition nicely. However, longitudinally, this relationship tends to be weaker because SNZ tracks the continuation of legal units, not firms. Eurostat sets out three criteria for measuring firm continuity – control, economic activity & location – and require two of these to remain the same for a firm to be described as continuing (Eurostat 2003). However, new legal units may be created on the BF without any of these three continuance criteria being violated. There is scope within the future development of the LBD for some repairs to be executed on the longitudinal continuity of firms. This work would improve outputs looking at firm entry and exit, possibly yielding insights into differences between greenfields and mergers & acquisition start-ups (see, eg, Baldwin & Gu 2006).
On a related issue, some caution is necessary around the interpretation of entering and exiting firm productivity due to the inability of the data to accurately discern exact dates that production starts and stops. For example, in Figure 6 and 7, the sharp pick-up (drop-off) in productivity of entrants (exiters) in their first (last) year is perhaps indicative of data issues that could influence the productivity calculation, for example, divergence between timing of administrative tasks such as GST registration and the production of goods & services and, similarly, winding down of financial accounts at business exit. As suggested earlier, these issues are a likely cause of higher rates of missing tax data in years that firms enter and exit the population. Our approach of an inclusive economic activity-based population is also likely to exacerbate these issues by including firms that have very low output.
5.2 Employment Measurement
The estimation of entering and exiting firm labour productivity is further exacerbated by the fact that working proprietor data is most often only observed annually (ie, there are no part-year counts). The mean (median) entering firm has an RME of 1.64 (0) and working proprietor count of 0.95 (1). Thus the current assumption that working proprietors work the full year has a measurable impact on labour productivity estimates in the first year of activity (and, similarly, the last year). Put another way, if we assumed that working proprietors only worked half the year of start-up and/or exit, the estimated mean labour productivity of entrants (exiters) would exceed the incumbent labour productivity in the year of entry (exit). On the positive side, the inclusion of RME as the labour input accounts for mid-year start-ups/shutdowns, compared to prior studies, which have relied on BF annual snap-shot employment and thus have to assume a labour input pattern over the year for both employees and employers (eg, Maré & Timmins 2006, who carefully test their estimates using both the "full-year" and "half-year" assumptions).38
The fact that the employment data involves simple headcounts will also have a tendency to overestimate labour input (because of part-time workers).39 In the absence of detailed hours worked data, the most common approach to correcting for this issue has been to adjust counts by industry-level average hours worked sourced from either SNZ's Household Labour Force Survey or Quarterly Employment Survey. Such adjustment only improves the comparison across, rather than within, industries, and can usually only be done with confidence at the two- or three-digit industry level.
In addition, some thought should be given to whether at least some part-time employment proxy could be established in the data (see Maré & Hyslop 2006 for an example of how this has been done).40 A further issue arises for working proprietors in that some owners of firms will receive taxable income purely as a return on equity, without any labour input being supplied at all. Identifying this subset of owners is difficult.41
5.3 Deflators
As noted earlier, all results in this paper are presented in nominal terms. It is usual, in policy applications, to be primarily interested in real productivity growth. However, as with measuring hours worked, no input or output prices exist at a comprehensive firm level and it is usual to apply industry-level input & output producer price indices to improve the cross-industry comparison of firm performance.
5.4 Capital Data in BAI Sales and Purchases
The GST-based sales (and purchases) data is potentially contaminated by capital income (expenditure). As the BAI documentation notes:
"…GST sales variable includes other items such as: Sales of second-hand assets… [and] sales of businesses themselves. If they are sold as a going concern the sale is zero-rated. The amount of the sale will still appear in the GST sales variable.
…GST purchases variable also includes: Purchases of land, buildings, plant and machinery etc... [and] purchases of businesses themselves. If the business is sold as a going concern the amount of the sale is not record[ed] as a GST purchase." SNZ (2001)
In a particular year, this capital data could potentially swamp measurement of true firm value-added (productivity). BAI processes to address this issue are only targeted at removing large spikes in values that might affect firm confidentiality in reported outputs.
One area where this capital contamination appears to manifest is in the use of zero-rated sales as a proxy for export earnings (since exports do not attract GST). An earlier version of this paper (Fabling 2007a) used positive zero-rated sales as a proxy indicator of exporting behaviour and found implausibly high growth rates in labour productivity for exporting firms (relative to non-exporters). The figure in Appendix B replicates that earlier analysis for manufacturers in our population. As Fabling concluded:
"It appears from this data that entering exporters drive the difference in growth rates. However, a perhaps more realistic explanation of the high growth rates is that the [BAI-based] export indicator is a poor proxy for measuring true exporting behaviour…"42
Obviously capital contamination is most influential in analyses where the population is restricted to firms with zero-rated GST sales. Fortunately, the rate of entry into (exit out of) zero-rated sales is quite low, with some of that activity presumably related to true export activity. Overall the number of productivity observations that are affected may be small, though this is hard to estimate with certainty without a comprehensive exports measure (ie, including service exports). There is potential to further investigate the importance of the capital contamination issue using IR10 data on sales, gains/losses on disposals of fixed assets and book values of fixed assets, together with better export identification as discussed below (and, perhaps, data held on the LBF regarding transfers of plants between firms).
5.5 Exporter Identification
Where the BAI does have an advantage over Customs data is that it should capture trade in services – clearly an important subject for analysis. One potential way to identify service exporters (that does not rely on BAI data) might be to use IR4 foreign income data combined with BF trade in services (balance of payments) indicators and Customs data.
Though we have not discussed it in detail in this paper, we have a number of exporters in our dataset that, according to our ANZSIC classification should not be in the business of substantially exporting goods, but are. These firms may be "head offices" (explaining their service-related ANZSIC) in enterprise groups that contain a firm that should, more appropriately, be associated with the exported good (eg, a manufacturing subsidiary). The other way in which group structures create problems is when groups restructure and the exports appear to shift between firms in the group. This issue has direct parallels with the discussion of false entry and exit above.
5.6 Missing data
Section 2 of the paper noted a number of potential sources of bias arising from missing administrative data. As Table 5 demonstrates, there is currently a stark trade-off between the use of available stock adjustment data in value-added calculation, and retaining labour productivity observations (this trade-off does not apply to the MFP calculation as stock adjustments and depreciation costs are both sourced from IR10s). As noted earlier, previous New Zealand micro firm data analyses have relied on BAI data, and as such benefit from much lower rates of missing data. Linear interpolation is commonly used for imputation in such cases.
There is the possibility of using IR10 returns that are currently discarded because they fail edit checks. The potential here is not insignificant with 154,164 financial performance returns currently discarded & 60,949 financial position returns discarded in the LBD dataset. Further, for the stock adjustment, there is some potential to patch small holes in longitudinal data since IR10s include both opening and closing stocks. Unfortunately such data is not always consistent across returns (ie, opening stocks in one year do not always match closing stocks in the prior year).
It may also make sense, for some applications, to use AES data in parallel with IR10. Beyond that it is probably sensible to impute stock changes, arguing that the adjustment is minor for most firms. It is another matter as to whether larger scale imputation of IR10 variables is desirable to extend the number of observations of, say, profitability or MFP. The method required would need to be very carefully thought through, given the high rate of imputed data that would eventuate in the dataset. Since the completion of this paper, SNZ have created imputed data for BAI & IR10 making use of a mix of linear interpolation, donor & historical imputation (SNZ 2007).
Back to Top