The ‘dirty dozen’ of freshwater science: detecting then reconciling hydrological data biases and errors

Sound water policy and management rests on sound hydrometeorological and ecological data. Conversely, unrepresentative, poorly collected, or erroneously archived data introduce uncertainty regarding the magnitude, rate, and direction of environmental change, in addition to undermining confidence in decision‐making processes. Unfortunately, data biases and errors can enter the information flow at various stages, starting with site selection, instrumentation, sampling/measurement procedures, postprocessing and ending with archiving systems. Techniques such as visual inspection of raw data, graphical representation, and comparison between sites, outlier, and trend detection, and referral to metadata can all help uncover spurious data. Tell‐tale signs of ambiguous and/or anomalous data are highlighted using 12 carefully chosen cases drawn mainly from hydrology (‘the dirty dozen’). These include evidence of changes in site or local conditions (due to land management, river regulation, or urbanization); modifications to instrumentation or inconsistent observer behavior; mismatched or misrepresentative sampling in space and time; treatment of missing values, postprocessing and data storage errors. Also for raising awareness of pitfalls, recommendations are provided for uncovering lapses in data quality after the information has been gathered. It is noted that error detection and attribution are more problematic for very large data sets, where observation networks are automated, or when various information sources have been combined. In these cases, more holistic indicators of data integrity are needed that reflect the overall information life‐cycle and application(s) of the hydrological data. WIREs Water 2017, 4:e1209. doi: 10.1002/wat2.1209

Sound water policy and management rests on sound hydrometeorological and ecological data. Conversely, unrepresentative, poorly collected, or erroneously archived data introduce uncertainty regarding the magnitude, rate, and direction of environmental change, in addition to undermining confidence in decision-making processes. Unfortunately, data biases and errors can enter the information flow at various stages, starting with site selection, instrumentation, sampling/measurement procedures, postprocessing and ending with archiving systems. Techniques such as visual inspection of raw data, graphical representation, and comparison between sites, outlier, and trend detection, and referral to metadata can all help uncover spurious data. Tell-tale signs of ambiguous and/or anomalous data are highlighted using 12 carefully chosen cases drawn mainly from hydrology ('the dirty dozen'). These include evidence of changes in site or local conditions (due to land management, river regulation, or urbanization); modifications to instrumentation or inconsistent observer behavior; mismatched or misrepresentative sampling in space and time; treatment of missing values, postprocessing and data storage errors. Also for raising awareness of pitfalls, recommendations are provided for uncovering lapses in data quality after the information has been gathered. It is noted that error detection and attribution are more problematic for very large data sets, where observation networks are automated, or when various information sources have been combined. In these cases, more holistic indicators of data integrity are needed that reflect the overall information life-cycle and application(s) of the hydrological data. © 2017 Wiley

INTRODUCTION
H igh-quality hydrometeorological measurement contributes to high-quality policies and management of natural resources. Examples of data sensitive (hydro-) decisions include: compliance monitoring for environmental regulation; water resource allocation between riparian states; planning, design and investment in long-lived water infrastructure; post-project evaluation; safety and performance reviews of critical infrastructure. All such activities rely on high-integrity data collection and archiving processes. Conversely, poor measurement and information management practices can seriously undermine confidence in data. 1 International bodies such as the World Meteorological Organisation provide detailed guidelines on best measurement practices, beginning with how to choose a site for a meteorological station, followed by protocols for site maintenance and instrument use. 2 Likewise, seminal texts such as Streamflow Measurement 3 and Hydrology in Practice 4 explain the strengths and weaknesses of different types of equipment for measuring water balance terms. These points of reference are intended to avoid erroneous practices before they occur; there is surprisingly little advice on how to discern lapses in sound practice after the information has been gathered. Of course, there are quality assurance systems to protect the veracity of data holdings in major collections such as the UK National River Flow Archive (NRFA). 5 But even these systems are fallible-erroneous entries can still slip through automated checking procedures when data values lie within plausible ranges.
This overview exposes some common data recording and handling errors, to explain how they might arise and be detected. We refer to our collection of 'rogue' data as The Dirty Dozen. This is in homage to the classic 1967 film by the same name in which a band of US Army convicts are brought together to achieve an honorable but near-impossible military objective. Similarly, by bringing together a portfolio of suspect data we are aiming for a positive outcome of raised awareness among researchers and practitioners. Although we draw our exhibits largely from observed data and personal experience, some of the same pitfalls might apply to modeled information. Likewise, while our case studies are mainly based on hydrological data the issues raised are relevant to related disciplines of ecology, meteorology, and water quality.
The order of our dirty dozen follows a typical information flow. We begin with examples of artificial influences on monitoring sites (#1-#4), then cover equipment changes (#5 and #6), quirks of sampling and observer bias (#7-#9), interpretation of outliers (#10 and #11), and techniques for infilling missing data (#12). We then add examples of errors that can occur at postprocessing and archiving stages, along with recommendations for detecting these kinds of erroneous values. Some supporting data are provided as Appendix S1, Supporting Information so readers can examine the same data for themselves. It is our intention that the dirty dozen(s) assembled in this paper will provide a basis for practical exercises and expose some of the tell-tale signs when things go wrong with hydrometric data.

EXHIBIT #1: CHANGING SITE LOCATION AND THE VALUE OF METADATA
Lengthy hydrometeorological records are essential for understanding climate variability and change, detecting emergent trends and contextualizing extreme weather events. To be fit for purpose, these data need to be homogeneous (i.e., collected in consistent ways and places) so that variability is only caused by changes in climate rather than by artificial influences such as station moves. Homogeneity may be tested by (1) identifying break-points in single series (absolute homogeneity) 6 ; or (2) comparing records from neighboring stations (relative homogeneity). 2 In both cases, metadata are invaluable for confirming detected breaks and for highlighting questionable parts of data that might elude statistical tests. The value of metadata increases with the age of the record because the earlier the data, the smaller the number of stations for implementing relative homogenization tests.
For example, absolute and relative homogenization methods were applied alongside metadata to build a quality assured, long-term rainfall network for the Island of Ireland. 7 One part of that record for Malin Head (MH) illustrates how station moves (and other factors) can influence trends identified in data and the importance of metadata in building confidence in adjusted series. This station was used in an earlier analysis of trends that claimed a large increase in annual rainfall totals 8 . However, metadata (Supplementary Information page #1) indicate: changes in the time and frequency of readings (throughout the record, but particularly after 1950 with the onset of hourly measurements); a move of the station from a cliff top at 230 ft (70 m) to a location at 20 ft (6 m) above sea level (in 1921); opening of a new station (same elevation) in 1955. Detected breaks in the annual rainfall series were consistent with the station relocation in 1921 and changes in the time of observations in the 1950s. This evidence was used to guide data homogenization-that is, correction for gauge under-catch during decades with less frequent measurement and more exposed site conditions. 7 A significant increasing trend is evident in the prehomogenized annual rainfall series (Figure 1(a)). However, posthomogenization, the gradient for the entire series (1890-2010) is only a quarter of that for the un-corrected record. Figure 1(b) shows a double mass plot which compares the cumulative sums of annual rainfall for the corrected MH annual series with Derry (the nearest long-running neighboring station). The break-points and cumulative departure of the MH homogenized record from Derry (the 1:1 line) are smaller than those for the MH original record.
Recommendation: Use metadata to check the continuity of site location and environs; use techniques such as linear regression or Pettitt's test for breakpoints to expose trends and abrupt changes respectively that may be due to undocumented changes in site properties.

EXHIBIT #2: ARTIFICIAL INFLUENCE ON RECORDS (ARTERIAL DRAINAGE)
Agricultural productivity is greatly reduced where there is persistent waterlogging and flooding. In an effort to combat this problem, arterial drainage schemes involving channel deepening and widening may be undertaken to improve flow conveyance. Field drains might also be installed to drain the land. Newly dredged river channels have a greater capacity to receive additional water from previously waterlogged soils. While arterial drainage has economic advantages, it can introduce hydrological discontinuities to river flow records.
For example, a break point in the measured flows of the River Boyne in east Ireland was detected around the 1970s (Figure 2) (Appendix S1 page #2). Early studies linked this abrupt change in regime to increased precipitation caused by a shift in the North Atlantic Oscillation to a predominantly positive phase. 10 Subsequent research 9 attributes the change to an extensive arterial drainage scheme that took place over the period [1969][1970][1971][1972][1973][1974][1975][1976][1977][1978][1979][1980][1981][1982][1983][1984][1985][1986]. Between this preand postdrainage period observed flow volumes increased by approximately 30%. Hydrological modeling was used to simulate flows in the Boyne catchment as if in a natural state (i.e., with no arterial drainage but with observed climate variability). The results showed that modeled and observed flows did not match after the change point so increased precipitation does not fully account for the regime change. It was deduced that change in the Boyne must, therefore, be driven by a change within the catchmentmost likely arterial drainage.
This case demonstrates how human modification to river channels and drainage properties can have a substantial impact on river flow. Such artificial changes can be misinterpreted as a natural consequence of, for example, an intensification of the hydrological cycle due to climate change. The process of setting up multiple hypotheses and systematically The construction and operation of reservoirs can substantially impact gauged river flows 12 (and other quantities such as water temperature 13 ), predominantly through the introduction of compensation flows, 14 the suppression of flood maxima and/or the timing and magnitude of releases. One such time series in the UK NRFA is the Shell Brook in southern England. This gauging station began recording river flows in 1971 and these early data reflect the natural flow regime. In 1978, Ardingly reservoir was constructed immediately upstream of the gauging station.
The post-1978 river flow record is clearly influenced by the reservoir, with sustained periods of similar flows and abrupt step changes (as in 2005; Figure 3a). These anomalous patterns predominantly impact the drier half of each water year (April to September), although there are some years in which all flow data are affected. The sustained periods of both low flows and moderate flows are particularly apparent when comparing the pre-and postreservoir flow duration curves (Figure 3(b)) and flow quantiles (Figure 3(c)). Adjustments to reservoir operations have also introduced substantial interannual variability.
In this case, simply plotting the data should highlight the impact on flows. However, reservoir influence can be more subtle, for example, the truncation of low flows in summer. Such effects are more difficult to detect, although a flow duration curve (e.g., Figure 3(b)) can help to highlight deviations from the expected distribution of river flows in a natural series. Where modeling approaches can generate naturalized river flow data, a range of ecologically relevant indicators can be calculated to summarize changes in seasonal flow regime, hydrological extremes and variability (e.g., Ref 15).
Recommendation: Plot hydrographs and search metadata to identify more obvious erroneous river flow data; plot flow duration curves and calculate flow quantiles to quantify the influence or to highlight more subtle impacts.

EXHIBIT #4: ARTIFICIAL INFLUENCE ON RECORDS (URBAN HEAT ISLAND)
Near surface air temperatures are influenced by regional-and local-scale energy balances. In mid-latitudes, for example, summer anticyclones generally elevate air temperatures by synoptic-scale subsidence and by diabatic warming through amplified surface heat fluxes. The latter can be highly sensitive to spatial variations in the physical properties of the underlying land cover which modulate surface energy fluxes. 16 Nonhomogeneity can emerge in temperature records at fixed sampling locations if these sitespecific properties change in time. This can be problematic for the interpretation of trends in long-term temperature records. For instance, without detailed interrogation, it can be difficult to separate the impact of global-scale anthropogenic warming on temperature records, from local processes driven by land cover modification (e.g., Ref 17). This attribution uncertainty extends to associated water balance terms such as evapotranspiration. 18 Urbanization is known to affect air temperature records, as the surface properties of cities modify energy fluxes in ways that strongly favor nocturnal Overview wires.wiley.com/water warming. 19 Sampling locations experiencing urbanization over time may, therefore, contribute to a warm bias in the study of larger-scale temperature trends. 20 An assessment of data collected by the US Historical Climatology Network, found much greater 20th Century warming for urban stations relative to their rural counterparts, particularly for minimum air temperatures. 21 Figure 4(a) demonstrates this tendency for two stations separated by only a few hundred kilometers, with the urban site experiencing more than twice the rate of rural warming. Where such localized heating effects are detected it may be desirable to exclude the sample location from the study. However, removal of the artificial warming signal is also possible, for example, by homogenization techniques 22 (see Exhibit #1), or via methods that explicitly identify and adjust urban records to yield trends consistent with rural neighbors. 23 Satellite observations of night lights (Figure 4(b)) can be used to independently discriminate between rural and urban sites. 21 Recommendation: Use independent indicators of the extent of urban development (such as maps of nocturnal light) to identify surface air temperature records that may be affected by urbanization; apply urban-rural pairing procedures to correct localized warming trends at urban sites.

EXHIBIT #5: CHANGING INSTRUMENTS
There are many ways of collecting river flow data. Fixed gauging stations such as weirs and flumes aim to stabilize the relationship (rating) between flow depth and volume to enable more accurate measurement of discharge. Ultrasonic gauges and electromagnetic gauges measure velocity using acoustic pulses and magnetic fields, respectively. Structures and equipment at a gauging site may be installed, then changed or upgraded in time. For example, a velocity-area station may be superseded by a weir, which may in turn have ultrasonic equipment installed if a stable relationship between water level and flow cannot be achieved. Weirs can alter the level of the river significantly, and may affect only certain aspects of the flow regime. For example, in the case of the Harper's Brook ( Figure 5), only the annual maximum flows appear to be affected. Like all field equipment, electromagnetic gauges suffer from deterioration over time which introduces errors to the flow data (e.g., due to degrading insulation of detecting electrodes, or siltation of the weir crosssection). When a gauging station is being installed or modified, data are generally not recorded leading to gaps in the time series (see Exhibit #12). Weirs and electromagnetic gauges require substantial building works and the disruption to the flow in this period is significant. However, where ultrasonic gauges are fitted, an overlap period may be used to calibrate the instruments. In the majority of cases, installation of a weir or alterations to it are accounted for by taking spot gaugings of river velocity and cross-sectional area, and altering the rating curve (which defines the relationship between the stage and the discharge). Despite this, testing can reveal step changes and/or false trends as a result of gauging alterations and/or gaps in records. 24 For example, metadata for Harper's Brook at Old Mill Bridge, central England shows that the record began with a velocity area station measuring the natural channel, until a compound crump weir was built in 1965 ( Figure 5). A simple linear regression fit to the annual maximum (AMAX) flow series reveals a substantial trend, whereby the AMAX values appear to increase by 0.5 m 3 /s per decade. Plotting mean values for the data before and after the installation of the compound crump weir highlights the effect of the structure on the high flows in this river. The Pettitt statistical change point test also detects the year 1965. Even so, the increase in AMAX could still be partly explained by multi-decadal climate variability leading to a flood-rich period in the later portion of the record. 25 Recommendation: Use metadata to check the continuity of instrumentation at a site; use the Pettitt test to expose abrupt changes that may be due to undocumented changes in equipment at the site.     26 The zero elevation point is often located in the ground beneath the riverbed. Ideally, the datum should be fixed over time, such that there is a consistent reference point for the entire record. However, sometimes the datum is changed, for example, following degradation of the riverbed. Unfortunately, it is estimated that between a third and half of all US Geological Survey (USGS) stream gauges have had a change of datum or major change of location during their period of record (Kolva, personnal communication).
Changing the datum alters the gauge height that is referenced for a given water surface elevation. For example, at the Comite River near Comite ( Figure 6), the datum was lowered by 2 ft (0.6 m) on October 1, 1996 (note the imperial units that are routinely used in the United States). Hence, a stage of 2 ft in September 1996 is equivalent to a stage of 4 ft (1.2 m) in October 1996 (for the same water surface elevation). Such changes can be detected relatively easily in historical time series when a large datum shift is applied ( Figure 6), but not necessarily when the change is small or gradual, for example, due to ground subsidence. Note also that switches of units such as between imperial (in Figures 6 and 7) and metric can be problematic too. Shifts in the stagedischarge relation may further be indicative of natural geomorphic processes at the site (e.g. changes in riverbed elevation or channel width due to accretion or erosion). 28 Information about changes in datum can usually be found in the USGS gauging station water-year summary report (see http://waterdata. usgs.gov/LA/nwis/wys_rpt/?site_no=07378000& agency_cd=USGS).
The issue of datum correction is particularly important for time series analyses of stage records 29 or of river channel geometry. 30 When computing changes in the frequency of flood events above flood stage, for example, if the measurements are not referenced to a fixed datum, a spurious trend in flood frequency could be inferred. Progressive changes in the datum may also contribute to instability in rating relationships used to estimate discharge from river stage. 31 Recommendation: Plot and visually inspect the stage-discharge relationship and stage time series before conducting any statistical analyses; note any abrupt shifts in stage that may reveal undocumented changes in datum.

EXHIBIT #7: OBSERVER MEASUREMENT BIASES
Benford's Law (BL), also known as the first digit law, recognizes that in many collections of numbers, the leading digit is most often 1 (~30% of the time) and least often 9 (~5% of the time). Such differences in frequency are greater than would be expected to occur by chance. BL holds for a wide variety of socioeconomic and natural science data sets. 32 Knowledge of this law can be used as a diagnostic tool. For instance, departures from expected high frequencies of small leading digits are routinely used to pick up rounding errors or fabricated data (e.g., in tax returns).
BL can also be used to detect observer bias or suspect values in hydrometeorological data. 33 Some biases may be unintentional. For example, weather observers tend to favor daily precipitation totals that are divisible by 5 or 10. One evaluation of the US Cooperative Observer Program network found that 97% of stations with complete or near complete records exhibit this 5/10 bias. 34 Observers also tend to under-report the frequency of days with light precipitation, that is, daily totals at the lower limit of measurement-which in the United States is often close to 2.54 mm (or 0.1 inches). Both biases were linked to the precision and consistency of use of precipitation measuring sticks which have large, labeled tick marks every 0.10 inches, large, unlabeled tick marks every 0.05 inches, and small, unlabeled tick marks every 0.01 inches. 34 Both number bias and trace wet-day under-reporting skew the overall frequency distribution of precipitation amounts in ways that can affect estimation of extreme values.
Another bias occurs when manual weather observations are not made on a weekend or over a holiday period. Instead, any precipitation falling during the unobserved days is assigned to the first day of return to business, which is typically a Monday or Tuesday. Average precipitation totals on these days tend to be higher than those estimated for days on the weekend. Such under-reporting of rainfall on Sunday has been shown for meteorological stations in Australia, 35 the United Kingdom, 36 and United States. 34 To illustrate these points, observer number preference and weekend under-reporting biases are assessed using daily precipitation data for Dushanbe, Tajikistan ( Figure 7) (Appendix S1 page #7). At this site, observer(s) have a preference for 3.0 and 6.0 inch daily rainfall totals as evidenced by unexpectedly high frequencies of these amounts during the period 1958-1967. In fact, the value 3.0 occurs 14% more frequently than expected by BL. More striking is the lack of any values either side of the 3.0 and 6.0 inch amounts which further raises doubt about the credibility of these entries. Mean intensities are notably higher on Mondays/Tuesdays than on Sundays suggesting that some weekend rainfall has been carried over into weekday totals too.
Recommendation: Use histograms of daily precipitation amounts to reveal under-reporting of light rainfall and/or observer number bias; mean daily amounts plotted by day of week can expose unrecorded aggregation of multi-day precipitation.

EXHIBIT #8: SAMPLING BIAS IN TIME
Spot sampling is widely used to monitor environmental variables in a noncontinuous way, perhaps to save time and/or resources. Sampling may be fixed (systematic) or random (without any temporal or spatial structure) according to the purpose of the data collection. Ideally, the sampling frequency, time, and location are appropriate to the behavior of the variable(s) under surveillance. Slowly varying phenomena such as groundwater levels may be adequately sampled once per month at a handful of sites to represent behavior across an aquifer. Conversely, rapidly varying variables like suspended sediment concentrations (see Exhibits #10 and #11) have to be sampled at hourly or subhourly intervals to accurately estimate the amount of material transported. If the sampling frequency is not appropriate, biased estimates may arise. Overview wires.wiley.com/water For example, it has been shown that the 98th percentile water temperature (used for compliance monitoring in the EU Water Framework Directive, WFD) can be 1 C cooler if based on monthly values rather than the 'true' values from hourly sampling. 37 As well as the frequency, the time of sampling is also critical for variables like water and air temperatures which have strong diurnal and seasonal cycles. 38 Provided that samples are collected at fixed points in these cycles, repeat measurements are comparable with each other. Figure 8 provides an example where systematic spot sampling was not applied to water temperature monitoring at a site on the River Dove, UK (Appendix S1, page #8).
Although the water temperature measurements in Figure 8 were made by trained field staff, following standard procedures, with well-maintained equipment and at a fixed location, the time of day of taking the monthly samples was not consistent. Spot samples in the mid-1990s were taken at around 09:00 h, but this drifted to about 13:00 h by the 2010s. Given that afternoon water temperatures are  typically higher than those in the morning, the change in sampling time alone has introduced a warming bias of~1.1 C over the course of the record. Even small discrepancies in water temperature are significant because they can lead to a misclassification of a river's health under the terms of the WFD, or exaggerate the pace of warming seen in UK freshwaters. 40 Recommendation: Plot the time of spot sampling to check for hidden biases in the collection of data, particularly for series with strong cyclical variations.

EXHIBIT #9: MISMATCHED SAMPLING IN SPACE AND TIME
Continuous river discharge records are often used to derive 'flow statistics' to match with other environmental indicators such as benthic invertebrate data. 41,42 High-resolution flow series may yield point discharge at a predetermined time and date through to daily, seasonal, or annual averages and long-term flow duration statistics (e.g., Q95-the flow that is exceeded 95% of the time). In contrast, most ecological series represent discrete sampling events, typically collected on a quasi-annual or seasonal basis ( Figure 9). Hence, timing of eco-sampling may vary from one year to another with, for example, collection of an 'autumn' sample anywhere between 1 September and 30 November. When assessing potential influences of antecedent flow conditions on instream communities it is clearly essential that discharge and ecological series overlap to ensure that the hydrological conditions experienced by instream communities are properly reflected. Two primary sources of error may still arise after quality assurance processes have been undertaken: (1) sites where discharge and ecological series were derived may not be co-located and (2) the sampling time-frame of discrete ecological series may miss potentially important hydrological events driving community structure and change. A third potential source of error may occur if discharge statistics drawn from the UK hydrological year (1 October-30 September) are matched with ecological samples that are collated on a seasonal basis (such as autumn, which spans 1 September to 30 November). One study examined 291 long-term (>20 years) paired river flow and autumn season macroinvertebrate community records (>10 years) for sites across England and Wales. 43 Screening of the series resulted in 208 (71%) of the sites being removed due to missing values or because sampling points were not coincident. Removal of some sites was necessary because of flow addition or loss associated with impoundment, abstraction, or confluences occurring between the gauge and biomonitoring points. A common source of error was due to missing hydrological events because of the mismatch between the hydrological year (October to September) and seasons used to analyze discrete macroinvertebrate samples (such as autumn being September to November).
Errors can arise when (1) an invertebrate sample is collected toward the end of a season with marked variability in river discharge that is not reflected in the seasonal average of the chosen flow metric (points #3 and #4 in Figure 9) or (2) discharge data from the period after the ecological survey is included in the seasonal average flow metric if the 'hydrological year' is not corrected to coincide with ecological sampling window. Most ecohydrological statistics potentially omit some hydrological events due to the mismatch between the continuous hydrological and discrete ecological series. It is, therefore, probably not surprising that the most statistically significant models of river flow-ecology relationships have been developed for less hydrologically variable groundwater dominated systems as opposed to flashier surface runoff dominated systems.  Recommendation: Plot hydrological time-series alongside dates for discrete ecological samples to confirm that sampling periods are coincident; examine series for the presence of potentially significant discharge events prior to collection of ecological samples (even those falling in another season).

EXHIBIT #10: SPURIOUS OR CURIOUS SPIKES
Modern instruments deployed in rivers can provide high-frequency (≤1 minute resolution) data, creating new opportunities for research but also requiring careful quality control. For example, Acoustic Doppler Velocimetry can record flow speeds at >100 Hz but it is widely acknowledged that time series require filtering to remove spurious values, that are an inherent and unavoidable product of the technology. Standard protocols exist for identifying spikes and outliers, which usually involve removing data that fall outside upper and lower thresholds defined relative to the record mean. 44,45 Similarly, high-frequency turbidity records can be subject to considerable noise and other limitations, not least when calibrating turbidity and suspended sediment concentration (SSC) records. 46,47 Noise can be caused by electronic signal errors, but these tend to be small relative to mean values and normally within the error range of the device (Figure 10). Larger spikes in data are common and can be caused by dirty optics, particularly biofouling that can be detected by sudden step changes or more gradual, but systematic shifts in turbidity. Wipers on sensors can remove small contaminants but larger debris must be manually removed.
Large spikes can also be caused by biological activity. 48,49 For instance, Figure 10, shows spikes in turbidity due to the activity of Signal Crayfish (Pacifastacus leniusculus) in both laboratory and field Note that spikes occur only when the crayfish is present, with gradually decreasing turbidity after crayfish removal. (b) 5-min resolution turbidity record for a tributary of the River Nene, UK, colonized by crayfish (black) which records a signal with more frequent spikes during night hours (labels are at midnight) and a strong diurnal structure in the mean turbidity. During this period other instruments confirmed that there were no changes in hydraulics capable of driving these turbidity fluctuations. It was concluded that individual spikes reflect fine sediment entrainment caused by foraging, burrowing or fighting events, which increase at night because crayfish are nocturnal. The diurnal pattern reflects the net effect of this enhanced night time activity on mean turbidity. A second turbidity sensor (red line), identical to that in the river, was deployed in an open-top aquarium filled with clean water and situated on the river bed adjacent to the first. The flat trace confirms that the signal from the river is not an instrument artifact, driven by diurnal variations in light or temperature that can affect the optical measurement of turbidity in some sensors. The small spikes that do occur, fall within the manufacturers stated error, are randomly distributed around the mean and do not show any temporal structure, which suggests that they reflect instrument noise.
settings, compared with controls where crayfish were excluded. In still-water with no crayfish, spikes are small, so most likely associated with electronic signal errors. In contrast, records with flowing water and crayfish are subject to much larger spikes, which reflect the impact of sediment disturbance by crayfish. Diurnal variations in spikiness are indicative of biological activity. One study reported that spikes are three times more likely and 20% higher when crayfish are active at night, than during daylight. 50 Hence, high-resolution turbidity data from field deployment needs careful assessment. Systematic changes, such as cumulative increases in turbidity, or step-changes should be removed and are likely the result of sensor fouling. However, remaining spikes exceeding sensor error terms are likely to be associated with biological activity or turbulent events, representing real phenomena.
Recommendation: Understand potential sources of data spikiness that are inherent in some measurement techniques but do not remove spikes and outliers uncritically; cross-check unexpectedly high data values against independent evidence and consider all potential causes of data excursions -they may reveal something unexpected and important.

EXHIBIT #11: MEANINGLESS MEANS
Simple measures of central tendency-such as average annual river flow or mean winter monthly rainfall-are routinely used to characterize hydrometeorological data. Such metrics are meaningful when the properties in question exhibit relatively consistent variability (i.e., when there are slow variations, with few extreme departures from typical levels). This applies not only to mean values over any particular time interval, but also to the nature and extent of any variability over diurnal, monthly, seasonal, or annual scales.
Sometimes, however, time series do not exhibit gradual or at least consistent change; instead, there may be extreme and apparently unpredictable variability at multiple timescales. For example, SSC may exhibit abrupt spikes above background levels, rather than gradual shifts. These effects can be caused by episodic sediment supply from biological activity (e.g., Exhibit #10), or due to bank collapse, flushing associated with rainstorms or meltwater release, or variable entrainment patterns on floodplains. This makes determining a representative value of SSC difficult, because actual values tend to be either very low background readings, or volatile quantities associated with transient events. In other words, the data are multimodally distributed. Figure 11 shows discharge and SSC time series for the proglacial river of the Finsterwalder Glacier in Svalbard, Arctic Norway. 51 Meltwater-fed systems such as these are useful exemplars of hydrological processes because they exhibit rapid change over relatively short timescales. In this case, it is evident that SSC values are dominated by two brief episodes (corresponding to flushing events) and a mean value that is not representative of the bimodal distribution of concentrations. Furthermore, attempts to quantify variability around the putative mean SSC are unreliable without explicit reference to a specific timescalethe characteristic diurnal range in this particular example is much smaller than the seasonal range.
However, because of the need to statistically characterize the system, the question remains: what is a representative suspended sediment transport value for-in this case-the Finsterwalder proglacial river? This question is best addressed through temporal and spatial aggregation. Here, the time integral of Suspended Sediment Load (SSL, the product of SSC and discharge; in units of mass) better quantifies the total transport for the duration of the time series, and forms a reliable basis for calculating sediment flux (in units of mass per unit area per unit time).
Recommendation: Aggregate time-and space-scales as much as possible when describing the 'average' condition of rapidly-changing variables with transient, extreme values when it is useful/ important to define mean conditions.

EXHIBIT #12: INFILLED DATA GAPS
Data may be missing for various reasons, including equipment malfunctions or loss during transmission and storage. Sometimes data are coded as missing because they are of insufficient accuracy, precision or reliability to be retained. Records may begin at different times, be discontinuous, or end before the present day. Individual variables may differ in their completeness even at the same site. Plotting data availability with time (Figure 12(a)) show the extent of overlap between neighboring records that might be bridged to create a composite series (as in Ref 52). Information in the metadata may contain errors too. For example, header information held in the NRFA 41009 record for gauge C incorrectly reported no data between 1977 and 1998.
Infilling data gaps may be necessary to create homogeneous hydrological series for assessment of long-term variability (see Exhibits #1 and #2), extreme events or continuous series for running models. However, any infilling by interpolation or extrapolation relies upon assumptions that can introduce artifacts and give an impression of false certainty. For instance, the parameters of a statistical distribution can be estimated from a sub-set of the observations as in Figure 12b, but the observations do not exactly conform to the log-normal curve selected. Hence, using the log-normal for infilling would impose some of this assumed shape on the distribution. Critically, if gap filling is needed, beware of using the mean (of the rest of the record or neighboring stations) as this will suppress variability and underestimate extremes. There are three valid alternatives: (1) Time substitution involves taking information from other dates assuming stationarity of the observations. For example, with flow data from gauge C fitted to a log-normal distribution, the missing data for 1976-1982 can be resampled from the same distribution. Alternatively, using the relationship between the overlapping records of gauges B and C for the period 1982-2015, the missing block in gauge C for 1976-1982 could be estimated from gauge B. Other sources of data, such as newspaper archives or proxy records, can help to corroborate infilled extreme events. 53,54 (2) Space substitution involves taking information from equivalent sites. For flood frequency estimation, 'pooled' analysis is common practice. For example, this technique was used to create 1405 annual maxima flow values for the River Trent, UK using approximately 50-year records. 54,55 (3) Physical principles can be used to predict missing data. For instance, A and B flow toward C; this means that A and B are each hydrologically linked to C, and all three are likely meteorologically inter-related given their proximity (<10 km) ( Figure 12 (c)). Using rainfall-runoff models it would be possible to estimate missing values at gauges A, B, or C and any interrelationships between them. Missing records can then be infilled with synthetic river flow records or even reconstructed for times without river records using historical weather data. 56,57 Recommendation: Filled data gaps contain assumptions not observations, so beware the techniques used to create apparently complete records to avoid (re)interpreting those assumptions.

DIRTY DOZEN II AND III: POSTPROCESSING AND ARCHIVING ERRORS
Space limitations mean that we have only scratched the surface of the full range of biases and errors that can occur in a hydrological information flow, between site selection and eventual dissemination of data ( Figure 13). Related disciplines, such as ecology and water quality, would be subject to many of the same uncertainties such as concerns about instrument drift, fouling, or truncation settings, as well as about equipment maintenance, calibration, and routine updating of instrument logs/meta data to help interpret outliers in data. Table 1 lists other sources of uncertainty that may be encountered by field hydrometry. Here, a distinction is made between errors (E) that relate to problems with instrumentation or measurement practices and biases (B) that are due to changing catchment conditions (outside the control of the field technician). Table 2 gives examples of errors that can arise at the other end of the process at the point of archiving, with indications of how they might be detected. Ideally, instrument logs would be maintained and made available for open inspection. Such checks might be feasible for individual sites, instruments, or records but impracticable for very large data sets-it is simply too labor intensive to visually inspect all entries. Hence, these types of error can present hidden dangers to users of 'global' sets compiled from multiple networks, with varying standards of data collection, types of instrumentation, and quality assurance protocols ( Figure 14).
Although there is now a tendency toward increased automation of environmental monitoring and quality assurance, there is still high dependency on manual techniques, not least for instrument calibration or evaluation of unexpected results. As we have shown, suspiciously high or low values are not always wrong (see Exhibits #10 and #11). Moreover, the power to detect outliers and change-points depends on the choice of the statistical techniques deployed. 58 Such considerations underline the importance of metadata and other circumstantial evidence (not least local knowledge) for ratifying hydrometerological data. 59 Once an issue is detected, the question then arises as to how to handle the error? Ideally, the archivist would set up processes to enable capture of user-community feedback. On the other hand, perhaps one of the conditions attached to the freedom of data access should be a responsibility on users to report errors.
We have focussed on individual records but the representativeness of the observing network of stations as a whole matters just as much (if not more). Benchmark networks such as the USGS Hydrologic Benchmark Network 60    Values outside calibration/rating curve range Truncated peak values 12 Regional variations in date (e.g., dd/mm/yy or mm/dd/yy) and decimal (e.g., ',' or '.') formats End of file error messages or data misfeeds when importing records Overview wires.wiley.com/water cost-benefits. Facilities such as the FRIEND European Water Archive, the UK Acid Waters Monitoring Network and the UK Environmental Change Network all provide a basis for tracking long-term environmental trends (e.g., Ref 24). However, benchmark networks are also critical points of reference for cross-validating data. Measures such as the Representative Catchment Index and Catchment Utility Index show the extent to which individual gauging station records are amenable to regionalization or comparable with other sites. 62 Indicators of hydrometric data quality, completeness, and provision also provide a basis for stabilizing 'fluctuating' networks and setting levels of service provision. 63 Increasingly, the case is being made for more holistic measures of data quality that reflect the overall information life-cycle and utility of the data to users (Figure 13), rather than a few conventional quality indictors (e.g., record completeness).

CONCLUSIONS
Hydrological data biases and errors are a fact of life but early detection and attribution can help to minimize the risk of costly/poor/dangerous decisions later on. Indeed, future work might catalog instances where data errors and/or biases have directly changed a management decision or led to a different outcome. One notorious example from space-engineering is the burn up of the Mars Climate Orbiter because different units were used by the constructors (imperial) and modelers (metric) of the satellite's thrusters. Just as systems engineers have examined the causes of famous failures 64 similar appraisals might be undertaken of the robustness of, for example, local flood protection schemes and national water policies to data biases and errors.
We have illustrated a range of techniques but the most dependable are: (1) visual inspection of raw data; (2) simple line, bar, and scatter charts to display changes over time or to compare data from neighboring sites; (3) basic outlier and trend diagnostics; and (4) reference to high-quality metadata to aid interpretation of unexpected values or abrupt changes in data. Above all, it is necessary to have a critical mind-set when interrogating any field data. Such precautions are not only valid for hydrologists, ecologists, and water quality specialists-they are just as essential for other environmental and social science disciplines.
During periods of austerity, conventional observing networks tend to be rationalized. With scarcer resources there is likely to be growing reliance on data gathered by automated systems, nonexperts ('citizen scientists') or via the amalgamation of disparate information sources ('big data'). As data sets grow in size and complexity, users may become even more distanced from the processes that produced them-the real danger is that such data are deployed uncritically or in good faith. Hence, the case for building data literate communities has never been stronger.