Modelling Animal Activity as Curves: An Approach Using Wavelet-Based Functional Data Analysis ()
1. Introduction
Temporal activity of animals is the result of behavioral responses to external (environmental) factors, such as availability of food resources and predation risk, and the internal states of individuals, such as nutritional condition, aversion to risk and reproductive drive [1] [2] [3] [4] . The constraints imposed by external factors and internal states dictate that animal temporal activity pattern in ter- restrial, aquatic, and aerial environments is interrupted by pauses, angle turnings, and changes in speed [5] [6] [7] [8] [9] . Temporal activity pattern is marked with punctuations in the movement (pauses and changes in speed), temporally autocorrelated, and localized in nature [3] [9] [10] . Therefore, ecological and behavioral inference on patterns and processes of animal temporal activity are based on data with inherent autocorrelation and localization properties [3] [11] . Furthermore, technological advances are making high-dimensional data on ani- mal temporal activity increasingly available [10] [12] - [17] .
Animal temporal activity is recorded as a series of time points but one can successfully argue that the discreteness of the data set is due to a technological limitation (data-acquisition capabilities) rather than a true underlying discrete nature of the phenomenon itself. Thus, as opposed to the standard statistical cases in which the observations are often numbers (scalars) or vectors [18] , high- frequency data such as temporal activity records are continuous curves (functional data). Wavelets are functions that are able to represent a signal in a time series in both time and large and small scale domains [18] . Such decomposition into time-scale space allows the identification of the dominant modes of variability and how these modes vary with time [19] . Wavelets have become the method of choice to deal with such data because wavelet transforms are particularly suited to handle data that are high dimensional, autocorrelated, and localized [3] [18] [20] [21] . Whereas wavelet transforms have been recently used to characterize autocorrelative properties of animal temporal activity [3] [15] [22] , wavelets have not yet been explored to model animal temporal activity as curves. In fact, when- ever wavelets are used, their statistical analysis is based on statistical tests de- signed for scalar or vector random variables and not functional data [23] .
The versatility of wavelets is breathtaking. They have been used to model pro- blems as widely disconnected as resampling time series of surrogate data derived from random cascades on dyadic trees [24] , to establish a connection between discrete wavelet transforms, and in entanglement renormalization for quantum systems on the lattice [25] . Important applications of wavelets are also found in preserving motion discontinuities along the edges of weak textures and for dealing with rotations that exist in image sequences [26] , the combined analysis of ther- mal and visible light images of plants to detect early disease with high accuracy [27] , and early detection of melanomas from images of boundary irregularities of skin lesions [28] . Wavelets are also useful for high-speed detection of transient high impedance faults and power-quality disturbances [29] , feature extraction, discriminant analysis, and classification rules as crucial issues for face recogni- tion [30] , and in the development of a full-fledged theory of multiresolution sig- nal decomposition via wavelet representation [31] .
In this note, we use wavelets to model temporal activity data as curves in the context of functional data analysis (FDA [32] [33] [34] [35] ). One may argue the case for treating these temporal activity data as vector-valued data. Even though this may be possible, we can advance two main reasons why treating the data as curves is more appropriate in practice. First, the vector-based and the functional data paradigms interpret the augmentation of sample sizes in two opposing views. The former understands new observations as an elongation of the time interval in which data are being provided. For example, if the original time points are 0.1, 0.2, 0.3, 0.4, 0.5, a new data point will happen at some time larger than 0.5, say 0.6. On the other hand, functional data analysis is based on a enhanced capacity of finer sampling. On the same example, instead of augmenting the original sampling scope from [0.1, 0.5] to [0.1, 0.6], one captures more points on the same interval [0.1, 0.5], say 0.15. This increasing sampling capacity is sometimes referred to as micro dynamics in opposition to the macro dynamics of vector- based analysis. Because animals may live in a limited space in nature or they may be confined to a laboratory setting and have a finite life-time, increasing sample size as increasing space (or time) is less desirable than increasing sampling rate (timewise or spatially). This refinement can be easily related to infinitesimal calculus so functions are more appropriate than their finite dimensional vector counterparts.
Wavelets are therefore used to model temporal activity data as curves in the context of FDA, which deals with functional responses as when units are ob- served over time. FDA theory provides the means for testing hypotheses based on curves rather than on scalar and vector-valued data [18] [36] . In particular, we will use functional analysis of variance (FANOVA), which is the equivalent of ANOVA for functional data [37] . In FANOVA for two functional samples with a common covariance function one wishes to test the null hypothesis of equality of mean functions versus the alternative hypothesis that the mean func- tions are different [37] . Our approach is exemplified by activity data obtained experimentally for a small neotropical marsupial, Gracilinanus microtarsus. We chose this species as a reference system because there is currently relevant eco- logical data available, including demography and life history [38] - [43] . Survival rate for females remains fairly constant in the pre- and post-mating periods, whereas male survival rate decreases significantly in the post-mating period due to stress from aggressive interactions between males [41] [44] . Adult males (30 - 45 g) are much heavier than female gracile opossum (20 - 30 g [41] [42] ). Body size has a positive, linear association with the area traversed by an individual in its normal activities of food gathering, mating and caring for the young [45] [46] [47] . As expected, the area traversed in nature is larger and more variable in male than in female gracile opossum [48] . Therefore, we expect that activity patterns will differ between male and female gracile opossum. Specifically, we will test the null hypothesis of homogeneity of mean curves of activity patterns from male and female gracile opossum. To the best of our knowledge this is the first attempt to integrate wavelets to model curves derived from activity data with functional data analysis in the framework of functional analysis of variance.
2. Theory
Ramsay et al. (2005) [36] provide the reader with a generalized framework for the analysis of functional data, which basically depends on regularity conditions of the underlying curve. Wavelets are especially built to provide regular esti- mates through multiscale shrinkage [18] . We refer to Kist and Pinheiro (2015) [20] for a detailed development of the wavelet functional data analysis for de- pendent errors.
Wavelets are basically elements of some specially built basis of the space of square-integrable functions. This means the following. If
is a function so that
is finite and
is a wavelet basis for the space of square-integrable functions, there are constants
and
such that
can be written as:
There are many different wavelet bases. Their main characteristic is that all elements of a wavelet basis have the same form, being different only in location and/or dilation. Therefore, wavelet analysis is immediately equipped with a fast transform algorithm.
In what follows, we give a general presentation of this procedure as needed for the animal temporal activity data set.
Each observation is composed of
time-point evaluations of a function of interest
,
, such that
(1)
where
is the diffusion parameter,
is a deterministic unknown func- tion and
are either independent standard Brownian motions, - case independent and identically distributed, or independent Continuous Time Autoregressive Moving Average (CTARMA) processes [20] case dependent, for
,
. We call
the
-di- mensional observations in group
, respectively.
Suppose
belongs to a convenient Besov space [20] . We write the non- linear wavelet estimators as
(2)
One can write the
norm as
, where
is the esti- mated approximation coefficient for the
-th group, 0-th scale and
-th position, while
is the thresholded wavelet coefficient for the
-th group,
-th scale and
-th position. The hypotheses of interest are
The dual relationship between Besov spaces and wavelet bases leads to a na- tural change in the hypotheses being tested. Instead of testing
vs
, a slight formal change is made, as proposed in [49] for independent errors. A two- step procedure is developed by Kist and Pinheiro (2015) [20] for dependent errors. The test statistic is compared to the cut-off point and the decision (reject or not
) can then be made.
Although these tests and hypotheses are mathematically different from the aforementioned hypotheses, for all applied purposes, they all yield the same inter- pretation as follows. Whenever the empirical evidences lead to the rejection of
one can conclude that the data provide statistical evidence that at least two of the functions
are different. For instance, for our case,
,
and
are the underlying functional behavior of female and male specimens, respectively. Thus, rejecting
means that the average time-curves are taken with respect to observations of female and male behaviors are statistically diffe- rent. On the other hand, whenever
is not rejected, one understands that there is not enough empirical evidence for each group to have different underlying functions. Again for our case, this can be interpreted as the data not providing statistically significant evidence that male and female specimens differ in their temporal activity behavior.
3. Materials and Methods
The specimens of the gracile mouse opossum (G. microtarsus) were live trapped in a savannah-like habitat in the city of Mogi-Guaçu in the state of São Paulo using Sherman live-traps (dimensions 7.5 ´ 9.0 ´ 23.5 cm) baited with banana and peanut butter. Individuals captured were marked with a numbered ear tag and their sex and age were recorded. In the laboratory, gracile mouse opossums were housed individually in acrylic boxes (44 cm width ´ 33 cm length ´ 20 cm height) with ad libitum access to food (commercial cat and dog chow) and water, and kept under artificial conditions of light (12-h light/dark cycle, light on at 7:00 A.M.) and temperature
.
To assess spontaneous locomotor activity of the gracile mouse opossum we used an automated motor activity monitor (Acti-Track v2.7.10, PanLab, S.L. Instrument, Barcelona, Spain [50] ). The apparatus consists of a transparent Perspex box (45 ´ 45 cm base, 35 cm height) connected to a photoelectric cell and locomotor activity is detected by light beam breaks. Thirty-two infrared beam breaks, 16 each on perpendicular walls, were mounted 3 cm above the box frame floor and connected to an interface (LE 8811, LSI Letica Scientific In- struments, Barcelona, Spain), and data were sent to a computer. Thence, loco- motor activity was assessed as a rate of light beam breaks during the period of the experiment [51] . This means that whenever the mouse opossum was moving vertically or horizontally the rate of light breaks would be recorded as the activity variable. Hence, the higher the rate of light beam breaks the higher the activity of the individual.
At the beginning of the experiment, each gracile mouse opossum was placed in the Perspex box and allowed to freely explore for 24 hours to habituate the individuals before conducting the experiments. Testing in the actimeter was done in an isolated room between 18:00 and 06:00, which corresponds to the activity of gracile mouse opossums in the wild. After each experiment, the Perspex box was carefully cleaned with a 5% ethanol cleaning solution.
Permission for animal collection was provided by SISBIO (Sistema de Auto- rização de Informação em Biodiversidade). Animal housing and experimental procedures were approved by Comissão de Ética no Uso de Animais, Univer- sidade Estadual de Campinas.
Experimental settings such as those used in our study provide relevant infor- mation not only for the study of animal temporal activity pattern, but also for several areas of ecology and behavior including for example the association be- tween social and sexual preferences and genetic variation at microsattelite loci [52] , modulation of vocalization by hormones [53] , and the link between heri- table neuroendocrine variation and male sexual behavior [54] .
The data analysis was performed as follows. Twelve hours were selected from the continuous observed curves. Three families of wavelet bases were employed on the data: Symmlets, Coiflets and Daubechies. Preliminary analyses led to one smoothness parameter for each family: Symmlets 8, Coiflets 3 and Daubechies 6. These are different wavelet bases. This means the aforementioned functions
and
will be different if we use each of these bases. We could write
,
and
for each case, respectively.
The aforementioned data set was composed of temporal activity curves from 12 hours of data acquisition, in which data were taken every second for 6 males and 7 females of G. microtarsus. This data set was analyzed using the proposed wavelet model for which there were
groups: females and males. We then estimated
and
.
4. Results
Figure 1 shows three estimators for each gender, based on three different wave- let bases: Symmlets 8, Coiflets 3, and Daubechies 6. The data is shown in gray. The respective estimators for the FANOVA model based on independent errors are shown in blue, and the estimators for the FANOVA model based on depen- dent errors are shown in red. With no loss of generality, time is transformed to
. One can notice the differences between
and
(see Equations (1) and (2) above). Moreover, the proposed dependent estimators are much more regularized then the previously available wavelet estimators.
Numerical results did not differ much among the bases chosen (shown in Figure 1). However, the visual results for Coifflets were in general coarser than the other two bases. Daubechies’ bases were interesting because of their theoretical and numerical properties, whereas Symmlets were the bases which more closely relate to Daubechies, and look the most symmetrical. We should point out that there are no wavelets bases that are both compactly supported and symmetrical
[21] .
Figure 1. Temporal activity curve curves analyzed using the proposed wavelet model. Curves obtained for three estimators for each gender, each onE based on a different wavelet base: Symmlets 8, Coiflets 3, and Daubechies 6. Data are shown in gray; the respective estimators for the FANOVA model based on independent errors are shown in blue; and the estimators for the FANOVA model based on dependent errors are shown in red. (a) Regularized Wavelet Mean 12-hour temporal activity curve of female Gracilianus microtarsus. The wavelet filter is the Symmlet 8, and the regularizing parameters are
and
. (b) Regularized Wavelet Mean 12-hour temporal activity curve of female G. microtarsus. The wavelet filter is the Coiflets 3, and the regularizing parameters are
and
. (c) Regularized Wavelet Mean 12-hour temporal activity curve of female G. microtarsus. The wavelet filter is the Daubechies 6, and the regularizing parameters are
and
. (d) Regularized Wavelet Mean 12-hour temporal activity curve of male G. microtarsus. The wavelet filter is the Symmlet 8, and the regularizing parameters are
and
. (e) Regularized Wavelet Mean 12-hour temporal activity curve of male G. microtarsus. The wavelet filter is the Coiflets 3, and the regularizing parameters are
and
. (f) Regularized Wavelet Mean 12-hour temporal activity curve of male G. microtarsus. The wavelet filter is the Daubechies 6, and the regularizing parameters are
and
.
The robust mean absolute deviation (MAD) or the standard deviation may be employed for the estimates of the measure of the noise variability. Standard deviation is in general superior to MAD, since the latter yields less regular estimated curves [20] . The choice of the wavelet basis is usually quite unim- portant. The use of any such basis leads to the same inferential results. Some local characteristics of the estimated curves are highlighted or shadowed by each basis, but the results are the same.
The data were comprised of a total of 43,200 observations for each specimen, which means one observation at every second for 12 hours of experiment. The curves were estimated for each sex, and several choices of wavelet basis and/or thresholding were employed. The individual autocorrelation estimates varied from 0.38 to 0.66, whilst the overall autocorrelation estimates for females and males were 0.46 and 0.55, respectively. The maximum difference between any estimates for the same data set given by the choice of the wavelet basis was not greater than .02.
One should note that, as expected, the estimated curve with independent errors was much less regular than its dependent counterpart. This happens considering each pair of curves for fixed wavelet basis and thresholding pro- cedure. Levels 6 - 9 were thresholded to estimate
, while levels 8 - 12 were thresholded in the final curve estimate for either independent or dependent models. Figure 1 displays the preliminary results for three families of bases. From this, Coifflets do present visually less appealing results.
The FANOVA test results using Daubechies 6 and Symmlets 8, and the aforementioned thresholding configuration were all favourable to the inequality of
and
. For db6, the test statistics were
and
against respective critical values of 10.54 and 5.8. For sym8, the test statistics were
and
against respective critical values of 10.67 and 5.78.
An extensive numerical study was performed varying the regularity of the Daubechies and Symmlets bases and the thresholding parameters. The general results were that:
1) Test results were robust with respect to the choice of the wavelet basis and thresholding parameters.
2) The chosen basis was the least relevant aspect on the procedure, both in its inferential (and numerical) aspects, as well as the visual characteristics of the estimates.
3) The choice of the thresholded levels bears importance for the test results - one should understand that the differences between males and females were statistically significant for any estimation procedure, but the value of the test statistics and the cut-off points changed considerably with the thresholded levels.
4) The choice of the thresholded levels influenced the visual characteristics of the estimated curves.
5) Standard deviation was superior to MAD.
5. Discussion
In this study, we used wavelets to model activity data as curves and functional data analysis, in particular functional analysis of variance, to test the hypothesis that the mean activity curves differed between male and female of the small marsupial G. microtarsus. We rejected the null hypothesis by FANOVA showing that despite the common laboratory environment, male and female G. micro- tarsus did differ in their temporal activity pattern. The statistical differences in the temporal activity curves may be attributable to endogenous factors associated with the sexes. In fact, male and female G. microtarsus differ in many aspects of their life history and ecology [55] . Male and female G. microtarsus differ most remarkably in their demographics. Survival rate for females is constant during the pre- and post-mating periods. On the other hand, survival rate in males de- creases sharply in the post-mating period and is significantly lower than that of females [41] . The decrease in survival rate in males is probably explained by post-mating mortality associated with stress that results from aggressive beha- vior and fighting between males during the mating period [41] [44] . This striking difference in life-history is possibly associated with differences in activity be- tween male and female. Other aspects of the ecology and life-history of G. micro- tarsus may also play a role in the observed statistical difference between male and female curves as revealed by FANOVA. Sexual dimorphism is very pro- nounced with adult males (30 - 45 g) being heavier than adult females (20 - 30 g [41] [42] ). The dimorphism in body size arises from males growing much faster than females, as inferred from a Gompertz growth model [48] . Body size has a positive, linear association with home range size, which is the area traversed by an individual in its normal activities of food gathering, mating and caring for the young [45] [46] [47] . In fact, observations in nature showed that home-range size in G. microtarsus was larger and more variable in males (0.14 ± 0.18 ha) than that in females (0.12 ± 0.09 ha [48] ).
Whereas wavelets have long been applied to problems in behavior and ecology (e.g. [9] [56] ), the use of fuctional data analysis is very promising in this context. For example, functional principal components analysis (FPCA) was recently used to investigate the relationship between a prey species, the sandeel, and its pre- dator, the black-legged kittiwake, in a dynamic marine ecosystem [57] . They de- monstrated that FPCA was an useful tool to assess spatio-temporal patterns in natural ecosystems and their study revealed the fine scale details of the inte- raction between environmental factors and prey behavior and predator foraging behavior. Just recently, [58] developed a Bayesian model to study animal move- ment patterns at different temporal scales within the context of functional data analysis. They applied this model to estimate movement paths and associated movement descriptors of the canadian lynx reintroduced to Colorado. In this application, B-splines were used but the model was general enough to incor- porate other basis functions such as Fourier series and wavelets. The approach by [58] seems extremely promising to reveal details of animal movements with important implications for population dynamics.
We believe that functional data analysis when applied in conjunction with wavelets to model curves derived from temporal activity data has the potential to provide an important methodological breakthrough in the study of ecology and animal behavior. One final observation regarding the applicability of wavelet tech- niques such as those proposed here for animal temporal activity data concerns regular sampling. Some studies of important phenomena do not allow for equally spaced data registers. Several options are available for adapting the proposed procedure for these cases. Among them padding [21] , lifted wavelets, and other second-generation wavelets can be employed [59] .
Acknowledgements
We are very grateful to João Del Giudice Neto and Marcos Mecca Pinto of Mogi- Guaçu Biological Reserve for logistical support. We are indebted to Eduardo Guimarães Martins for critical comments that improved the quality of the manuscript.
Funding
Research supported by Fundação de Amparo à Pesquisa do Estado de São Paulo, Brazil (FAPESP) [Alusio Pinheiro-13/00506-1 and Sérgio F. dos Reis-2005/513- 53-4]; Conselho Nacional de Desenvolvimento Cientfico e Tecnológico, Brazil (CNPq) [Alusio Pinheiro-304512/2011-7, Sérgio F. dos Reis-303544/2011-2 and Barbara Henning-140773/2013-4]; and CAPES [Babara Henning].