Title: | An R toolbox for analysing animal movement across space and time |
---|---|
Description: | An R toolbox for analysing animal movement across space and time. |
Authors: | Mikkel Roald-Arbøl [aut, cre] |
Maintainer: | Mikkel Roald-Arbøl <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.0 |
Built: | 2025-02-02 02:40:56 UTC |
Source: | https://github.com/roaldarbol/animovement |
Calculates and adds a centroid point to movement tracking data. The centroid represents the mean position of selected keypoints at each time point.
add_centroid( data, include_keypoints = NULL, exclude_keypoints = NULL, centroid_name = "centroid" )
add_centroid( data, include_keypoints = NULL, exclude_keypoints = NULL, centroid_name = "centroid" )
data |
A data frame containing movement tracking data with the following required columns:
|
include_keypoints |
Optional character vector specifying which keypoints
to use for centroid calculation. If NULL (default), all keypoints are used
unless |
exclude_keypoints |
Optional character vector specifying which keypoints
to exclude from centroid calculation. If NULL (default), no keypoints are
excluded unless |
centroid_name |
Character string specifying the name for the centroid keypoint (default: "centroid") |
The function calculates the centroid as the mean x and y position of the
selected keypoints at each time point for each individual. Keypoints can be
selected either by specifying which ones to include (include_keypoints
) or
which ones to exclude (exclude_keypoints
). The resulting centroid is added
as a new keypoint to the data frame.
A data frame with the same structure as the input, but with an additional keypoint representing the centroid. The centroid's confidence values are set to NA.
convert_nan_to_na()
for NaN handling in the centroid calculation
## Not run: # Add centroid using all keypoints add_centroid(movement_data) # Calculate centroid using only specific keypoints add_centroid(movement_data, include_keypoints = c("head", "thorax", "abdomen")) # Calculate centroid excluding certain keypoints add_centroid(movement_data, exclude_keypoints = c("antenna_left", "antenna_right"), centroid_name = "body_centroid") ## End(Not run)
## Not run: # Add centroid using all keypoints add_centroid(movement_data) # Calculate centroid using only specific keypoints add_centroid(movement_data, include_keypoints = c("head", "thorax", "abdomen")) # Calculate centroid excluding certain keypoints add_centroid(movement_data, exclude_keypoints = c("antenna_left", "antenna_right"), centroid_name = "body_centroid") ## End(Not run)
This function aligns two time series by shifting one series relative to the
reference based on their cross-correlation. It first finds the optimal lag
using find_lag
, then applies the shift by padding with NA values
as needed.
align_timeseries(signal, reference, max_lag = 5000, normalize = TRUE)
align_timeseries(signal, reference, max_lag = 5000, normalize = TRUE)
signal |
Time series to align (numeric vector) |
reference |
Reference time series to align against (numeric vector) |
max_lag |
Maximum lag to consider in both directions, in number of samples. If NULL, uses (length of series - 1) |
normalize |
Logical; if TRUE, z-score normalizes both series before computing cross-correlation (recommended for series with different scales) |
A numeric vector of the same length as the input signal, shifted to align with the reference series. NA values are used to pad the beginning or end depending on the direction of the shift.
# Create two artificially shifted sine waves t <- seq(0, 10, 0.1) reference <- sin(t) signal <- sin(t - 0.5) # Signal delayed by 0.5 units # Align the delayed signal with the reference aligned <- align_timeseries(signal, reference) # Plot to verify alignment plot(t, reference, type = "l", col = "black") lines(t, aligned, col = "red", lty = 2)
# Create two artificially shifted sine waves t <- seq(0, 10, 0.1) reference <- sin(t) signal <- sin(t - 0.5) # Signal delayed by 0.5 units # Align the delayed signal with the reference aligned <- align_timeseries(signal, reference) # Plot to verify alignment plot(t, reference, type = "l", col = "black") lines(t, aligned, col = "red", lty = 2)
Calculates kinematic measurements including translational and rotational motion from position data. The function computes velocities, accelerations, and angular measurements from x-y coordinate time series data.
calculate_kinematics(data, by = NULL)
calculate_kinematics(data, by = NULL)
data |
A data frame containing at minimum:
|
by |
Character vector specifying additional grouping variables (optional).
If the input data frame is already grouped, those groups will be preserved
and any additional groups specified in |
A data frame containing the original data plus calculated kinematics:
distance: Distance traveled between consecutive points
v_translation: Translational velocity
a_translation: Translational acceleration
direction: Movement direction in radians
rotation: Angular change between consecutive points
v_rotation: Angular velocity
a_rotation: Angular acceleration
Time points should be regularly sampled for accurate derivatives.
# Basic usage with just x-y coordinates df <- data.frame( time = 1:10, x = runif(10), y = runif(10) ) calculate_kinematics(df) # Using with grouping variables df_grouped <- data.frame( time = rep(1:5, 2), x = runif(10), y = runif(10), individual = rep(c("A", "B"), each = 5) ) calculate_kinematics(df_grouped, by = "individual")
# Basic usage with just x-y coordinates df <- data.frame( time = 1:10, x = runif(10), y = runif(10) ) calculate_kinematics(df) # Using with grouping variables df_grouped <- data.frame( time = rep(1:5, 2), x = runif(10), y = runif(10), individual = rep(c("A", "B"), each = 5) ) calculate_kinematics(df_grouped, by = "individual")
Calculates the instantaneous speed from x, y coordinates and time data. Speed is computed as the absolute magnitude of velocity (change in position over time).
calculate_speed(x, y, time)
calculate_speed(x, y, time)
x |
Numeric vector of x coordinates |
y |
Numeric vector of y coordinates |
time |
Numeric vector of time values |
Numeric vector of speeds. The first value will be NA since speed requires two positions to calculate.
## Not run: # Inside dplyr pipeline data |> group_by(keypoint) |> mutate(speed = calculate_speed(x, y, time)) ## End(Not run)
## Not run: # Inside dplyr pipeline data |> group_by(keypoint) |> mutate(speed = calculate_speed(x, y, time)) ## End(Not run)
Calculate summary statistics for tracks
calculate_statistics( data, measures = "median_mad", straightness = c("A", "B", "C", "D") )
calculate_statistics( data, measures = "median_mad", straightness = c("A", "B", "C", "D") )
data |
A kinematics data frame |
measures |
Measures of central tendency and dispersion. Options are |
straightness |
Which method to calculate path straightness. Choose between "A" (default), "B", "C"... or a combination (e.g. "c("A","B")"). See description for details about the different calculations. |
An data frame data frame with kinematics calculated
This function generates histograms showing the distribution of confidence values for each keypoint in the dataset.
check_confidence(data)
check_confidence(data)
data |
A data frame containing at least the columns |
Each keypoint in the dataset is assigned its own histogram, showing the frequency of different confidence values.
Confidence values are grouped and visualized using the subplot_confidence
function.
The combined plots use patchwork
for alignment and styling.
A patchwork
object combining histograms for each keypoint, visualizing
the confidence value distributions.
library(dplyr) library(patchwork) data <- dplyr::tibble( keypoint = rep(c("head", "arm", "leg", "torso"), each = 10), confidence = runif(40, min = 0, max = 1) ) # Generate histograms of confidence distributions check_confidence(data)
library(dplyr) library(patchwork) data <- dplyr::tibble( keypoint = rep(c("head", "arm", "leg", "torso"), each = 10), confidence = runif(40, min = 0, max = 1) ) # Generate histograms of confidence distributions check_confidence(data)
This function generates a plot showing the distribution of gap sizes (consecutive
NA
values) in the data, either aggregated or broken down by keypoints.
check_na_gapsize(data, limit = 10, include_total = TRUE, by_keypoint = TRUE)
check_na_gapsize(data, limit = 10, include_total = TRUE, by_keypoint = TRUE)
data |
A data frame containing at least the columns |
limit |
An integer specifying the maximum gap size to include in the plot. Default is 10. |
include_total |
Logical. If |
by_keypoint |
Logical. If |
The plot highlights the most common gap sizes in the data, ordered by frequency.
Different colors represent the occurrence (indianred
), total counts (steelblue
),
and border outlines (black
).
The function uses patchwork
to combine multiple plots when by_keypoint = TRUE
.
A patchwork
object combining one or more ggplots that visualize the
occurrence of gap sizes (consecutive NA
s) in the data.
library(dplyr) library(ggplot2) library(patchwork) data <- dplyr::tibble( x = c(NA, NA, 3, NA, 5, 6, NA, NA, NA, 10), keypoint = factor(rep(c("head", "arm"), each = 5)) ) check_na_gapsize(data, limit = 5, include_total = TRUE, by_keypoint = TRUE)
library(dplyr) library(ggplot2) library(patchwork) data <- dplyr::tibble( x = c(NA, NA, 3, NA, 5, 6, NA, NA, NA, 10), keypoint = factor(rep(c("head", "arm"), each = 5)) ) check_na_gapsize(data, limit = 5, include_total = TRUE, by_keypoint = TRUE)
This function generates a plot to visualize where missing values (NA
s) occur
in the data over time. It can display separate plots for each keypoint or a
single aggregated plot for all keypoints.
check_na_timing(data, by_keypoint = TRUE)
check_na_timing(data, by_keypoint = TRUE)
data |
A data frame containing at least the columns |
by_keypoint |
Logical. If |
Missing values are highlighted using a red (indianred2
) color, and non-missing
values are shown in blue (steelblue
).
The function uses the patchwork
package to combine multiple plots when by_keypoint = TRUE
.
A patchwork
object combining one or more ggplots that show the
timing of missing values (NA
) in the data.
library(dplyr) library(ggplot2) library(patchwork) data <- dplyr::tibble( x = c(1, 2, NA, 4, NA, 6), individual = rep("A", 6), keypoint = factor(rep(c("head", "arm"), each = 3)) ) check_na_timing(data, by_keypoint = TRUE)
library(dplyr) library(ggplot2) library(patchwork) data <- dplyr::tibble( x = c(1, 2, NA, 4, NA, 6), individual = rep("A", 6), keypoint = factor(rep(c("head", "arm"), each = 3)) ) check_na_timing(data, by_keypoint = TRUE)
This function generates visualizations of the distances from each keypoint to a calculated centroid in the data. By default, it produces histograms of the distance distributions, but it can also create confidence plots if specified.
check_pose(data, reference_keypoint, type = "histogram")
check_pose(data, reference_keypoint, type = "histogram")
data |
A data frame containing at least the columns |
reference_keypoint |
The keypoint used as a reference to calculate the distance. |
type |
Character string specifying the type of plot to create. Options are:
|
The centroid is computed using the add_centroid
function and distances are
calculated with the calculate_distance_to_centroid
function.
The function automatically excludes the centroid itself from the visualizations.
Histograms provide an overview of distance distributions, while confidence plots
summarize variability with intervals.
A patchwork
object combining plots for each keypoint, visualizing
the distances to the centroid.
## Not run: # Create sample data data <- dplyr::tibble( keypoint = rep(c("head", "arm", "leg", "torso"), each = 10), x = rnorm(40, mean = 0, sd = 1), y = rnorm(40, mean = 0, sd = 1) ) # Plot histogram of distances check_pose(data, reference_keypoint = "head", type = "histogram") # Plot confidence intervals check_pose(data, reference_keypoint = "head", type = "confidence") ## End(Not run)
## Not run: # Create sample data data <- dplyr::tibble( keypoint = rep(c("head", "arm", "leg", "torso"), each = 10), x = rnorm(40, mean = 0, sd = 1), y = rnorm(40, mean = 0, sd = 1) ) # Plot histogram of distances check_pose(data, reference_keypoint = "head", type = "histogram") # Plot confidence intervals check_pose(data, reference_keypoint = "head", type = "confidence") ## End(Not run)
This function analyzes movement tracking data to identify periods of high and low activity by detecting stable periods in the movement data. It returns a binary classification where 1 indicates high activity and 0 indicates low activity.
classify_by_stability( speed, window_size = 30, min_stable_period = 30, tolerance = 0.1, refine_transitions = TRUE, min_low_state_duration = 0, min_high_state_duration = 0, search_window = 90, stability_window = 10, stability_threshold = 0.5, return_type = c("numeric", "factor") )
classify_by_stability( speed, window_size = 30, min_stable_period = 30, tolerance = 0.1, refine_transitions = TRUE, min_low_state_duration = 0, min_high_state_duration = 0, search_window = 90, stability_window = 10, stability_threshold = 0.5, return_type = c("numeric", "factor") )
speed |
Numeric vector of speed or velocity measurements. If velocity is provided, absolute values will be used automatically |
window_size |
Number of measurements to consider when calculating variance (default: 30) |
min_stable_period |
Minimum length required for a stable period (default: 30) |
tolerance |
Tolerance for variance in stable periods (default: 0.1, must be between 0 and 1) |
refine_transitions |
Whether to refine state transitions using stability detection (default: TRUE) |
min_low_state_duration |
Minimum duration for low activity states; shorter periods are merged using majority context (default: 0, no merging) |
min_high_state_duration |
Minimum duration for high activity states; shorter periods are merged using majority context (default: 0, no merging) |
search_window |
How far to look for movement transitions when refining (default: 90) |
stability_window |
Window size for checking if movement has stabilized (default: 10) |
stability_threshold |
Maximum variance allowed in stable state (default: 0.5) |
return_type |
Should the function return "factor" ("high"/"low") or "numeric" (1/0) (default: "numeric") |
The classification process follows these key steps:
Stability Detection:
Identifies stable periods in the movement data
Uses the longest stable period to establish a baseline for low activity
State Classification:
Sets an activity threshold based on the baseline period
Classifies periods that deviate from baseline stability as high activity
Optional Refinement:
If refine_transitions = TRUE, examines transitions between states to find precise start/end points using stability detection
Short duration states can be filtered based on min_low_state_duration and min_high_state_duration parameters using a majority context approach
Numeric vector of the same length as input:
1: High activity state
0: Low activity state
NA: Unable to classify (usually due to missing data)
Classifies numeric values into "high" and "low" categories based on a threshold, while enforcing minimum run lengths for both categories. Values exceeding the threshold are classified as "high", others as "low". Short runs that don't meet the minimum length requirement are reclassified into the opposite category.
classify_by_threshold( values, threshold, min_low_frames, min_high_frames, return_type = c("numeric", "factor") )
classify_by_threshold( values, threshold, min_low_frames, min_high_frames, return_type = c("numeric", "factor") )
values |
Numeric vector to be classified |
threshold |
Numeric value used as classification boundary between "high" and "low" |
min_low_frames |
Minimum number of consecutive frames required for a "low" sequence |
min_high_frames |
Minimum number of consecutive frames required for a "high" sequence |
return_type |
Should the function return "factor" ("high"/"low") or "numeric" (1/0) (default: "numeric") |
The classification process occurs in two steps:
Initial classification based on threshold
Reclassification of sequences that don't meet minimum length requirements
The function first processes "low" sequences, then "high" sequences. This order can affect the final classification when there are competing minimum length requirements.
Character vector of same length as input, with values classified as either "high" or "low". NA values in input remain NA in output.
# Basic usage values <- c(1, 1.5, 2.8, 3.2, 3.0, 2.9, 1.2, 1.1) result <- classify_by_threshold(values, threshold = 2.5, min_low_frames = 2, min_high_frames = 3) # Handling NAs values_with_na <- c(1, NA, 3, 3.2, NA, 1.2) result <- classify_by_threshold(values_with_na, threshold = 2.5, min_low_frames = 2, min_high_frames = 2)
# Basic usage values <- c(1, 1.5, 2.8, 3.2, 3.0, 2.9, 1.2, 1.1) result <- classify_by_threshold(values, threshold = 2.5, min_low_frames = 2, min_high_frames = 3) # Handling NAs values_with_na <- c(1, NA, 3, 3.2, NA, 1.2) result <- classify_by_threshold(values_with_na, threshold = 2.5, min_low_frames = 2, min_high_frames = 2)
Identifies periods of high activity in a time series by analyzing peaks and troughs, returning a logical vector marking these periods. The function handles special cases like adjacent peaks and the initial/final sequences.
classify_high_periods(x, peaks, troughs)
classify_high_periods(x, peaks, troughs)
x |
numeric vector; the time series values |
peaks |
logical vector; same length as x, TRUE indicates peak positions |
troughs |
logical vector; same length as x, TRUE indicates trough positions |
The function performs the following steps:
Resolves adjacent peaks by keeping only the highest
Handles the initial sequence before the first trough
Handles the final sequence after the last event
Identifies regions between troughs containing exactly one peak
logical vector; TRUE indicates periods of high activity
## Not run: x <- c(1, 3, 2, 1, 4, 2, 1) peaks <- c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE) troughs <- c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE) classify_high_periods(x, peaks, troughs) ## End(Not run)
## Not run: x <- c(1, 3, 2, 1, 4, 2, 1) peaks <- c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE) troughs <- c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE) classify_high_periods(x, peaks, troughs) ## End(Not run)
Identifies periods of low activity in a time series by analyzing peaks and troughs, returning a logical vector marking these periods. Low activity periods are defined as regions between consecutive troughs that contain no peaks.
classify_low_periods(peaks, troughs)
classify_low_periods(peaks, troughs)
peaks |
logical vector; TRUE indicates peak positions |
troughs |
logical vector; same length as peaks, TRUE indicates trough positions |
The function performs the following steps:
Validates input lengths
Initializes all periods as potentially low activity (TRUE)
For each pair of consecutive troughs:
If no peaks exist between them, maintains TRUE for that period
If any peaks exist, marks that period as FALSE (not low activity)
logical vector; TRUE indicates periods of low activity
peaks <- c(FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE) troughs <- c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE) classify_low_periods(peaks, troughs)
peaks <- c(FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE) troughs <- c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE) classify_low_periods(peaks, troughs)
This function applies a highpass Butterworth filter to a signal using forward-backward filtering (filtfilt) to achieve zero phase distortion. The Butterworth filter is maximally flat in the passband, making it ideal for many signal processing applications.
filter_highpass( x, cutoff_freq, sampling_rate, order = 4, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
filter_highpass( x, cutoff_freq, sampling_rate, order = 4, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
x |
Numeric vector containing the signal to be filtered |
cutoff_freq |
Cutoff frequency in Hz. Frequencies above this value are passed, while frequencies below are attenuated. Should be between 0 and sampling_rate/2. |
sampling_rate |
Sampling rate of the signal in Hz. Must be at least twice the highest frequency component in the signal (Nyquist criterion). |
order |
Filter order (default = 4). Controls the steepness of frequency rolloff: - Higher orders give sharper cutoffs but may introduce more ringing - Lower orders give smoother transitions but less steep rolloff - Common values in practice are 2-8 - Values above 8 are rarely used due to numerical instability |
na_action |
Method to handle NA values before filtering. One of: - "linear": Linear interpolation (default) - "spline": Spline interpolation for smoother curves - "stine": Stineman interpolation preserving data shape - "locf": Last observation carried forward - "value": Replace with a constant value - "error": Raise an error if NAs are present |
keep_na |
Logical indicating whether to restore NAs to their original positions after filtering (default = FALSE) |
... |
Additional arguments passed to replace_na(). Common options include: - value: Numeric value for replacement when na_action = "value" - min_gap: Minimum gap size to interpolate/fill - max_gap: Maximum gap size to interpolate/fill |
The Butterworth filter response falls off at -6*order dB/octave. The cutoff frequency corresponds to the -3dB point of the filter's magnitude response.
Common Applications:
Removing baseline drift: Use low cutoff (0.1-1 Hz)
EMG analysis: Use moderate cutoff (10-20 Hz)
Motion artifact removal: Use application-specific cutoff
Parameter Selection Guidelines:
cutoff_freq: Choose based on the lowest frequency you want to preserve
order: Same guidelines as lowpass_filter
Common values by field:
ECG processing: order=2, cutoff=0.5 Hz
EEG analysis: order=4, cutoff=1 Hz
Mechanical vibrations: order=2, cutoff application-specific
Missing Value Handling: The function uses replace_na() internally for handling missing values. See ?replace_na for detailed information about each method and its parameters. NAs can optionally be restored to their original positions after filtering using keep_na = TRUE.
Numeric vector containing the filtered signal
Butterworth, S. (1930). On the Theory of Filter Amplifiers. Wireless Engineer, 7, 536-541.
replace_na
for details on NA handling methods
filter_lowpass
for low-pass filtering
butter
for Butterworth filter design
filtfilt
for zero-phase digital filtering
# Generate example signal with drift t <- seq(0, 1, by = 0.001) drift <- 0.5 * t # Linear drift signal <- sin(2*pi*10*t) # 10 Hz signal x <- signal + drift # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "linear", keep_na = TRUE)
# Generate example signal with drift t <- seq(0, 1, by = 0.001) drift <- 0.5 * t # Linear drift signal <- sin(2*pi*10*t) # 10 Hz signal x <- signal + drift # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_highpass(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "linear", keep_na = TRUE)
This function implements a highpass filter using the Fast Fourier Transform (FFT). It provides a sharp frequency cutoff but may introduce ringing artifacts (Gibbs phenomenon).
filter_highpass_fft( x, cutoff_freq, sampling_rate, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
filter_highpass_fft( x, cutoff_freq, sampling_rate, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
x |
Numeric vector containing the signal to be filtered |
cutoff_freq |
Cutoff frequency in Hz. Frequencies above this value are passed, while frequencies below are attenuated. Should be between 0 and sampling_rate/2. |
sampling_rate |
Sampling rate of the signal in Hz. Must be at least twice the highest frequency component in the signal (Nyquist criterion). |
na_action |
Method to handle NA values before filtering. One of: - "linear": Linear interpolation (default) - "spline": Spline interpolation for smoother curves - "stine": Stineman interpolation preserving data shape - "locf": Last observation carried forward - "value": Replace with a constant value - "error": Raise an error if NAs are present |
keep_na |
Logical indicating whether to restore NAs to their original positions after filtering (default = FALSE) |
... |
Additional arguments passed to replace_na(). Common options include: - value: Numeric value for replacement when na_action = "value" - min_gap: Minimum gap size to interpolate/fill - max_gap: Maximum gap size to interpolate/fill |
FFT-based filtering applies a hard cutoff in the frequency domain. This can be advantageous for:
Precise frequency selection
Batch processing of long signals
Cases where sharp frequency cutoffs are desired
Common Applications:
Removing baseline drift: Use low cutoff (0.1-1 Hz)
EMG analysis: Use moderate cutoff (10-20 Hz)
Motion artifact removal: Use application-specific cutoff
Limitations:
May introduce ringing artifacts
Assumes periodic signal (can cause edge effects)
Less suitable for real-time processing
Missing Value Handling: The function uses replace_na() internally for handling missing values. See ?replace_na for detailed information about each method and its parameters. NAs can optionally be restored to their original positions after filtering using keep_na = TRUE.
Numeric vector containing the filtered signal
replace_na
for details on NA handling methods
filter_lowpass_fft
for FFT-based low-pass filtering
filter_highpass
for Butterworth-based filtering
# Generate example signal with drift t <- seq(0, 1, by = 0.001) drift <- 0.5 * t # Linear drift signal <- sin(2*pi*10*t) # 10 Hz signal x <- signal + drift # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "linear", keep_na = TRUE) # Compare with Butterworth filter butter_filtered <- filter_highpass(x, 2, 1000)
# Generate example signal with drift t <- seq(0, 1, by = 0.001) drift <- 0.5 * t # Linear drift signal <- sin(2*pi*10*t) # 10 Hz signal x <- signal + drift # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_highpass_fft(x, cutoff_freq = 2, sampling_rate = 1000, na_action = "linear", keep_na = TRUE) # Compare with Butterworth filter butter_filtered <- filter_highpass(x, 2, 1000)
Implements a Kalman filter for regularly sampled time series data with automatic parameter selection based on sampling rate. The filter handles missing values (NA) and provides noise reduction while preserving real signal changes.
filter_kalman( measurements, sampling_rate, base_Q = NULL, R = NULL, initial_state = NULL, initial_P = NULL )
filter_kalman( measurements, sampling_rate, base_Q = NULL, R = NULL, initial_state = NULL, initial_P = NULL )
measurements |
Numeric vector containing the measurements to be filtered. |
sampling_rate |
Numeric value specifying the sampling rate in Hz (frames per second). |
base_Q |
Optional. Process variance. If NULL, automatically calculated based on sampling_rate. Represents expected rate of change in the true state. |
R |
Optional. Measurement variance. If NULL, defaults to 0.1. Represents the noise level in your measurements. |
initial_state |
Optional. Initial state estimate. If NULL, uses first non-NA measurement. |
initial_P |
Optional. Initial state uncertainty. If NULL, calculated based on sampling_rate. |
The function implements a simple Kalman filter with a constant position model. When parameters are not explicitly provided, they are automatically configured based on the sampling rate:
base_Q scales inversely with sampling rate (base_Q ≈ 0.15/sampling_rate)
R defaults to 0.1 (assuming moderate measurement noise)
initial_P scales with sampling rate uncertainty
Missing values (NA) are handled by relying on the prediction step without measurement updates.
A numeric vector of the same length as measurements containing the filtered values.
Parameter selection guidelines:
Increase R or decrease base_Q for smoother output
Decrease R or increase base_Q for more responsive output
For high-frequency data (>100 Hz), consider reducing base_Q
If you know your sensor's noise characteristics, set R to the square of the standard deviation
filter_kalman_irregular for handling irregularly sampled data
# Basic usage with 60 Hz data measurements <- c(1, 1.1, NA, 0.9, 1.2, NA, 0.8, 1.1) filtered <- filter_kalman(measurements, sampling_rate = 60) # Custom parameters for more aggressive filtering filtered_custom <- filter_kalman(measurements, sampling_rate = 60, base_Q = 0.001, R = 0.2)
# Basic usage with 60 Hz data measurements <- c(1, 1.1, NA, 0.9, 1.2, NA, 0.8, 1.1) filtered <- filter_kalman(measurements, sampling_rate = 60) # Custom parameters for more aggressive filtering filtered_custom <- filter_kalman(measurements, sampling_rate = 60, base_Q = 0.001, R = 0.2)
Implements a Kalman filter for irregularly sampled time series data with optional resampling to regular intervals. Handles variable sampling rates, missing values, and automatically adjusts process variance based on time intervals.
filter_kalman_irregular( measurements, times, base_Q = NULL, R = NULL, initial_state = NULL, initial_P = NULL, resample = FALSE, resample_freq = NULL )
filter_kalman_irregular( measurements, times, base_Q = NULL, R = NULL, initial_state = NULL, initial_P = NULL, resample = FALSE, resample_freq = NULL )
measurements |
Numeric vector containing the measurements to be filtered. |
times |
Numeric vector of timestamps corresponding to measurements. |
base_Q |
Optional. Base process variance per second. If NULL, automatically calculated. |
R |
Optional. Measurement variance. If NULL, defaults to 0.1. |
initial_state |
Optional. Initial state estimate. If NULL, uses first non-NA measurement. |
initial_P |
Optional. Initial state uncertainty. If NULL, calculated from median sampling rate. |
resample |
Logical. Whether to return regularly resampled data (default: FALSE). |
resample_freq |
Numeric. Desired sampling frequency in Hz for resampling (required if resample=TRUE). |
The function implements an adaptive Kalman filter that accounts for irregular sampling intervals. Process variance is scaled by the time difference between measurements, allowing proper uncertainty handling for variable sampling rates.
Key features:
Handles irregular sampling intervals
Scales process variance with time gaps
Optional resampling to regular intervals
Automatic parameter selection based on median sampling rate
Missing value (NA) handling
When resampling, the function uses linear interpolation and warns if the requested sampling frequency exceeds twice the median original sampling rate (Nyquist frequency).
If resample=FALSE: A numeric vector of filtered values corresponding to original timestamps If resample=TRUE: A list containing:
time: Vector of regular timestamps
values: Vector of filtered values at regular timestamps
original_time: Original irregular timestamps
original_values: Filtered values at original timestamps
Resampling considerations:
Avoid resampling above twice the median original sampling rate
Consider the physical meaning of your data when choosing resample_freq
Be cautious of creating artifacts through high-frequency resampling
Parameter selection guidelines:
base_Q controls the expected rate of change per second
R should reflect your measurement noise level
For slow-changing signals, reduce base_Q
For noisy measurements, increase R
filter_kalman for regularly sampled data
# Example with irregular sampling measurements <- c(1, 1.1, NA, 0.9, 1.2, NA, 0.8, 1.1) times <- c(0, 0.1, 0.3, 0.35, 0.5, 0.8, 0.81, 1.0) # Basic filtering with irregular samples filtered <- filter_kalman_irregular(measurements, times) # Filtering with resampling to 50 Hz filtered_resampled <- filter_kalman_irregular(measurements, times, resample = TRUE, resample_freq = 50) # Plot results plot(times, measurements, type="p", col="blue") lines(filtered_resampled$time, filtered_resampled$values, col="red")
# Example with irregular sampling measurements <- c(1, 1.1, NA, 0.9, 1.2, NA, 0.8, 1.1) times <- c(0, 0.1, 0.3, 0.35, 0.5, 0.8, 0.81, 1.0) # Basic filtering with irregular samples filtered <- filter_kalman_irregular(measurements, times) # Filtering with resampling to 50 Hz filtered_resampled <- filter_kalman_irregular(measurements, times, resample = TRUE, resample_freq = 50) # Plot results plot(times, measurements, type="p", col="blue") lines(filtered_resampled$time, filtered_resampled$values, col="red")
This function applies a lowpass Butterworth filter to a signal using forward-backward filtering (filtfilt) to achieve zero phase distortion. The Butterworth filter is maximally flat in the passband, making it ideal for many signal processing applications.
filter_lowpass( x, cutoff_freq, sampling_rate, order = 4, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
filter_lowpass( x, cutoff_freq, sampling_rate, order = 4, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
x |
Numeric vector containing the signal to be filtered |
cutoff_freq |
Cutoff frequency in Hz. Frequencies below this value are passed, while frequencies above are attenuated. Should be between 0 and sampling_rate/2. |
sampling_rate |
Sampling rate of the signal in Hz. Must be at least twice the highest frequency component in the signal (Nyquist criterion). |
order |
Filter order (default = 4). Controls the steepness of frequency rolloff: - Higher orders give sharper cutoffs but may introduce more ringing - Lower orders give smoother transitions but less steep rolloff - Common values in practice are 2-8 - Values above 8 are rarely used due to numerical instability |
na_action |
Method to handle NA values before filtering. One of: - "linear": Linear interpolation (default) - "spline": Spline interpolation for smoother curves - "stine": Stineman interpolation preserving data shape - "locf": Last observation carried forward - "value": Replace with a constant value - "error": Raise an error if NAs are present |
keep_na |
Logical indicating whether to restore NAs to their original positions after filtering (default = FALSE) |
... |
Additional arguments passed to replace_na(). Common options include: - value: Numeric value for replacement when na_action = "value" - min_gap: Minimum gap size to interpolate/fill - max_gap: Maximum gap size to interpolate/fill |
The Butterworth filter response falls off at -6*order dB/octave. The cutoff frequency corresponds to the -3dB point of the filter's magnitude response.
Parameter Selection Guidelines:
cutoff_freq: Choose based on the frequency content you want to preserve
sampling_rate: Should match your data collection rate
order:
order=2: Gentle rolloff, minimal ringing (~12 dB/octave)
order=4: Standard choice, good balance (~24 dB/octave)
order=6: Steeper rolloff, some ringing (~36 dB/octave)
order=8: Very steep, may have significant ringing (~48 dB/octave) Note: For very low cutoff frequencies (<0.001 of Nyquist), order is automatically reduced to 2 to maintain stability.
Common values by field:
Biomechanics: order=2 or 4
EEG/MEG: order=4 or 6
Audio processing: order=2 to 8
Mechanical vibrations: order=2 to 4
Missing Value Handling: The function uses replace_na() internally for handling missing values. See ?replace_na for detailed information about each method and its parameters. NAs can optionally be restored to their original positions after filtering using keep_na = TRUE.
Numeric vector containing the filtered signal
Butterworth, S. (1930). On the Theory of Filter Amplifiers. Wireless Engineer, 7, 536-541.
replace_na
for details on NA handling methods
filter_highpass
for high-pass filtering
butter
for Butterworth filter design
filtfilt
for zero-phase digital filtering
# Generate example signal: 2 Hz fundamental + 50 Hz noise t <- seq(0, 1, by = 0.001) x <- sin(2*pi*2*t) + 0.5*sin(2*pi*50*t) # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "linear", keep_na = TRUE)
# Generate example signal: 2 Hz fundamental + 50 Hz noise t <- seq(0, 1, by = 0.001) x <- sin(2*pi*2*t) + 0.5*sin(2*pi*50*t) # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_lowpass(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "linear", keep_na = TRUE)
This function implements a lowpass filter using the Fast Fourier Transform (FFT). It provides a sharp frequency cutoff but may introduce ringing artifacts (Gibbs phenomenon).
filter_lowpass_fft( x, cutoff_freq, sampling_rate, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
filter_lowpass_fft( x, cutoff_freq, sampling_rate, na_action = c("linear", "spline", "stine", "locf", "value", "error"), keep_na = FALSE, ... )
x |
Numeric vector containing the signal to be filtered |
cutoff_freq |
Cutoff frequency in Hz. Frequencies below this value are passed, while frequencies above are attenuated. Should be between 0 and sampling_rate/2. |
sampling_rate |
Sampling rate of the signal in Hz. Must be at least twice the highest frequency component in the signal (Nyquist criterion). |
na_action |
Method to handle NA values before filtering. One of: - "linear": Linear interpolation (default) - "spline": Spline interpolation for smoother curves - "stine": Stineman interpolation preserving data shape - "locf": Last observation carried forward - "value": Replace with a constant value - "error": Raise an error if NAs are present |
keep_na |
Logical indicating whether to restore NAs to their original positions after filtering (default = FALSE) |
... |
Additional arguments passed to replace_na(). Common options include: - value: Numeric value for replacement when na_action = "value" - min_gap: Minimum gap size to interpolate/fill - max_gap: Maximum gap size to interpolate/fill |
FFT-based filtering applies a hard cutoff in the frequency domain. This can be advantageous for:
Precise frequency selection
Batch processing of long signals
Cases where sharp frequency cutoffs are desired
Limitations:
May introduce ringing artifacts
Assumes periodic signal (can cause edge effects)
Less suitable for real-time processing
Missing Value Handling: The function uses replace_na() internally for handling missing values. See ?replace_na for detailed information about each method and its parameters. NAs can optionally be restored to their original positions after filtering using keep_na = TRUE.
Numeric vector containing the filtered signal
replace_na
for details on NA handling methods
filter_highpass_fft
for FFT-based high-pass filtering
filter_lowpass
for Butterworth-based filtering
# Generate example signal with mixed frequencies t <- seq(0, 1, by = 0.001) x <- sin(2*pi*2*t) + sin(2*pi*50*t) # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "linear", keep_na = TRUE) # Compare with Butterworth filter butter_filtered <- filter_lowpass(x, 5, 1000)
# Generate example signal with mixed frequencies t <- seq(0, 1, by = 0.001) x <- sin(2*pi*2*t) + sin(2*pi*50*t) # Add some NAs x[sample(length(x), 10)] <- NA # Basic filtering with linear interpolation for NAs filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000) # Using spline interpolation with max gap constraint filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "spline", max_gap = 3) # Replace NAs with zeros before filtering filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "value", value = 0) # Filter but keep NAs in their original positions filtered <- filter_lowpass_fft(x, cutoff_freq = 5, sampling_rate = 1000, na_action = "linear", keep_na = TRUE) # Compare with Butterworth filter butter_filtered <- filter_lowpass(x, 5, 1000)
Applies smoothing filters to movement tracking data to reduce noise.
filter_movement( data, method = c("rollmedian", "rollmean", "kalman", "sgolay", "lowpass", "highpass", "lowpass_fft", "highpass_fft"), use_derivatives = FALSE, ... )
filter_movement( data, method = c("rollmedian", "rollmean", "kalman", "sgolay", "lowpass", "highpass", "lowpass_fft", "highpass_fft"), use_derivatives = FALSE, ... )
data |
A data frame containing movement tracking data with the following required columns:
|
method |
Character string specifying the smoothing method. Options:
|
use_derivatives |
Filter on the derivative values instead of coordinates (important for e.g. trackball or accelerometer data) |
... |
Additional arguments passed to the specific filter function |
This function is a wrapper that applies various filtering methods to x and y (and z if present) coordinates. Each filtering method has its own specific parameters - see the documentation of individual filter functions for details:
filter_kalman()
: Kalman filter parameters
filter_sgolay()
: Savitzky-Golay filter parameters
filter_lowpass()
: Low-pass filter parameters
filter_highpass()
: High-pass filter parameters
filter_lowpass_fft()
: FFT-based low-pass filter parameters
filter_highpass_fft()
: FFT-based high-pass filter parameters
filter_rollmean()
: Rolling mean parameters (window_width, min_obs)
filter_rollmedian()
: Rolling median parameters (window_width, min_obs)
A data frame with the same structure as the input, but with smoothed coordinates.
## Not run: # Apply rolling median with window of 5 filter_movement(tracking_data, "rollmedian", window_width = 5, min_obs = 1) ## End(Not run)
## Not run: # Apply rolling median with window of 5 filter_movement(tracking_data, "rollmedian", window_width = 5, min_obs = 1) ## End(Not run)
This function replaces values in columns x
, y
, and confidence
with NA
if the confidence values are below a specified threshold.
filter_na_confidence(data, threshold = 0.6)
filter_na_confidence(data, threshold = 0.6)
data |
A data frame containing the columns |
threshold |
A numeric value specifying the minimum confidence level to retain data. Default is 0.6. |
A data frame with the same structure as the input, but where x
, y
,
and confidence
values are replaced with NA
if the confidence is below the threshold.
library(dplyr) data <- dplyr::tibble( x = 1:5, y = 6:10, confidence = c(0.5, 0.7, 0.4, 0.8, 0.9) ) filter_na_confidence(data, threshold = 0.6)
library(dplyr) data <- dplyr::tibble( x = 1:5, y = 6:10, confidence = c(0.5, 0.7, 0.4, 0.8, 0.9) ) filter_na_confidence(data, threshold = 0.6)
Filters out coordinates that fall outside a specified region of interest by setting them to NA. The ROI can be either rectangular (defined by min/max coordinates) or circular (defined by center and radius).
filter_na_roi( data, x_min = NULL, x_max = NULL, y_min = NULL, y_max = NULL, x_center = NULL, y_center = NULL, radius = NULL )
filter_na_roi( data, x_min = NULL, x_max = NULL, y_min = NULL, y_max = NULL, x_center = NULL, y_center = NULL, radius = NULL )
data |
A data frame containing 'x' and 'y' coordinates |
x_min |
Minimum x-coordinate for rectangular ROI |
x_max |
Maximum x-coordinate for rectangular ROI |
y_min |
Minimum y-coordinate for rectangular ROI |
y_max |
Maximum y-coordinate for rectangular ROI |
x_center |
x-coordinate of circle center for circular ROI |
y_center |
y-coordinate of circle center for circular ROI |
radius |
Radius of circular ROI |
A data frame with coordinates outside ROI set to NA
# Create sample data sample_data <- expand.grid( x = seq(0, 100, by = 10), y = seq(0, 100, by = 10) ) |> as.data.frame() # Rectangular ROI example sample_data |> filter_na_roi(x_min = 20, x_max = 80, y_min = 20, y_max = 80) # Circular ROI example sample_data |> filter_na_roi(x_center = 50, y_center = 50, radius = 25)
# Create sample data sample_data <- expand.grid( x = seq(0, 100, by = 10), y = seq(0, 100, by = 10) ) |> as.data.frame() # Rectangular ROI example sample_data |> filter_na_roi(x_min = 20, x_max = 80, y_min = 20, y_max = 80) # Circular ROI example sample_data |> filter_na_roi(x_center = 50, y_center = 50, radius = 25)
This function filters out values in a dataset where the calculated speed exceeds
a specified threshold. Values for x
, y
, and confidence
are replaced with
NA
if their corresponding speed exceeds the threshold. Speed is calculated
using the calculate_kinematics
function.
filter_na_speed(data, threshold = "auto")
filter_na_speed(data, threshold = "auto")
data |
A data frame containing the following required columns:
|
threshold |
A numeric value specifying the speed threshold, or "auto".
|
The speed is calculated using the calculate_kinematics
function, which
computes translational velocity (v_translation
) and other kinematic parameters.
When using threshold = "auto"
, the function calculates the threshold as the
mean speed plus three standard deviations, which assumes normally distributed speeds.
A data frame with the same columns as the input data
, but with
values replaced by NA
where the speed exceeds the threshold.
## Not run: data <- dplyr::tibble( time = 1:5, x = c(1, 2, 4, 7, 11), y = c(1, 1, 2, 3, 5), confidence = c(0.8, 0.9, 0.7, 0.85, 0.6) ) # Filter data by a speed threshold of 3 filter_by_speed(data, threshold = 3) # Use automatic threshold filter_by_speed(data, threshold = "auto") ## End(Not run)
## Not run: data <- dplyr::tibble( time = 1:5, x = c(1, 2, 4, 7, 11), y = c(1, 1, 2, 3, 5), confidence = c(0.8, 0.9, 0.7, 0.85, 0.6) ) # Filter data by a speed threshold of 3 filter_by_speed(data, threshold = 3) # Use automatic threshold filter_by_speed(data, threshold = "auto") ## End(Not run)
Applies a rolling mean filter to a numeric vector using the roll package.
filter_rollmean(x, window_width = 5, min_obs = 1, ...)
filter_rollmean(x, window_width = 5, min_obs = 1, ...)
x |
Numeric vector to filter |
window_width |
Integer specifying window size for rolling calculation |
min_obs |
Minimum number of non-NA values required (default: 1) |
... |
Additional parameters to be passed to |
Filtered numeric vector
Applies a rolling median filter to a numeric vector using the roll package.
filter_rollmedian(x, window_width = 5, min_obs = 1, ...)
filter_rollmedian(x, window_width = 5, min_obs = 1, ...)
x |
Numeric vector to filter |
window_width |
Integer specifying window size for rolling calculation |
min_obs |
Minimum number of non-NA values required (default: 1) |
... |
Additional parameters to be passed to |
Filtered numeric vector
This function applies a Savitzky-Golay filter to smooth movement data while preserving higher moments (peaks, valleys) better than moving average filters. The implementation uses zero-phase filtering to prevent temporal shifts in the data.
filter_sgolay( x, sampling_rate, window_size = ceiling(sampling_rate/10) * 2 + 1, order = 3, preserve_edges = FALSE, na_action = "linear", keep_na = FALSE, ... )
filter_sgolay( x, sampling_rate, window_size = ceiling(sampling_rate/10) * 2 + 1, order = 3, preserve_edges = FALSE, na_action = "linear", keep_na = FALSE, ... )
x |
Numeric vector containing the movement data to be filtered |
sampling_rate |
Sampling rate of the data in Hz. Must match your data collection rate (e.g., 60 for 60 FPS motion capture). |
window_size |
Window size in samples (must be odd). Controls the amount of smoothing. Larger windows give more smoothing but may over-attenuate genuine movement features. Default is automatically calculated as sampling_rate/10 (rounded up to nearest odd number). |
order |
Polynomial order (default = 3). Controls how well the filter preserves higher-order moments in the data: - order=2: Preserves position, velocity (good for smooth movements) - order=3: Also preserves acceleration (good for most movement data) - order=4: Also preserves jerk (good for quick movements) - order=5: Maximum preservation (may retain too much noise) |
preserve_edges |
Logical indicating whether to use progressively smaller windows at the beginning and end of the signal to reduce edge effects (default = FALSE). Note: This only affects the signal endpoints, not internal discontinuities. |
na_action |
Method to handle NA values before filtering. One of: - "linear": Linear interpolation (default) - "spline": Spline interpolation for smoother curves - "locf": Last observation carried forward - "value": Replace with a constant value - "error": Raise an error if NAs are present |
keep_na |
Logical indicating whether to restore NAs to their original positions after filtering (default = FALSE) |
... |
Additional arguments passed to replace_na() |
The Savitzky-Golay filter fits successive polynomials to sliding windows of the data. This approach preserves higher moments of the data better than simple moving averages or Butterworth filters, making it particularly suitable for movement data where preserving features like peaks and valleys is important.
Edge Handling: When preserve_edges = TRUE, the function uses progressively smaller windows near the beginning and end of the signal to reduce endpoint distortion. This only affects the signal endpoints - it does not detect or handle internal discontinuities or sharp events within the data.
Parameter Selection Guidelines:
window_size:
For 60 FPS: 5-15 frames (83-250ms) for quick movements, 15-31 for slow movements
For 120 FPS: 7-21 frames (58-175ms) for quick movements, 21-51 for slow movements
For 500 FPS: 25-75 frames (50-150ms) for quick movements, 75-151 for slow movements The default window_size = sampling_rate/10 works well for typical human movement.
order:
order=2: Smooth movements, position analysis
order=3: Most movement analysis (default)
order=4: Quick movements, sports analysis
order=5: Very quick movements, impact analysis Note: order must be less than window_size
Common values by application:
Gait analysis (60 FPS): window_size=15, order=3
Sports biomechanics (120 FPS): window_size=21, order=4
Impact analysis (500 FPS): window_size=51, order=4
Posture analysis (60 FPS): window_size=31, order=2
Numeric vector containing the filtered movement data
Savitzky, A., & Golay, M.J.E. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 36(8), 1627-1639.
filter_lowpass
for frequency-based filtering
sgolayfilt
for the base Savitzky-Golay implementation
replace_na
for details on NA handling methods
# Generate example movement data: smooth motion + noise t <- seq(0, 5, by = 1/60) # 60 FPS data x <- sin(2*pi*0.5*t) + rnorm(length(t), 0, 0.1) # Basic filtering with default parameters (60 FPS) filtered <- filter_sgolay(x, sampling_rate = 60) # Adjusting parameters for quick movements filtered_quick <- filter_sgolay(x, sampling_rate = 60, window_size = 11, order = 4) # High-speed camera data (500 FPS) with larger window filtered_high <- filter_sgolay(x, sampling_rate = 500, window_size = 51, order = 3)
# Generate example movement data: smooth motion + noise t <- seq(0, 5, by = 1/60) # 60 FPS data x <- sin(2*pi*0.5*t) + rnorm(length(t), 0, 0.1) # Basic filtering with default parameters (60 FPS) filtered <- filter_sgolay(x, sampling_rate = 60) # Adjusting parameters for quick movements filtered_quick <- filter_sgolay(x, sampling_rate = 60, window_size = 11, order = 4) # High-speed camera data (500 FPS) with larger window filtered_high <- filter_sgolay(x, sampling_rate = 500, window_size = 51, order = 3)
This function calculates the optimal lag between two time series by finding the lag that maximizes their cross-correlation. It's particularly useful for synchronizing recordings from different sources, such as physiological and behavioral data.
find_lag(signal, reference, max_lag = 5000, normalize = TRUE)
find_lag(signal, reference, max_lag = 5000, normalize = TRUE)
signal |
Time series to align (numeric vector) |
reference |
Reference time series to align against (numeric vector) |
max_lag |
Maximum lag to consider in both directions, in number of samples. If NULL, uses (length of series - 1) |
normalize |
Logical; if TRUE, z-score normalizes both series before computing cross-correlation (recommended for series with different scales) |
Integer indicating the optimal lag. A positive value means the signal needs to be shifted forward in time to align with the reference. A negative value means the signal needs to be shifted backward.
align_timeseries
for applying the computed lag
# Create two artificially shifted sine waves t <- seq(0, 10, 0.1) reference <- sin(t) signal <- sin(t - 0.5) # Signal delayed by 0.5 units lag <- find_lag(signal, reference) print(lag) # Should be approximately 5 samples (0.5 units)
# Create two artificially shifted sine waves t <- seq(0, 10, 0.1) reference <- sin(t) signal <- sin(t - 0.5) # Signal delayed by 0.5 units lag <- find_lag(signal, reference) print(lag) # Should be approximately 5 samples (0.5 units)
Identifies peaks (local maxima) in a numeric time series, with options to filter peaks based on height and prominence. The function handles missing values (NA) appropriately and is compatible with dplyr's mutate. Includes flexible handling of plateaus and adjustable window size for peak detection.
find_peaks( x, min_height = -Inf, min_prominence = 0, plateau_handling = c("strict", "middle", "first", "last", "all"), window_size = 3 )
find_peaks( x, min_height = -Inf, min_prominence = 0, plateau_handling = c("strict", "middle", "first", "last", "all"), window_size = 3 )
x |
Numeric vector containing the time series data |
min_height |
Minimum height threshold for peaks (default: -Inf) |
min_prominence |
Minimum prominence threshold for peaks (default: 0) |
plateau_handling |
String specifying how to handle plateaus. One of:
|
window_size |
Integer specifying the size of the window to use for peak detection (default: 3). Must be odd and >= 3. Larger values detect peaks over wider ranges. |
The function uses a sliding window algorithm for peak detection (window size specified by window_size parameter), combined with a region-based prominence calculation method similar to that described in Palshikar (2009).
A logical vector of the same length as the input where:
TRUE
indicates a confirmed peak
FALSE
indicates a non-peak
NA
indicates peak status could not be determined due to missing data
A point is considered a peak if it is the highest point within its window (default window_size of 3 compares each point with its immediate neighbors). The first and last (window_size-1)/2 points in the series cannot be peaks and are marked as NA. Larger window sizes will identify peaks that dominate over a wider range, typically resulting in fewer peaks being detected.
Prominence measures how much a peak stands out relative to its surrounding values. It is calculated as the height of the peak minus the height of the highest minimum between this peak and any higher peaks (or the end of the series if no higher peaks exist).
Plateaus (sequences of identical values) are handled according to the plateau_handling parameter:
strict: No points in a plateau are considered peaks (traditional behavior)
middle: For plateaus of odd length, the middle point is marked as a peak. For plateaus of even length, the two middle points are marked as peaks.
first: The first point of each plateau is marked as a peak
last: The last point of each plateau is marked as a peak
all: Every point in the plateau is marked as a peak
Note that in all cases, the plateau must still qualify as a peak relative to its surrounding window (i.e., higher than all other points in the window).
The function uses the following rules for handling NAs:
If a point is NA, it cannot be a peak (returns NA)
If any point in the window is NA, peak status cannot be determined (returns NA)
For prominence calculations, stretches of NAs are handled appropriately
A minimum of window_size points is required; shorter series return all NAs
The function is optimized for use with dplyr's mutate
For noisy data, consider using a larger window_size or smoothing the series before peak detection
Adjust min_height and min_prominence to filter out unwanted peaks
Choose plateau_handling based on your specific needs
Larger window_size values result in more stringent peak detection
Palshikar, G. (2009). Simple Algorithms for Peak Detection in Time-Series. Proc. 1st Int. Conf. Advanced Data Analysis, Business Analytics and Intelligence.
find_troughs
for finding local minima
findpeaks
in the pracma package for alternative peak detection methods
# Basic usage with default window size (3) x <- c(1, 3, 2, 6, 4, 5, 2) find_peaks(x) # With larger window size find_peaks(x, window_size = 5) # More stringent peak detection # With minimum height find_peaks(x, min_height = 4, window_size = 3) # With plateau handling x <- c(1, 3, 3, 3, 2, 4, 4, 1) find_peaks(x, plateau_handling = "middle", window_size = 3) # Middle of plateaus find_peaks(x, plateau_handling = "all", window_size = 5) # All plateau points # With missing values x <- c(1, 3, NA, 6, 4, NA, 2) find_peaks(x) # Usage with dplyr library(dplyr) data_frame( time = 1:10, value = c(1, 3, 7, 4, 2, 6, 5, 8, 4, 2) ) %>% mutate(peaks = find_peaks(value, window_size = 3))
# Basic usage with default window size (3) x <- c(1, 3, 2, 6, 4, 5, 2) find_peaks(x) # With larger window size find_peaks(x, window_size = 5) # More stringent peak detection # With minimum height find_peaks(x, min_height = 4, window_size = 3) # With plateau handling x <- c(1, 3, 3, 3, 2, 4, 4, 1) find_peaks(x, plateau_handling = "middle", window_size = 3) # Middle of plateaus find_peaks(x, plateau_handling = "all", window_size = 5) # All plateau points # With missing values x <- c(1, 3, NA, 6, 4, NA, 2) find_peaks(x) # Usage with dplyr library(dplyr) data_frame( time = 1:10, value = c(1, 3, 7, 4, 2, 6, 5, 8, 4, 2) ) %>% mutate(peaks = find_peaks(value, window_size = 3))
Identifies troughs (local minima) in a numeric time series, with options to filter troughs based on height and prominence. The function handles missing values (NA) appropriately and is compatible with dplyr's mutate. Includes flexible handling of plateaus and adjustable window size for trough detection.
find_troughs( x, max_height = Inf, min_prominence = 0, plateau_handling = c("strict", "middle", "first", "last", "all"), window_size = 3 )
find_troughs( x, max_height = Inf, min_prominence = 0, plateau_handling = c("strict", "middle", "first", "last", "all"), window_size = 3 )
x |
Numeric vector containing the time series data |
max_height |
Maximum height threshold for troughs (default: Inf) |
min_prominence |
Minimum prominence threshold for troughs (default: 0) |
plateau_handling |
String specifying how to handle plateaus. One of:
|
window_size |
Integer specifying the size of the window to use for trough detection (default: 3). Must be odd and >= 3. Larger values detect troughs over wider ranges. |
The function uses a sliding window algorithm for trough detection (window size specified by window_size parameter), combined with a region-based prominence calculation method similar to that described in Palshikar (2009).
A logical vector of the same length as the input where:
TRUE
indicates a confirmed trough
FALSE
indicates a non-trough
NA
indicates trough status could not be determined due to missing data
A point is considered a trough if it is the lowest point within its window (default window_size of 3 compares each point with its immediate neighbors). The first and last (window_size-1)/2 points in the series cannot be troughs and are marked as NA. Larger window sizes will identify troughs that dominate over a wider range, typically resulting in fewer troughs being detected.
Prominence measures how much a trough stands out relative to its surrounding values. It is calculated as the height of the lowest maximum between this trough and any lower troughs (or the end of the series if no lower troughs exist) minus the height of the trough.
Plateaus (sequences of identical values) are handled according to the plateau_handling parameter:
strict: No points in a plateau are considered troughs (traditional behavior)
middle: For plateaus of odd length, the middle point is marked as a trough. For plateaus of even length, the two middle points are marked as troughs.
first: The first point of each plateau is marked as a trough
last: The last point of each plateau is marked as a trough
all: Every point in the plateau is marked as a trough
Note that in all cases, the plateau must still qualify as a trough relative to its surrounding window (i.e., lower than all other points in the window).
The function uses the following rules for handling NAs:
If a point is NA, it cannot be a trough (returns NA)
If any point in the window is NA, trough status cannot be determined (returns NA)
For prominence calculations, stretches of NAs are handled appropriately
A minimum of window_size points is required; shorter series return all NAs
The function is optimized for use with dplyr's mutate
For noisy data, consider using a larger window_size or smoothing the series before trough detection
Adjust max_height and min_prominence to filter out unwanted troughs
Choose plateau_handling based on your specific needs
Larger window_size values result in more stringent trough detection
Palshikar, G. (2009). Simple Algorithms for Peak Detection in Time-Series. Proc. 1st Int. Conf. Advanced Data Analysis, Business Analytics and Intelligence.
find_peaks
for finding local maxima
findpeaks
in the pracma package for alternative extrema detection methods
# Basic usage with default window size (3) x <- c(5, 3, 4, 1, 4, 2, 5) find_troughs(x) # With larger window size find_troughs(x, window_size = 5) # More stringent trough detection # With maximum height find_troughs(x, max_height = 3, window_size = 3) # With plateau handling x <- c(5, 2, 2, 2, 3, 1, 1, 4) find_troughs(x, plateau_handling = "middle", window_size = 3) # Middle of plateaus find_troughs(x, plateau_handling = "all", window_size = 5) # All plateau points # With missing values x <- c(5, 3, NA, 1, 4, NA, 5) find_troughs(x) # Usage with dplyr library(dplyr) data_frame( time = 1:10, value = c(5, 3, 1, 4, 2, 1, 3, 0, 4, 5) ) %>% mutate(troughs = find_troughs(value, window_size = 3))
# Basic usage with default window size (3) x <- c(5, 3, 4, 1, 4, 2, 5) find_troughs(x) # With larger window size find_troughs(x, window_size = 5) # More stringent trough detection # With maximum height find_troughs(x, max_height = 3, window_size = 3) # With plateau handling x <- c(5, 2, 2, 2, 3, 1, 1, 4) find_troughs(x, plateau_handling = "middle", window_size = 3) # Middle of plateaus find_troughs(x, plateau_handling = "all", window_size = 5) # All plateau points # With missing values x <- c(5, 3, NA, 1, 4, NA, 5) find_troughs(x) # Usage with dplyr library(dplyr) data_frame( time = 1:10, value = c(5, 3, 1, 4, 2, 1, 3, 0, 4, 5) ) %>% mutate(troughs = find_troughs(value, window_size = 3))
Downloads example data for different animal tracking software and returns the path to the downloaded file. The function caches the data to avoid repeated downloads.
get_example_data(source, cache_dir = tempdir())
get_example_data(source, cache_dir = tempdir())
source |
Character string specifying the tracking software. Currently supported:
|
cache_dir |
Character string specifying the directory where to cache the downloaded
files. Defaults to a temporary directory using |
The function downloads example data from a GitHub repository and caches it locally. If the file already exists in the cache directory, it will use the cached version instead of downloading it again.
The data sources are hosted at: https://github.com/roaldarbol/movement-data
Character string with the path to the downloaded file.
## Not run: # Get path to DeepLabCut example data path <- get_example_data("deeplabcut") # Read the data using preferred method data <- read_deeplabcut(path) ## End(Not run)
## Not run: # Get path to DeepLabCut example data path <- get_example_data("deeplabcut") # Read the data using preferred method data <- read_deeplabcut(path) ## End(Not run)
get_metadata(data)
get_metadata(data)
data |
movement data frame |
the metadata associated with the movement data frame
Sometimes your sampling rate is too high; group_every allows you to
down-sample by creating "bins" which can subsequently be summarised on. When using n
, data
needs to be regularly sampled; if there are gaps in time, the bin duration will differ.
Works well with calculate_summary()
for movement data.
group_every(data, seconds = NULL, n = NULL)
group_every(data, seconds = NULL, n = NULL)
data |
Input data frame |
seconds |
Number of seconds to bin together |
n |
Number of observations to include in each bin/group |
Grouped data frame, with new "bin" variable.
## Group by every 5 seconds df_time <- data.frame( time = seq(from = 0.02, to = 100, by = 1/30), # time at 30Hz, slightly offset y = rnorm(3000)) # random numbers df_time |> group_every(seconds = 5) |> # group for every 5 seconds dplyr::summarise(time = min(time), # summarise for time and y mean_y = mean(y)) |> dplyr::mutate(time = floor(time)) # floor to get the round second number # Group every n observations df <- data.frame( x = seq(1:1000), y = rnorm(1000)) df |> group_every(n = 30) |> # group every 30 observations together dplyr::summarise(mean_x = mean(x), mean_y = mean(y))
## Group by every 5 seconds df_time <- data.frame( time = seq(from = 0.02, to = 100, by = 1/30), # time at 30Hz, slightly offset y = rnorm(3000)) # random numbers df_time |> group_every(seconds = 5) |> # group for every 5 seconds dplyr::summarise(time = min(time), # summarise for time and y mean_y = mean(y)) |> dplyr::mutate(time = floor(time)) # floor to get the round second number # Group every n observations df <- data.frame( x = seq(1:1000), y = rnorm(1000)) df |> group_every(n = 30) |> # group every 30 observations together dplyr::summarise(mean_x = mean(x), mean_y = mean(y))
init_metadata(data)
init_metadata(data)
data |
movement data frame |
data frame with metadata
Map from polar to Cartesian coordinates
map_to_cartesian(data)
map_to_cartesian(data)
data |
movement data frame with polar coordinates |
movement data frame with Cartesian coordinates
Map from Cartesian to polar coordinates
map_to_polar(data)
map_to_polar(data)
data |
movement data frame with Cartesian coordinates |
movement data frame with polar coordinates
Creates a multi-panel visualization of keypoint position data over time. Each keypoint gets its own panel showing its x and/or y coordinates, with different colors distinguishing between x (orange) and y (blue) coordinates. Useful for visually inspecting movement patterns and identifying potential tracking issues.
plot_position_timeseries(data, reference_keypoint = NULL, dimension = "xy")
plot_position_timeseries(data, reference_keypoint = NULL, dimension = "xy")
data |
A data frame containing tracked keypoint data with the following columns:
|
reference_keypoint |
Optional character string. If provided, all coordinates will be translated relative to this keypoint's position. Must match one of the keypoint levels in the data. |
dimension |
Character string specifying which coordinates to plot. Options are:
|
A ggplot object combining individual time series plots for each keypoint using patchwork. The plots are stacked vertically with shared axes and legends.
translate_coords()
for the coordinate translation functionality used when
reference_keypoint
is specified.
## Not run: # Plot all coordinates check_timeseries(movement_data) # Plot coordinates relative to "head" keypoint check_timeseries(movement_data, reference_keypoint = "head") # Plot only x coordinates check_timeseries(movement_data, dimension = "x") ## End(Not run)
## Not run: # Plot all coordinates check_timeseries(movement_data) # Plot coordinates relative to "head" keypoint check_timeseries(movement_data, reference_keypoint = "head") # Plot only x coordinates check_timeseries(movement_data, dimension = "x") ## End(Not run)
Creates a multi-panel visualization of keypoint speed data over time. Each keypoint gets its own panel showing its speed, useful for analyzing movement intensity and identifying potential tracking issues.
plot_speed_timeseries(data, y_max = NULL)
plot_speed_timeseries(data, y_max = NULL)
data |
A data frame containing tracked keypoint data with the following columns:
|
y_max |
Optional numeric value specifying the maximum value for the y-axis. If NULL (default), the y-axis limit is automatically determined from the data. |
A ggplot object combining individual time series plots for each keypoint using patchwork. The plots are stacked vertically with shared axes and legends.
plot_position_timeseries()
for plotting position data
calculate_speed()
for the speed calculation
## Not run: # Plot with automatic y-axis scaling plot_speed_timeseries(movement_data) # Plot with fixed maximum speed of 100 plot_speed_timeseries(movement_data, y_max = 100) ## End(Not run)
## Not run: # Plot with automatic y-axis scaling plot_speed_timeseries(movement_data) # Plot with fixed maximum speed of 100 plot_speed_timeseries(movement_data, y_max = 100) ## End(Not run)
Read a data frame from AnimalTA
read_animalta(path, detailed = FALSE)
read_animalta(path, detailed = FALSE)
path |
An AnimalTA data frame |
detailed |
Animal export either raw (default) or detailed data files. We only have limited support for detailed data. |
a movement dataframe
Chiara, V., & Kim, S.-Y. (2023). AnimalTA: A highly flexible and easy-to-use program for tracking and analysing animal movement in different environments. Methods in Ecology and Evolution, 14, 1699–1707. doi:0.1111/2041-210X.14115.
Read a Bonsai data frame
read_bonsai(path)
read_bonsai(path)
path |
Path to a Bonsai data file |
a movement dataframe
Read csv files from DeepLabCut (DLC). The function recognises whether it is a single- or multi-animal dataset.
read_deeplabcut(path, multianimal = NULL)
read_deeplabcut(path, multianimal = NULL)
path |
Path to a DeepLabCut data file |
multianimal |
By default, whether a file is multi-animal is detected automatically. This gives an option to ensure it. logical TRUE/FALSE. |
a movement dataframe
Read idtracker.ai data
read_idtracker(path, path_probabilities = NULL, version = 6)
read_idtracker(path, path_probabilities = NULL, version = 6)
path |
Path to an idtracker.ai data frame |
path_probabilities |
Path to a csv file with probabilities. Only needed if you are reading csv files as they are included in h5 files. |
version |
idtracker.ai version. Currently only v6 output is implemented |
a movement dataframe
Read csv files from LightningPose (LP).
read_lightningpose(path)
read_lightningpose(path)
path |
Path to a LightningPose data file |
a movement dataframe
read_movement(data)
read_movement(data)
data |
A movement data frame |
a movement dataframe
Read SLEAP data
read_sleap(path)
read_sleap(path)
path |
A SLEAP analysis data frame in HDF5 (.h5) format |
a movement dataframe
Read trackball data from a variety of setups and configurations.
read_trackball( paths, setup = c("of_free", "of_fixed", "fictrac"), sampling_rate, col_time = "time", col_dx = "x", col_dy = "y", ball_calibration = NULL, ball_diameter = NULL, distance_scale = NULL, distance_unit = NULL, verbose = FALSE )
read_trackball( paths, setup = c("of_free", "of_fixed", "fictrac"), sampling_rate, col_time = "time", col_dx = "x", col_dy = "y", ball_calibration = NULL, ball_diameter = NULL, distance_scale = NULL, distance_unit = NULL, verbose = FALSE )
paths |
Two file paths, one for each sensor (although one is allowed for a fixed setup, |
setup |
Which type of experimental setup was used. Expects either |
sampling_rate |
Sampling rate tells the function how long time it should integrate over. A sampling rate of 60(Hz) will mean windows of 1/60 sec are used to integrate over. |
col_time |
Which column contains the information about time. Can be specified either by the column number (numeric) or the name of the column if it has one (character). Should either be a datetime (POSIXt) or seconds (numeric). |
col_dx |
Column name for x-axis values |
col_dy |
Column name for y-axis values |
ball_calibration |
When running an |
ball_diameter |
When running a |
distance_scale |
If using computer mice, you might be getting unit-less data out. However, computer mice have a factor called "dots-per-cm", which you can use to convert your estimates into centimeters. |
distance_unit |
Which unit should be used. If |
verbose |
If |
a movement dataframe
read_treadmill(data)
read_treadmill(data)
data |
A treadmill data frame |
a movement dataframe
Reads and formats movement tracking data exported from TRex (Walter & Couzin, 2021). TRex is a software for tracking animal movement in videos, which exports coordinate data in CSV format. This function processes these files into a standardized movement data format.
read_trex(path)
read_trex(path)
path |
Character string specifying the path to a TRex CSV file. The file should contain columns for:
|
The function performs several processing steps:
Validates the input file format (must be CSV)
Reads the data using vroom for efficient processing
Cleans column names to a consistent format
Restructures the data from wide to long format
Initializes metadata fields required for movement data
A data frame containing movement data with the following columns:
time
: Time values from the tracking
individual
: Factor (set to NA, as TRex tracks one individual)
keypoint
: Factor identifying tracked points (e.g., "head", "centroid")
x
: x-coordinates in centimeters
y
: y-coordinates in centimeters
confidence
: Numeric confidence values (set to NA as TRex doesn't provide these)
Walter, T., & Couzin, I. D. (2021). TRex, a fast multi-animal tracking system with markerless identification, and 2D estimation of posture and visual fields. eLife, 10, e64000.
init_metadata()
for details on metadata initialization
TRex software: https://trex.run
## Not run: # Read a TRex CSV file data <- read_trex("path/to/trex_export.csv") ## End(Not run)
## Not run: # Read a TRex CSV file data <- read_trex("path/to/trex_export.csv") ## End(Not run)
A wrapper function that replaces missing values using various interpolation or filling methods.
replace_na(x, method = "linear", value = NULL, min_gap = 1, max_gap = Inf, ...)
replace_na(x, method = "linear", value = NULL, min_gap = 1, max_gap = Inf, ...)
x |
A vector containing numeric data with missing values (NAs) |
method |
Character string specifying the replacement method:
|
value |
Numeric value for replacement when method = "value" |
min_gap |
Integer specifying minimum gap size to interpolate/fill. Gaps shorter than this will be left as NA. Default is 1 (handle all gaps). |
max_gap |
Integer or Inf specifying maximum gap size to interpolate/fill. Gaps longer than this will be left as NA. Default is Inf (no upper limit). |
... |
Additional parameters passed to the underlying interpolation functions |
A numeric vector with NA values replaced according to the specified method where gap length criteria are met.
replace_na_linear() for linear interpolation details
replace_na_spline() for spline interpolation details
replace_na_stine() for Stineman interpolation details
replace_na_locf() for last observation carried forward details
replace_na_value() for constant value replacement details
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) # Different methods replace_na(x, method = "linear") replace_na(x, method = "spline") replace_na(x, method = "stine") replace_na(x, method = "locf") replace_na(x, method = "value", value = 0) # With gap constraints replace_na(x, method = "linear", min_gap = 2) replace_na(x, method = "spline", max_gap = 2) replace_na(x, method = "linear", min_gap = 2, max_gap = 3) ## End(Not run)
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) # Different methods replace_na(x, method = "linear") replace_na(x, method = "spline") replace_na(x, method = "stine") replace_na(x, method = "locf") replace_na(x, method = "value", value = 0) # With gap constraints replace_na(x, method = "linear", min_gap = 2) replace_na(x, method = "spline", max_gap = 2) replace_na(x, method = "linear", min_gap = 2, max_gap = 3) ## End(Not run)
Replaces missing values using linear interpolation, with control over both minimum and maximum gap sizes to interpolate.
replace_na_linear(x, min_gap = 1, max_gap = Inf, ...)
replace_na_linear(x, min_gap = 1, max_gap = Inf, ...)
x |
A vector containing numeric data with missing values (NAs) |
min_gap |
Integer specifying minimum gap size to interpolate. Gaps shorter than this will be left as NA. Default is 1 (interpolate all gaps). |
max_gap |
Integer or Inf specifying maximum gap size to interpolate. Gaps longer than this will be left as NA. Default is Inf (no upper limit). |
... |
Additional parameters passed to stats::approx |
The function applies both minimum and maximum gap criteria:
Gaps shorter than min_gap are left as NA
Gaps longer than max_gap are left as NA
Only gaps that meet both criteria are interpolated If both parameters are specified, min_gap must be less than or equal to max_gap.
A numeric vector with NA values replaced by interpolated values where gap length criteria are met.
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_linear(x) # interpolates all gaps replace_na_linear(x, min_gap = 2) # only gaps >= 2 replace_na_linear(x, max_gap = 2) # only gaps <= 2 replace_na_linear(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_linear(x) # interpolates all gaps replace_na_linear(x, min_gap = 2) # only gaps >= 2 replace_na_linear(x, max_gap = 2) # only gaps <= 2 replace_na_linear(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
Replaces missing values by carrying forward the last observed value, with control over both minimum and maximum gap sizes to fill.
replace_na_locf(x, min_gap = 1, max_gap = Inf)
replace_na_locf(x, min_gap = 1, max_gap = Inf)
x |
A vector containing numeric data with missing values (NAs) |
min_gap |
Integer specifying minimum gap size to fill. Gaps shorter than this will be left as NA. Default is 1 (fill all gaps). |
max_gap |
Integer or Inf specifying maximum gap size to fill. Gaps longer than this will be left as NA. Default is Inf (no upper limit). |
The function applies both minimum and maximum gap criteria:
Gaps shorter than min_gap are left as NA
Gaps longer than max_gap are left as NA
Only gaps that meet both criteria are filled If both parameters are specified, min_gap must be less than or equal to max_gap.
A numeric vector with NA values replaced by the last observed value where gap length criteria are met.
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_locf(x) # fills all gaps replace_na_locf(x, min_gap = 2) # only gaps >= 2 replace_na_locf(x, max_gap = 2) # only gaps <= 2 replace_na_locf(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_locf(x) # fills all gaps replace_na_locf(x, min_gap = 2) # only gaps >= 2 replace_na_locf(x, max_gap = 2) # only gaps <= 2 replace_na_locf(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
Replaces missing values using spline interpolation, with control over both minimum and maximum gap sizes to interpolate.
replace_na_spline(x, min_gap = 1, max_gap = Inf, ...)
replace_na_spline(x, min_gap = 1, max_gap = Inf, ...)
x |
A vector containing numeric data with missing values (NAs) |
min_gap |
Integer specifying minimum gap size to interpolate. Gaps shorter than this will be left as NA. Default is 1 (interpolate all gaps). |
max_gap |
Integer or Inf specifying maximum gap size to interpolate. Gaps longer than this will be left as NA. Default is Inf (no upper limit). |
... |
Additional parameters passed to stats::spline |
The function applies both minimum and maximum gap criteria:
Gaps shorter than min_gap are left as NA
Gaps longer than max_gap are left as NA
Only gaps that meet both criteria are interpolated If both parameters are specified, min_gap must be less than or equal to max_gap.
A numeric vector with NA values replaced by interpolated values where gap length criteria are met.
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_spline(x) # interpolates all gaps replace_na_spline(x, min_gap = 2) # only gaps >= 2 replace_na_spline(x, max_gap = 2) # only gaps <= 2 replace_na_spline(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_spline(x) # interpolates all gaps replace_na_spline(x, min_gap = 2) # only gaps >= 2 replace_na_spline(x, max_gap = 2) # only gaps <= 2 replace_na_spline(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
Replaces missing values using Stineman interpolation, with control over both minimum and maximum gap sizes to interpolate.
replace_na_stine(x, min_gap = 1, max_gap = Inf, ...)
replace_na_stine(x, min_gap = 1, max_gap = Inf, ...)
x |
A vector containing numeric data with missing values (NAs) |
min_gap |
Integer specifying minimum gap size to interpolate. Gaps shorter than this will be left as NA. Default is 1 (interpolate all gaps). |
max_gap |
Integer or Inf specifying maximum gap size to interpolate. Gaps longer than this will be left as NA. Default is Inf (no upper limit). |
... |
Additional parameters passed to stinepack::stinterp |
The function applies both minimum and maximum gap criteria:
Gaps shorter than min_gap are left as NA
Gaps longer than max_gap are left as NA
Only gaps that meet both criteria are interpolated If both parameters are specified, min_gap must be less than or equal to max_gap.
Stineman interpolation is particularly good at preserving the shape of the data and avoiding overshooting.
A numeric vector with NA values replaced by interpolated values where gap length criteria are met.
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_stine(x) # interpolates all gaps replace_na_stine(x, min_gap = 2) # only gaps >= 2 replace_na_stine(x, max_gap = 2) # only gaps <= 2 replace_na_stine(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_stine(x) # interpolates all gaps replace_na_stine(x, min_gap = 2) # only gaps >= 2 replace_na_stine(x, max_gap = 2) # only gaps <= 2 replace_na_stine(x, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
Replaces missing values with a specified constant value, with control over both minimum and maximum gap sizes to fill.
replace_na_value(x, value, min_gap = 1, max_gap = Inf)
replace_na_value(x, value, min_gap = 1, max_gap = Inf)
x |
A vector containing numeric data with missing values (NAs) |
value |
Numeric value to use for replacement |
min_gap |
Integer specifying minimum gap size to fill. Gaps shorter than this will be left as NA. Default is 1 (fill all gaps). |
max_gap |
Integer or Inf specifying maximum gap size to fill. Gaps longer than this will be left as NA. Default is Inf (no upper limit). |
The function applies both minimum and maximum gap criteria:
Gaps shorter than min_gap are left as NA
Gaps longer than max_gap are left as NA
Only gaps that meet both criteria are filled If both parameters are specified, min_gap must be less than or equal to max_gap.
A numeric vector with NA values replaced by the specified value where gap length criteria are met.
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_value(x, value = 0) # fills all gaps with 0 replace_na_value(x, value = -1, min_gap = 2) # only gaps >= 2 replace_na_value(x, value = -999, max_gap = 2) # only gaps <= 2 replace_na_value(x, value = 0, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
## Not run: x <- c(1, NA, NA, 4, 5, NA, NA, NA, 9) replace_na_value(x, value = 0) # fills all gaps with 0 replace_na_value(x, value = -1, min_gap = 2) # only gaps >= 2 replace_na_value(x, value = -999, max_gap = 2) # only gaps <= 2 replace_na_value(x, value = 0, min_gap = 2, max_gap = 3) # gaps between 2 and 3 ## End(Not run)
Rotates coordinates in Cartesian space based on two alignment points. The rotation aligns these points either with the 0-degree axis (parallel) or makes them perpendicular to it. This is particularly useful for creating egocentric reference frames or standardizing orientation across multiple frames or individuals.
rotate_coords(data, alignment_points, align_perpendicular = FALSE)
rotate_coords(data, alignment_points, align_perpendicular = FALSE)
data |
movement data frame with columns: time, individual, keypoint, x, y |
alignment_points |
character vector of length 2 specifying the keypoint names to use for alignment |
align_perpendicular |
logical; if TRUE, alignment_points will be rotated to be perpendicular to the 0-degree axis. If FALSE (default), alignment_points will be rotated to align with the 0-degree axis |
The function processes each individual separately and maintains their independence. For each time point, it:
Calculates the vector between the alignment points
Determines the current angle of this vector
Rotates all points to achieve the desired alignment
movement data frame with rotated coordinates
This function modifies time values in a dataset to match a new framerate and updates the corresponding metadata. It handles both integer and non-integer time values, ensuring time series start from zero when appropriate.
set_framerate(data, framerate, old_framerate = 1)
set_framerate(data, framerate, old_framerate = 1)
data |
A data frame or tibble containing the time series data |
framerate |
The new target framerate to convert to |
old_framerate |
The original framerate of the data (defaults to 1) |
The function calculates a scaling factor based on the ratio of old to new framerates. For integer time values, it ensures they start from zero. All time values are then scaled proportionally to maintain relative temporal relationships.
A modified data frame with adjusted time values and updated metadata
data <- data.frame(time = 0:10, value = rnorm(11)) result <- set_framerate(data, framerate = 60, old_framerate = 30)
data <- data.frame(time = 0:10, value = rnorm(11)) result <- set_framerate(data, framerate = 60, old_framerate = 30)
This function replaces any existing individual identifiers with a new specified identifier across all rows in the dataset. The data is first ungrouped to ensure consistent application of the new identifier.
set_individual(data, individual)
set_individual(data, individual)
data |
A data frame or tibble containing the data to be modified |
individual |
The new identifier value to be assigned to all rows |
A modified data frame with the new individual identifier applied as a factor
data <- data.frame(time = 1:5, value = rnorm(5)) result <- set_individual(data, "subject_A")
data <- data.frame(time = 1:5, value = rnorm(5)) result <- set_individual(data, "subject_A")
set_start_datetime(data, start_datetime)
set_start_datetime(data, start_datetime)
data |
movement data frame |
start_datetime |
starting datetime. provided either as POSIXt, or as a string that can be parsed by the anytime package. |
movement data frame with starting datetime in metadata
Adds a unique identifier (UUID) to the data frames metadata
set_uuid(data, length = 20)
set_uuid(data, length = 20)
data |
movement data frame |
length |
length of identifier. (default: 20) |
data frame with the "uuid" metadata field filled out
Transforms Cartesian coordinates into an egocentric reference frame through a two-step process: translation followed by rotation. First translates all coordinates relative to a reference keypoint, then rotates the coordinate system based on specified alignment points.
transform_to_egocentric( data, to_keypoint, alignment_points, align_perpendicular = FALSE )
transform_to_egocentric( data, to_keypoint, alignment_points, align_perpendicular = FALSE )
data |
movement data frame with columns: time, individual, keypoint, x, y |
to_keypoint |
character; keypoint to use as the new origin |
alignment_points |
character vector of length 2 specifying the keypoint names to use for alignment |
align_perpendicular |
logical; if TRUE, alignment_points will be rotated to be perpendicular to the 0-degree axis. If FALSE (default), alignment_points will be rotated to align with the 0-degree axis |
This function combines translation and rotation to create an egocentric reference frame. It:
Translates all coordinates relative to the specified keypoint (to_keypoint)
Rotates the coordinate system based on the alignment points
The translation makes the reference keypoint the new origin (0,0), while the rotation standardizes the orientation. This is particularly useful for:
Creating egocentric reference frames
Standardizing pose data across frames or individuals
Analyzing relative motion patterns
movement data frame in egocentric reference frame
## Not run: # Transform coordinates to make nose the origin and align body axis transformed_data <- transform_to_egocentric( data, to_keypoint = "nose", alignment_points = c("nose", "tail"), align_perpendicular = FALSE ) # Transform to make nose origin and ears perpendicular to forward axis transformed_data <- transform_to_egocentric( data, to_keypoint = "nose", alignment_points = c("ear_left", "ear_right"), align_perpendicular = TRUE ) ## End(Not run)
## Not run: # Transform coordinates to make nose the origin and align body axis transformed_data <- transform_to_egocentric( data, to_keypoint = "nose", alignment_points = c("nose", "tail"), align_perpendicular = FALSE ) # Transform to make nose origin and ears perpendicular to forward axis transformed_data <- transform_to_egocentric( data, to_keypoint = "nose", alignment_points = c("ear_left", "ear_right"), align_perpendicular = TRUE ) ## End(Not run)
Translates coordinates in Cartesian space. Takes either a single point
(to_x
and to_y
), a vector with the same length as the time dimension or a
keypoint (to_keypoint
), which can be used to transform the data into an
egocentric reference frame.
translate_coords(data, to_x = 0, to_y = 0, to_z = NULL, to_keypoint = NULL)
translate_coords(data, to_x = 0, to_y = 0, to_z = NULL, to_keypoint = NULL)
data |
movement data frame with columns: time, individual, keypoint, x, y |
to_x |
x coordinates; either a single value or a time-length vector |
to_y |
y coordinates; either a single value or a time-length vector |
to_z |
z coordinates (only if 3D); either a single value or a time-length vector |
to_keypoint |
all other coordinates becomes relative to this keypoint |
movement data frame with translated coordinates