My functional data work is broadly focused on methods for functional registration, regression, and dimension reduction. A subset of this work is summarized below.
I developed a method to register exponential family
functional data, which was published in Biometrics
in 2019. Our methods are implemented in the registr
package, an R
package on CRAN and GitHub you can download
here.
If each curve represents an observation for one subject, then curve registration refers to warping the domain (often time) of a set of curves so that the main features of each curve are aligned across subjects. An example with simulated data is below.
Warping functions (center) are applied to the unregistered curves (left) to get registered curves (right). The exponential family part comes in because not all functional observations are Gaussian or continuous. Our approach to registration allows alignment of data that is discrete as well as continuous.
Below our method is applied to accelerometer data where each subject’s binary activity (active vs. not active) is collected at every minute over 24 hours. Periods of inactivity are colored in light blue and periods of activity are colored in dark blue. Applying our registration technique to the activity data pulls out patterns in physical activity.
Since 2019 I have been developing tools and methodology for analysis of multiplex single cell imaging (MI) data, an emerging image analysis technique that has revolutionized researchers’ ability to study tissue structure and function at a cellular level while preserving the original spatial context. Single cell refers to individual cell resolution, multiplex refers to multiple types of proteins in the tissue that are tagged, allowing for identification of nuanced cell subtypes, function, and tissue regions, and imaging indicates that biological spatial relationships in the tissue are preserved. An example image from a paper I worked on studying tumor-immune relationships in non-small cell lung carcinoma is below.
This image is a multichannel TIFF files, where each channel represents signal intensity of a particular protein. The left panel is the composite image combining all 8 channels collected for this dataset.The next three images are three individual channels from the same image, shown, from left to right, the nucleus (DAPI) channel, the tumor (cytokeratin) channel, and the immune (CD8) channel.
Multiplex imaging data has a complex data acquisition, image processing, and analysis pipeline with unique challenges that can be addressed by statisticians. First, tissue is placed on a slide and labeled with multiple (multiplex) antibody markers. These images are then segmented to identify tissue compartments (e.g. tumor vs. stroma), cells, and nuclei. Marker intensities across samples must be normalized and batch corrected to account for non-biological variability. Then cells are phenotyped, or given biological label(s) based on their mean marker intensities. Finally, the datasets are undergo compositional and spatial analysis, potentially in combination with patient level outcomes such as survival time, disease subtype, or cancer stage. To address these challenges, I have developed methods and software for normalizing multiplex imaging data (1,2), for analyzing continuous marker expression using density-based variation analysis (3), for spatial analysis using scalar spatial summaries (4), and for spatial analysis using techniques drawn from functional data analysis (5, 6).