The gating module comprises the most complex parts of our analysis workflow. It consists of two crucial processes:
- binning the GFP+ population into subpopulations
- split the MyHC+ and MyHC- population inside each GFP+ subpopulation
1) Binning the GFP+ population
The expression of GFP-labeled KRAS is not homogeneous within the GFP+ population. However, the magnitude of this expression may affect C2C12 cell differentiation. To analyze this effect, we need to increase the granularity of our analysis by splitting (or binning) the GFP+ population into smaller subpopulations.
We allow users to customize the number and size of these
subpopulations using a numeric inputs. Users can add a maximum of three
subpopulations (low, medium, high). We use the
gate_limits( )
function to fetch the users’s numeric inputs
and save them inside a list.
gate_limits <- reactive({
withProgress(message = "Calculating gate limits...", {
x <- list(low = if(!is.null(input$gfp_range_1)) list(input$gfp_range_1),
medium = if(!is.null(input$gfp_range_2)) list(input$gfp_range_2) else {list(NA)},
high = if(!is.null(input$gfp_range_3)) list(input$gfp_range_3) else {list(NA)})
x[!sapply(x, is.na)]})
})
The gates
variable saves the GFP-low, GFP-medium and
GFP-high rectangle gates created from these inputs.
gates <- reactive({
withProgress(message = "Creating gates...", {
# stop execution if gate_limits() is NULL
if (is.null(gate_limits())) return(NULL)
# if gate_limits() is not NULL, execute code below
filter_names <- c("GFP-low", "GFP-medium", "GFP-high")
y <- lapply(gate_limits(), function(x) {
names(x) <- r$ch_kras()
rectangleGate(x)})
for (i in seq_along(y)){
y[[i]]@filterId <- filter_names[[i]]
}
return(y)
})
})
Finally, we add the gates to the GatingSet and recompute the GatingSet to effectively split the GFP+ population into different subpopulations.
observe({
for (i in seq_along(gates())) {
gs_pop_add(r$gs, gates()[[i]], parent = "GFP+")
}
recompute(r$gs)
}) |> bindEvent(input$confirm_bins)
2) Splitting the MyHC+ and MyHC- population
In our differentiation assay, the MyHC protein is targeted using an eFluor660-labled antibody. This antibody emits fluorescence captured in the red channel of our flow cytometer. Moreover, expression of this protein correlated with cell differentiation. Thus, if cells differentiate, they express more MyHC and consequently higher fluorescence intensities are captured on our flow cytometer. A typical distribution of red fluorescence observed for our differentiation assay is illustrated below. The left peak (light blue) comprises cells that do not express high amounts of MyHC and are considered undifferentiated. We name this population MyHC-. The right peak (pale green) corresponds to cells expressing high amounts of MyHC and are considered differentiated. We name this population MyHC+.
library(ggplot2)
set.seed(2345)
data.frame(x = c(rnorm(10000, 10, 50),
rnorm(2500, 35000, 35000))) |>
ggplot(aes(x = x)) +
geom_histogram(bins = 200, fill = "red") +
annotate(geom = "rect", xmin = 1, xmax = 1000, ymin = 0, ymax = 270,
fill = "lightblue", alpha = 0.5) +
annotate(geom = "rect", xmin = 1000, xmax = 200000, ymin = 0, ymax = 150,
fill = "palegreen", alpha = 0.25) +
theme_bw() +
theme(axis.title = element_text(size = 15)) +
scale_x_log10() +
xlab("RED.R.HLin (RFU)")
While a clear cutoff between these two peaks can be distinguished, we
have consistently observed that the exact location of this cutoff
changes from sample to sample. Thus, manually gating the MyHC+ cells is
time-consuming. In FlowFate, we automated this process using
gate_flowclust_1d( )
, a data-driven gating function from
openCyto.
We wrapped the entire process into a custom function,
getData_splitPeak( )
. Conveniently, the
gate_flowclust_1d( )
not only determines a cutoff between
two populations but also immediately creates a gate from this cutoff.
Thus, the output from getData_splitPeak( )
is a list of
gates. Sometimes biological samples are of poor quality and the GFP+
population or its subpopulation only count very few cells. If the number
of cells is too low, the algorithm behind
gate_flowclust_1d( )
breaks which produces an error. To
stop the entire app from crashig, we use tryCatch()
. For
samples where gate_flowclust_1d( )
is unable to establish a
cutoff between the MyHC- and MyHC+ population, the resulting gate
evaluates to NULL. We remove these NULLs with another custom function,
remove_null_from_list( )
.
getData_splitPeak <- function(r, gs, bin, filter_name) {
## fetch the data we need. bin = name of the gate to extract data from, used to access "GFP-low", "GFP-medium" and "GFP-high" gates
x <- gs_pop_get_data(gs, bin) |> cytoset_to_flowSet()
## x is a flowSet. To each flowFrame, we apply the function tryCatch(). If no error, returns a gate that cuts between myosin peaks. If error, the respective gate evaluates to NULL
x <- fsApply(x, function(fr, filterId = filter_name) {
return(tryCatch(gate_flowclust_1d(fr, params = r$ch_myhc(), K = 2, cutpoint_method = "min_density", filterId = filterId), error = function(e) NULL))
})
## we don't want the gates that evaluate to NULL, so we remove them using the custom function remove_null_from_list()
return(remove_null_from_list(x))
}
remove_null_from_list <- function(data) {
## capture the indices of the the elements that are NULL, assign indices to test variable
test <- which(sapply(data, is.null))
## when no NULLs (test variable of size 0), return the input data
if (sum(test) == 0) {
return(data)
}
## when NULLs identified (test variable of size != 0), return the input data WITHOUT the positions that were NULL
else {
return(data[-test])
}
}
Note that everything getData_splitPeak( )
is supposed to
do is to (1) look at the fluorescent intensity distribution in the red
channel of every dataset and (2) find a threshold between the MyHC- and
MyHC+ peaks for for each dataset and create a gate from it. Remember
that we have previously split the GFP+ population into one, two or three
subpopulations, depending on the user’s choices. Thus, for each dataset
we also need to establish one, two or three MyHC+ gates. This is why we
set the bin
argument getData_splitPeak( )
to
either GFP-low
, GFP-medium
or
GFP-high
. Below, we save the MyHC+ gates for each of these
subpopulations (or bins) and for all the samples in three different
variables.
gfp_low_myo_high <- reactive({
req(r$gs)
withProgress(message = "Retrieving data from GFP-low bin...", {
if (any(str_detect(gs_get_pop_paths(r$gs), "GFP-low"))) { #make sure GFP-low exists!
getData_splitPeak(r = r, gs = r$gs, bin = "GFP-low", filter_name = "MyHC+")
}})
})
gfp_medium_myo_high <- reactive({
req(r$gs)
withProgress(message = "Retrieving data from GFP-low bin...", {
if (any(str_detect(gs_get_pop_paths(r$gs), "GFP-medium"))) { #make sure GFP-medium exists!
getData_splitPeak(r = r, gs = r$gs, bin = "GFP-medium", filter_name = "MyHC+")
}})
})
gfp_high_myo_high <- reactive({
req(r$gs)
withProgress(message = "Retrieving data from GFP-high bin...", {
if (any(str_detect(gs_get_pop_paths(r$gs), "GFP-high"))) { #make sure GFP-high exists!
getData_splitPeak(r = r, gs = r$gs, bin = "GFP-high", filter_name = "MyHC+")
}})
})
Once all these cutoffs have been determined and MyHC+ gates created,
the actual gating is achieved by adding the gates to the GatingSet and
recomputing the GatingSet. The function below first checks if one of the
gate lists (gfp_low_myo_high
,
gfp_medium_myo_high
, gfp_high_myo_high
) is
NULL. If not, the gate is added using the custom
add_gate( )
function.
observe({
if (!is.null(gfp_low_myo_high())) {add_gate(r = r, gs = r$gs, gate = gfp_low_myo_high(), parent = "GFP-low")}
if (!is.null(gfp_medium_myo_high())) {add_gate(r = r, gs = r$gs, gate = gfp_medium_myo_high(), parent = "GFP-medium")}
if (!is.null(gfp_high_myo_high())) {add_gate(r = r, gs = r$gs, gate = gfp_high_myo_high(), parent = "GFP-high")}
observer.state.split(TRUE)
updateTabsetPanel(inputId = "tabset-mainPanel", selected = "Plot")
}) |> bindEvent(input$split)
add_gate( )
adds a gate to the matching GatingSet sample
and recomputes the GatingSet.
add_gate <- function(r, gs, gate, parent) {
gate_names <- names(gate)
gatingSet_names <- sampleNames(gs)
for (i in seq_along(gatingSet_names)) {
if (gatingSet_names[i] %in% gate_names) {
x <- which(gatingSet_names[i] == gate_names) # gives the index of the name in gate_names that matches gatingSet_names
gs_pop_add(r$gs[[i]], gate = gate[[x]], parent = parent)
}
}
recompute(r$gs)
}
The data analysis is now complete. Below is a recap of what has been achieved:
- FCS files were read into R
- The GFP signal captured from control samples was used to establish a threshold defining a GFP+ population of transfected cells.
- The GFP+ population was split into distinct subpopulations based on a user-defined intensity interval. This allows us to investigate the effect of the extent of KRAS expression on C2C12 differentiation.
- Inside each GFP subpopulation, we gated the differentiated cells using a data-driven, peak-splitting algorithm.