The role of the task has received special attention in visual-cognition research because it can provide causal explanations of goal-directed eye-movement responses. The dependency between visual attention and task suggests that eye movements can be used to classify the task being performed. A recent study by Greene, Liu, and Wolfe (2012), however, fails to achieve accurate classification of visual tasks based on eye-movement features. In the present study, we hypothesize that tasks can be successfully classified when they differ with respect to the involvement of other cognitive domains, such as language processing. We extract the eye-movement features used by Greene et al. as well as additional features from the data of three different tasks: visual search, object naming, and scene description. First, we demonstrated that eye-movement responses make it possible to characterize the goals of these tasks. Then, we trained three different types of classifiers and predicted the task participants performed with an accuracy well above chance (a maximum of 88% for visual search). An analysis of the relative importance of featuresfor classification accuracy reveals that just one feature, i.e., initiation time, is sufficient for above-chance performance (a maximum of 79% accuracy in object naming). Crucially, this feature is independent of task duration, which differs systematically across the three tasks we investigated. Overall, the best task classification performance was obtained with a set of seven features that included both spatial information (e.g., entropy of attention allocation) and temporal components (e.g., total fixation on objects) of the eye-movement record. This result confirms the task-dependent allocation of visual attention and extends previous work by showing that task classification is possible when tasks differ in the cognitive processes involved (purely visual tasks such as search vs. communicative tasks such as scene description).
library(glmnet)
library(nnet)
library(kernlab)
library(e1071)
setwd("E:/R/replicability/jov2014/")
source("MyCenter.R")
source("F-score.R")
data = databack = read.table("EMstatistics.txt", header = TRUE)
## saving dataset twice, we need databack later
## show the names of the different features extracted
colnames(data)[4:ncol(data)]
## [1] "Initiation.Time" "Mean.Gaze"
## [3] "Nr..Fixation" "Mean.Saccade"
## [5] "Entropy" "Latency.Body"
## [7] "First.Fixation.Body" "Total.Fixation.Body"
## [9] "Mean.Gaze.Body" "Proportion.Fixation.Body"
## [11] "Latency.Object" "First.Fixation.Object"
## [13] "Total.Fixation.Object" "Mean.Gaze.Object"
## [15] "Proportion.Fixation.Object" "Latency.Face"
## [17] "First.Fixation.Face" "Total.Fixation.Face"
## [19] "Mean.Gaze.Face" "Proportion.Fixation.Face"
## [21] "Task" "Saliency"
## [23] "Turbulence" "Complexity"
## [25] "Clutter" "Feature.Congestion"
## [27] "Fixation.Area"
The full data set contains a total of 1,756 unique trials, which are divided across the three tasks as follows: search (580), description (600), naming (576). From the eye-movement data of each trial, we extract the seven features used by Greene et al. (2012): (a) number of fixations, (b) mean fixation duration, (c) mean saccade amplitude, and (d) percent of image covered by fixations assuming a 18 circle around the fixation position, proportion of dwell time on (e) faces, (f) bodies, and (g) objects. In addition to the Greene et al. (2012) feature set (GF), we extracted another set of 15 features (OF): (a)latency of first fixation; (b) first fixation duration; (c) mean fixation duration; (d) total gaze duration on faces, bodies, and objects (four features for three different regions, i.e., 12 features in total); (e) the initiation time, which is the time spent after scene onset before the first saccade is launched; (f) mean saliency at the fixation location; and (g) the entropy of the attentional landscape.
## take out only the features of Green,ea, 20 12
dvs = c(5, 6, 7, 13, 18, 23)
depM = data[, 24] ## the column where the task is
mat = data[, dvs]
mat = myCenter(mat)
## give the column names back
if ( is.matrix(mat) ) colnames(mat) = colnames(data)[dvs]
data = data.frame(mat, depM)
predictors = names(data) ## define the predictors
predictors = predictors[-length(predictors)]
We used the features described above to train three different types of classifiers: a multinomial log-linear neural networks model (nnet
) package, Venables & Ripley, 2002); a generalized linear model with penalized maximum likelihood (glmnet
) and a support vector machine (ksvm
) using the default setting for all three different models.
The classifiers were trained and tested using ten-fold cross-validation, in which the model is trained on 90% of the data and then tested in the remaining 10%; this process is repeated 10 times so that each fold functions as test data exactly once. This is a way to avoid overfitting the training data while still making maximal use of a small data set.
percent = 10
total = test.ind = seq(1,nrow(data), 1)
nr.t = round( length(total)*percent/100 ) ## nr. of testing instances
fold = 10 ## number of folds
train.folds = test.folds = list()
keep = vector()
for (k in 1:fold){ ## create the 10 folds for cross-validation
if (length(test.ind) >= nr.t) {
tt = sample(test.ind, nr.t) ## may not be possible to create
## 10 identical folds
keep = c(keep, tt)
## print ( c( length(tt), length(unique(tt) ) ) )
} else { ## last fold put all remaining indeces in
tt = test.ind
}
ts = setdiff(total, tt)
train.folds[[k]] = ts
test.folds[[k]] = tt
rm = vector()
for (r in tt){ rm = c(rm, which(test.ind == r)) }
test.ind = test.ind[-rm]
# print( length(test.ind) )
}
## initialize the mis-classification matrices, where instances
## of inaccurate classifications are stored
classes = unique(data$depM)
FglmMis = FmmMis = FsvmMis = matrix(0, nrow = length(classes),
ncol = length( classes ) )
colnames(FglmMis) = rownames(FglmMis) = colnames(FmmMis) = rownames(FmmMis) = colnames(FsvmMis) = rownames(FsvmMis) = classes
We measured the accuracy of the classifiers using F-score, which is the geometric mean of precision and recall and is defined as F-score = 2(PR)/(P + R). Precision (P) is the number of correctly classified instances over the total number of instances labeled as belonging to the class, defined as tp/(tp + fp). Here, tp is the number of true positives (i.e., instances of the class that were correctly identified as such), and fp is the number of false positives (i.e., instances that were labeled as members of the class even though they belong to a different class). Recall (R) is the number of correctly classified instances over the total number of instances in that class, defined as tp/(tp + fn), where fn is the number of false negatives, i.e., the number of instances that were labeled as nonmembers of the class even though they belong to the class.
k = 1
ans = vector() ## store the results in it
for (k in 1:fold){ ## for all the folds
print(k)
build = vector()
train.ml = train = data[train.folds[[k]],]
test = data[test.folds[[k]], ]
ind.var = which(names(test) == "depM") ## get the col index of depM
test.m = as.character(test[, ind.var])
predictors = names( train.ml[1:(ncol(train.ml)-1)] )
formula = as.formula ( paste("depM", "~", paste(predictors, collapse = "+"), sep = "") )
## Parameters are kept at standard values
## room for optimizing them
## run multinomial model (MM)
train.ml$depM = relevel(train.ml$depM, ref = "Prod")
mmodel = multinom(formula, train.ml, maxit = 150, Hess = FALSE)
mmtest = predict(mmodel, newdata = test, "probs")
## run support vector machine (SVM)
SVMmodel = ksvm(formula, data = train, kernel = "rbfdot")
SVMpredict = predict(SVMmodel, test)
## run Lasso regression (LASSO)
## LASSO does not cope with NAs, so, first find them in both training and testing
## if some are found, remove the row containing them.
y = train$depM
glmdata = as.matrix( train[,1:(ncol(train)-1)] )
glmtest = as.matrix( test[, 1:(ncol(test)-1)] )
ind = which(is.na(glmdata == TRUE), arr.ind = TRUE)
if(length(ind) > 0){ glmdata = glmdata[-ind[,1],]; y = y[-ind[,1]]}
ind = which(is.na(glmtest == TRUE), arr.ind = TRUE)
if(length(ind) > 0){ glmtest = glmtest[-ind[,1],]}
glmnetmodel = glmnet(glmdata, y, family = "multinomial", alpha = 0,
standardize = F)
predglmnet = predict(glmnetmodel, glmtest, s = 1, type = "class" )
## calculate F-score and percentage accuracy
classes = vector()
for (r in 1:nrow(mmtest)){
row.prob = which(mmtest[r,] == max(mmtest[r, ]) )
classes = c(classes, names(row.prob))}
resd = Fscore(classes, test.m)
fscr = resd[[1]][, c(1,4)]
Fmm = cbind(fscr, rep("MM", nrow(fscr)))
FmmMis = (FmmMis + resd[[3]])/2
resd = Fscore(SVMpredict, test.m)
fscr = resd[[1]][, c(1,4)]
Fsvm = cbind(fscr, rep("SVM", nrow(fscr)))
FsvmMis = (FsvmMis + resd[[3]])/2
resd = Fscore(predglmnet, test.m)
fscr = resd[[1]][, c(1,4)]
Fglmnet = cbind(fscr, rep("LASSO", nrow(fscr)))
FglmMis = (FglmMis + resd[[3]])/2
ans = rbind(ans,
rbind(Fmm, Fglmnet, Fsvm, deparse.level = 0),
deparse.level = 0)
}
## [1] 1
## # weights: 24 (14 variable)
## initial value 1733.610192
## iter 10 value 1162.991450
## iter 20 value 953.023085
## final value 953.022364
## converged
## [1] 2
## # weights: 24 (14 variable)
## initial value 1733.610192
## iter 10 value 1137.091917
## iter 20 value 948.257636
## final value 948.257418
## converged
## [1] 3
## # weights: 24 (14 variable)
## initial value 1733.610192
## iter 10 value 1139.579863
## iter 20 value 949.112855
## final value 949.112471
## converged
## [1] 4
## # weights: 24 (14 variable)
## initial value 1734.708804
## iter 10 value 1139.508123
## iter 20 value 954.992558
## final value 954.992484
## converged
## [1] 5
## # weights: 24 (14 variable)
## initial value 1733.610192
## iter 10 value 1142.935158
## iter 20 value 965.148661
## final value 965.148618
## converged
## [1] 6
## # weights: 24 (14 variable)
## initial value 1734.708804
## iter 10 value 1133.487055
## iter 20 value 934.207520
## iter 30 value 934.202819
## final value 934.200371
## converged
## [1] 7
## # weights: 24 (14 variable)
## initial value 1733.610192
## iter 10 value 1135.346320
## iter 20 value 938.919336
## final value 938.919193
## converged
## [1] 8
## # weights: 24 (14 variable)
## initial value 1733.610192
## iter 10 value 1144.234681
## iter 20 value 952.920489
## iter 30 value 952.916769
## final value 952.915288
## converged
## [1] 9
## # weights: 24 (14 variable)
## initial value 1733.610192
## iter 10 value 1132.412092
## iter 20 value 941.886780
## iter 30 value 941.884266
## final value 941.882995
## converged
## [1] 10
## # weights: 24 (14 variable)
## initial value 1738.004641
## iter 10 value 1135.677720
## iter 20 value 943.788200
## iter 30 value 943.785425
## final value 943.784765
## converged
colnames(ans) = c("task", "fscore", "classifier")
Show the overall mean accuracy for the different classifiers, and divided by task
ans = as.data.frame(ans)
ans$fscore = as.numeric(as.matrix(ans$fscore))
str(ans)
## 'data.frame': 90 obs. of 3 variables:
## $ task : Factor w/ 3 levels "Naming","Prod",..: 2 1 3 2 1 3 2 1 3 1 ...
## $ fscore : num 0.63 0.739 0.872 0.636 0.739 ...
## $ classifier: Factor w/ 3 levels "LASSO","MM","SVM": 2 2 2 1 1 1 3 3 3 2 ...
## overall
with(ans, tapply(fscore, list(classifier), mean))
## LASSO MM SVM
## 0.6689440 0.6842990 0.6917908
## divided by task
with(ans, tapply(fscore, list(classifier, task), mean))
## Naming Prod Search
## LASSO 0.6869928 0.5745407 0.7452985
## MM 0.6989330 0.5980980 0.7558660
## SVM 0.7082063 0.6025014 0.7646648
We demonstrate that task can be accurately classified using eye-movement features.
We test that our result is robust to self-termination (i.e., not using features related to the amount of viewing time), by training a SVM classifier on two features: initiation time and mean saccade amplitude. The first feature reflects the cuing aspect of the task, the second one the spatial span covered as the scene is explored.
## make the dataset smaller on the range to improve plotting
colnames(databack)
## [1] "subj" "trial"
## [3] "cond" "Initiation.Time"
## [5] "Mean.Gaze" "Nr..Fixation"
## [7] "Mean.Saccade" "Entropy"
## [9] "Latency.Body" "First.Fixation.Body"
## [11] "Total.Fixation.Body" "Mean.Gaze.Body"
## [13] "Proportion.Fixation.Body" "Latency.Object"
## [15] "First.Fixation.Object" "Total.Fixation.Object"
## [17] "Mean.Gaze.Object" "Proportion.Fixation.Object"
## [19] "Latency.Face" "First.Fixation.Face"
## [21] "Total.Fixation.Face" "Mean.Gaze.Face"
## [23] "Proportion.Fixation.Face" "Task"
## [25] "Saliency" "Turbulence"
## [27] "Complexity" "Clutter"
## [29] "Feature.Congestion" "Fixation.Area"
databack = subset(databack, databack$Initiation.Time <= 600 & databack$Mean.Saccade <= 500)
Initiation = databack$Initiation.Time
Saccade = databack$Mean.Saccade
Task = as.character(databack$Task)
Task[which(Task == "Prod")] = "Description"
## just rename Prod, with something clearer
Task = as.factor(Task)
DF = data.frame(Initiation, Saccade, Task)
model = svm(Task ~ Saccade + Initiation, data = DF)
alphacol = c("#00CD0050","#FF000050","#0000FF50") ## transparency
plot(model, DF, svSymbol = 8, dataSymbol = 16, grid = 200,
symbolPalette = c("green4", "red", "blue"),
pch = c(16,15,3), col = alphacol)