sched: Fix rq->nr_uninterruptible update race
authorPeter Zijlstra <a.p.zijlstra@chello.nl>
Wed, 25 Jan 2012 10:50:51 +0000 (11:50 +0100)
committerIngo Molnar <mingo@elte.hu>
Thu, 26 Jan 2012 18:38:09 +0000 (19:38 +0100)
commit4ca9b72b71f10147bd21969c1805f5b2c4ca7b7b
tree05d6072a28029ffca57ba6c7d30036698348b59a
parent87f71ae2dd7471c1b4c94100be1f218e91dc64c3
sched: Fix rq->nr_uninterruptible update race

KOSAKI Motohiro noticed the following race:

 > CPU0                    CPU1
 > --------------------------------------------------------
 > deactivate_task()
 >                         task->state = TASK_UNINTERRUPTIBLE;
 > activate_task()
 >    rq->nr_uninterruptible--;
 >
 >                         schedule()
 >                           deactivate_task()
 >                             rq->nr_uninterruptible++;
 >

Kosaki-San's scenario is possible when CPU0 runs
__sched_setscheduler() against CPU1's current @task.

__sched_setscheduler() does a dequeue/enqueue in order to move
the task to its new queue (position) to reflect the newly provided
scheduling parameters. However it should be completely invariant to
nr_uninterruptible accounting, sched_setscheduler() doesn't affect
readyness to run, merely policy on when to run.

So convert the inappropriate activate/deactivate_task usage to
enqueue/dequeue_task, which avoids the nr_uninterruptible accounting.

Also convert the two other sites: __migrate_task() and
normalize_task() that still use activate/deactivate_task. These sites
aren't really a problem since __migrate_task() will only be called on
non-running task (and therefore are immume to the described problem)
and normalize_task() isn't ever used on regular systems.

Also remove the comments from activate/deactivate_task since they're
misleading at best.

Reported-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1327486224.2614.45.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel/sched/core.c