git.alex.org.uk - linux-flexiantxendom0-3.2.10.git/log

ACPI: Fix D3hot v D3cold confusion

Before this patch, ACPI_STATE_D3 incorrectly referenced D3hot
in some places, but D3cold in other places.

After this patch, ACPI_STATE_D3 always means ACPI_STATE_D3_COLD;
and all references to D3hot use ACPI_STATE_D3_HOT.

ACPI's _PR3 method is used to enter both D3hot and D3cold states.
What distinguishes D3hot from D3cold is the presence _PR3
(Power Resources for D3hot) If these resources are all ON,
then the state is D3hot. If _PR3 is not present,
or all _PR0 resources for the devices are OFF,
then the state is D3cold.

This patch applies after Linux-3.4-rc1.
A future syntax cleanup may remove ACPI_STATE_D3
to emphasize that it always means ACPI_STATE_D3_COLD.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Aaron Lu <aaron.lu@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>

Merge branch 'd3' into release

Conflicts:
drivers/acpi/sleep.c

This was a text conflict between
a2ef5c4fd44ce3922435139393b89f2cce47f576
(ACPI: Move module parameter gts and bfs to sleep.c)

which added #include <linux/module.h>

and

b24e5098853653554baf6ec975b9e855f3d6e5c0
(ACPI, PCI: Move acpi_dev_run_wake() to ACPI core)

which added #include <linux/pm_runtime.h>

The resolution was to take them both.

Signed-off-by: Len Brown <len.brown@intel.com>

Merge branch 'apei' into release

Conflicts:
drivers/acpi/apei/apei-base.c

This was a conflict between

15afae604651d4e17652d2ffb56f5e36f991cfef
(CPI, APEI: Fix incorrect APEI register bit width check and usage)

and

653f4b538f66d37db560e0f56af08117136d29b7
(ACPICA: Expand OSL memory read/write interfaces to 64 bits)

The former changed a parameter in the call to acpi_os_read_memory64()
and the later replaced all calls to acpi_os_read_memory64()
with calls to acpi_os_read_memory().

Signed-off-by: Len Brown <len.brown@intel.com>

Merge branches 'acpica', 'bgrt', 'bz-11533', 'cpuidle', 'ec', 'hotplug', 'misc', 'red-hat-bz-727865', 'thermal', 'throttling', 'turbostat' and 'video' into release

Signed-off-by: Len Brown <len.brown@intel.com>

ACPI throttling: fix endian bug in acpi_read_throttling_status()

Using a u64 here creates an endian bug. We store a u32 number in the
top byte which is a larger number than intended on big endian systems.
There is no reason to use a 64 bit data type here, I guess it was just
an oversight.

I removed the initialization to zero as well. It's needed with a u64
but with a u32, the variable gets initialized properly inside the call
to acpi_os_read_port().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>

Disable MCP limit exceeded messages from Intel IPS driver

On a system on the thermal limit these are quite noisy and flood the logs.
Better would be a counter anyways. But given that we don't even have
anything for normal throttling this doesn't seem to be urgent either.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI video: Don't start video device until its associated input device has been allocated

Quoth Dmitry Torokhov:
In addition to bus notifier we do install device notifier explicitly
so it might fire up early. The easiest fox would be to move
acpi_video_bus_start_devices() after input_allocate_device() but
before input_register_device() - unregistered input devices can handle
input_event() calls just fine.

May fix crashes reported in:
https://bugzilla.kernel.org/show_bug.cgi?id=40672

Signed-off-by: Igor Murzov <e-mail@date.by>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI video: Harden video bus adding.

It is always better to check return values, so add some new checks and
correct existing ones.

v2: Be consistent and don't mix errors from -E* and AE_* namespaces.

Signed-off-by: Igor Murzov <e-mail@date.by>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Add support for exposing BGRT data

ACPI 5.0 adds the BGRT, a table that contains a pointer to the firmware
boot splash and associated metadata. This simple driver exposes it via
/sys/firmware/acpi in order to allow bootsplash applications to draw their
splash around the firmware image and reduce the number of jarring graphical
transitions during boot.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: export acpi_kobj

Drivers may wish to add entries to /sys/firmware/acpi, so export acpi_kobj
in order to let them do that.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Fix logic for removing mappings in 'acpi_unmap'

Make sure the removal of mappings uses the same logic that put the
mappings in place.

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>

CPER failed to handle generic error records with multiple sections

The function apei_estatus_print() and apei_estatus_check() forget to move ahead
the gdata pointer when dealing with multiple generic error data sections.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Clean redundant codes in scan.c

Clean the redundant codes of apci_bus_get_power_flags().

Signed-off-by: Alex He <alex.he@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()

The acpi_processor_cst_has_changed() function is invoked from a
CPU_ONLINE or CPU_DEAD function, which might well execute on CPU 0
even though the CPU being hotplugged is some other CPU. In addition,
acpi_processor_cst_has_changed() invokes smp_processor_id() without
protection, resulting in splats when onlining CPUs.

This commit therefore changes the smp_processor_id() to pr->id, as is
used elsewhere in the code, for example, in acpi_processor_add().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Yong Zhang <yong.zhang0@gmail.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: consistently use should_use_kmap()

... so that acpi_unmap()'s behavior gets in sync with acpi_map()'s.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Len Brown <len.brown@intel.com>

PNPACPI: Fix device ref leaking in acpi_pnp_match

During testing pci root bus removal, found some root bus bridge is not freed.
If booting with pnpacpi=off, those hostbridge could be freed without problem.
It turns out that some devices reference are not released during acpi_pnp_match.
that match should not hold one device ref during every calling.
Add pu_device calling before returning.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Fix use-after-free in acpi_map_lsapic

When processor is being hot-added to the system, acpi_map_lsapic invokes
ACPI _MAT method to find APIC ID and flags, verifies that returned structure
is indeed ACPI's local APIC structure, and that flags contain MADT_ENABLED
bit. Then saves APIC ID, frees structure - and accesses structure when
computing arguments for acpi_register_lapic call. Which sometime leads
to acpi_register_lapic call being made with second argument zero, failing
to bring processor online with error 'Unable to map lapic to logical cpu
number'.

As lapic->lapic_flags & ACPI_MADT_ENABLED was already confirmed to be non-zero
few lines above, we can just pass unconditional ACPI_MADT_ENABLED to the
acpi_register_lapic.

Signed-off-by: Petr Vandrovec <petr@vmware.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: processor_driver: add missing kfree

The function acpi_processor_add is stored in the ops.add field of a
acpi_driver structure.  This function is then called in
acpi_bus_driver_init.  On failure, this function clears the field
device->driver_data, but does not free its contents.  Thus the free has to
be done by the add function.  In acpi_processor_add, the corresponding
value is pr.  This value is currently freed on failure before storing it in
device->driver_data, but not after.  This free is added in the error
handling code at the end of the function.  The per_cpu variable
processors is also cleared so that it does not refer to a dangling pointer.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI, APEI: Fix incorrect APEI register bit width check and usage

The current code incorrectly assumes that
(1) the APEI register bit width is always 8, 16, 32, or 64 and
(2) the APEI register bit width is always equal to the APEI
    register access width.

ERST serialization instructions entries such as:

[030h 0048   1]                       Action : 00 [Begin Write Operation]
[031h 0049   1]                  Instruction : 03 [Write Register Value]
[032h 0050   1]        Flags (decoded below) : 01
                      Preserve Register Bits : 1
[033h 0051   1]                     Reserved : 00

[034h 0052  12]              Register Region : [Generic Address Structure]
[034h 0052   1]                     Space ID : 00 [SystemMemory]
[035h 0053   1]                    Bit Width : 03
[036h 0054   1]                   Bit Offset : 00
[037h 0055   1]         Encoded Access Width : 03 [DWord Access:32]
[038h 0056   8]                      Address : 000000007F2D7038

[040h 0064   8]                        Value : 0000000000000001
[048h 0072   8]                         Mask : 0000000000000007

break this assumption by yielding:
  [Firmware Bug]: APEI: Invalid bit width in GAR [0x7f2d7038/3/0]

I have found no ACPI specification requirements corresponding
with the above assumptions.  There is even a good example in
the Serialization Instruction Entries section (ACPI 4.0 section
17.4,1.2, ACPI 4.0a section 2.5.1.2, ACPI 5.0 section 18.5.1.2)
that mentions a serialization instruction with a bit range of
[6:2] which is 5 bits wide, _not_ 8, 16, 32, or 64 bits wide.

Compile and boot tested with 3.3.0-rc7 on a IBM HX5.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>

Update documentation for parameter *notrigger* in einj.txt

Add description of parameter notrigger in the einj.txt.
One can utilize this new parameter to do some SRAR injection
test. Pay attention, the operation is highly depended on the
BIOS implementation. If no proper BIOS supports it, even if
enabling this parameter, expected result will not happen.

v2:
Update the documentation suggested by Tony

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI, APEI, EINJ, new parameter to control trigger action

Some APEI firmware implementation will access injected address
specified in param1 to trigger the error when injecting memory
error, which means if one SRAR error is injected, the crash
always happens because it is executed in kernel context. This
new parameter can disable trigger action and control is taken
over by the user. In this way, an SRAR error can happen in user
context instead of crashing the system. This function is highly
depended on BIOS implementation so please ensure you know the
BIOS trigger procedure before you enable this switch.

v2:
notrigger should be created together with param1/param2

Tested-by: Tony Luck <tony.luck@lintel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI, APEI, EINJ, limit the range of einj_param

On the platforms with ACPI4.x support, parameter extension
is not always doable, which means only parameter extension
is enabled, einj_param can take effect.

v2->v1: stopping early in einj_get_parameter_address for einj_param

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI, APEI, Fix ERST header length check

This fixes a trivial copy & paste error in ERST header length check.
It's just for future safety because sizeof(struct acpi_table_einj)
equals to sizeof(struct acpi_table_erst) with current ACPI5.0
specification. It applies to v3.3-rc6.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

cpuidle: power_usage should be declared signed integer

power_usage is always assigned a negative value and should be declared
a signed integer

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>

idle, x86: Allow off-lined CPU to enter deeper C states

Currently when a CPU is off-lined it enters either MWAIT-based idle or,
if MWAIT is not desired or supported, HLT-based idle (which places the
processor in C1 state). This patch allows processors without MWAIT
support to stay in states deeper than C1.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Add CPU hotplug support for processor device objects

acpi_processor_install_hotplug_notify() registers processor objects to
receive ACPI CPU hotplug event notifications. This patch additionally
registers processor device objects (ACPI0007) to receive the notifications
as well.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI / PM: print physical addresses consistently with other parts of kernel

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Evaluate thermal trip points before reading temperature

An HP laptop (Pavilion G4-1016tx) has the following code in _TMP:

       Store (\_SB.PCI0.LPCB.EC0.RTMP, Local0)
       If (LGreaterEqual (Local0, S4TP))
       {
           Store (One, HTS4)
       }

S4TP is initialised at 0 and not programmed further until either _HOT or
_CRT is called. If we evaluate _TMP before the trip points then HTS4 will
always be set, causing the firmware to generate a message on boot
complaining that the system shut down because of overheating. The simplest
solution is just to reverse the checking of trip points and _TMP in thermal
init.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI, PCI: Move acpi_dev_run_wake() to ACPI core

acpi_dev_run_wake() is a generic function which can be used by
other subsystem too. Rename it to acpi_pm_device_run_wake, to be
consistent with acpi_pm_device_sleep_wake.

Then move it to ACPI core.

Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

cpuidle: remove unused 'governor_data' field

As far as I can see, this field is never used in the code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Len Brown <len.brown@intel.com>

cpuidle: remove useless array definition in cpuidle_structure

All the modules name are ro-data, it is never copied to the array.

eg.

static struct cpuidle_driver intel_idle_driver = {
.name = "intel_idle",
.owner = THIS_MODULE,
};

It safe to assign the pointer of this ro-data to a const char *.
By this way we save 12 bytes.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>

cpuidle: use the driver's state_count as default

If the state_count is not initialized for the device use
the driver's state count as the default. That will prevent
to add it manually in the cpuidle driver initialization
routine and will save us from duplicate line of code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Len Brown <len.brown@intel.com>

cpuidle: add a sysfs entry to disable specific C state for debug purpose.

Some C states of new CPU might be not good.  One reason is BIOS might
configure them incorrectly.  To help developers root cause it quickly, the
patch adds a new sysfs entry, so developers could disable specific C state
manually.

In addition, C state might have much impact on performance tuning, as it
takes much time to enter/exit C states, which might delay interrupt
processing.  With the new debug option, developers could check if a deep C
state could impact performance and how much impact it could cause.

Also add this option in Documentation/cpuidle/sysfs.txt.

[akpm@linux-foundation.org: check kstrtol return value]
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com>
Reviewed-and-Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Add interface to register/unregister device to/from power resources

Devices may share same list of power resources in _PR0, for example

Device(Dev0)
{
Name (_PR0, Package (0x01)
{
P0PR,
P1PR
})
}

Device(Dev1)
{
Name (_PR0, Package (0x01)
{
P0PR,
P1PR
}
}

Assume Dev0 and Dev1 were runtime suspended.
Then Dev0 is resumed first and it goes into D0 state.
But Dev1 is left in D0_Uninitialised state.

This is wrong. In this case, Dev1 must be resumed too.

In order to hand this case, each power resource maintains a list of
devices which relies on it.

When power resource is ON, it will check if the devices on its list
can be resumed. The device can only be resumed when all the power
resouces of its _PR0 are ON.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Introduce ACPI D3_COLD state support

If a device has _PR3, it means the device supports D3_COLD.
Add the ability to validate and enter D3_COLD state in ACPI.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Update to version 20120320

Version 20120320.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Object repair code: Support to add Package wrappers

Repair a common problem with objects that are defined to return
a variable-length Package of sub-objects. If there is only one
sub-object, some BIOS code mistakenly simply declares the single
object instead of a Package with one sub-object. This function
attempts to repair this error by wrapping a Package object around
the original object, creating the correct and expected Package
with one sub-object.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Make ACPI interrupt threaded

Some ACPI interrupt actions may need to wait, and it's easiest to
have a thread context for this. So turn the ACPI interrupt
into a threaded interrupt.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: ec: Do request_region outside WARN()

WARN() is not supposed to have side effects, so move the request_regions
outside.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools turbostat: harden against cpu online/offline

Sometimes users have turbostat running in interval mode
when they take processors offline/online.

Previously, turbostat would survive, but not gracefully.

Tighten up the error checking so turbostat notices
changesn sooner, and print just 1 line on change:

turbostat: re-initialized with num_cpus %d

Signed-off-by: Len Brown <len.brown@intel.com>

tools turbostat: reduce measurement overhead due to IPIs

turbostat uses /dev/cpu/*/msr interface to read MSRs.
For modern systems, it reads 10 MSR/CPU.  This can
be observed as 10 "Function Call Interrupts"
per CPU per sample added to /proc/interrupts.

This overhead is measurable on large idle systems,
and as Yoquan Song pointed out, it can even trick
cpuidle into thinking the system is busy.

Here turbostat re-schedules itself in-turn to each
CPU so that its MSR reads will always be local.
This replaces the 10 "Function Call Interrupts"
with a single "Rescheduling interrupt" per sample
per CPU.

On an idle 32-CPU system, this shifts some residency from
the shallow c1 state to the deeper c7 state:

# ./turbostat.old -s
   %c0  GHz  TSC    %c1    %c3    %c6    %c7   %pc2   %pc3   %pc6   %pc7
  0.27 1.29 2.29   0.95   0.02   0.00  98.77  20.23   0.00  77.41   0.00
  0.25 1.24 2.29   0.98   0.02   0.00  98.75  20.34   0.03  77.74   0.00
  0.27 1.22 2.29   0.54   0.00   0.00  99.18  20.64   0.00  77.70   0.00
  0.26 1.22 2.29   1.22   0.00   0.00  98.52  20.22   0.00  77.74   0.00
  0.26 1.38 2.29   0.78   0.02   0.00  98.95  20.51   0.05  77.56   0.00
^C
i# ./turbostat.new -s
   %c0  GHz  TSC    %c1    %c3    %c6    %c7   %pc2   %pc3   %pc6   %pc7
  0.27 1.20 2.29   0.24   0.01   0.00  99.49  20.58   0.00  78.20   0.00
  0.27 1.22 2.29   0.25   0.00   0.00  99.48  20.79   0.00  77.85   0.00
  0.27 1.20 2.29   0.25   0.02   0.00  99.46  20.71   0.03  77.89   0.00
  0.28 1.26 2.29   0.25   0.01   0.00  99.46  20.89   0.02  77.67   0.00
  0.27 1.20 2.29   0.24   0.01   0.00  99.48  20.65   0.00  78.04   0.00

cc: Youquan Song <youquan.song@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools turbostat: add summary option

turbostat -s
cuts down on the amount of output, per user request.

also treak some output whitespace and the man page.

Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Move module parameter gts and bfs to sleep.c

Move linux specific module parameter gts and bfs out of ACPICA core
code to sleep.c.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Sleep/Wake interfaces: optionally execute _GTS and _BFS

Enhanced the sleep/wake interfaces to optionally execute the
_GTS method (Going To Sleep), and the _BFS method (Back From
Sleep). Windows apparently does not execute these methods, and
therefore these methods are often untested. It has been seen on
some systems where the execution of these methods causes errors
and also prevents the machine from entering S5. It is therefore
suggested that host operating systems do not execute these methods
by default. In the future, perhaps these methods can be optionally
executed based on the age of the system and/or what is the newest
version of Windows that the BIOS asks for via _OSI.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Do cpufreq clamping for throttling per package v2

On Intel CPUs the processor typically uses the highest frequency
set by any logical CPU. When the system overheats
Linux first forces the frequency to the lowest available one
to lower the temperature.

However this was done only per logical CPU, which means all
logical CPUs in a package would need to go through this before
the frequency is actually lowered.

Worse this delay actually prevents real throttling, because
the real throttle code only proceeds when the lowest frequency
is already reached.

So when a throttle event happens force the lowest frequency
for all CPUs in the package where it happened. The per CPU
state is now kept per package, not per logical CPU. An alternative
would be to do it per cpufreq unit, but since we want to bring
down the temperature of the complete chip it's better
to do it for all.

In principle it may even make sense to do it for all CPUs,
but I kept it on the package for now.

With this change the frequency is actually lowered, which
in terms also allows real throttling to proceed.

I also removed an unnecessary per cpu variable initialization.

v2: Fix package mapping

Cc: <stable@vger.kernel.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Debugger: Add missing object info to namespace dump

Many namespace node types must have an attached object. For
these node types, print a message about a missing object during
a namespace dump.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Change exception code for invalid pathname in acpi_evaluate_object

Change the returned exception code from AE_BAD_PARAMETER to the
more appropriate AE_BAD_PATHNAME, when the input pathname to
evaluate object is invalid.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Clarify METHOD_NAME* defines for full-pathname cases

Changed the METHOD_NAME* defines that define a full pathname to
the method to METHOD_PATHNAME* in order to make it clear that
it is not a simple 4-character ACPI name. Used for the various
sleep/wake methods.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Update to version 20120215

Version 20120215.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Add table-driven dispatch for sleep/wake functions

Simplifies the code, especially the compile-time
ACPI_REDUCED_HARDWARE option.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Split sleep/wake functions into two files

The functions for the original/legacy sleep/wake registers are in
hwsleep.c, and the functions for the new extended FADT V5 sleep
registers are in hwesleep.c

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Distill multiple sleep method functions to a single function

Adds acpi_hw_execute_sleep_method to handle the various sleep methods
such as _GTS, _BFS, _WAK, and _SST. Removes the specialized
functions previously used for each of these methods.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Add acpi_os_physical_table_override interface

This interface allows the host to override a table via a
physical address, instead of the logical address required by
acpi_os_table_override. This simplifies the host implementation.
Initial implementation by Thomas Renninger. ACPICA implementation
creates a single function for table overrides that attempts both
a logical and a physical override.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: ACPI 5: Update debug output for new notify values

Add new notify values, add support for "hardware specific" notifies.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Expand OSL memory read/write interfaces to 64 bits

This change expands acpi_os_read_memory and acpi_os_write_memory to a
full 64 bits. This allows 64 bit transfers via the acpi_read and
acpi_write interfaces. Note: The internal acpi_hw_read and acpi_hw_write
interfaces remain at 32 bits, because 64 bits is not needed to
access the standard ACPI registers.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Support for custom ACPICA build for ACPI 5 reduced hardware

Add ACPI_REDUCED_HARDWARE flag that removes all hardware-related
code (about 10% code, 5% static data).

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Move ACPI timer prototypes to public acpixf file

These prototypes were incorrectly declared in achware.h.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: ACPI 5: Support for new FADT SleepStatus, SleepControl registers

Adds sleep and wake support for systems with these registers.
One new file, hwxfsleep.c

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Update _REV return value to 5

_REV returns the supported ACPI revision level, which is now 5.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ARM: davinci: Fix for cpuidle consolidation changes

The recent cpuidle consolidation changes erroneously omitted one
critical line of code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Tested-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal: Fix for setting the thermal zone mode to enable/disable

Basically without this patch changing the mode of thermal zone
is not possible as wrong string size is passed to strncmp.

Signed-off-by: Amit Daniel Kachhap <amit.kachhap@linaro.org>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal: spear13xx: checking for NULL instead of IS_ERR()

thermal_zone_device_register() never returns NULL, on error it returns and
ERR_PTR().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Viresh Kumar <viresh.kumar@st.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal/spear_thermal: replace readl/writel with lighter _relaxed variants

readl/writel versions for ARM contain memory barrier instruction for
synchronizing DMA buffers. These are not required at least on this
module. So use lighter _relaxed variants.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal: add support for thermal sensor present on SPEAr13xx machines

ST's SPEAr13xx machines are based on CortexA9 ARM processors. These
machines contain a thermal sensor for junction temperature monitoring.

This patch adds support for this thermal sensor in existing thermal
framework.

[akpm@linux-foundation.org: little code cleanup]
[akpm@linux-foundation.org: print the pointer correctly]
[viresh.kumar@st.com: thermal/spear_thermal: add compilation dependency on PLAT_SPEAR]
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal_sys: convert printks to pr_<level>

Use the current logging style.

Remove PREFIX, add pr_fmt, convert the printks. All dmesg output now
prefixed with "thermal_sys: ".

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal_sys: kernel style cleanups

Just a few tidies to make it more like most kernel sources.

A couple of long lines still remain.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal_sys: remove obfuscating used-once macros

These don't add any value as they are used only once and the surrounding
code uses similar variable.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

thermal_sys: remove unnecessary line continuations

Line continations are not necessary in function calls or statements.
Remove them.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

drivers/thermal/thermal_sys.c: fix build warning

With CONFIG_NET=n:

drivers/thermal/thermal_sys.c:63: warning: 'thermal_event_seqnum' defined but not used

Move 'thermal_event_seqnum' definition inside the '#ifdef CONFIG_NET'

[akpm@linux-foundation.org: make thermal_event_seqnum local to generate_netlink_event()]
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Durgadoss R <durgadoss.r@intel.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

SH: shmobile: Consolidate time keeping and irq enable

Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ARM: shmobile: Consolidate time keeping and irq enable

Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ARM: omap: Consolidate OMAP4 time keeping and irq enable

Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ARM: omap: Consolidate OMAP3 time keeping and irq enable

Use core cpuidle timekeeping and irqen wrapper and remove that
handling from this code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Jean Pihet <j-pihet@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ARM: davinci: Consolidate time keeping and irq enable

Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ARM: kirkwood: Consolidate time keeping and irq enable

Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ARM: at91: Consolidate time keeping and irq enable

Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

cpuidle: Add common time keeping and irq enabling

Make necessary changes to implement time keeping and irq enabling
in the core cpuidle code. This will allow the removal of these
functionalities from various platform cpuidle implementations whose
timekeeping and irq enabling follows the form in this common code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Tested-by: Jean Pihet <j-pihet@ti.com>
Tested-by: Amit Daniel <amit.kachhap@linaro.org>
Tested-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPICA: Fix regression in FADT revision checks

commit 64b3db22c04586997ab4be46dd5a5b99f8a2d390 (2.6.39),
"Remove use of unreliable FADT revision field" causes regression
for old P4 systems because now cst_control and other fields are
not reset to 0.

The effect is that acpi_processor_power_init will notice
cst_control != 0 and a write to CST_CNT register is performed
that should not happen. As result, the system oopses after the
"No _CST, giving up" message, sometimes in acpi_ns_internalize_name,
sometimes in acpi_ns_get_type, usually at random places. May be
during migration to CPU 1 in acpi_processor_get_throttling.

Every one of these settings help to avoid this problem:
- acpi=off
- processor.nocst=1
- maxcpus=1

The fix is to update acpi_gbl_FADT.header.length after
the original value is used to check for old revisions.

https://bugzilla.kernel.org/show_bug.cgi?id=42700
https://bugzilla.redhat.com/show_bug.cgi?id=727865

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: ignore FADT reset-reg-sup flag

we check that the address is non-zero later anyway.

https://bugzilla.kernel.org/show_bug.cgi?id=11533

Signed-off-by: Len Brown <len.brown@intel.com>

Linux 3.3

Don't limit non-nested epoll paths

Commit 28d82dc1c4ed ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).

The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.

This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking changes from David Miller:
"1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
     crashes, particularly during shutdown.  Reported by Dave Jones and
     fixed by Eric Dumazet.

  2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
     already freed the SKB, which causes crashes as to the caller this
     means requeue the packet.  Fixes from Eric Dumazet.

  3) usbnet driver doesn't allocate the right amount of headroom on
     fresh RX SKBs, fix from Eric Dumazet.

  4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
     abolutely should not take a reference to 'dev', this leads to
     leaks.  Fix from RonQing Li.

  5) Fix netfilter ctnetlink race between delete and timeout expiration.
     From Pablo Neira Ayuso.

  6) Revert SFQ change which causes regressions, specifically queueing
     to tail can lead to unavoidable flow starvation.  From Eric
     Dumazet.

  7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
     from Michal Schmidt."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netfilter: ctnetlink: fix race between delete and timeout expiration
  ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
  wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
  net/hyperv: fix erroneous NETDEV_TX_BUSY use
  net/usbnet: reserve headroom on rx skbs
  bnx2x: fix memory leak in bnx2x_init_firmware()
  bnx2x: fix a crash on corrupt firmware file
  sch_sfq: revert dont put new flow at the end of flows
  ipv6: fix icmp6_dst_alloc()

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools, x86: Build perf on older user-space as well
  perf tools: Use scnprintf where applicable
  perf tools: Incorrect use of snprintf results in SEGV

netfilter: ctnetlink: fix race between delete and timeout expiration

Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.

Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.

ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.

[ bug introduced in 96b52e61be1 (ipv6: mcast: RCU conversions) ]

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'akpm' (more patches from Andrew)

Merge some more email patches from Andrew Morton:
"A couple of nilfs fixes"

* emailed from Andrew Morton <akpm@linux-foundation.org>:
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
nilfs2: clamp ns_r_segments_percentage to [1, 99]

nilfs2: fix NULL pointer dereference in nilfs_load_super_block()

According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:

BUG: unable to handle kernel NULL pointer dereference at 00000048
IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
...
Call Trace:
  [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
  [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
  [<c0226636>] mount_fs+0x36/0x180
  [<c023d961>] vfs_kern_mount+0x51/0xa0
  [<c023ddae>] do_kern_mount+0x3e/0xe0
  [<c023f189>] do_mount+0x169/0x700
  [<c023fa9b>] sys_mount+0x6b/0xa0
  [<c04abd1f>] sysenter_do_call+0x12/0x28
Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc
CR2: 0000000000000048

This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.

This patch fixes it and eliminates the reported oops.

Reported-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Slicky Devil <slicky.dvl@gmail.com>
Cc: <stable@vger.kernel.org> [2.6.30+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nilfs2: clamp ns_r_segments_percentage to [1, 99]

ns_r_segments_percentage is read from the disk. Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation. This patch reports error when mounting such
bogus volumes.

Signed-off-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull maintainer update from James Morris:
"Please pull this patch which adds Serge as maintainer of the
  capabilities code, as discussed on lwn and the lsm list.

  New capabilities must be signed off by the maintainer, and new uses of
  any capabilities should at be cc'd to the maintainer."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  MAINTAINERS: Add Serge as maintainer of capabilities

Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming

Pull c6x bugfix from Mark Salter:
"Remove dead code from entry.S which causes a build failure when using
a newer assembler (v2.22 complains about it, v2.20 ignores it)."

* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
C6X: remove dead code from entry.S

afs: Remote abort can cause BUG in rxrpc code

When writing files to afs I sometimes hit a BUG:

kernel BUG at fs/afs/rxrpc.c:179!

With a backtrace of:

afs_free_call
afs_make_call
afs_fs_store_data
afs_vnode_store_data
afs_write_back_from_locked_page
afs_writepages_region
afs_writepages

The cause is:

ASSERT(skb_queue_empty(&call->rx_queue));

Looking at a tcpdump of the session the abort happens because we
are exceeding our disk quota:

rx abort fs reply store-data error diskquota exceeded (32)

So the abort error is valid. We hit the BUG because we haven't
freed all the resources for the call.

By freeing any skbs in call->rx_queue before calling afs_free_call
we avoid hitting leaking memory and avoid hitting the BUG.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

afs: Read of file returns EBADMSG

A read of a large file on an afs mount failed:

# cat junk.file > /dev/null
cat: junk.file: Bad message

Looking at the trace, call->offset wrapped since it is only an
unsigned short. In afs_extract_data:

        _enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count);
...

        if (call->offset < count) {
                if (last) {
                        _leave(" = -EBADMSG [%d < %zu]", call->offset, count);
                        return -EBADMSG;
                }

Which matches the trace:

[cat   ] ==> afs_extract_data({65132},{524},1,,65536)
[cat   ] <== afs_extract_data() = -EBADMSG [0 < 65536]

call->offset went from 65132 to 0. Fix this by making call->offset an
unsigned int.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

C6X: remove dead code from entry.S

The ENDPROC() on sys_fadvise64_c6x() in arch/c6x/kernel/entry.S is
outside of the conditional block with the matching ENTRY() macro. This
leads a newer (v2.22 vs. v2.20) assembler to complain:

/tmp/ccGZBaPT.s: Assembler messages:
/tmp/ccGZBaPT.s: Error: .size expression for sys_fadvise64_c6x does not evaluate to a constant

The conditional block became dead code when c6x switched to generic
unistd.h and should be removed along with the offending ENDPROC().

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>

wimax/i2400m: fix erroneous NETDEV_TX_BUSY use

A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.

In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.

In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK

Also increments tx_dropped counter

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hyperv: fix erroneous NETDEV_TX_BUSY use

A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.

This is mostly a revert of commit bf769375c (staging: hv: fix the return
status of netvsc_start_xmit())

In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.

In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/usbnet: reserve headroom on rx skbs

network drivers should reserve some headroom on incoming skbs so that we
dont need expensive reallocations, eg forwarding packets in tunnels.

This NET_SKB_PAD padding is done in various helpers, like
__netdev_alloc_skb_ip_align() in this patch, combining NET_SKB_PAD and
NET_IP_ALIGN magic.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix memory leak in bnx2x_init_firmware()

When cycling the interface down and up, bnx2x_init_firmware() knows that
the firmware is already loaded, but nevertheless it allocates certain
arrays anew (init_data, init_ops, init_ops_offsets, iro_arr). The old
arrays are leaked.

Fix the leaks by returning early if the firmware was already loaded.
Because if the firmware is loaded, so are the arrays.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix a crash on corrupt firmware file

If the requested firmware is deemed corrupt and then released, reset the
pointer to NULL in order to avoid double-freeing it in
bnx2x_release_firmware() or dereferencing it in bnx2x_init_firmware().

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_sfq: revert dont put new flow at the end of flows

This reverts commit d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)

As Jesper found out, patch sounded great but has bad side effects.

In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.

It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.

A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.

Reported-by: Jesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fix icmp6_dst_alloc()

commit 87a115783 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()

Many thanks to Dave Jones for providing a very complete bug report.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>