[go: nahoru, domu]

History log of /drivers/net/bonding/bond_main.c
Revision Date Author Comments
b8e4500f42fe4464a33a887579147050bed8fcef 18-Nov-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: fix curr_active_slave/carrier with loadbalance arp monitoring

Since commit 6fde8f037e60 ("bonding: fix locking in
bond_loadbalance_arp_mon()") we can have a stale bond carrier state and
stale curr_active_slave when using arp monitoring in loadbalance modes. The
reason is that in bond_loadbalance_arp_mon() we can't have
do_failover == true but slave_state_changed == false, whenever do_failover
is true then slave_state_changed is also true. Then the following piece
from bond_loadbalance_arp_mon():
if (slave_state_changed) {
bond_slave_state_change(bond);
if (BOND_MODE(bond) == BOND_MODE_XOR)
bond_update_slave_arr(bond, NULL);
} else if (do_failover) {
block_netpoll_tx();
bond_select_active_slave(bond);
unblock_netpoll_tx();
}

will execute only the first branch, always and regardless of do_failover.
Since these two events aren't related in such way, we need to decouple and
consider them separately.

For example this issue could lead to the following result:
Bonding Mode: load balancing (round-robin)
*MII Status: down*
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 100
ARP IP target/s (n.n.n.n form): 192.168.9.2

Slave Interface: ens12
*MII Status: up*
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 00:0f:53:01:42:2c
Slave queue ID: 0

Slave Interface: eth1
*MII Status: up*
Speed: Unknown
Duplex: Unknown
Link Failure Count: 70
Permanent HW addr: 52:54:00:2f:0f:8e
Slave queue ID: 0

Since some interfaces are up, then the status of the bond should also be
up, but it will never change unless something invokes bond_set_carrier()
(i.e. enslave, bond_select_active_slave etc). Now, if I force the
calling of bond_select_active_slave via for example changing
primary_reselect (it can change in any mode), then the MII status goes to
"up" because it calls bond_select_active_slave() which should've been done
from bond_loadbalance_arp_mon() itself.

CC: Veaceslav Falico <vfalico@gmail.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Ding Tianhong <dingtianhong@huawei.com>

Fixes: 6fde8f037e60 ("bonding: fix locking in bond_loadbalance_arp_mon()")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0287587884b15041203b3a362d485e1ab1f24445 06-Oct-2014 Eric Dumazet <edumazet@google.com> net: better IFF_XMIT_DST_RELEASE support

Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee6377147409a00c071b2da853059a7d59979fbc 05-Oct-2014 Mahesh Bandewar <maheshb@google.com> bonding: Simplify the xmit function for modes that use xmit_hash

Earlier change to use usable slave array for TLB mode had an additional
performance advantage. So extending the same logic to all other modes
that use xmit-hash for slave selection (viz 802.3AD, and XOR modes).
Also consolidating this with the earlier TLB change.

The main idea is to build the usable slaves array in the control path
and use that array for slave selection during xmit operation.

Measured performance in a setup with a bond of 4x1G NICs with 200
instances of netperf for the modes involved (3ad, xor, tlb)
cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5

Mode TPS-Before TPS-After

802.3ad : 468,694 493,101
TLB (lb=0): 392,583 392,965
XOR : 475,696 484,517

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5f0c5f73e5efaee2928c4cabcf48b03f6ba99fc8 29-Sep-2014 Andy Gospodarek <gospo@cumulusnetworks.com> bonding: make global bonding stats more reliable

As the code stands today, bonding stats are based simply on the stats
from the member interfaces. If a member was to be removed from a bond,
the stats would instantly drop. This would be confusing to an admin
would would suddonly see interface stats drop while traffic is still
flowing.

In addition to preventing the stats drops mentioned above, new members
will now be added to the bond and only traffic received after the member
was added to the bond will be counted as part of bonding stats. Bonding
counters will also be updated when any slaves are dropped to make sure
the reported stats are reliable.

v2: Changes suggested by Nik to properly allocate/free stats memory.
v3: Properly destroy workqueue and fix netlink configuration path.
v4: Moved cached stats into bonding and slave structs as there does not
seem to be a complexity/performance benefit to using alloc'd memory vs
in-struct memory.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
37ab7ddf3f81cec9175f53f17c357bb0d27a343e 19-Sep-2014 dingtianhong <dingtianhong@huawei.com> bonding: remove the unnecessary notes for bond_xmit_broadcast()

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a64d044e3907b717ae3d1e3711226064b42c83f4 19-Sep-2014 dingtianhong <dingtianhong@huawei.com> bonding: slight optimization for bond_xmit_roundrobin()

When the slave is the curr_active_slave, no need to check
whether the slave is active or not, it is always active.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e0974585e74cc16446bc0690f0545b72aa2a3485 15-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: consolidate ASSERT_RTNL()s and remove the unnecessary

Consolidate the calls to ASSERT_RTNL() before bond_select_active_slave()
inside bond_select_active_slave() itself and remove the ASSERT_RTNL()
from bond_hw_addr_swap() as it's not exported and its only caller -
bond_change_active_slave() already has an ASSERT_RTNL().

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
547942cace50e536dcda9ce8397792bc992291d6 15-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: trivial: style and comment fixes

First adjust a couple of locking comments that were left inaccurate,
then adjust comments to use the netdev styling and remove extra new
lines where necessary and add a couple of new lines between declarations
and code. These are all trivial styling changes, no functional change.
Also removed a couple of outdated or obvious comments.
This patch is by no means a complete fix of all netdev style violations
but it gets the bonding closer.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9a72c2da690d78e93cff24b9f616412508678dd5 12-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: fix div by zero while enslaving and transmitting

The problem is that the slave is first linked and slave_cnt is
incremented afterwards leading to a div by zero in the modes that use it
as a modulus. What happens is that in bond_start_xmit()
bond_has_slaves() is used to evaluate further transmission and it becomes
true after the slave is linked in, but when slave_cnt is used in the xmit
path it is still 0, so fetch it once and transmit based on that. Since
it is used only in round-robin and XOR modes, the fix is only for them.
Thanks to Eric Dumazet for pointing out the fault in my first try to fix
this.

Call trace (took it out of net-next kernel, but it's the same with net):
[46934.330038] divide error: 0000 [#1] SMP
[46934.330041] Modules linked in: bonding(O) 9p fscache
snd_hda_codec_generic crct10dif_pclmul
[46934.330041] bond0: Enslaving eth1 as an active interface with an up
link
[46934.330051] ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
pata_acpi floppy [last unloaded: bonding]
[46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G O
3.17.0-rc4+ #27
[46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
ffff88005b728000
[46934.330059] RIP: 0010:[<ffffffffa0198c33>] [<ffffffffa0198c33>]
bond_start_xmit+0x1c3/0x450 [bonding]
[46934.330060] RSP: 0018:ffff88005b72b7f8 EFLAGS: 00010246
[46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
000000000000002a
[46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
ffff88004b077940
[46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
ffff88004a83e000
[46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
ffff88004b3f0500
[46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
ffff88004b077940
[46934.330063] FS: 00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
knlGS:0000000000000000
[46934.330064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
00000000000406e0
[46934.330069] Stack:
[46934.330071] ffffffff811e6169 00000000e772fa05 ffff88004b077000
ffff88004b3f0500
[46934.330072] ffffffff81d17d18 000000000000002a 0000000000000000
ffff88005b72b8a0
[46934.330073] ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
ffff88005b302000
[46934.330073] Call Trace:
[46934.330077] [<ffffffff811e6169>] ?
__kmalloc_node_track_caller+0x119/0x300
[46934.330084] [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
[46934.330086] [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
[46934.330088] [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
[46934.330089] [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
[46934.330090] [<ffffffff8168f022>] arp_xmit+0x22/0x60
[46934.330091] [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
[46934.330092] [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
[46934.330094] [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
[46934.330096] [<ffffffff8162875a>] neigh_probe+0x4a/0x70
[46934.330097] [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
[46934.330098] [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
[46934.330100] [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
[46934.330101] [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
[46934.330102] [<ffffffff81661f08>] ip_output+0x58/0x90
[46934.330103] [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
[46934.330104] [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
[46934.330105] [<ffffffff81662a66>] ip_send_skb+0x16/0x50
[46934.330106] [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
[46934.330107] [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
[46934.330110] [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
[46934.330111] [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
[46934.330113] [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
[46934.330114] [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
[46934.330115] bond0: Adding slave eth2
[46934.330116] [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
[46934.330118] [<ffffffff81603248>] ?
move_addr_to_kernel.part.20+0x28/0x80
[46934.330121] [<ffffffff811b4477>] ? might_fault+0x47/0x50
[46934.330122] [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
[46934.330125] [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
[46934.330127] [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
[46934.330129] [<ffffffff81242b38>] ? fsnotify+0x238/0x310
[46934.330130] [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
[46934.330131] [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
[46934.330134] [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
[46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
[46934.330146] RIP [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
[bonding]
[46934.330146] RSP <ffff88005b72b7f8>

CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
Fixes: 278b208375 ("bonding: initial RCU conversion")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8c0bc550288d81e9ad8a2ed9136a72140b9ef507 11-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: adjust locking comments

Now that locks have been removed, remove some unnecessary comments and
adjust others to reflect reality. Also add a comment to "mode_lock" to
describe its current users and give a brief summary why they need it.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e470259fa1bd7ce5a375b16c5ec97cc0e83b058d 11-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: 3ad: convert to bond->mode_lock

Now that we have bond->mode_lock, we can remove the state_machine_lock
and use it in its place. There're no fast paths requiring the per-port
spinlocks so it should be okay to consolidate them into mode_lock.
Also move it inside the unbinding function as we don't want to expose
mode_lock outside of the specific modes.

Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4bab16d7c97498e91564231b922d49f52efaf7d4 11-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: alb: convert to bond->mode_lock

The ALB/TLB specific spinlocks are no longer necessary as we now have
bond->mode_lock for this purpose, so convert them and remove them from
struct alb_bond_info.
Also remove the unneeded lock/unlock functions and use spin_lock/unlock
directly.

Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b743562819bd97cc7c282e870896bae8016b64b5 11-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert curr_slave_lock to a spinlock and rename it

curr_slave_lock is now a misleading name, a much better name is
mode_lock as it'll be used for each mode's purposes and it's no longer
necessary to use a rwlock, a simple spinlock is enough.

Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1c72cfdc96e63bf975cab514c4ca4d8a661ba0e6 11-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: clean curr_slave_lock use

Mostly all users of curr_slave_lock already have RTNL as we've discussed
previously so there's no point in using it, the one case where the lock
must stay is the 3ad code, in fact it's the only one.
It's okay to remove it from bond_do_fail_over_mac() as it's called with
RTNL and drops the curr_slave_lock anyway.
bond_change_active_slave() is one of the main places where
curr_slave_lock was used, it's okay to remove it as all callers use RTNL
these days before calling it, that's why we move the ASSERT_RTNL() in
the beginning to catch any potential offenders to this rule.
The RTNL argument actually applies to all of the places where
curr_slave_lock has been removed from in this patch.
Also remove the unnecessary bond_deref_active_protected() macro and use
rtnl_dereference() instead.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
37b7021d9d713964482faeb9e0935f458a1a4016 09-Sep-2014 Masanari Iida <standby24x7@gmail.com> net:bonding: Add missing space in bonding driver parameter description

This patch adds missing space between "interface" and "by"
in bonding module parameter description.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
87163ef9cda7617f8afdb549de191706641003c0 09-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: remove last users of bond->lock and bond->lock itself

The usage of bond->lock in bond_main.c was completely unnecessary as it
didn't help to sync with anything, most of the spots already had RTNL.
Since there're no more users of bond->lock, remove it.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
059b47e8aaf997245bc531e980581de492315fe6 09-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert primary_slave to use RCU

This is necessary mainly for two bonding call sites: procfs and
sysfs as it was dereferenced without any real protection.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bdbc5f13036c13ba47dad5f99645556fc40381f0 09-Sep-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: 3ad: use curr_slave_lock instead of bond->lock

In 3ad mode the only syncing needed by bond->lock is for the wq
and the recv handler, so change them to use curr_slave_lock.
There're no locking dependencies here as 3ad doesn't use
curr_slave_lock at all.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a67eed571aa505f51a4d02cab765a9d4f6ef80c4 25-Jul-2014 Dan Carpenter <dan.carpenter@oracle.com> bonding: fix a memory leak in bond_arp_send_all()

This test is reversed so the memory is always leaked. It's better style
to remove the test anyway.

Fixes: 3e403a77779f ('bonding: make it possible to have unlimited nested upper vlans')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3e403a77779faf046862d91c36ef79fb4b12be9a 17-Jul-2014 Veaceslav Falico <vfalico@gmail.com> bonding: make it possible to have unlimited nested upper vlans

Currently we're limited by a constant level of vlan nestings, and fail to
find anything beyound that level (currently 2).

To fix this - remove the limit of nestings when going through device tree,
and when the end device is found - allocate the needed amount of vlan tags
and return them, instead of found/not found.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
23fa5c2caae08f919d906b1064b9fdc352b3024e 17-Jul-2014 Veaceslav Falico <vfalico@gmail.com> bonding: destroy proc directory only after all bonds are gone

Currently we might arrive to bond_net_exit() with some bonds left (that
were created while the module is unloading). We take care of that by
destroying sysfs (the last possibility to add new bonds) and then
destroying all the remaining bonds.

However, we destroy the /proc/net/bonding directory before destroying those
last bonds, and get a warning that we're trying to destroy a non-empty
proc directory (containing /proc/net/bonding/bondX).

Fix this by moving bond_destroy_proc_dir() after all the bonds are
destroyed, so that we're sure that no bonds exist.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14056e7930761b730e2b34ae716e151ba890f3bf 16-Jul-2014 Veaceslav Falico <vfalico@gmail.com> bonding: use rtnl_deref in bond_change_rx_flags()

As it's always called with RTNL held, via dev_set_allmulti/promiscuity.
Also, remove the wrong comment.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ce04d63502ca7ec30ef07336a0fd6f1165fd486b 17-Jul-2014 Jianhua Xie <jianhua.xie@freescale.com> bonding: enhance L2 hash helper with packet type

Current L2 hash helper calculates destination eth addr and
source ether addr as L2 hash factors. This patch is adding
packet type ID field into L2 hash factors. While one of
BOND_XMIT_POLICY_LAYER2 or BOND_XMIT_POLICY_{LAYER|ENCAP}23
is applied, for the 2nd level hash, enhanced hash method can
help to distribute different types of packets like IPv4/IPv6
packets to different slave devices.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>
CC: Pan Jiafei <Jiafei.Pan@freescale.com>

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f338532327bf1f93ff30764dbd2251c1e2fa765b 15-Jul-2014 Veaceslav Falico <vfalico@gmail.com> bonding: remove pr_fmt from bond_main.c

To maintain the same message structure as netdev_* functions print.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
76444f5052edf6683a735c3bc307e11c121654c5 15-Jul-2014 Veaceslav Falico <vfalico@gmail.com> bonding: convert bond_main.c to use netdev_printk instead of pr_

Converted only the parts where we've had a valid net_device, skipping the
init/deinit and options verification.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f54424412b6b2f64cae4d7c39d981ca14ce0052c 15-Jul-2014 Veaceslav Falico <vfalico@gmail.com> bonding: permit enslaving interfaces without set_mac support

Currently we exit if the slave isn't the first slave, doesn't support mac
address setting and fail_over_mac isn't FOM_ACTIVE. It's wrong because we
only require ndo_set_mac_address in case bonding is in active-backup mode
and FOM isn't FOM_ACTIVE.

To fix this - only exit with an error if we're in a/b mode and have
fail_over_mac != FOM_ACTIVE.

Also, maintain current behaviour on the first slave (forcibly change fom to
FOM_ACTIVE) to not break anyone's configuration.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8574171833b24fda5101e1aa892a38c0d91d083e 15-Jul-2014 Eric Dumazet <edumazet@google.com> bonding: add proper __rcu annotation for current_arp_slave

Using __rcu annotation actually helps to spot all accesses to
bond->current_arp_slave are correctly protected, with LOCKDEP support.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4740d6382790077f22c606d03804f5d9f15b90d7 15-Jul-2014 Eric Dumazet <edumazet@google.com> bonding: add proper __rcu annotation for curr_active_slave

RCU was added to bonding in linux-3.12 but lacked proper sparse annotations.

Using __rcu annotation actually helps to spot all accesses to bond->curr_active_slave
are correctly protected, with LOCKDEP support.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c835a677331495cf137a7f8a023463afd9f032f8 14-Jul-2014 Tom Gundersen <teg@jklm.no> net: set name_assign_type in alloc_netdev()

Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
548d28bd0eac840d122b691279ce9f4ce6ecbfb6 13-Jul-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: fix ad_select module param check

Obvious copy/paste error when I converted the ad_select to the new
option API. "lacp_rate" there should be "ad_select" so we can get the
proper value.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>

Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option
API")
Reported-by: Karim Scheik <karim.scheik@prisma-solutions.at>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e721f87d806f2a959d6a530be18dabee6097aa79 02-Jul-2014 Jiri Pirko <jiri@resnulli.us> bonding: remove no longer relevant vlan warnings

These warnings are no longer relevant. Even when last slave is
removed, there is a valid address assigned to bond (random).
The correct functionality of vlans is ensured by maintaining unicast
list in vlan_sync_address().

Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
763e0ecd72fe90fdd73bb1aa1b72caf8381d2fff 27-Jun-2014 Jiri Pirko <jiri@resnulli.us> bonding: allow to add vlans on top of empty bond

This limitation maybe had some reason in the past, but now there is not
one -> removing this.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a7baa78851b8e915480aa596de4bd2f13f31ffb 17-Jun-2014 Or Gerlitz <ogerlitz@mellanox.com> bonding: Advertize vxlan offload features when supported

When the underlying device supports TCP offloads for VXLAN/UDP
encapulated traffic, we need to reflect that through the hw_enc_features
field of the bonding net-device. This will cause the xmit path
in the core networking stack to provide bonding with encapsulated
GSO frames to offload into the HW etc.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14af9963ba1e5e8400c9de9267bdcab895109f6a 04-Jun-2014 Vlad Yasevich <vyasevic@redhat.com> bonding: Support macvlans on top of tlb/rlb mode bonds

To make TLB mode work, the patch allows learning packets
to be sent using mac addresses assigned to macvlan devices,
also taking into an account vlans that may be between the
bond and macvlan device.

To make RLB work, all we have to do is accept ARP packets
for addresses added to the bond dev->uc list. Since RLB
mode will take care to update the peers directly with
correct mac addresses, learning packets for these addresses
do not have be send to switch.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c565b488c6a86fdff24f6d716defc3d07a474b44 04-Jun-2014 Vlad Yasevich <vyasevic@redhat.com> bonding: Turn on IFF_UNICAST_FLT on bond devices

Bonding devices manage the unicast filters of the underlying
interfaces, but do not turn on IFF_UNICAST_FLT flag. Thus
anytime a unicast address is added to the bond, the bond is
places in promiscuous mode.

Turn on IFF_UNICAST_FLT on the bond device so that the bond does
not go into promiscuous mode needlesly. If an underlying device
does not support unicast filtering, that device will automaticall
enter promiscuous mode already.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dc73c41f4e0bdc3e43b6247d2b35e927739f1d9f 21-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: populate essential new_slave->bond/dev early

The new bond_free_slave() needs new_slave->bond to verify if additional
structures were allocated, so populate it early so that, in case of failure
in bond_enslave(), we would be able to get it.

Also populate the new_slave->dev field, as it's too one of the most needed
things to assign early.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a9b3ace44c7d4eb021a78a4d2e6bb812c34f086f 20-May-2014 Michal Kubeček <mkubecek@suse.cz> bonding: fix vlan_features computing

bond_compute_features() uses netdev_increment_features() to
combine vlan_features of slaves into vlan_features of the bond.
As netdev_increment_features() only adds most features and we
start with BOND_VLAN_FEATURES, we can end up with features none
of the slaves provided.

If there is at least one slave, initialize vlan_features only
with the flags in NETIF_F_ALL_FOR_ALL. Right now there is none
in BOND_VLAN_FEATURES but stating it explicitely will make the
code more future proof.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
44a4085538c844e79d6ee6bcf46fabf7c57a9a38 16-May-2014 Vlad Yasevich <vyasevic@redhat.com> bonding: Fix stacked device detection in arp monitoring

Prior to commit fbd929f2dce460456807a51e18d623db3db9f077
bonding: support QinQ for bond arp interval

the arp monitoring code allowed for proper detection of devices
stacked on top of vlans. Since the above commit, the
code can still detect a device stacked on top of single
vlan, but not a device stacked on top of Q-in-Q configuration.
The search will only set the inner vlan tag if the route
device is the vlan device. However, this is not always the
case, as it is possible to extend the stacked configuration.

With this patch it is possible to provision devices on
top Q-in-Q vlan configuration that should be used as
a source of ARP monitoring information.

For example:
ip link add link bond0 vlan10 type vlan proto 802.1q id 10
ip link add link vlan10 vlan100 type vlan proto 802.1q id 100
ip link add link vlan100 type macvlan

Note: This patch limites the number of stacked VLANs to 2,
just like before. The original, however had another issue
in that if we had more then 2 levels of VLANs, we would end
up generating incorrectly tagged traffic. This is no longer
possible.

Fixes: fbd929f2dce460456807a51e18d623db3db9f077 (bonding: support QinQ for bond arp interval)
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Patric McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8557cd74ca8af9a71ae19d445e33d92bd50a6dc5 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: replace SLAVE_IS_OK() with bond_slave_can_tx()

They're verifying the same thing (except of IFF_UP, which is implied for
netif_running(), which is also a prerequisite).

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
891ab54d6636b7aa0da48b5a5dd738af8b75cafe 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: rename {, bond_}slave_can_tx and clean it up

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b6adc610f183061bd607d965857870e618d229a6 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: convert IS_UP(slave->dev) to inline function

Also, remove the IFF_UP verification cause we can't be netif_running() with
being also IFF_UP.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2807a9feb2393648f4db114fdf3fa99860ff6a36 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: make IS_IP_TARGET_UNUSABLE_ADDRESS an inline function

Also, use standard IP primitives to check the address.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
01844098ecd9564cd5f903e3ff6c1ea96355772d 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: create a macro for bond mode and use it

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ec0865a94991d1819d4f99866a2492af8df5c882 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: make USES_PRIMARY inline functions

Change the name a bit to better reflect its scope, and update some
comments. Two functions added - one which takes bond as a param and the
other which takes the mode.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
267bed777a5f8a8f5acd50a9134c7341fc46d822 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: make BOND_NO_USES_ARP an inline function

Also, change its name to better reflect its scope, and skip the "no"
part.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d1e2e5cd4f8ed2ac7c43ce44feeb9ebc7d27cb4b 15-May-2014 Veaceslav Falico <vfalico@gmail.com> bonding: make TX_QUEUE_OVERRIDE() macro an inline function

Also, make it accept bonding as a parameter and change the name a bit to
better reflect its scope.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3fdddd859af235119bdfb09ccc886fe48b97fc72 12-May-2014 dingtianhong <dingtianhong@huawei.com> bonding: alloc the structure ad_info dynamically in per slave

The struct ad_slave_info is very huge, and only be used for 802.3ad mode,
so alloc the structure dynamically could save 356 Bits for every slave in
non 802.3ad mode.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bedabf903d9b1a056ba58dac1c89760d5adca2a3 07-May-2014 dingtianhong <dingtianhong@huawei.com> bonding: simplify the slave_do_arp_validate_only()

The argument slave is not used for slave_do_arp_validate_only(), so no need
to keep it, make the function more simple.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e9f0fb88493570200b8dc1cc02d3e676412d25bc 23-Apr-2014 Mahesh Bandewar <maheshb@google.com> bonding: Add tlb_dynamic_lb parameter for tlb mode

The aggresive load balancing causes packet re-ordering as active
flows are moved from a slave to another within the group. Sometime
this aggresive lb is not necessary if the preference is for less
re-ordering. This parameter if used with value "0" disables
this dynamic flow shuffling minimizing packet re-ordering. Of course
the side effect is that it has to live with the static load balancing
that the hashing distribution provides. This impact is less severe if
the correct xmit-hashing-policy is used for the tlb setup.

The default value of the parameter is set to "1" mimicing the earlier
behavior.

Ran the netperf test with 200 stream for 1 min between two hosts with
4x1G trunk (xmit-lb mode with xmit-policy L3+4) before and after these
changes. Following was the command used for those 200 instances -

netperf -t TCP_RR -l 60 -s 5 -H <host> -- -r81920,81920

Transactions per second:
Before change: 1,367.11
After change: 1,470.65

Change-Id: Ie3f75c77282cf602e83a6e833c6eb164e72a0990
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f05b42eaa22cd7c6736d31316e6046c5127f8721 23-Apr-2014 Mahesh Bandewar <maheshb@google.com> bonding: Added bond_tlb_xmit() for tlb mode.

Re-organized the xmit function for the lb mode separating tlb xmit
from the alb mode. This will enable use of the hashing policies
like 802.3ad mode. Also extended use of xmit-hash-policy to tlb mode.

Now the tlb-mode defaults to BOND_XMIT_POLICY_LAYER2 if the xmit policy
module parameter is not set (just like 802.3ad, or Xor mode).

Change-Id: I140257403d272df75f477b380207338d0f04963e
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee62e868139b96f73f3d01268ca1c39f7c6f4cd7 23-Apr-2014 Mahesh Bandewar <maheshb@google.com> bonding: Changed hashing function to just provide hash

Modified the hash function to return just hash separating from the
modulo operation that can be performed by the caller. This is to
make way for the tlb mode to use the same hashing policies that
are used in the 802.3ad and Xor mode.

Change-Id: I276609e87e0ca213c4d1b17b79c5e0b0f3d0dd6f
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
db29868653394937037d71dc3545768302dda643 09-Apr-2014 Thomas Richter <tmricht@linux.vnet.ibm.com> bonding: Remove debug_fs files when module init fails

Remove the bonding debug_fs entries when the
module initialization fails. The debug_fs
entries should be removed together with all other
already allocated resources.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7db8df02797e29cfa7d62a7e0b19f41e64b8433e 02-Apr-2014 zheng.li <zheng.x.li@oracle.com> bonding: Inactive slaves should keep inactive flag's value

bond_open is not setting the inactive flag correctly for some modes (alb and
tlb), resulting in error behavior if the bond has been administratively set
down and then back up. This effect should not occur when slaves are added while
the bond is up; it's something that only happens after a down/up bounce of the
bond.

For example, in bond tlb or alb mode, domu send some ARP request which go out
from dom0 bond's active slave, then the ARP broadcast request packets go back to
inactive slave from switch, because the inactive slave's inactive flag is zero,
kernel will receive the packets and pass them to bridge that cause dom0's bridge
map domu's MAC address to port of bond, bridge should map domu's MAC to port of
vif.

Signed-off-by: Zheng Li <zheng.x.li@oracle.com>
Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a8779ec1c5e60548b7b661a8d74a8cecf7775690 27-Mar-2014 Eric W. Biederman <ebiederm@xmission.com> netpoll: Remove gfp parameter from __netpoll_setup

The gfp parameter was added in:
commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af
Author: Amerigo Wang <amwang@redhat.com>
Date: Fri Aug 10 01:24:37 2012 +0000

netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

slave_enable_netpoll() and __netpoll_setup() may be called
with read_lock() held, so should use GFP_ATOMIC to allocate
memory. Eric suggested to pass gfp flags to __netpoll_setup().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

The reason for the gfp parameter was removed in:
commit c4cdef9b7183159c23c7302aaf270d64c549f557
Author: dingtianhong <dingtianhong@huawei.com>
Date: Tue Jul 23 15:25:27 2013 +0800

bonding: don't call slave_xxx_netpoll under spinlocks

The slave_xxx_netpoll will call synchronize_rcu_bh(),
so the function may schedule and sleep, it should't be
called under spinlocks.

bond_netpoll_setup() and bond_netpoll_cleanup() are always
protected by rtnl lock, it is no need to take the read lock,
as the slave list couldn't be changed outside rtnl lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Nothing else that calls __netpoll_setup or ndo_netpoll_setup
requires a gfp paramter, so remove the gfp parameter from both
of these functions making the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4873ac3c8ed3b0285f18b81e501249c26284c2ca 25-Mar-2014 dingtianhong <dingtianhong@huawei.com> bonding: add net_ratelimt to avoid spam in arp interval

Remove the unnecessary log and add net_ratelimit to the others, in order to
avoid spam the log.

Cc: Joe Perches <joe@perches.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fbd929f2dce460456807a51e18d623db3db9f077 25-Mar-2014 dingtianhong <dingtianhong@huawei.com> bonding: support QinQ for bond arp interval

The bond send arp request to indicate that the slave is active, and if the bond dev
is a vlan dev, it will set the vlan tag in skb to notice the vlan group, but the
bond could only send a skb with 802.1q proto, not support for QinQ.

So add outer tag for lower vlan tag and inner tag for upper vlan tag to support QinQ,
The new skb will be consist of two vlan tag just like this:

dst mac | src mac | outer vlan tag | inner vlan tag | data | .....

If We don't need QinQ, the inner vlan tag could be set to 0 and use outer vlan tag
as a normal vlan group.

Using "ip link" to configure the bond for QinQ and add test log:

ip link add link bond0 bond0.20 type vlan proto 802.1ad id 20
ip link add link bond0.20 bond0.20.200 type vlan proto 802.1q id 200

ifconfig bond0.20 11.11.20.36/24
ifconfig bond0.20.200 11.11.200.36/24

echo +11.11.200.37 > /sys/class/net/bond0/bonding/arp_ip_target

90:e2:ba:07:4a:5c (oui Unknown) > Broadcast, ethertype 802.1Q-QinQ (0x88a8),length 50: vlan 20, p 0,ethertype 802.1Q, vlan 200, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 11.11.200.37 tell 11.11.200.36, length 28

90:e2:ba:06:f9:86 (oui Unknown) > 90:e2:ba:07:4a:5c (oui Unknown), ethertype 802.1Q-QinQ (0x88a8), length 50: vlan 20, p 0, ethertype 802.1Q, vlan 200, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 11.11.200.37 is-at 90:e2:ba:06:f9:86 (oui Unknown), length 28

v1->v2: remove the comment "TODO: QinQ?".

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9152e26df20b9e5ffdbe0d5b96d0e9ff8b33ff31 25-Mar-2014 dingtianhong <dingtianhong@huawei.com> bonding: ratelimit pr_err() for bond xmit broadcast

It may spam if the system is out of the memory, add ratelimit for it.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
054bb8801038c93c42cb6cde75141aa396afd065 25-Mar-2014 dingtianhong <dingtianhong@huawei.com> bonding: slight optimization for bond xmit path

Add unlikely() micro to the unlikely conditions in the bond
xmit path for slight optimization.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2bb77ab42a6a40162a367b80394b96bb756ad5f1 11-Mar-2014 Eric W. Biederman <ebiederm@xmission.com> bonding: Call dev_kfree_skby_any instead of kfree_skb.

Replace kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f3253339a47ff3690ce52e2acd95ec295f8521b3 05-Mar-2014 stephen hemminger <stephen@networkplumber.org> bonding: options handling cleanup

Make local functions static (ie. only used in bond_options.c)
Make bond options parsing tables constant.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fca28094cd628e187520f412caa0fb687bfbc8d4 05-Mar-2014 stephen hemminger <stephen@networkplumber.org> bonding: remove dead code

These functions are defined but no longer used.
Compile tested only.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
285727600fa3714051cda1c21f20a8a3842f3dd8 28-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: send arp requests even if there's no route to them

Currently we're only sending arp requests if we have a route to the target
(and, thus, can find out the source ip address).

There are some use cases, however, where we don't want/need to set an ip
address (or set up a specific route) for bonding to use arp monitoring *for
traffic generation*. We can easily send arp probes (arp requests with src
ip == 0) to generate arp broadcast responses from the target ip and use
them for determining if the target is up.

This, obviously, won't work with arp validation - because we don't have the
ip address set and, thus, will filter out the responses. So in that case -
print a warning.

CC: François CACHEREUL <f.cachereul@alphalink.fr>
CC: Zhenjie Chen <zhchen@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
09a89c219baf0f116387efc928e325cf23630f20 26-Feb-2014 Jiri Bohac <jbohac@suse.cz> bonding: disallow enslaving a bond to itself

Enslaving a bond to itself leads to an endless loop and hangs the kernel.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Tested-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee6154e11eeccd4ae32c4881415dbd902a869592 26-Feb-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: fix a div error caused by the slave release path

There's a bug in the slave release function which leads the transmit
functions which use the bond->slave_cnt to a div by 0 because we might
just have released our last slave and made slave_cnt == 0 but at the same
time we may have a transmitter after the check for an empty list which will
fetch it and use it in the slave id calculation.
Fix it by moving the slave_cnt after synchronize_rcu so if this was our
last slave any new transmitters will see an empty slave list which is
checked after rcu lock but before calling the mode transmit functions
which rely on bond->slave_cnt.

Fixes: 278b208375 ("bonding: initial RCU conversion")

CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: David S. Miller <davem@davemloft.net>

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b0929915e0356acedf59504521c097ecada88b19 26-Feb-2014 dingtianhong <dingtianhong@huawei.com> bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor

Veaceslav has reported and fix this problem by commit f2ebd477f141bc0
(bonding: restructure locking of bond_ab_arp_probe()). According Jay's
opinion, the current solution is not very well, because the notification
is to indicate that the interface has actually changed state in a meaningful
way, but these calls in the ab ARP monitor are internal settings of the flags
to allow the ARP monitor to search for a slave to become active when there are
no active slaves. The flag setting to active or backup is to permit the ARP
monitor's response logic to do the right thing when deciding if the test
slave (current_arp_slave) is up or not.

So the best way to fix the problem is that we should not send a notification
when the slave is in testing state, and check the state at the end of the
monitor, if the slave's state recover, avoid to send pointless notification
twice. And RTNL is really a big lock, hold it regardless the slave's state
changed or not when the current_active_slave is null will loss performance
(every 100ms), so we should hold it only when the slave's state changed and
need to notify.

I revert the old commit and add new modifications.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5e5b066535f0ee58e5de3a2db5fb56fa3cd7e3b1 26-Feb-2014 dingtianhong <dingtianhong@huawei.com> bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for 802.3ad mode

The problem was introduced by the commit 1d3ee88ae0d
(bonding: add netlink attributes to slave link dev).
The bond_set_active_slave() and bond_set_backup_slave()
will use rtmsg_ifinfo to send slave's states, so these
two functions should be called in RTNL.

In 802.3ad mode, acquiring RTNL for the __enable_port and
__disable_port cases is difficult, as those calls generally
already hold the state machine lock, and cannot unconditionally
call rtnl_lock because either they already hold RTNL (for calls
via bond_3ad_unbind_slave) or due to the potential for deadlock
with bond_3ad_adapter_speed_changed, bond_3ad_adapter_duplex_changed,
bond_3ad_link_change, or bond_3ad_update_lacp_rate. All four of
those are called with RTNL held, and acquire the state machine lock
second. The calling contexts for __enable_port and __disable_port
already hold the state machine lock, and may or may not need RTNL.

According to the Jay's opinion, I don't think it is a problem that
the slave don't send notify message synchronously when the status
changed, normally the state machine is running every 100 ms, send
the notify message at the end of the state machine if the slave's
state changed should be better.

I fix the problem through these steps:

1). add a new function bond_set_slave_state() which could change
the slave's state and call rtmsg_ifinfo() according to the input
parameters called notify.

2). Add a new slave parameter which called should_notify, if the slave's state
changed and don't notify yet, the parameter will be set to 1, and then if
the slave's state changed again, the param will be set to 0, it indicate that
the slave's state has been restored, no need to notify any one.

3). the __enable_port and __disable_port should not call rtmsg_ifinfo
in the state machine lock, any change in the state of slave could
set a flag in the slave, it will indicated that an rtmsg_ifinfo
should be called at the end of the state machine.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7a4ddcd92ea72fdc54ebf671f3d3fcc8ba3a1ea8 21-Feb-2014 dingtianhong <dingtianhong@huawei.com> bonding: remove no longer needed lock for bond_xxx_info_query()

The bond_xxx_info_query() was already in RTNL, so no need to use
bond lock to protect the bond slave list, so remove it.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
827418081ab177b82ff3032277fca32c3ecc5dc3 21-Feb-2014 dingtianhong <dingtianhong@huawei.com> bonding: netpoll: remove unwanted slave_dev_support_netpoll()

The __netpoll_setup() will check the slave's flag and ndo_poll_controller just
like the slave_dev_support_netpoll() does, and slave_dev_support_netpoll() was
not used by any place, so remove it.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
010d3c3989706d800ae72253773fa6537cc9f74c 20-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: fix bond_arp_rcv() race of curr_active_slave

bond->curr_active_slave can be changed between its deferences, even to
NULL, and thus we might panic.

We're always holding the rcu (rx_handler->bond_handle_frame()->bond_arp_rcv())
so fix this by rcu_dereferencing() it and using the saved.

Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Fixes: aeea64a ("bonding: don't trust arp requests unless active slave really works")
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2a7c183bc7ae798915c4bc58d3bf413fe466705b 18-Feb-2014 Joe Perches <joe@perches.com> bonding: More use of ether_addr_copy

It's smaller and faster for some architectures.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
49f17de72174ceb340a96996e14058e8f6ff951d 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: rename last_arp_rx to last_rx

To reflect the new meaning.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8e603460fa388c89f08e359366ec9398971cf49c 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: trivial: rename slave->jiffies to ->last_link_up

slave->jiffies is updated every time the slave becomes active, which, for
bonding, means that its link is 'up'.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f8ff080dacecc5d0ce3579941b664bac3a4405c2 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: remove useless updating of slave->dev->last_rx

Now that all the logic is handled via last_arp_rx, we don't need to use
last_rx.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ff71529da40eee34458b1c3d62ca158a13e99b4d 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: use last_arp_rx in bond_loadbalance_arp_mon()

Now that last_arp_rx correctly show the last time we've received an ARP, we
can use it safely instead of slave->dev->last_rx.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f2cb691a7735d7903398aa914b7e567536ea98e4 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: use the new options to correctly set last_arp_rx

Now that the options are in place - arp_validate can be set to receive all
the traffic or only arp packets to verify if the slave is up, when the
slave isn't validated.

CC: Rob Landley <rob@landley.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Nikolay Aleksandrov <nikolay@redhat.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3fe68df97c7f132495664358e0bfbfcd4ca7809c 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: always set recv_probe to bond_arp_rcv in arp monitor

Currently we only set bond_arp_rcv() if we're using arp_validate, however
this makes us skip updating last_arp_rx if we're not validating incoming
ARPs - thus, if arp_validate is off, last_arp_rx will never be updated.

Fix this by always setting up recv_probe = bond_arp_rcv, even if we're not
using arp_validate.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6db4a54593ce12423c68155b7b59b9fbd3e6519d 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: always update last_arp_rx on packet recieve

Currently we're updating the last_arp_rx only when we've validate the
packet, however afterwards we use it as 'ANY last packet received', but not
only validated ARPs.

Fix this by updating it in case of any packet received. It won't break the
arp_validation=0 because we, anyway, return the correct slave->dev->last_rx in
slave_last_rx().

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13ac34a8866e31b31db6237c73aa558aff84d765 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: permit using arp_validate with non-ab modes

Currently it's disabled because it's sometimes hard, in typical configs, to
make it work - because of the nature how the loadbalance modes work - as
it's hard to deliver valid arp replies to correct slaves by the switch.

However we still can use arp_validation in loadbalance with several other
configs, per example with arp_validate == 2 for backup with one broadcast
domain, without the switch(es) doing any balancing - this way we'd be (a
bit more) sure that the slave is up.

So, enable it to let users decide which one works/suits them best. Also
correct the mode limitation from BOND_OPT_ARP_VALIDATE.

CC: Nikolay Aleksandrov <nikolay@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3b7d636b50fb38e4e0f1113ba7bff8dcd19d140f 18-Feb-2014 Veaceslav Falico <vfalico@redhat.com> bonding: remove bond->lock from bond_arp_rcv

We're always called with rcu_read_lock() held (bond_arp_rcv() is only
called from bond_handle_frame(), which is rx_handler and always called
under rcu from __netif_receive_skb_core() ).

The slave active/passive and/or bonding params can change in-flight, however
we don't really care about that - we only modify the last time packet was
received, which is harmless.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
99932d4fc03a13bb3e94938fe25458fabc8f2fc3 16-Feb-2014 Daniel Borkmann <dborkman@redhat.com> netdevice: add queue selection fallback handler for ndo_select_queue

Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ada0f8633c5b8dad640e1a2bcb95499ec187ac17 16-Feb-2014 Joe Perches <joe@perches.com> bonding: Convert memcpy(foo, bar, ETH_ALEN) to ether_addr_copy(foo, bar)

ether_addr_copy is smaller and faster for some architectures.

This relies on a stack frame being at least __aligned(2)
for one use of an Ethernet address on the stack.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
90194264ceffdff809e625f54767f6f8c292a28e 16-Feb-2014 Joe Perches <joe@perches.com> bonding: Neaten pr_<level>

Add missing terminating newlines.
Convert uses of pr_info to pr_cont in bond_check_params.
Standardize upper/lower case styles.
Typo fixes, remove unnecessary parentheses and periods.
Alignment neatening.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
91565ebbcc5aea69d4d6cb3832f52da03dbd44b6 16-Feb-2014 Joe Perches <joe@perches.com> bonding: Convert pr_warning to pr_warn, neatening

Use more current logging style.

Coalesce formats, realign arguments, drop unnecessary periods.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
805d157e8f4273eeafeeab53c3a5d295ac0d9208 12-Feb-2014 dingtianhong <dingtianhong@huawei.com> bonding: remove the redundant judgements for bond_set_mac_address()

The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
so no need to check it in bond_set_mac_address().

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
f80889a5b79cae0b84465a90c21b1273a03b7973 11-Feb-2014 dingtianhong <dingtianhong@huawei.com> bonding: Fix deadlock in bonding driver when using netpoll

The bonding driver take write locks and spin locks that are shared
by the tx path in enslave processing and notification processing,
If the netconsole is in use, the bonding can call printk which puts
us in the netpoll tx path, if the netconsole is attached to the bonding
driver, result in deadlock.

So add protection for these place, by checking the netpoll_block_tx
state, we can defer the sending of the netconsole frames until a later
time using the retransmit feature of netpoll_send_skb that is triggered
on the return code NETDEV_TX_BUSY.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b8790b5006b5ca3ee1c039c3c909833d7958716 10-Feb-2014 dingtianhong <dingtianhong@huawei.com> bonding: remove unwanted bond lock for enslave processing

The bond enslave processing don't hold bond->lock anymore,
so release an unlocked rw lock will cause warning message,
remove the unwanted read_unlock(&bond->lock).

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cc689aaa7abf33b2ccb02482e5e17885ea8903d1 25-Jan-2014 dingtianhong <dingtianhong@huawei.com> bonding: fail_over_mac should only affect AB mode in bond_set_mac_address()

The fail_over_mac could be set to active or follow in any time for all modes,
so if the fail_over_mac is not none and the current mode is not active-backup,
the bond_set_mac_address() could not change the master and slave's MAC address.

In bond_set_mac_address(), the fail_over_mac should only affect AB mode, so modify
to check the mode in addition to fail_over_mac when setting bond's MAC address.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
00503b6f702eaf23e7257d6287da72805d7d014c 25-Jan-2014 dingtianhong <dingtianhong@huawei.com> bonding: fail_over_mac should only affect AB mode at enslave and removal processing

According to bonding.txt, the fail_over_ma should only affect active-backup mode,
but I found that the fail_over_mac could be set to active or follow in all
modes, this will cause new slave could not be set to bond's MAC address at
enslave processing and restore its own MAC address at removal processing.

The correct way to fix the problem is that we should not add restrictions when
setting options, just need to modify the bond enslave and removal processing
to check the mode in addition to fail_over_mac when setting a slave's MAC during
enslavement. The change active slave processing already only calls the fail_over_mac
function when in active-backup mode.

Thanks for Jay's suggestion.

The patch also modify the pr_warning() to pr_warn().

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6fde8f037e604e05df1529e4689041715d6d55d2 28-Jan-2014 Ding Tianhong <dingtianhong@huawei.com> bonding: fix locking in bond_loadbalance_arp_mon()

The commit 1d3ee88ae0d605629bf369
(bonding: add netlink attributes to slave link dev)
has add rtmsg_ifinfo() in bond_set_active_slave() and
bond_set_backup_slave(), so the two function need to
called in RTNL lock, but bond_loadbalance_arp_mon()
only calling these functions in RCU, warning message
will occurs.

fix this by add a new function bond_slave_state_change(),
which will reset the slave's state after slave link check,
so remove the bond_set_xxx_slave() from the cycle and only
record the slave_state_changed, this will call the new
function to set all slaves to new state in RTNL later.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f2ebd477f141bc09b10fb8deb612a4d9b8999bba 27-Jan-2014 Veaceslav Falico <vfalico@redhat.com> bonding: restructure locking of bond_ab_arp_probe()

Currently we're calling it from under RCU context, however we're using some
functions that require rtnl to be held.

Fix this by restructuring the locking - don't call it under any locks,
aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active
slave present), and use rtnl locking otherwise - if we need to modify
(in)active flags of a slave.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
98b90f26651f9d84cfbb0221c9a3d9863c5bea69 27-Jan-2014 Veaceslav Falico <vfalico@redhat.com> bonding: RCUify bond_ab_arp_probe

Currently bond_ab_arp_probe() is always called under rcu_read_lock(),
however to work with curr_active_slave we're still holding the
curr_slave_lock.

To remove that curr_slave_lock - rcu_dereference the bond's
curr_active_slave and use it further - so that we're sure the slave won't
go away, and we don't care if it will change in the meanwhile.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f9399814927ad9bb995a6e109c2a5f9d8a848209 22-Jan-2014 Weilong Chen <chenweilong@huawei.com> bonding: Don't allow bond devices to change network namespaces.

Like bridge, bonding as netdevice doesn't cross netns boundaries.

Bonding ports and bonding itself live in same netns.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3bad540ed8285fb53f6365420bba0320d8cd2066 22-Jan-2014 Jiri Pirko <jiri@resnulli.us> bonding: convert netlink to use slave data info api

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
d1fbd3ed9366904b58b1c0c30b22d51dc793de99 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert active_slave to use the new option API

This patch adds the necessary changes so active_slave would use
the new bonding option API. Also some trivial/style fixes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
388d3a6d4aa356b885bcd023c185060df9ea2484 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert primary_reselect to use the new option API

This patch adds the necessary changes so primary_reselect would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b98d9c66e1c3823c50a3cd5e8e59f12b97d7ba5d 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert miimon to use the new option API

This patch adds the necessary changes so miimon would use
the new bonding option API. The "default" definition has been removed as
it was 0.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e5f5eebe765b340af0318dba261e5de0f2aaf32 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert ad_select to use the new option API

This patch adds the necessary changes so ad_select would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d3131de76b1b1a4d95f145846bd61f96e72f0411 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert lacp_rate to use the new option API

This patch adds the necessary changes so lacp_rate would use
the new bonding option API. Also some trivial/style error fixes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7bdb04ed0dbf9f0e94110be43db4f8bb7df58de2 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert arp_interval to use the new option API

This patch adds the necessary changes so arp_interval would use
the new bonding option API. The "default" definition has been removed as
it was 0.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1df6b6aa334c99b39f9366f4199b7f5e479a8899 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert fail_over_mac to use the new option API

This patch adds the necessary changes so fail_over_mac would use
the new bonding option API. Also fixes a trivial copy/paste error in
bond_check_params where the wrong variable was used for the error msg.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
edf36b24c58dbbd5f2e708096537bf0a88ffa477 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert arp_all_targets to use the new option API

This patch adds the necessary changes so arp_all_targets would use the
new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
162288810c9ebd2efb79ee6dc364e266044cac9e 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert arp_validate to use the new option API

This patch adds the necessary changes so arp_validate would use the
new bonding option API. Also fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a4b32ce7f891d507aa663bc78118ef267f0d6d4c 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert xmit_hash_policy to use the new option API

This patch adds the necessary changes so xmit_hash_policy would use the
new bonding option API. Also fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aa59d8517d1017e571b803ba6302c4b693b324ab 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert packets_per_slave to use the new option API

This patch adds the necessary changes so packets_per_slave would use the
new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2b3798d5e1377ce6c67993bb271754c9c5ab4833 22-Jan-2014 Nikolay Aleksandrov <nikolay@redhat.com> bonding: convert mode setting to use the new option API

This patch makes the bond's mode setting use the new option API and
adds support for dependency printing which relies on having an entry for
the mode option in the bond_opts[] array.
Also add the ability to print the mode name when mode dependency fails
and fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
809fa972fd90ff27225294b17a027e908b2d7b7a 22-Jan-2014 Hannes Frederic Sowa <hannes@stressinduktion.org> reciprocal_divide: update/correction of the algorithm

Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c4809fa5
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d8d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

int l = ceil(log_2 d)
uword m' = floor((1<<32)*((1<<l)-d)/d)+1
int sh_1 = min(l,1)
int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

uword t = (n*m')>>32
q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

[1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
[2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
[3] https://gmplib.org/~tege/division-paper.pdf
[4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
[5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
[6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1d3ee88ae0d605629bf369ab0b868dae8ca62a48 17-Jan-2014 sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com> bonding: add netlink attributes to slave link dev

If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest. Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.

Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes. Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
07699f9a7c8d1002e07011d5aa382cd63241eea8 17-Jan-2014 sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com> bonding: add sysfs /slave dir for bond slave devices.

Add sub-directory under /sys/class/net/<interface>/slave with
read-only attributes for slave. Directory only appears when
<interface> is a slave.

$ tree /sys/class/net/eth2/slave/
/sys/class/net/eth2/slave/
├── ad_aggregator_id
├── link_failure_count
├── mii_status
├── perm_hwaddr
├── queue_id
└── state

$ cat /sys/class/net/eth2/slave/*
2
0
up
40:02:10:ef:06:01
0
active

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3ec775b9fb950db19175ab7984a0d65fda1142b4 16-Jan-2014 Veaceslav Falico <vfalico@redhat.com> bonding: handle slave's name change with primary_slave logic

Currently, if a slave's name change, we just pass it by. However, if the
slave is a current primary_slave, then we end up with using a slave, whose
name != params.primary, for primary_slave. And vice-versa, if we don't have
a primary_slave but have params.primary set - we will not detected a new
primary_slave.

Fix this by catching the NETDEV_CHANGENAME event and setting primary_slave
accordingly. Also, if the primary_slave was changed, issue a reselection of
the active slave, cause the priorities have changed.

Reported-by: Ding Tianhong <dingtianhong@huawei.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0917b9334b1cc130e91a53b2e3bbaae760fc50ee 15-Jan-2014 Ying Xue <ying.xue@windriver.com> bonding: use __dev_get_by_name instead of dev_get_by_name to find interface

The following call chain indicates that bond_do_ioctl() is protected
under rtnl_lock. If we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handler in it, this would
help us avoid to change reference counter of interface once.

dev_ioctl()
rtnl_lock()
dev_ifsioc()
bond_do_ioctl()
rtnl_unlock()

Additionally we also change the coding style in bond_do_ioctl(),
letting it more readable for us.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f663dd9aaf9ed124f25f0f8452edf238f087ad50 10-Jan-2014 Jason Wang <jasowang@redhat.com> net: core: explicitly select a txq before doing l2 forwarding

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
instead of lower device which misses the necessary txq synchronization for
lower device such as txq stopping or frozen required by dev watchdog or
control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
when tso is disabled for lower device.

Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ec029fac3e96980fa8f6f81b8327787a9600dfaa 03-Jan-2014 sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com> bonding: add ad_select attribute netlink support

Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6da67d260873e157a8df28bc8b1b10d8e0cab099 30-Dec-2013 stephen hemminger <stephen@networkplumber.org> bonding: make more functions static

More functions in bonding that can be declared static because
they are only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
844223abe9f88b0e0cb7fb1f1f8ab04fe29806bc 02-Jan-2014 dingtianhong <dingtianhong@huawei.com> bonding: use ether_addr_equal_64bits to instead of ether_addr_equal

The net_device.dev_addr have more than 2 bytes of additional data after
the mac addr, so it is safe to use the ether_addr_equal_64bits().

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d316dedd4dae67e58976b695d3d0405a9db45246 02-Jan-2014 dingtianhong <dingtianhong@huawei.com> bonding: remove unwanted return value for bond_dev_queue_xmit()

The return value for bond_dev_queue_xmit() will not be used anymore,
so remove the return value.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3900f29021f0bc7fe9815aa32f1a993b7dfdd402 02-Jan-2014 dingtianhong <dingtianhong@huawei.com> bonding: slight optimizztion for bond_slave_override()

When the skb is xmit by the function bond_slave_override(),
it will have duplicate judgement for slave state, and I think it
will consumes a little performance, maybe it is negligible,
so I simplify the function and remove the unwanted judgement.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
834db4bcdf937b9f06122a633672ee5acd77085b 21-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: ust micro BOND_NO_USE_ARP to simplify the mode check

The bond 3ad and TLB/ALB has the same check path, so combine them.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3a7129e52766f015f0d4035ac9c7c9408829b9a1 21-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: add option lp_interval for loading module

The bond driver could set the lp_interval when loading module.

Suggested-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
da131ddbffae0d225f36e0651b8cf7014a576c0e 29-Dec-2013 stephen hemminger <stephen@networkplumber.org> bonding: make local function static

bond_xmit_slave_id is only used in main.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
f23691095b3c2c05b3ee4ab03cb8e90cfe245ea8 13-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: rebuild the bond_resend_igmp_join_requests_delayed()

The bond_resend_igmp_join_requests_delayed() and
bond_resend_igmp_join_requests() should be integrated,
because the bond_resend_igmp_join_requests_delayed() did
nothing except bond_resend_igmp_join_requests().

The bond igmp_retrans could only be changed in bond_change_active_slave
and here, bond_change_active_slave will be called in RTNL and curr_slave_lock,
the bond_resend_igmp_join_requests already hold RTNL, so no need
to free RTNL and hold curr_slave_lock again, it may be a small optimization,
so move the igmp_retrans in RTNL and remove the curr_slave_lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
be79bd048abe9fb6aee049ea903d1e70b44d6480 13-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: add RCU for bond_3ad_state_machine_handler()

The bond_3ad_state_machine_handler() use the bond lock to protect
the bond slave list and slave port together, but it is not enough,
the bond slave list was link and unlink in RTNL, not bond lock,
so I add RCU to protect the slave list from leaving.

The bond lock is still used here, because when the slave has been
removed from the list by the time the state machine runs, it appears
to be possible for both function to manupulate the same aggregator->lag_ports
by finding the aggregator via two different ports that are both members of
that aggregator (i.e., port A of the agg is being unbound, and port B
of the agg is runing its state machine).

If I remove the bond lock, there are nothing to mutex changes
to aggregator->lag_ports between bond_3ad_state_machine_handler and
bond_3ad_unbind_slave, So the bond lock is the simplest way to protect
aggregator->lag_ports.

There was a lot of function need RCU protect, I have two choice
to make the function in RCU-safe, (1) create new similar functions
and make the bond slave list in RCU. (2) modify the existed functions
and make them in read-side critical section, because the RCU
read-side critical sections may be nested.

I choose (2) because it is no need to create more similar functions.

The nots in the function is still too old, clean up the nots.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c8517035445834350b8d498723b68f4e81286110 13-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: remove unwanted lock for bond enslave and release

The bond_change_active_slave() and bond_select_active_slave()
do't need bond lock anymore, so remove the unwanted bond lock
for these two functions.

The bond_select_active_slave() will release and acquire
curr_slave_lock, so the curr_slave_lock need to protect
the function.

In bond enslave and bond release, the bond slave list is also
protected by RTNL, so bond lock is no need to exist, remove
the lock and clean the functions.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eb9fa4b0199f62df3d174d32b4bd534df8ba4533 13-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: rebuild the lock use for bond_activebackup_arp_mon()

The bond_activebackup_arp_mon() use the bond lock for read to
protect the slave list, it is no effect, and the RTNL is only
called for bond_ab_arp_commit() and peer notify, for the performance
better, use RCU to replace with the bond lock, to the bond slave
list need to called in RCU, add a new bond_first_slave_rcu()
to get the first slave in RCU protection.

In bond_ab_arp_probe(), the bond->current_arp_slave may changd
if bond release slave, just like:

bond_ab_arp_probe() bond_release()
cpu 0 cpu 1
...
if (bond->current_arp_slave...) ...
... bond->current_arp_slave = NULl
bond->current_arp_slave->dev->name ...

So the current_arp_slave need to dereference in the section.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2e52f4fe3655c7a2311070c6713f7feabc75486c 13-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: rebuild the lock use for bond_loadbalance_arp_mon()

The bond_loadbalance_arp_mon() use the bond lock to protect the
bond slave list, it is no effect, so I could use RTNL or RCU to
replace it, considering the performance impact, the RCU is more
better here, so the bond lock replace with the RCU.

The bond_select_active_slave() need RTNL and curr_slave_lock
together, but there is no RTNL lock here, so add a rtnl_rtylock.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4cb4f97b7e361745281e843499ba58691112d2f8 13-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: rebuild the lock use for bond_mii_monitor()

The bond_mii_monitor() still use bond lock to protect bond slave list,
it is no effect, I have 2 way to fix the problem, move the RTNL to the
top of the function, or add RCU to protect the bond slave list,
according to the Jay Vosburgh's opinion, 10 times one second is a
truely big performance loss if use RTNL to protect the whole monitor,
so I would take the advice and use RCU to protect the bond slave list.

The bond_has_slave() will not protect by anything, there will no things
happen if the slave list is be changed, unless the bond was free, but
it will not happened before the monitor, the bond will closed before
be freed.

The peers notify for the bond will calling curr_active_slave, so
derefence the slave to make sure we will accessing the same slave
if the curr_active_slave changed, as the rcu dereference need in
read-side critical sector and bond_change_active_slave() will call
it with no RCU hold, so add peer notify in rcu_read_lock which
will be nested in monitor.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b2e7aceb00b2621431e220748f7158b131b89e7b 13-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: remove the no effect lock for bond_select_active_slave()

The bond slave list was no longer protected by bond lock and only
protected by RTNL or RCU, so anywhere that use bond lock to protect
slave list is meaningless.

remove the release and acquire bond lock for bond_select_active_slave().

The curr_active_slave could only be changed in 3 place:

1. enslave slave.
2. release slave.
3. change_active_slave.

all above place were holding bond lock, RTNL and curr_slave_lock
together, it is tedious and meaningless, obviously bond lock is no
need here, but RTNL or curr_slave_lock is needed, so if you want
to access active slave, you have to choose one lock, RTNL or
curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock,
otherwise curr_slave_lock is better, because of the performance.

there are several place calling bond_select_active_slave() and
bond_change_active_slave(), the next step I will clean these place
and remove the no effect lock.

there are some document changed together when update the function.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
36708b89e00951b004b4dd6c14181301e2e98354 10-Dec-2013 Paul E. McKenney <paulmck@linux.vnet.ibm.com> bonding: Use RCU_INIT_POINTER() for better overhead and for sparse

Although rcu_assign_pointer() can be used to assign a constant
NULL pointer, doing so gets you an unnecessary memory barrier and
in some circumstances, sparse warnings. This commit therefore
changes the rcu_assign_pointer() of NULL in __bond_release_one() to
RCU_INIT_POINTER().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
89015c18ff34a39e4679ddd9b058e74d7dde28e7 04-Dec-2013 dingtianhong <dingtianhong@huawei.com> bonding: add arp_ip_target checks when install the module

When I install the bonding with the wrong arp_ip_target,
just like arp_ip_target=500.500.500.500, the arp_ip_target
was transfored to 245.245.245.244 and stored in the ip
target success, it is uncorrect, so I add checks to avoid
adding wrong address.

The in4_pton() will set wrong ip address to 0.0.0.0 and
return 0, also use the micro IS_IP_TARGET_UNUSABLE_ADDRESS
to simplify the code.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fe9d04afe9bee0ec37a9724937443b2c0e39ce4b 22-Nov-2013 dingtianhong <dingtianhong@huawei.com> bonding: disable arp and enable mii monitoring when bond change to no uses arp mode

Because the ARP monitoring is not support for 802.3ad, but I still
could change the mode to 802.3ad from ab mode while ARP monitoring
is running, it is incorrect.

So add a check for 802.3ad in bonding_store_mode to fix the problem,
and make a new macro BOND_NO_USES_ARP() to simplify the code.

v2: according to the Dan Williams's suggestion, bond mode is the most
important bond option, it should override any of the other sub-options.
So when the mode is changed, the conficting values should be cleared
or reset, otherwise the user has to duplicate more operations to modify
the logic. I disable the arp and enable mii monitoring when the bond mode
is changed to AB, TB and 8023AD if the arp interval is true.

v3: according to the Nik's suggestion, the default value of miimon should need
a name, there is several place to use it, and the bond_store_arp_interval()
could use micro BOND_NO_USES_ARP to make the code more simpify.

Suggested-by: Dan Williams <dcbw@redhat.com>
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
73958329ea1fe0dc149b51e5d8703015f65a03e0 05-Nov-2013 Nikolay Aleksandrov <nikolay@redhat.com> bonding: extend round-robin mode with packets_per_slave

This patch aims to extend round-robin mode with a new option called
packets_per_slave which can have the following values and effects:
0 - choose a random slave
1 (default) - standard round-robin, 1 packet per slave
>1 - round-robin when >1 packets have been transmitted per slave
The allowed values are between 0 and 65535.
This patch also fixes the comment style in bond_xmit_roundrobin().

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1f2cd845d3827412e82bf26dde0abca332ede402 28-Oct-2013 David S. Miller <davem@davemloft.net> Revert "Merge branch 'bonding_monitor_locking'"

This reverts commit 4d961a101e032b4bf223b279b4b35bc77576f5a8, reversing
changes made to a00f6fcc7d0c62a91768d9c4ccba4c7d64fbbce3.

Revert bond locking changes, they cause regressions and Veaceslav Falico
doesn't like how the commit messages were done at all.

Signed-off-by: David S. Miller <davem@davemloft.net>
80b9d236ec56ecc18da4a43bd79e8ec9ac5036ff 24-Oct-2013 dingtianhong <dingtianhong@huawei.com> bonding: remove bond read lock for bond_activebackup_arp_mon()

The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7f1bb571b753ac75b6548f0b7c932dfc0bb1f970 24-Oct-2013 dingtianhong <dingtianhong@huawei.com> bonding: remove bond read lock for bond_loadbalance_arp_mon()

The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and add the rtnl lock to protect the whole monitor function.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b6c526147bb00b5788a2f48463481dd30c29b71 24-Oct-2013 dingtianhong <dingtianhong@huawei.com> bonding: remove bond read lock for bond_mii_monitor()

The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7f29405403d7c17f539c099987972b862e7e5255 24-Oct-2013 Alexei Starovoitov <ast@plumgrid.com> net: fix rtnl notification in atomic context

commit 991fb3f74c "dev: always advertise rx_flags changes via netlink"
introduced rtnl notification from __dev_set_promiscuity(),
which can be called in atomic context.

Steps to reproduce:
ip tuntap add dev tap1 mode tap
ifconfig tap1 up
tcpdump -nei tap1 &
ip tuntap del dev tap1 mode tap

[ 271.627994] device tap1 left promiscuous mode
[ 271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940
[ 271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
[ 271.677525] INFO: lockdep is turned off.
[ 271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G W 3.12.0-rc3+ #73
[ 271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[ 271.731254] ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428
[ 271.760261] ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8
[ 271.790683] 0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8
[ 271.822332] Call Trace:
[ 271.838234] [<ffffffff817544e5>] dump_stack+0x55/0x76
[ 271.854446] [<ffffffff8108bad1>] __might_sleep+0x181/0x240
[ 271.870836] [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
[ 271.887076] [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
[ 271.903368] [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
[ 271.919716] [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
[ 271.936088] [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
[ 271.952504] [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
[ 271.968902] [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
[ 271.985302] [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
[ 272.001642] [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
[ 272.017917] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[ 272.033961] [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
[ 272.049855] [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
[ 272.065494] [<ffffffff81732052>] packet_notifier+0x1b2/0x380
[ 272.080915] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[ 272.096009] [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
[ 272.110803] [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
[ 272.125468] [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
[ 272.139984] [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
[ 272.154523] [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
[ 272.168552] [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
[ 272.182263] [<ffffffff81622641>] rollback_registered+0x31/0x40
[ 272.195369] [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
[ 272.208230] [<ffffffff81547ca0>] __tun_detach+0x140/0x340
[ 272.220686] [<ffffffff81547ed6>] tun_chr_close+0x36/0x60

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5378c2e6ea236de847a39bdb6f3aa83137120d26 21-Oct-2013 Veaceslav Falico <vfalico@redhat.com> bonding: move bond-specific init after enslave happens

As Jiri noted, currently we first do all bonding-specific initialization
(specifically - bond_select_active_slave(bond)) before we actually attach
the slave (so that it becomes visible through bond_for_each_slave() and
friends). This might result in bond_select_active_slave() not seeing the
first/new slave and, thus, not actually selecting an active slave.

Fix this by moving all the bond-related init part after we've actually
completely initialized and linked (via bond_master_upper_dev_link()) the
new slave.

Also, remove the bond_(de/a)ttach_slave(), it's useless to have functions
to ++/-- one int.

After this we have all the initialization of the new slave *before*
linking, and all the stuff that needs to be done on bonding *after* it. It
has also a bonus effect - we can remove the locking on the new slave init
completely, and only use it for bond_select_active_slave().

Reported-by: Jiri Pirko <jiri@resnulli.us>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong@huawei.com
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
080a06e1a9a5d2c55884d5fba8755d0af838cd5c 18-Oct-2013 Jiri Pirko <jiri@resnulli.us> bonding: remove bond_ioctl_change_active()

no longer needed since bond_option_active_slave_set() can be used
instead.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
0a2a78c4a95240e658272bd7cd7422a529e4eb4a 18-Oct-2013 Jiri Pirko <jiri@resnulli.us> bonding: push Netlink bits into separate file

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
32819dc1834866cb9547cb75f81af9edd58d33cd 02-Oct-2013 Nikolay Aleksandrov <nikolay@redhat.com> bonding: modify the old and add new xmit hash policies

This patch adds two new hash policy modes which use skb_flow_dissect:
3 - Encapsulated layer 2+3
4 - Encapsulated layer 3+4
There should be a good improvement for tunnel users in those modes.
It also changes the old hash functions to:
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);

Where hash will be initialized either to L2 hash, that is
SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
from the upper layer. Flow's dst and src are also extracted based on the
xmit policy either directly from the buffer or by using skb_flow_dissect,
but in both cases if the protocol is IPv6 then dst and src are obtained by
ipv6_addr_hash() on the real addresses. In case of a non-dissectable
packet, the algorithms fall back to L2 hashing.
The bond_set_mode_ops() function is now obsolete and thus deleted
because it was used only to set the proper hash policy. Also we trim a
pointer from struct bonding because we no longer need to keep the hash
function, now there's only a single hash function - bond_xmit_hash that
works based on bond->params.xmit_policy.

The hash function and skb_flow_dissect were suggested by Eric Dumazet.
The layer names were suggested by Andy Gospodarek, because I suck at
semantics.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b32418705107265dfca5edfe2b547643e53a732e 28-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: RCUify bond_set_rx_mode()

Currently we rely on rtnl locking in bond_set_rx_mode(), however it's not
always the case:

RTNL: assertion failed at drivers/net/bonding/bond_main.c (3391)
...
[<ffffffff81651ca5>] dump_stack+0x54/0x74
[<ffffffffa029e717>] bond_set_rx_mode+0xc7/0xd0 [bonding]
[<ffffffff81553af7>] __dev_set_rx_mode+0x57/0xa0
[<ffffffff81557ff8>] __dev_mc_add+0x58/0x70
[<ffffffff81558020>] dev_mc_add+0x10/0x20
[<ffffffff8161e26e>] igmp6_group_added+0x18e/0x1d0
[<ffffffff81186f76>] ? kmem_cache_alloc_trace+0x236/0x260
[<ffffffff8161f80f>] ipv6_dev_mc_inc+0x29f/0x320
[<ffffffff8161f9e7>] ipv6_sock_mc_join+0x157/0x260
...

Fix this by using RCU primitives.

Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a0068deb611109c5ba77358be533f763f395ee4 27-Sep-2013 Neil Horman <nhorman@tuxdriver.com> bonding: Fix broken promiscuity reference counting issue

Recently grabbed this report:
https://bugzilla.redhat.com/show_bug.cgi?id=1005567

Of an issue in which the bonding driver, with an attached vlan encountered the
following errors when bond0 was taken down and back up:

dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of
device might be broken.

The error occurs because, during __bond_release_one, if we release our last
slave, we take on a random mac address and issue a NETDEV_CHANGEADDR
notification. With an attached vlan, the vlan may see that the vlan and bond
mac address were in sync, but no longer are. This triggers a call to dev_uc_add
and dev_set_rx_mode, which enables IFF_PROMISC on the bond device. Then, when
we complete __bond_release_one, we use the current state of the bond flags to
determine if we should decrement the promiscuity of the releasing slave. But
since the bond changed promiscuity state during the release operation, we
incorrectly decrement the slave promisc count when it wasn't in promiscuous mode
to begin with, causing the above error

Fix is pretty simple, just cache the bonding flags at the start of the function
and use those when determining the need to set promiscuity.

This is also needed for the ALLMULTI flag

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Mark Wu <wudxw@linux.vnet.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
Reported-by: Mark Wu <wudxw@linux.vnet.ibm.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
23c147e026bbb41dd26a2bda0404a95ea951072f 27-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: correctly verify for the first slave in bond_enslave

After commit 1f718f0f4f97145f4072d2d72dcf85069ca7226d ("bonding: populate
neighbour's private on enslave"), we've moved the actual 'linking' in the
end of the function - so that, once linked, the slave is ready to be used,
and is not still in the process of enslaving.

However, 802.3ad verified if it's the first slave by looking at the

if (bond_first_slave(bond) == new_slave)

which, because we've moved the linking to the end, became broken - on the
first slave bond_first_slave(bond) returns NULL.

Fix this by verifying if the prev_slave, that equals bond_last_slave(), is
actually populated - if it is - then it's not the first slave, and vice
versa.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5831d66e8097aedfa3bc35941cf265ada2352317 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> net: create sysfs symlinks for neighbour devices

Also, remove the same functionality from bonding - it will be already done
for any device that links to its lower/upper neighbour.

The links will be created for dev's kobject, and will look like
lower_eth0 for lower device eth0 and upper_bridge0 for upper device
bridge0.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4fee991a4689ca64abea90814e0b3c6d30a3c62f 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: remove slave lists

And all the initialization.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c8c23903f12a62708606b5cdba8cd8550cd6bdcd 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: remove bond_prev_slave()

We don't really need it, and it's really hard to RCUify the list->prev.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0965a1f3f8757a2c20a16a83bc18279009d79a26 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: add bond_has_slaves() and use it

Currently we verify if we have slaves by checking if bond->slave_list is
empty. Create a define bond_has_slaves() and use it, a bit more readable
and easier to change in the future.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4087df87b868cef0dd19a9409b40fb9415503552 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: rework bond_ab_arp_probe() to use bond_for_each_slave()

Currently it uses the hard-to-rcuify bond_for_each_slave_from(), and also
it doesn't check every slave for disrepencies between the actual
IS_UP(slave) and the slave->link == BOND_LINK_UP, but only till we find the
next suitable slave.

Fix this by using bond_for_each_slave() and storing the first good slave in
*before till we find the current_arp_slave, after that we store the first good
slave in new_slave. If new_slave is empty - use the slave stored in before,
and if it's also empty - then we didn't find any suitable slave.

Also, in the meanwhile, check for each slave status.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
77140d2951432487d012dbcdcf124168eafc49ca 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: rework bond_find_best_slave() to use bond_for_each_slave()

bond_find_best_slave() does not have to be balanced - i.e. return the slave
that is *after* some other slave, but rather return the best slave that
suits, except of bond->primary_slave - in which case we just return it if
it's suitable.

After that we just look through all the slaves and return either first up
slave or the slave whose link came back earliest.

We also don't care about curr_active_slave lock cause we use it in
bond_should_change_active() only and there we take it right away - i.e. it
won't go away.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
544a028e65a9dadc13c3d12fb009b4bcd5338a9f 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: use bond_for_each_slave() in bond_uninit()

We're safe agains removal there, cause we use neighbours primitives.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9caff1e7b761c28018bf1858f6661439b4055f51 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: make bond_for_each_slave() use lower neighbour's private

It needs a list_head *iter, so add it wherever needed. Use both non-rcu and
rcu variants.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
81f23b13ac985e9a3cfb889c690695a8932e02c2 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: remove bond_for_each_slave_continue_reverse()

We only use it in rollback scenarios and can easily use the standart
bond_for_each_dev() instead.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1f718f0f4f97145f4072d2d72dcf85069ca7226d 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> bonding: populate neighbour's private on enslave

Use the new provided function when attaching the lower slave to populate
its ->private with struct slave *new_slave. Also, move it to the end to
be able to 'find' it only after it was completely initialized, and
deinitialize in the first place on release.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2f268f129c2d1a05d297fe3ee34d393f862d2b22 25-Sep-2013 Veaceslav Falico <vfalico@redhat.com> net: add adj_list to save only neighbours

Currently, we distinguish neighbours (first-level linked devices) from
non-neighbours by the neighbour bool in the netdev_adjacent. This could be
quite time-consuming in case we would like to traverse *only* through
neighbours - cause we'd have to traverse through all devices and check for
this flag, and in a (quite common) scenario where we have lots of vlans on
top of bridge, which is on top of a bond - the bonding would have to go
through all those vlans to get its upper neighbour linked devices.

This situation is really unpleasant, cause there are already a lot of cases
when a device with slaves needs to go through them in hot path.

To fix this, introduce a new upper/lower device lists structure -
adj_list, which contains only the neighbours. It works always in
pair with the all_adj_list structure (renamed from upper/lower_dev_list),
i.e. both of them contain the same links, only that all_adj_list contains
also non-neighbour device links. It's really a small change visible,
currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't
change the main linked logic at all.

Also, add some comments a fix a name collision in
netdev_for_each_upper_dev_rcu() and rework the naming by the following
rules:

netdev_(all_)(upper|lower)_*

If "all_" is present, then we work with the whole list of upper/lower
devices, otherwise - only with direct neighbours. Uninline functions - to
get better stack traces.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7eacd03810960823393521063734fc8188446bca 13-Sep-2013 Neil Horman <nhorman@tuxdriver.com> bonding: Make alb learning packet interval configurable

running bonding in ALB mode requires that learning packets be sent periodically,
so that the switch knows where to send responding traffic. However, depending
on switch configuration, there may not be any need to send traffic at the
default rate of 3 packets per second, which represents little more than wasted
data. Allow the ALB learning packet interval to be made configurable via sysfs

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Acked-by: Veaceslav Falico <vfalico@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5bb9e0b50d2188d8fac481742d9f801436e2c5ab 07-Sep-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: fix bond_arp_rcv setting and arp validate desync state

We make bond_arp_rcv global so it can be used in bond_sysfs if the bond
interface is up and arp_interval is being changed to a positive value
and cleared otherwise as per Jay's suggestion.
This also fixes a problem where bond_arp_rcv was set even though
arp_validate was disabled while the bond was up by unsetting recv_probe
in bond_store_arp_validate and respectively setting it if enabled.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c48268611a3df84a9250d2fc34ad671cdae43440 02-Sep-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: drop read_lock in bond_compute_features

bond_compute_features is always called with RTNL held, so we can safely
drop the read bond->lock.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9b7b165ac1adf5169f0ee03d107423ce7f5805d9 02-Sep-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: drop read_lock in bond_fix_features

We're protected by RTNL so nothing can happen and we can safely drop the
read bond->lock.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee8487c0e1aed52b534f9bf31d3934af4c50bf33 02-Sep-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: trivial: remove outdated comment and braces

We don't have to release all slaves when closing the bond dev, so remove
the outdated comment and the braces around the left single statement.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6c8d23f78646b20bdbdad476d015b6594ac8ff1c 02-Sep-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: simplify and fix peer notification

This patch aims to remove a use of the bond->lock for mutual exclusion
which will later allow easier migration to RCU of the users of this
functionality. We use RTNL as a synchronizing mechanism since it's
always held when send_peer_notif is set, and when it is decremented from
the notifier function. We can also drop some locking, and fix the
leakage of the send_peer_notif counter.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3e32582f7d5b6f364ddd31110bf7277d69fd4e91 28-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: pr_debug instead of pr_warn in bond_arp_send_all

They're simply annoying and will spam dmesg constantly if we hit them, so
convert to pr_debug so that we still can access them in case of debugging.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e868b0c938d9cc0d7ed4bd77d5c37676e833ed95 28-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: remove vlan_list/current_alb_vlan

Currently there are no real users of vlan_list/current_alb_vlan, only the
helpers which maintain them, so remove them.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a59d3d21ea7636d4cc7fb921104b9b4a59196839 28-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: use vlan_uses_dev() in __bond_release_one()

We always hold the rtnl_lock() in __bond_release_one(), so use
vlan_uses_dev() instead of bond_vlan_used().

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
50223ce4be70367ca5d8135bfd4c976e148bc491 28-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: convert bond_has_this_ip() to use upper devices

Currently, bond_has_this_ip() is aware only of vlan upper devices, and thus
will return false if the address is associated with the upper bridge or any
other device, and thus will break the arp logic.

Fix this by using the upper device list. For every upper device we verify
if the address associated with it is our address, and if yes - return true.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
27bc11e63888c7cb0bd6d443e98775254cf7dbdd 28-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: make bond_arp_send_all use upper device list

Currently, bond_arp_send_all() is aware only of vlans, which breaks
configurations like bond <- bridge (or any other 'upper' device) with IP
(which is quite a common scenario for virt setups).

To fix this we convert the bond_arp_send_all() to first verify if the rt
device is the bond itself, and if not - to go through its list of upper
vlans and their respectiv upper devices (if the vlan's upper device matches
- tag the packet), if still not found - go through all of our upper list
devices to see if any of them match the route device for the target. If the
match is a vlan device - we also save its vlan_id and tag it in
bond_arp_send().

Also, clean the function a bit to be more readable.

CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b8e2fde466f7902fed4ad10bb60c1377b27cbfb7 23-Aug-2013 Wei Yongjun <yongjun_wei@trendmicro.com.cn> bonding: fix error return code in bond_enslave()

Fix to return a negative error code in the add bond vlan ids error
handling case instead of 0, as done elsewhere in this function.

Introduced by commit 1ff412ad7714f6952f76ffd77f0a7f2f563288a1.
(bonding: change the bond's vlan syncing functions with the standard ones)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b20903f2a9ee5bd9ca80ad72867c8d137aaf4d62 06-Aug-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: unwind on bond_add_vlan failure

In case of bond_add_vlan() failure currently we'll have the vlan's
refcnt bumped up in all slaves, but it will never go down because it
failed to get added to the bond, so properly unwind the added vlan if
bond_add_vlan fails.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ff412ad7714f6952f76ffd77f0a7f2f563288a1 06-Aug-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: change the bond's vlan syncing functions with the standard ones

Now we have vlan_vids_add/del_by_dev() which serve the same purpose as
bond's bond_add/del_vlans_on_slave() with the good side effect of
reverting the changes if one of the additions fails.
There's only 1 change in the behaviour of enslave: if adding of the
vlans to the slave fails, we'll fail the enslaving because otherwise we
might delete some vlan that wasn't added by the bonding.
The only way this may happen is with ENOMEM currently, so we're in trouble
anyway.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3b380877d58604686c2526c19154d656c25d2953 02-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: modify only neigh_parms owned by us

Otherwise, on neighbour creation, bond_neigh_init() will be called with a
foreign netdev.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7864a1adf7291993d74923fdd0a45459ce9da27e 05-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: remove locking from bond_set_rx_mode()

We're already protected by RTNL lock, so nothing can happen to bond/its
slaves, and thus the locking is useless here (both bond->lock and
bond->curr_active_slave).

Also, add ASSERT_RTNL() both to bond_set_rx_mode() and bond_hw_addr_swap()
to catch possible uses of it without RTNL locking.

This patch also saves us from a lockdep false-positive in
bond_set_rx_mode() vs bond_hw_addr_swap().

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e7f63f1dc4bd643d9249c653e60c530d4a438147 03-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: add bond_time_in_interval() and use it for time comparison

Currently we use a lot of time comparison math for arp_interval
comparisons, which are sometimes quite hard to read and understand.

All the time comparisons have one pattern:
(time - arp_interval_jiffies) <= jiffies <= (time + mod *
arp_interval_jiffies + arp_interval_jiffies/2)

Introduce a new helper - bond_time_in_interval(), which will do the math in
one place and, thus, will clean up the logical code. This helper introduces
a bit of overhead (by always calculating the jiffies from arp_interval),
however it's really not visible, considering that functions using it
usually run once in arp_interval milliseconds.

There are several lines slightly over 80 chars, however breaking them would
result in more hard-to-read code than several character after the 80 mark.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
def4460cdb171d64309c927907c18c4efcb0204b 03-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: call slave_last_rx() only once per slave

Simple cleanup to not call slave_last_rx() on every time function. It won't
give any measurable boost - but looks cleaner and easier to understand.

There are no time-consuming functions in between these calls, so it's safe
to call it in the beginning only once.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9918d5bf329d0dc5bb2d9d293bcb772bdb626e65 02-Aug-2013 Veaceslav Falico <vfalico@redhat.com> bonding: modify only neigh_parms owned by us

Otherwise, on neighbour creation, bond_neigh_init() will be called with a
foreign netdev.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
278b20837511776dc9d5f6ee1c7fabd5479838bb 01-Aug-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: initial RCU conversion

This patch does the initial bonding conversion to RCU. After it the
following modes are protected by RCU alone: roundrobin, active-backup,
broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for
reading, and will be dealt with later. curr_active_slave needs to be
dereferenced via rcu in the converted modes because the only thing
protecting the slave after this patch is rcu_read_lock, so we need the
proper barrier for weakly ordered archs and to make sure we don't have
stale pointer. It's not tagged with __rcu yet because there's still work
to be done to remove the curr_slave_lock, so sparse will complain when
rcu_assign_pointer and rcu_dereference are used, but the alternative to use
rcu_dereference_protected would've created much bigger code churn which is
more difficult to test and review. That will be converted in time.

1. Active-backup mode
1.1 Perf recording while doing iperf -P 4
- old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU
in bonding
- new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU
in bonding
1.2. Bandwidth measurements
- old bonding: 16.1 gbps consistently
- new bonding: 17.5 gbps consistently

2. Round-robin mode
2.1 Perf recording while doing iperf -P 4
- old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU
in bonding
- new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU
in bonding
2.2 Bandwidth measurements
- old bonding: 8 gbps (variable due to packet reorderings)
- new bonding: 10 gbps (variable due to packet reorderings)

Of course the latency has improved in all converted modes, and moreover
while
doing enslave/release (since it doesn't affect tx anymore).

Also I've stress tested all modes doing enslave/release in a loop while
transmitting traffic.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15077228cab68e5e8c3cbf26a7f6ebacfac4c829 01-Aug-2013 Nikolay Aleksandrov <razor@BlackWall.org> bonding: factor out slave id tx code and simplify xmit paths

I factored out the tx xmit code which relies on slave id in
bond_xmit_slave_id. It is global because later it can be used also in
3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
mode is simplified because bond_dev_queue_xmit always consumes the skb.
bond_xmit_xor becomes one line because of bond_xmit_slave_id.
bond_for_each_slave_from is not used in bond_xmit_slave_id because later
when RCU is used we can avoid important race condition by using standard
rculist routines.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
78a646ced88450754613573f7d1fa7cb0de14bb3 01-Aug-2013 Nikolay Aleksandrov <razor@BlackWall.org> bonding: simplify broadcast_xmit function

We don't need to start from the curr_active_slave as the frame will be
sent to all eligible slaves anyway, so we remove the unnecessary local
variables, checks and comments, and make it use the standard list API.
This has the nice side-effect that later when it's converted to RCU
a race condition will be avoided which could lead to double packet tx.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
71bc3b2dc5d02566afe1b23a7b72bfc8acdd04e1 01-Aug-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: remove unnecessary read_locks of curr_slave_lock

In all the cases we already hold bond->lock for reading, so the slave
can't get away and the check != NULL is sufficient. curr_active_slave
can still change after the read_lock is unlocked prior to use of the
dereferenced value, so there's no need for it. It either contains a
valid slave which we use (and can't get away), or it is NULL which is
checked.
In some places the read_lock of curr_slave_lock was left because we need
it not to change while performing some action (e.g. syncing current
active slave's addresses, sending ARP requests through the active slave)
such cases will be dealt with individually while converting to RCU.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dec1e90e8c7157a527faad95023d96dbc114fbac 01-Aug-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: convert to list API and replace bond's custom list

This patch aims to remove struct bonding's first_slave and struct
slave's next and prev pointers, and replace them with the standard Linux
list API. The old macros are converted to list API as well and some new
primitives are available now. The checks if there're slaves that used
slave_cnt have been replaced by the list_empty macro.
Also a few small style fixes, changing longest -> shortest line in local
variable declarations, leaving an empty line before return and removing
unnecessary brackets.
This is the first step to gradual RCU conversion.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4beac0293fabb68125e1a9d2ce81d89343f8702d 01-Aug-2013 Nikolay Aleksandrov <nikolay@redhat.com> bonding: fix system hang due to fast igmp timer rescheduling

After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event")
we try to acquire rtnl in bond_resend_igmp_join_requests but it can be
scheduled with rtnl already held (e.g. when bond_change_active_slave is
called with rtnl) causing a loop of immediate reschedules + calls because
rtnl_trylock fails each time since it's being already held.
For me this issue leads to system hangs very easy:
modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod
bonding;

The fix is to introduce a small (1 jiffy) delay which is enough for the
sections holding rtnl to finish without putting any strain on the system.
Also adjust the timer in bond_change_active_slave to be 1 jiffy, since
most of the time it's called with rtnl already held.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dcfe8048de66c3468060c8a2ec2c04ae3725d002 27-Jul-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: remove bond_resend_igmp_join_requests read_unlock leftover

After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event") we
have 1 read_unlock in bond_resend_igmp_join_requests which isn't paired
with a read_lock because it's removed by that commit.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10eccb46b521359fa344f63459087df722f0776d 24-Jul-2013 stephen hemminger <stephen@networkplumber.org> bond: cleanup netpoll code

This started out with fixing a sparse warning, then I realized that
the wrapper function bond_netpoll_info could just be removed
by rolling it into the enable code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f52809483caceaf83bd2c7915a35ace3b6e7b0ef 24-Jul-2013 Wang Sheng-Hui <shhuiw@gmail.com> bonding: use pre-defined macro in bond_mode_name instead of magic number 0

We have BOND_MODE_ROUNDROBIN pre-defined as 0, and it's the lowest
mode number.
Use it to check the arg lower bound instead of magic number 0 in
bond_mode_name.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b07ea07bd0fa63bfb74dbd803ac3bb9e14dc630b 23-Jul-2013 dingtianhong <dingtianhong@huawei.com> bonding: Fixed up a error "do not initialise statics to 0 or NULL" in bond_main.c

The error is found by the checkpatch.pl tools.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
c4cdef9b7183159c23c7302aaf270d64c549f557 23-Jul-2013 dingtianhong <dingtianhong@huawei.com> bonding: don't call slave_xxx_netpoll under spinlocks

The slave_xxx_netpoll will call synchronize_rcu_bh(),
so the function may schedule and sleep, it should't be
called under spinlocks.

bond_netpoll_setup() and bond_netpoll_cleanup() are always
protected by rtnl lock, it is no need to take the read lock,
as the slave list couldn't be changed outside rtnl lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4aa5dee4d9997879adff858514844efab5a15a01 20-Jul-2013 Jiri Pirko <jiri@resnulli.us> net: convert resend IGMP to notifier event

Until now, bond_resend_igmp_join_requests() looks for vlans attached to
bonding device, bridge where bonding act as port manually. It does not
care of other scenarios, like stacked bonds or team device above. Make
this more generic and use netdev notifier to propagate the event to
upper devices and to actually call ip_mc_rejoin_groups().

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
008aebde9be37e7e1248332b1983976e354327ea 29-Jun-2013 Nikolay Aleksandrov <nikolay@redhat.com> bonding: combine pr_debugs in bond_set_dev_addr into one

Combine the multiple pr_debugs in bond_set_dev_addr into one pr_debug.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ae0d67505ca30c635f7763564622c9710913f293 26-Jun-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: when cloning a MAC use NET_ADDR_STOLEN

A simple semantic change, when a slave's MAC is cloned by the bond
master then set addr_assign_type to NET_ADDR_STOLEN instead of
NET_ADDR_SET. Also use bond_set_dev_addr() in BOND_FOM_ACTIVE mode
to change the bond's MAC address because the assign_type has to be
set properly.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
97a1e6396b07581249506952a4c417dc6d2a4f9c 26-Jun-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: remove unnecessary dev_addr_from_first member

In struct bonding there's a member called dev_addr_from_first which is
used to denote when the bond dev should clone the first slave's MAC
address but since we have netdev's addr_assign_type variable that is not
necessary. We clone the first slave's MAC each time we have a random MAC
set to the bond device. This has the nice side-effect of also fixing an
inconsistency - when the MAC address of the bond dev is set after its
creation, but prior to having slaves, it's not kept and the first slave's
MAC is cloned. The only way to keep the MAC was to create the bond device
with the MAC address set (e.g. through ip link). In all cases if the
bond device is left without any slaves - its MAC gets reset to a random
one as before.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d2ada77f8a7f8f65fcbf71b23cbac54b64151a6 26-Jun-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: remove unnecessary setup_by_slave member

We have a member called setup_by_slave in struct bonding to denote if the
bond dev has different type than ARPHRD_ETHER, but that is already denoted
in bond's netdev type variable if it was setup by the slave, so use that
instead of the member.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8599b52e14a1611dcb563289421bee76751f1d53 24-Jun-2013 Veaceslav Falico <vfalico@redhat.com> bonding: add an option to fail when any of arp_ip_target is inaccessible

Currently, we fail only when all of the ips in arp_ip_target are gone.
However, in some situations we might need to fail if even one host from
arp_ip_target becomes unavailable.

All situations, obviously, rely on the idea that we need *completely*
functional network, with all interfaces/addresses working correctly.

One real world example might be:
vlans on top on bond (hybrid port). If bond and vlans have ips assigned
and we have their peers monitored via arp_ip_target - in case of switch
misconfiguration (trunk/access port), slave driver malfunction or
tagged/untagged traffic dropped on the way - we will be able to switch
to another slave.

Though any other configuration needs that if we need to have access to all
arp_ip_targets.

This patch adds this possibility by adding a new parameter -
arp_all_targets (both as a module parameter and as a sysfs knob). It can be
set to:

0 or any (the default) - which works exactly as it's working now -
the slave is up if any of the arp_ip_targets are up.

1 or all - the slave is up if all of the arp_ip_targets are up.

This parameter can be changed on the fly (via sysfs), and requires the mode
to be active-backup and arp_validate to be enabled (it obeys the
arp_validate config on which slaves to validate).

Internally it's done through:

1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's
an array of jiffies, meaning that slave->target_last_arp_rx[i] is the
last time we've received arp from bond->params.arp_targets[i] on this
slave.

2) If we successfully validate an arp from bond->params.arp_targets[i] in
bond_validate_arp() - update the slave->target_last_arp_rx[i] with the
current jiffies value.

3) When getting slave's last_rx via slave_last_rx(), we return the oldest
time when we've received an arp from any address in
bond->params.arp_targets[].

If the value of arp_all_targets == 0 - we still work the same way as
before.

Also, update the documentation to reflect the new parameter.

v3->v4:
Kill the forgotten rtnl_unlock(), rephrase the documentation part to be
more clear, don't fail setting arp_all_targets if arp_validate is not set -
it has no effect anyway but can be easier to set up. Also, print a warning
if the last arp_ip_target is removed while the arp_interval is on, but not
the arp_validate.

v2->v3:
Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new
arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(),
use the same initialization value for target_last_arp_rx[] as is used
for the default last_arp_rx, to avoid useless interface flaps.

Also, instead of failing to remove the last arp_ip_target just print a
warning - otherwise it might break existing scripts.

v1->v2:
Correctly handle adding/removing hosts in arp_ip_target - we need to
shift/initialize all slave's target_last_arp_rx. Also, don't fail module
loading on arp_all_targets misconfiguration, just disable it, and some
minor style fixes.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aeea64ac717a920ea655b061e37b14fbc872f7db 24-Jun-2013 Veaceslav Falico <vfalico@redhat.com> bonding: don't trust arp requests unless active slave really works

Currently, if we receive any arp packet on a backup slave in active-backup
mode and arp_validate enabled, we suppose that it's an arp request, swap
source/target ip and try to validate it. This optimization gives us
virtually no downtime in the most common situation (active and backup
slaves are in the same broadcast domain and the active slave failed).

However, if we can't reach the arp_ip_target(s), we end up in an endless
loop of reselecting slaves, because we receive our arp requests, sent by
the active slave, and think that backup slaves are up, thus selecting them
as active and, again, sending arp requests, which fool our backup slaves.

Fix this by not validating the swapped arp packets if the current active
slave didn't receive any arp reply after it was selected as active. This
way we will only accept arp requests if we know that the current active
slave can actually reach arp_ip_target.

v3->v4:
Obey 80 lines and make checkpatch.pl happy, per Sergei's suggestion.

v1->v3:
No change.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2c14610210978512271dd6fe21d6f55b789d9a80 24-Jun-2013 Veaceslav Falico <vfalico@redhat.com> bonding: don't validate arp if we don't have to

Currently, we validate all the incoming arps if arp_validate not 0.
However, we don't have to validate backup slaves if arp_validate == active
and vice versa, so return early in bond_arp_rcv() in these cases.

It works correctly now because we verify arp_validate in slave_last_rx(),
however we're just doing useless work in bond_arp_rcv().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0afee4e8b9fe4b5f58734b2f28e980dd58d3e3cb 24-Jun-2013 Veaceslav Falico <vfalico@redhat.com> bonding: don't add duplicate targets to arp_ip_target

Print a warning and skip them.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
87a7b84b588c2ddbde890890855aef18ec34174e 24-Jun-2013 Veaceslav Falico <vfalico@redhat.com> bonding: add helper function bond_get_targets_ip(targets, ip)

Add function bond_get_targets_ip(targets, ip) which searches through
targets array of ips (arp_targets) and returns the position of first
match. If ip == 0, returns the first free slot. On failure to find the
ip or free slot, return -1.

Use it to verify if the arp we've received is valid and in sysfs.

v1->v2:
Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)",
per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might
update 'null' arp_ip_targets' last_rx. Also, address style.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
db4e9b2b98bac7adc7657ef94bb6d1a419a35571 20-Jun-2013 Nikolay Aleksandrov <nikolay@redhat.com> bonding: fix slave speed reporting in bond_miimon_commit

When we have BOND_LINK_UP the speed is reported unconditionally with %u
format although it can be SPEED_UNKNOWN (-1). After this patch it returns
0 in that case in an attempt to keep the existing scripts happy.
One line is intenionally left 81 chars because it gets ugly if broken.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4f5474e7fd68988cb11373fc698bf10b35b49e31 12-Jun-2013 Nikolay Aleksandrov <nikolay@redhat.com> bonding: fix igmp_retrans type and two related races

First the type of igmp_retrans (which is the actual counter of
igmp_resend parameter) is changed to u8 to be able to store values up
to 255 (as per documentation). There are two races that were hidden
there and which are easy to trigger after the previous fix, the first is
between bond_resend_igmp_join_requests and bond_change_active_slave
where igmp_retrans is set and can be altered by the periodic. The second
race condition is between multiple running instances of the periodic
(upon execution it can be scheduled again for immediate execution which
can cause the counter to go < 0 which in the unsigned case leads to
unnecessary igmp retransmissions).
Since in bond_change_active_slave bond->lock is held for reading and
curr_slave_lock for writing, we use curr_slave_lock for mutual
exclusion. We can't drop them as there're cases where RTNL is not held
when bond_change_active_slave is called. RCU is unlocked in
bond_resend_igmp_join_requests before getting curr_slave_lock since we
don't need it there and it's pointless to delay.
The decrement is moved inside the "if" block because if we decrement
unconditionally there's still a possibility for a race condition although
it is much more difficult to hit (many changes have to happen in
a very short period in order to trigger) which in the case of 3 parallel
running instances of this function and igmp_retrans == 1
(with check bond->igmp_retrans-- > 1) is:
f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
f3 does the unnecessary retransmissions.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b8fad459f9cc8417b74f71c6c229eef7412163d1 12-Jun-2013 Nikolay Aleksandrov <nikolay@redhat.com> bonding: reset master mac on first enslave failure

If the bond device is supposed to get the first slave's MAC address and
the first enslavement fails then we need to reset the master's MAC
otherwise it will stay the same as the failed slave device. We do it
after err_undo_flags since that is the first place where the MAC can be
changed and we check if it should've been the first slave and if the
bond's MAC was set to it because that err place is used by multiple
locations prior to changing the master's MAC address.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b5acd292336da029535de010af568533df9b665 31-May-2013 Jay Vosburgh <fubar@us.ibm.com> bonding: disallow change of MAC if fail_over_mac enabled

Currently, if fail_over_mac is set to active, then attempts to
change the MAC of the bond itself silently fail. However, if fail_over_mac
is set to follow, changes are permitted.

Permitting the bond's MAC to change with fail_over_mac=follow
will disrupt the follow functionality, which normally controls the
assignment of MAC address to the bond and its slaves, and can cause
multiple ports to be assigned the same MAC address. which will interfere
with the functioning of the device (where the device here is a
virtualization-aware card for s390, qeth).

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
303d1cbf610eef31e039efc2e8da30cf94cf5ebc 31-May-2013 Jay Vosburgh <fubar@us.ibm.com> bonding: Convert hw addr handling to sync/unsync, support ucast addresses

This patch converts bonding to use the dev_uc/mc_sync and
dev_uc/mc_sync_multiple functions for updating the hardware addresses
of bonding slaves.

The existing functions to add or remove addresses are removed,
and their functionality is replaced with calls to dev_mc_sync or
dev_mc_sync_multiple, depending upon the bonding mode.

Calls to dev_uc_sync and dev_uc_sync_multiple are also added,
so that unicast addresses added to a bond will be properly synced with
its slaves.

Various functions are renamed to better reflect the new
situation, and relevant comments are updated.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
351638e7deeed2ec8ce451b53d33921b3da68f83 28-May-2013 Jiri Pirko <jiri@resnulli.us> net: pass info struct via netdevice notifier

So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
5a5c5fd48e3bcd57572e9a7a4964ed8f38a20b87 18-May-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: arp_ip_count and arp_targets can be wrong

When getting arp_ip_targets if we encounter a bad IP, arp_ip_count still
gets increased and all the targets after the wrong one will not be probed
if arp_interval is enabled after that (unless a new IP target is added
through sysfs) because of the zero entry, in this case reading
arp_ip_target through sysfs will show valid targets even if there's a
zero entry.
Example: 1.2.3.4,4.5.6.7,blah,5.6.7.8
When retrieving the list from arp_ip_target the output would be:
1.2.3.4,4.5.6.7,5.6.7.8
but there will be a 0 entry between 4.5.6.7 and 5.6.7.8. If arp_interval
is enabled after that 5.6.7.8 will never be checked because of that.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
acca2674a71816c5c9d0caa81fecd33b491fd68f 18-May-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: replace %x with %pI4 for IPv4 addresses

There're few pr_debug() places that can provide the IPv4 address in
dotted decimal format instead which is more helpful.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b0ce3508b25ea6fa10ae3ca254de1d695b521702 16-May-2013 Eric Dumazet <edumazet@google.com> bonding: allow TSO being set on bonding master

In some situations, we need to disable TSO on bonding slaves.

bonding device automatically unset TSO in bond_fix_features(), and
performance is not good because :

1) We consume more cpu cycles.

2) GSO segmentation has some bugs leading to out of order TCP packets
if this segmentation is done before virtual device. This particular
problem will be addressed in a separate patch.

This patch allows TSO being set/unset on the bonding master,
so that GSO segmentation is done after bonding layer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michał Mirosław <mirqus@gmail.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c6cdcf6d82bc8f53e64ad59464e0114fe48e28bb 22-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: fix locking in enslave failure path

In commit 3c5913b53fefc9d9e15a2d0f93042766658d9f3f ("bonding:
primary_slave & curr_active_slave are not cleaned on enslave failure")
I didn't account for the use of curr_active_slave without curr_slave_lock
and since there are such users, we should hold bond->lock for writing while
setting it to NULL (in the NULL case we don't need the curr_slave_lock).
Keeping the bond lock as to avoid the extra release/acquire cycle.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d632ce989c811863a1ce9b1c53dae2cf06b35c49 18-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: in bond_mc_swap() bond's mc addr list is walked without lock

Use netif_addr_lock_bh() to acquire the appropriate lock before walking.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fc7a72ac86e2956dd405b0c604fea45a2702f567 18-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: disable netpoll on enslave failure

slave_disable_netpoll() is not called upon enslave failure which would
lead to a memory leak. Call slave_disable_netpoll() after err_detach as
that's the first error path after enabling netpoll on that slave.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3c5913b53fefc9d9e15a2d0f93042766658d9f3f 18-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: primary_slave & curr_active_slave are not cleaned on enslave failure

On enslave failure primary_slave can point to new_slave which is to be
freed, and the same applies to curr_active_slave. So check if this is
the case and clean up properly after err_detach because that's the first
error code path after they're set.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a506e7b479e1215c230e4b87fedc246cf748537f 18-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: vlans don't get deleted on enslave failure

The main problem is with vid refcount which only gets bumped up.
Delete the vlans after err_detach as that's the first error path
after the vlans are added.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
25e40305d4f4399bc8ecf9c9b7cf43493bb40bbd 18-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: mc addresses don't get deleted on enslave failure

Add bond_mc_list_flush() after err_detach as that's the first error path
after the addresses are added. The main issue is the mc addresses' refcount
which only gets bumped up.

v2: update log message and don't move code unnecessarily

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bb5b052f751b309b5181686741c724a66c5cb15a 16-Apr-2013 Andy Gospodarek <andy@greyhouse.net> bond: add support to read speed and duplex via ethtool

This patch adds support for the get_settings ethtool op to the bonding
driver. This was motivated by users who wanted to get the speed of the
bond and compare that against throughput to understand utilization.
The behavior before this patch was added was problematic when computing
line utilization after trying to get link-speed and throughput via SNMP.

Output from ethtool looks like this for a round-robin bond:

Settings for bond0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 11000Mb/s
Duplex: Full
Port: Other
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
MDI-X: Unknown
Link detected: yes

I tested this and verified it works as expected. A test was also done
on a version backported to an older kernel and it worked well there.

v2: Switch to using ethtool_cmd_speed_set to set speed, added check to
SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations
as they were not needed, and set port type to 'Other.'

v3: Fix useless assignment and checkpatch warning.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1fd9b1fc310314911f66d2f14a8e4f0ef37bf47b 19-Apr-2013 Patrick McHardy <kaber@trash.net> net: vlan: prepare for 802.1ad support

Make the encapsulation protocol value a property of VLAN devices and change
the device lookup functions to take the protocol value into account.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
80d5c3689b886308247da295a228a54df49a44f6 19-Apr-2013 Patrick McHardy <kaber@trash.net> net: vlan: prepare for 802.1ad VLAN filtering offload

Change the rx_{add,kill}_vid callbacks to take a protocol argument in
preparation of 802.1ad support. The protocol argument used so far is
always htons(ETH_P_8021Q).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
f646968f8f7c624587de729115d802372b9063dd 19-Apr-2013 Patrick McHardy <kaber@trash.net> net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*

Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4394542ca4ec9f28c3c8405063d200b1e7c347d7 15-Apr-2013 Eric Dumazet <edumazet@google.com> bonding: fix l23 and l34 load balancing in forwarding path

Since commit 6b923cb7188d46 (bonding: support for IPv6 transmit hashing)
bonding doesn't properly hash traffic in forwarding setups.

Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in
this case.

More generally, the transport header might not be in the skb head.

Use pskb_may_pull() & skb_header_pointer() to get it right, and use
proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for
more protocols than TCP and UDP.

Reported-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: John Eaglesham <linux@8192.net>
Tested-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
b6a5a7b9a528a8b4c8bec940b607c5dd9102b8cc 11-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: IFF_BONDING is not stripped on enslave failure

While enslaving a new device and after IFF_BONDING flag is set, in case
of failure it is not stripped from the device's priv_flags while
cleaning up, which could lead to other problems.
Cleaning at err_close because the flag is set after dev_open().

v2: no change

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6101391d4a381cc0c661d8765235b3cad7da09e5 11-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: fix netdev event NULL pointer dereference

In commit 471cb5a33dcbd7c529684a2ac7ba4451414ee4a7 ("bonding: remove
usage of dev->master") a bug was introduced which causes a NULL pointer
dereference. If a bond device is in mode 6 (ALB) and a slave is added
it will dereference a NULL pointer in bond_slave_netdev_event().
This is because in bond_enslave we have bond_alb_init_slave() which
changes the MAC address of the slave and causes a NETDEV_CHANGEADDR.
Then we have in bond_slave_netdev_event():
struct slave *slave = bond_slave_get_rtnl(slave_dev);
struct bonding *bond = slave->bond;
bond_slave_get_rtnl() dereferences slave_dev->rx_handler_data which at
that time is NULL since netdev_rx_handler_register() is called later.

This is fixed by checking if slave is NULL before dereferencing it.

v2: Comment style changed.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
69b0216ac255f523556fa3d4ff030d857eaaa37f 06-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: fix bonding_masters race condition in bond unloading

While the bonding module is unloading, it is considered that after
rtnl_link_unregister all bond devices are destroyed but since no
synchronization mechanism exists, a new bond device can be created
via bonding_masters before unregister_pernet_subsys which would
lead to multiple problems (e.g. NULL pointer dereference, wrong RIP,
list corruption).

This patch fixes the issue by removing any bond devices left in the
netns after bonding_masters is removed from sysfs.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ffcdedb667b6db8ee31c7efa76a3ec59d9c3b0fc 06-Apr-2013 nikolay@redhat.com <nikolay@redhat.com> Revert "bonding: remove sysfs before removing devices"

This reverts commit 4de79c737b200492195ebc54a887075327e1ec1d.

This patch introduces a new bug which causes access to freed memory.
In bond_uninit: list_del(&bond->bond_list);
bond_list is linked in bond_net's dev_list which is freed by
unregister_pernet_subsys.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4de79c737b200492195ebc54a887075327e1ec1d 03-Apr-2013 Veaceslav Falico <vfalico@redhat.com> bonding: remove sysfs before removing devices

We have a race condition if we try to rmmod bonding and simultaneously add
a bond master through sysfs. In bonding_exit() we first remove the devices
(through rtnl_link_unregister() ) and only after that we remove the sysfs.
If we manage to add a device through sysfs after that the devices were
removed - we'll end up with that device/sysfs structure and with the module
unloaded.

Fix this by first removing the sysfs and only after that calling
rtnl_link_unregister().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fcd99434fb5c137274d2e15dd2a6a7455f0f29ff 02-Apr-2013 Veaceslav Falico <vfalico@redhat.com> bonding: get netdev_rx_handler_unregister out of locks

Now that netdev_rx_handler_unregister contains synchronize_net(), we need
to call it outside of bond->lock, cause it might sleep. Also, remove the
already unneded synchronize_net().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ad999eee669d6a0439f5b9734e87eed50e776e32 26-Mar-2013 Veaceslav Falico <vfalico@redhat.com> bonding: cleanup unneeded rcu_read_lock()

bond_resend_igmp_join_requests_delayed() calls _resend_igmp_join_requests()
under rcu_read_lock(), while it gets its own rcu_read_lock() for the whole
function. Remove the lock from the _delayed function.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
876254ae2758d50dcb08c7bd00caf6a806571178 12-Mar-2013 Veaceslav Falico <vfalico@redhat.com> bonding: don't call update_speed_duplex() under spinlocks

bond_update_speed_duplex() might sleep while calling underlying slave's
routines. Move it out of atomic context in bond_enslave() and remove it
from bond_miimon_commit() - it was introduced by commit 546add79, however
when the slave interfaces go up/change state it's their responsibility to
fire NETDEV_UP/NETDEV_CHANGE events so that bonding can properly update
their speed.

I've tested it on all combinations of ifup/ifdown, autoneg/speed/duplex
changes, remote-controlled and local, on (not) MII-based cards. All changes
are visible.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
80028ea1c0afc24d4ddeb8dd2a9992fff03616ca 06-Mar-2013 Veaceslav Falico <vfalico@redhat.com> bonding: fire NETDEV_RELEASE event only on 0 slaves

Currently, if we set up netconsole over bonding and release a slave,
netconsole will stop logging on the whole bonding device. Change the
behavior to stop the netconsole only when the last slave is released.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6c8c4e4c24b9f6cee3d356a51e4a7f2af787a49b 25-Feb-2013 Jiri Pirko <jiri@resnulli.us> bond: check if slave count is 0 in case when deciding to take slave's mac

in bond_enslave(), check slave_cnt before actually using slave address.

introduced by:
commit 409cc1f8a41 (bond: have random dev address by default instead of zeroes)

Reported-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
b3f92b63c4c793142751d029ae0037fb8ab403a0 18-Feb-2013 Doug Goldstein <cardoe@cardoe.com> bonding: set sysfs device_type to 'bond'

Sets the sysfs device_type to 'bond' for udev. This allows udev rules to
be created for bond devices. This is similar to how other network
devices set their device_type.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0896341a44bf04bf6149d9307fe4686006f3eee1 18-Feb-2013 nikolay@redhat.com <nikolay@redhat.com> bonding: fix bond_release_all inconsistencies

This patch fixes the following inconsistencies in bond_release_all:
- IFF_BONDING flag is not stripped from slaves
- MTU is not restored
- no netdev notifiers are sent
Instead of trying to keep bond_release and bond_release_all in sync
I think we can re-use bond_release as the environment for calling it
is correct (RTNL is held). I have been running tests for the past
week and they came out successful. The only way for bond_release to fail
is for the slave to be attached in a different bond or to not be a slave
but that cannot happen as RTNL is held and no slave manipulations can be
achieved.

V2: As suggested bond_release is renamed to __bond_release_one with a
new parameter "all" introduced so to avoid calling unnecessary code while
destroying a bond, and a wrapper for it called bond_release is created
because of ndo_del_link. bond_release_all() is removed.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2cde6acd49daca58b96f1fbc697492825511ad31 11-Feb-2013 Neil Horman <nhorman@tuxdriver.com> netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock

__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
already held. The mechanism is used to asynchronously call __netpoll_cleanup
outside of the holding of the rtnl_lock, so as to avoid deadlock.
Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
rtnl_lock must be held while calling it. Further, it cannot be held, because
rcu callbacks may be issued in softirq contexts, which cannot sleep.

Fix this by converting the rcu callback to a work queue that is guaranteed to
get scheduled in process context, so that we can hold the rtnl properly while
calling __netpoll_cleanup

Tested successfully by myself.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Cong Wang <amwang@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
387ff91184fc03e4b8c5818e8ef2928434bcee6d 31-Jan-2013 Gao feng <gaofeng@cn.fujitsu.com> netns: bond: allow unprivileged users to control bond device

reduce the permission check of bond device's ioctl.
allow the userns root to control the bond device.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
409cc1f8a4149c26bbb8e5d3bacb36541ad371e2 30-Jan-2013 Jiri Pirko <jiri@resnulli.us> bond: have random dev address by default instead of zeroes

Makes more sense to have randomly generated address by default than to
have all zeroes. It also allows user to for example put the bond into
bridge without need to have any slaves in it.

Also note that this changes only behaviour of bonds with no slaves. Once
the first slave device is enslaved, its address will be used (no change
here).

Also, fix dev_assign_type values on the way.

Reported-by: Pavel Šimerda <psimerda@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7826d43f2db45c9305a6e0ba165650e1a203f517 06-Jan-2013 Jiri Pirko <jiri@resnulli.us> ethtool: fix drvinfo strings set in drivers

Use strlcpy where possible to ensure the string is \0 terminated.
Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN
and custom defines.
Use snprintf instead of sprint.
Remove unnecessary inits of ->fw_version
Remove unnecessary inits of drvinfo struct.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
471cb5a33dcbd7c529684a2ac7ba4451414ee4a7 03-Jan-2013 Jiri Pirko <jiri@resnulli.us> bonding: remove usage of dev->master

Benefit from new upper dev list and free bonding from dev->master usage.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
cfb6f99dd9629ec7759b78cff51d9bf7eedf105a 14-Dec-2012 Konstantin Khlebnikov <khlebnikov@openvz.org> bonding: do not cancel works in bond_uninit()

Bonding initializes these works in bond_open() and cancels in bond_close(),
thus in bond_uninit() they are already canceled but may be unitialized yet.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
c772dde343917b0961f75227cfb8c2b2f2b31d24 07-Dec-2012 Ben Hutchings <bhutchings@solarflare.com> bonding: Fix check for ethtool get_link operation support

Since commit 2c60db037034 ('net: provide a default dev->ethtool_ops')
all devices have a non-null ethtool_ops. Test only
dev->ethtool_ops->get_link in both places where we care.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
90fb6250c509cabd425b7ae4524053dba2e27e2c 29-Nov-2012 nikolay@redhat.com <nikolay@redhat.com> bonding: make arp_ip_target parameter checks consistent with sysfs

The module can be loaded with arp_ip_target="255.255.255.255" which makes
it impossible to remove as the function in sysfs checks for that value,
so we make the parameter checks consistent with sysfs.

v2: Fix formatting
v3: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fbb0c41b814d497c656fc7be9e35456f139cb2fb 29-Nov-2012 nikolay@redhat.com <nikolay@redhat.com> bonding: fix miimon and arp_interval delayed work race conditions

First I would give three observations which will be used later.
Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq)
This usage is wrong because the pending bit is cleared just before the
work's fn is executed and if the function re-arms itself we might end up
with the work still running. It's safe to call cancel_delayed_work_sync()
even if the work is not queued at all.
Observation 2: Use of INIT_DELAYED_WORK()
Work needs to be initialized only once prior to (de/en)queueing.
Observation 3: IFF_UP is set only after ndo_open is called

Related race conditions:
1. Race between bonding_store_miimon() and bonding_store_arp_interval()
Because of Obs.1 we can end up having both works enqueued.
2. Multiple races with INIT_DELAYED_WORK()
Since the works are not protected by anything between INIT_DELAYED_WORK()
and calls to (en/de)queue it is possible for races between the following
functions:
(races are also possible between the calls to INIT_DELAYED_WORK()
and workqueue code)
bonding_store_miimon() - bonding_store_arp_interval(), bond_close(),
bond_open(), enqueued functions
bonding_store_arp_interval() - bonding_store_miimon(), bond_close(),
bond_open(), enqueued functions
3. By Obs.1 we need to change bond_cancel_all()

Bugs 1 and 2 are fixed by moving all work initializations in bond_open
which by Obs. 2 and Obs. 3 and the fact that we make sure that all works
are cancelled in bond_close(), is guaranteed not to have any work
enqueued.
Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so
they can't race with bond_close and bond_open. The opposing work is
cancelled only if the IFF_UP flag is set and it is cancelled
unconditionally. The opposing work is already cancelled if the interface
is down so no need to cancel it again. This way we don't need new
synchronizations for the bonding workqueue. These bugs (and fixes) are
tied together and belong in the same patch.
Note: I have left 1 line intentionally over 80 characters (84) because I
didn't like how it looks broken down. If you'd prefer it otherwise,
then simply break it.

v2: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e591b93d5edb3c126a7257c6e29d978a82656d9 22-Nov-2012 Michal Kubeček <mkubecek@suse.cz> bonding: in balance-rr mode, set curr_active_slave only if it is up

If all slaves of a balance-rr bond with ARP monitor are enslaved
with down link state, bond keeps down state even after slaves
go up.

This is caused by bond_enslave() setting curr_active_slave to
first slave not taking into account its link state. As
bond_loadbalance_arp_mon() uses curr_active_slave to identify
whether slave's down->up transition should update bond's link
state, bond stays down even if slaves are up (until first slave
goes from up to down at least once).

Before commit f31c7937 "bonding: start slaves with link down for
ARP monitor", this was masked by slaves always starting in UP
state with ARP monitor (and MII monitor not relying on
curr_active_slave being NULL if there is no slave up).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0e376bd0b791ac6ac6bdb051492df0769c840848 21-Nov-2012 Sarveshwar Bandi <sarveshwar.bandi@emulex.com> bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices.

Patch sets the lowest gso_max_size and gso_max_segs values of the slave devices during enslave and detach.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
55462cf30ad9768fff6a6d36db21879146a39bdf 14-Oct-2012 Jiri Pirko <jiri@resnulli.us> vlan: fix bond/team enslave of vlan challenged slave/port

In vlan_uses_dev() check for number of vlan devs rather than existence
of vlan_info. The reason is that vlan id 0 is there without appropriate
vlan dev on it by default which prevented from enslaving vlan challenged
dev.

Reported-by: Jon Stanley <jstanley@rmrf.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
49ee49202b4ac4be95d05e4bf24a9ac8b54c5528 04-Oct-2012 Eric Dumazet <edumazet@google.com> bonding: set qdisc_tx_busylock to avoid LOCKDEP splat

If a qdisc is installed on a bonding device, its possible to get
following lockdep splat under stress :

=============================================
[ INFO: possible recursive locking detected ]
3.6.0+ #211 Not tainted
---------------------------------------------
ping/4876 is trying to acquire lock:
(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830

but task is already holding lock:
(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);

*** DEADLOCK ***

May be due to missing lock nesting notation

6 locks held by ping/4876:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30
#1: (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870
#2: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830
#3: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
#4: (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding]
#5: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830

stack backtrace:
Pid: 4876, comm: ping Not tainted 3.6.0+ #211
Call Trace:
[<ffffffff810a0145>] __lock_acquire+0x715/0x1b80
[<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100
[<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0
[<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
[<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50
[<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
[<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90
[<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
[<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
[<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding]
[<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0
[<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0
[<ffffffff8157a249>] dev_queue_xmit+0x199/0x830
[<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
[<ffffffff815ba96f>] ip_finish_output+0x5df/0x870
[<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870
[<ffffffff815bb964>] ip_output+0x54/0xf0
[<ffffffff815bad48>] ip_local_out+0x28/0x90
[<ffffffff815bc444>] ip_send_skb+0x14/0x50
[<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40
[<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30
[<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0
[<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
[<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
[<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
[<ffffffff815f6855>] inet_sendmsg+0x125/0x240
[<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
[<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0
[<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
[<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
[<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390
[<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0
[<ffffffff811958e8>] ? fsnotify+0x88/0x7e0
[<ffffffff81361f36>] ? put_ldisc+0x56/0xd0
[<ffffffff8116f98a>] ? fget_light+0x3da/0x510
[<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80
[<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b

Avoid this problem using a distinct lock_class_key for bonding
devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
da210f559019ba1cd4ebee2a28ad158bfb95bab2 30-Aug-2012 Jiri Bohac <jbohac@suse.cz> bonding: add some slack to arp monitoring time limits

Currently, all the time limits in the bonding ARP monitor are in
multiples of arp_interval -- the time interval at which the ARP
monitor is periodically scheduled.

With a fast network round-trip and a little scheduling latency
of the ARP monitor work, a limit of n*delta_in_ticks may
effectively mean (n-1)*delta_in_ticks.

This is fatal in case of n==1 (the link will stay down
forever) and makes the behaviour non-deterministic in all the
other cases.

Add a delta_in_ticks/2 time slack to all the time limits.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b923cb7188d46905f43fa84210c4c3e5f9cd8fb 21-Aug-2012 John Eaglesham <linux@8192.net> bonding: support for IPv6 transmit hashing

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

Tested-by: John Eaglesham <linux@8192.net>
Signed-off-by: John Eaglesham <linux@8192.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e15c3c2294605f09f9b336b2f3b97086ab4b8145 10-Aug-2012 Amerigo Wang <amwang@redhat.com> netpoll: check netpoll tx status on the right device

Although this doesn't matter actually, because netpoll_tx_running()
doesn't use the parameter, the code will be more readable.

For team_dev_queue_xmit() we have to move it down to avoid
compile errors.

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
38e6bc185d9544dfad1774b3f8902a0b061aea25 10-Aug-2012 Amerigo Wang <amwang@redhat.com> netpoll: make __netpoll_cleanup non-block

Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup()
may be called with read_lock() held too, so we should make them
non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af 10-Aug-2012 Amerigo Wang <amwang@redhat.com> netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

slave_enable_netpoll() and __netpoll_setup() may be called
with read_lock() held, so should use GFP_ATOMIC to allocate
memory. Eric suggested to pass gfp flags to __netpoll_setup().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b7bc2a5b5bd99b216c3e5fe68c7f45c684ab5745 10-Aug-2012 Amerigo Wang <amwang@redhat.com> net: remove netdev_bonding_change()

I don't see any benifits to use netdev_bonding_change() than
using call_netdevice_notifiers() directly.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
df4ab5b3c295050da09153fa9760042e4de3ffff 20-Jul-2012 Jiri Pirko <jiri@resnulli.us> net: rename bond_queue_mapping to slave_dev_queue_mapping

As this is going to be used not only by bonding.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
d40156aa5ecbd51fed932ed4813df82b56e5ff4d 20-Jul-2012 Jiri Pirko <jiri@resnulli.us> rtnl: allow to specify different num for rx and tx queue count

Also cut out unused function parameters and possible err in return
value.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
b6fe83e9525a03b3141e5857eb7d8af219db94e5 17-Jul-2012 Eric Dumazet <edumazet@google.com> bonding: refine IFF_XMIT_DST_RELEASE capability

Some workloads greatly benefit of IFF_XMIT_DST_RELEASE capability
on output net device, avoiding dirtying dst refcount.

bonding currently disables IFF_XMIT_DST_RELEASE unconditionally.

If all slaves have the IFF_XMIT_DST_RELEASE bit set, then
bonding master can also have it in its priv_flags

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
30fdd8a082a00126a6feec994e43e8dc12f5bccb 17-Jul-2012 Jiri Pirko <jiri@resnulli.us> netpoll: move np->dev and np->dev_name init into __netpoll_setup()

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
a64d49c3dd504b685f9742a2f3dcb11fb8e4345f 09-Jul-2012 Eric W. Biederman <ebiederm@xmission.com> bonding: Manage /proc/net/bonding/ entries from the netdev events

It was recently reported that moving a bonding device between network
namespaces causes warnings from /proc. It turns out after the move we
were trying to add and to remove the /proc/net/bonding entries from the
wrong network namespace.

Move the bonding /proc registration code into the NETDEV_REGISTER and
NETDEV_UNREGISTER events where the proc registration and unregistration
will always happen at the right time.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0450243096de90ff51c3a6c605410c5e28d79f8d 13-Jun-2012 Eric Dumazet <edumazet@google.com> bonding: drop_monitor aware

When packets are dropped in TX path, its better to use kfree_skb()
instead of dev_kfree_skb() to give proper drop_monitor events.

Also move the kfree_skb() call after read_unlock() in bond_alb_xmit()
and bond_xmit_activebackup()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
de063b7040dcd9fbc9a1847fa44f0af13e19d6de 11-Jun-2012 Eric Dumazet <edumazet@google.com> bonding: remove packet cloning in recv_probe()

Cloning all packets in input path have a significant cost.

Use skb_header_pointer()/skb_copy_bits() instead of pskb_may_pull() so
that recv_probe handlers (bond_3ad_lacpdu_recv / bond_arp_rcv /
rlb_arp_recv ) dont touch input skb.

bond_handle_frame() can avoid the skb_clone()/dev_kfree_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5ee31c6898ea5537fcea160999d60dc63bc0c305 12-Jun-2012 Eric Dumazet <edumazet@google.com> bonding: Fix corrupted queue_mapping

In the transmit path of the bonding driver, skb->cb is used to
stash the skb->queue_mapping so that the bonding device can set its
own queue mapping. This value becomes corrupted since the skb->cb is
also used in __dev_xmit_skb.

When transmitting through bonding driver, bond_select_queue is
called from dev_queue_xmit. In bond_select_queue the original
skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
and skb->queue_mapping is overwritten with the bond driver queue.

Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
the packet length into skb->cb, thereby overwriting the stashed
queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit),
the queue mapping for the skb is set to the stashed value which is now
the skb length and hence is an invalid queue for the slave device.

If we want to save skb->queue_mapping into skb->cb[], best place is to
add a field in struct qdisc_skb_cb, to make sure it wont conflict with
other layers (eg : Qdiscc, Infiniband...)

This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8
bytes :

netem qdisc for example assumes it can store an u64 in it, without
misalignment penalty.

Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[].
The largest user is CHOKe and it fills it.

Based on a previous patch from Tom Herbert.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Tom Herbert <therbert@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2e42e4747ea72943c21551d8a206b51a9893b1e0 09-May-2012 Joe Perches <joe@perches.com> drivers/net: Convert compare_ether_addr to ether_addr_equal

Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.

Done via cocci script:

$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)

@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)

@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)

@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)

@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)

@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)

@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13a8e0c8cdb43982372bd6c65fb26839c8fd8ce9 09-May-2012 Jiri Bohac <jbohac@suse.cz> bonding: don't increase rx_dropped after processing LACPDUs

Since commit 3aba891d, bonding processes LACP frames (802.3ad
mode) with bond_handle_frame(). Currently a copy of the skb is
made and the original is left to be processed by other
rx_handlers and the rest of the network stack by returning
RX_HANDLER_ANOTHER. As there is no protocol handler for
PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
increased.

Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
if bonding has processed the LACP frame.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
13b95fb714d7b7996e8d423d907beea17ad5c380 26-Apr-2012 Rick Jones <rick.jones2@hp.com> bonding: bond_update_speed_duplex() can return void since no callers check its return

As none of the callers of bond_update_speed_duplex (need to) check its
return value, there is little point in it returning anything.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f31c7937c2548bfa73da5074808c29617a932e29 17-Apr-2012 Michal Kubeček <mkubecek@suse.cz> bonding: start slaves with link down for ARP monitor

Initialize slave device link state as down if ARP monitor is
active and net_carrier_ok() returns zero. Also shift initial
value of its last_arp_tx so that it doesn't immediately cause
fake detection of "up" state.

When ARP monitoring is used, initializing the slave device with
up link state can cause ARP monitor to detect link failure
before the device is really up (with igb driver, this can take
more than two seconds).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
64d683c5825003ffb3b127057a165e6bfc26691e 13-Apr-2012 David S. Miller <davem@davemloft.net> bonding: Fixup get_tx_queue() op second arg type.

I missed this when fixing up the warning in the previous commit.

Signed-off-by: David S. Miller <davem@davemloft.net>
efacb309b50073a79ae604949a31509cd8b507ab 10-Apr-2012 stephen hemminger <shemminger@vyatta.com> rtnetlink & bonding: change args got get_tx_queues

Change get_tx_queues, drop unsused arg/return value real_tx_queues,
and use return by value (with error) rather than call by reference.

Probably bonding should just change to LLTX and the whole get_tx_queues
API could disappear!

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a4309746cd74734daa964acb02690c22b3c8911 05-Apr-2012 Veaceslav Falico <vfalico@redhat.com> bonding: properly unset current_arp_slave on slave link up

When a slave comes up, we're unsetting the current_arp_slave without
removing active flags from it, which can lead to situations where we have
more than one slave with active flags in active-backup mode.

To avoid this situation we must remove the active flags from a slave before
removing it as a current_arp_slave.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
234bcf8a499ee206145c7007d12d9706a254f790 04-Apr-2012 Shlomo Pongratz <shlomop@mellanox.com> net/bonding: correctly proxy slave neigh param setup ndo function

The current implemenation was buggy for slaves who use ndo_neigh_setup,
since the networking stack invokes the bonding device ndo entry (from
neigh_params_alloc) before any devices are enslaved, and the bonding
driver can't further delegate the call at that point in time. As a
result when bonding IPoIB devices, the neigh_cleanup hasn't been called.

Fix that by deferring the actual call into the slave ndo_neigh_setup
from the time the bonding neigh_setup is called.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2af73d4b2afe826d23e83f3747f850eefbd867ff 04-Apr-2012 Shlomo Pongratz <shlomop@mellanox.com> net/bonding: emit address change event also in bond_release

commit 7d26bb103c4 "bonding: emit event when bonding changes MAC" didn't
take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding
actually changes the mac address (to all zeroes). As a result the neighbours
aren't deleted by the core networking code (which does so upon getting that
event).

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7d26bb103c4162003bfdf1d63aaa32b548ad0e9a 27-Mar-2012 Weiping Pan <wpan@redhat.com> bonding: emit event when bonding changes MAC

When a bonding device is configured with fail_over_mac=active,
we expect to see the MAC address of the new active slave as the source MAC
address after failover. But we see that the source MAC address is the MAC
address of previous active slave.

Emit NETDEV_CHANGEADDR event when bonding changes its MAC address, in order
to let arp_netdev_event flush neighbour cache and route cache.

How to reproduce this bug ?

-----------hostB----------------
hostA ----- switch ---|-- eth0--bond0(192.168.100.2/24)|
(192.168.100.1/24 \--|-- eth1-/ |
--------------------------------

1 on hostB,
modprobe bonding mode=1 miimon=500 fail_over_mac=active downdelay=1000
num_grat_arp=1
ifconfig bond0 192.168.100.2/24 up
ifenslave bond0 eth0
ifenslave bond0 eth1

then eth0 is the active slave, and MAC of bond0 is MAC of eth0.

2 on hostA, ping 192.168.100.2

3 on hostB,
tcpdump -i bond0 -p icmp -XXX
you will see bond0 uses MAC of eth0 as source MAC in icmp reply.

4 on hostB,
ifconfig eth0 down
tcpdump -i bond0 -p icmp -XXX (just keep it running in step 3)
you will see first bond0 uses MAC of eth1 as source MAC in icmp
reply, then it will use MAC of eth0 as source MAC.

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9ffc93f203c18a70623f21950f1dd473c9ec48cd 28-Mar-2012 David Howells <dhowells@redhat.com> Remove all #inclusions of asm/system.h

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
eaddcd76903c28e84bb452a35835babb0800a2c4 22-Mar-2012 Andy Gospodarek <andy@greyhouse.net> bonding: remove entries for master_ip and vlan_ip and query devices instead

The following patch aimed to resolve an issue where secondary, tertiary,
etc. addresses added to bond interfaces could overwrite the
bond->master_ip and vlan_ip values.

commit 917fbdb32f37e9a93b00bb12ee83532982982df3
Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
Date: Wed Nov 23 23:37:15 2011 +0000

bonding: only use primary address for ARP

That patch was good because it prevented bonds using ARP monitoring from
sending frames with an invalid source IP address. Unfortunately, it
didn't always work as expected.

When using an ioctl (like ifconfig does) to set the IP address and
netmask, 2 separate ioctls are actually called to set the IP and netmask
if the mask chosen doesn't match the standard mask for that class of
address. The first ioctl did not have a mask that matched the one in
the primary address and would still cause the device address to be
overwritten. The second ioctl that was called to set the mask would
then detect as secondary and ignored, but the damage was already done.

This was not an issue when using an application that used netlink
sockets as the setting of IP and netmask came down at once. The
inconsistent behavior between those two interfaces was something that
needed to be resolved.

While I was thinking about how I wanted to resolve this, Ralf Zeidler
came with a patch that resolved this on a RHEL kernel by keeping a full
shadow of the entries in dev->ifa_list for the bonding device and vlan
devices in the bonding driver. I didn't like the duplication of the
list as I want to see the 'bonding' struct and code shrink rather than
grow, but liked the general idea.

As the Subject indicates this patch drops the master_ip and vlan_ip
elements from the 'bonding' and 'vlan_entry' structs, respectively.
This can be done because a device's address-list is now traversed to
determine the optimal source IP address for ARP requests and for checks
to see if the bonding device has a particular IP address. This code
could have all be contained inside the bonding driver, but it made more
sense to me to EXPORT and call inet_confirm_addr since it did exactly
what was needed.

I tested this and a backported patch and everything works as expected.
Ralf also helped with verification of the backported patch.

Thanks to Ralf for all his help on this.

v2: Whitespace and organizational changes based on suggestions from Jay
Vosburgh and Dave Miller.

v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's
suggestion.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Ralf Zeidler <ralf.zeidler@nsn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1c3ac4289a0e4d60cbd4787b4a91de4a0c785df1 17-Mar-2012 Peter Pan(潘卫平) <panweiping3@gmail.com> bonding: send igmp report for its master

Liang Zheng(lzheng@redhat.com) found that in the following topo,
bonding does not send igmp report when we trigger a fail-over of bonding.

eth0--
|-- bond0 -- br0
eth1--

modprobe bonding mode=1 miimon=100 resend_igmp=10
ifconfig bond0 up
ifenslave bond0 eth0 eth1

brctl addbr br0
ifconfig br0 192.168.100.2/24 up
brctl addif br0 bond0

Add 192.168.100.2(br0) into a multicast group, like 224.10.10.10,
then trigger a fali-over in bonding.
You can see that parameter "resend_igmp" does not work.

The reason is that when we add br0 into a multicast group,
it does not propagate multicast knowledge down to its ports.

If we choose to propagate multicast knowledge down to all ports for bridge,
then we have to track every change that is done to bridge, and keep a backup
for all ports. It is hard to track, I think.

Instead I choose to modify bonding to send igmp report for its master.

Changelog:
V2: correct comments
V3: move this check into bond_resend_igmp_join_requests()
V4: only send igmp reports if bond is enslaved to a bridge

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f7d9821a6a9c83450ac35e76d3709e32fd38b76f 31-Dec-2011 stephen hemminger <shemminger@vyatta.com> bonding: fix error handling if slave is busy (v2)

If slave device already has a receive handler registered, then the
error unwind of bonding device enslave function is broken.

The following will leave a pointer to freed memory in the slave
device list, causing a later kernel panic.
# modprobe dummy
# ip li add dummy0-1 link dummy0 type macvlan
# modprobe bonding
# echo +dummy0 >/sys/class/net/bond0/bonding/slaves

The fix is to detach the slave (which removes it from the list)
in the unwind path.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
87002b03baabd2b8f6281ab6411ed88d24958de1 08-Dec-2011 Jiri Pirko <jpirko@redhat.com> net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls

This patch adds wrapper for ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid
functions. Check for NETIF_F_HW_VLAN_FILTER feature is done in this
wrapper.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8e586137e6b63af1e881b328466ab5ffbe562510 09-Dec-2011 Jiri Pirko <jpirko@redhat.com> net: make vlan ndo_vlan_rx_[add/kill]_vid return error value

Let caller know the result of adding/removing vlan id to/from vlan
filter.

In some drivers I make those functions to just return 0. But in those
where there is able to see if hw setup went correctly, return value is
set appropriately.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
917fbdb32f37e9a93b00bb12ee83532982982df3 24-Nov-2011 Henrik Saavedra Persson <henrik.e.persson@ericsson.com> bonding: only use primary address for ARP

Only use the primary address of the bond device
for master_ip. This will prevent changing the ARP source
address in Active-Backup mode whenever a secondry address
is added to the bond device.

Signed-off-by: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@drr.davemloft.net>
34324dc2bf27c1773045fea63cb11f7e2a6ad2b9 15-Nov-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: remove NETIF_F_NO_CSUM feature bit

Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
enabled by default. The check heuristics is altered a bit here,
so it hits other people than before. The default shouldn't be
trusted for performance-critical cases anyway.

For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
c8f44affb7244f2ac3e703cab13d55ede27621bb 15-Nov-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: introduce and use netdev_features_t for device features sets

v2: add couple missing conversions in drivers
split unexporting netdev_fix_features()
implemented %pNF
convert sock::sk_route_(no?)caps

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
589665f5a6008dbce1d0af2cb93e94a80bf78151 04-Nov-2011 Dan Carpenter <dan.carpenter@oracle.com> bonding: comparing a u8 with -1 is always false

slave->duplex is a u8 type so the in bond_info_show_slave() when we
check "if (slave->duplex == -1)", it's always false.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
98f41f694f46085fda475cdee8cc0b6d2c5e6f1f 31-Oct-2011 Weiping Pan <wpan@redhat.com> bonding:update speed/duplex for NETDEV_CHANGE

Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
arp monitor, sometimes bonding driver cannot get the speed and duplex from
its slaves, it will assume them to be 100Mb/sec and Full, please see
/proc/net/bonding/bond0.
But there is no such problem when uses miimon.

(Take igb for example)
I find that the reason is that after dev_open() in bond_enslave(),
bond_update_speed_duplex() will call igb_get_settings()
, but in that function,
it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
because igb get an error value of status.
So even dev_open() is called, but the device is not really ready to get its
settings.

Maybe it is safe for us to call igb_get_settings() only after
this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
Flow Control: RX".

So I prefer to update the speed and duplex for a slave when reseices
NETDEV_CHANGE/NETDEV_UP event.

Changelog
V2:
1 remove the "fake 100/Full" logic in bond_update_speed_duplex(),
set speed and duplex to -1 when it gets error value of speed and duplex.
2 delete the warning in bond_enslave() if bond_update_speed_duplex() returns
error.
3 make bond_info_show_slave() handle bad values of speed and duplex.

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6d265e8504ab4a3368b8645d318b344ee88b280 28-Oct-2011 Jay Vosburgh <fubar@us.ibm.com> bonding: eliminate bond_close race conditions

This patch resolves two sets of race conditions.

Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the
first, as follows:

The bond_close() calls cancel_delayed_work() to cancel delayed works.
It, however, cannot cancel works that were already queued in workqueue.
The bond_open() initializes work->data, and proccess_one_work() refers
get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
work->data has been initialized. Thus, a panic occurs.

He included a patch that converted the cancel_delayed_work calls
in bond_close to flush_delayed_work_sync, which eliminated the above
problem.

His patch is incorporated, at least in principle, into this
patch. In this patch, we use cancel_delayed_work_sync in place of
flush_delayed_work_sync, and also convert bond_uninit in addition to
bond_close.

This conversion to _sync, however, opens new races between
bond_close and three periodically executing workqueue functions:
bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon.

The race occurs because bond_close and bond_uninit are always
called with RTNL held, and these workqueue functions may acquire RTNL to
perform failover-related activities. If bond_close or bond_uninit is
waiting in cancel_delayed_work_sync, deadlock occurs.

These deadlocks are resolved by having the workqueue functions
acquire RTNL conditionally. If the rtnl_trylock() fails, the functions
reschedule and return immediately. For the cases that are attempting to
perform link failover, a delay of 1 is used; for the other cases, the
normal interval is used (as those activities are not as time critical).

Additionally, the bond_mii_monitor function now stores the delay
in a variable (mimicing the structure of activebackup_arp_mon).

Lastly, all of the above renders the kill_timers sentinel moot,
and therefore it has been removed.

Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
59fdaca9a4497ada47328d7b4b406b98a6f1c1a6 24-Oct-2011 Maciej Żenczykowski <maze@google.com> net: make bonding slaves honour master's skb->priority

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4c22400ab64d434a00ecbe0c655a16956c902aa8 12-Oct-2011 Eric W. Biederman <ebiederm@xmission.com> bonding: Use a per netns implementation of /sys/class/net/bonding_masters.

This fixes a network namespace misfeature that bonding_masters looked at
current instead of the remembering the context where in which
/sys/class/net/bonding_masters was opened in to see which network
namespace to act upon.

This removes the need for sysfs to handle tagged directories with
untagged members allowing for a conceptually simpler sysfs
implementation.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4d97480b1806e883eb1c7889d4e7a87e936e06d9 12-Oct-2011 Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> bonding: use local function pointer of bond->recv_probe in bond_handle_frame

The bond->recv_probe is called in bond_handle_frame() when
a packet is received, but bond_close() sets it to NULL. So,
a panic occurs when both functions work in parallel.

Why this happen:
After null pointer check of bond->recv_probe, an sk_buff is
duplicated and bond->recv_probe is called in bond_handle_frame.
So, a panic occurs when bond_close() is called between the
check and call of bond->recv_probe.

Patch:
This patch uses a local function pointer of bond->recv_probe
in bond_handle_frame(). So, it can avoid the null pointer
dereference.

Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a0db2dad0935e798973bb79676e722b82f177206 23-Sep-2011 Andy Gospodarek <andy@greyhouse.net> bonding: properly stop queuing work when requested

During a test where a pair of bonding interfaces using ARP monitoring
were both brought up and torn down (with an rmmod) repeatedly, a panic
in the timer code was noticed. I tracked this down and determined that
any of the bonding functions that ran as workqueue handlers and requeued
more work might not properly exit when the module was removed.

There was a flag protected by the bond lock called kill_timers that is
set when the interface goes down or the module is removed, but many of
the functions that monitor link status now unlock the bond lock to take
rtnl first. There is a chance that another CPU running the rmmod could
get the lock and set kill_timers after the first check has passed.

This patch does not allow any function to queue work that will make
itself run unless kill_timers is not set. I also noticed while doing
this work that bond_resend_igmp_join_requests did not have a check for
kill_timers, so I added the needed call there as well.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Liang Zheng <lzheng@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4bc71cb983fd2844e603bf633df2bb53385182d2 03-Sep-2011 Jiri Pirko <jpirko@redhat.com> net: consolidate and fix ethtool_ops->get_settings calling

This patch does several things:
- introduces __ethtool_get_settings which is called from ethtool code and
from drivers as well. Put ASSERT_RTNL there.
- dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
- changes calling in drivers so rtnl locking is respected. In
iboe_get_rate was previously ->get_settings() called unlocked. This
fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
problem. Also fixed by calling __dev_get_by_index() instead of
dev_get_by_index() and holding rtnl_lock for both calls.
- introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
so bnx2fc_if_create() and fcoe_if_create() are called locked as they
are from other places.
- use __ethtool_get_settings() in bonding code

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

v2->v3:
-removed dev_ethtool_get_settings()
-added ASSERT_RTNL into __ethtool_get_settings()
-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
around it and __ethtool_get_settings() call
v1->v2:
add missing export_symbol
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
afc4b13df143122f99a0eb10bfefb216c2806de0 16-Aug-2011 Jiri Pirko <jpirko@redhat.com> net: remove use of ndo_set_multicast_list in drivers

replace it by ndo_set_rx_mode

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d03462b999307ec5c186851ec9c5751bd5a675f7 16-Aug-2011 Jiri Pirko <jpirko@redhat.com> bonding: use ndo_change_rx_flags callback

Benefit from use of ndo_change_rx_flags in handling change of promisc
and allmulti. No need to store previous state locally.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ba3211ccd043fae3713793334d64d75bd0a1d029 15-Aug-2011 Peter Pan(潘卫平) <panweiping3@gmail.com> bonding:reset backup and inactive flag of slave

Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
bonding mode from active backup to round robin, some slaves are still keeping
"backup", and won't transmit packets.

As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by
removing the bond_is_active_slave() check, because the "backup" flag is only
meaningful for active backup mode.

But if we just simply ignore the bond_is_active_slave() check,
the transmission will work fine, but we can't maintain the correct value of
"backup" flag for each slaves, though it is meaningless for other mode than
active backup.

I'd like to reset "backup" and "inactive" flag in bond_open,
thus we can keep the correct value of them.

As for bond_is_active_slave(), I'd like to prepare another patch to handle it.

V2:
Use C style comment.
Move read_lock(&bond->curr_slave_lock).
Replace restore with reset, for active backup mode, it means "restore",
but for other modes, it means "reset".

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d5da4510a742bb85182dcab8a47f63baaebb5ec3 10-Aug-2011 Jiri Pirko <jpirko@redhat.com> bonding: implement get_tx_queues rtnk_link_op

If bonding device is created via rtnl, it is created with default number
of rx/tx queues. This patch implements callback in bonding so the
correct value (previously specified by bonding module param) is used.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b2730f4f842b987c818023a8003e6426cf996985 27-Jul-2011 Andy Gospodarek <andy@greyhouse.net> bonding: reduce noise during init

On Tue, Jul 26, 2011 at 05:40:27PM -0700, Joe Perches wrote:
> On Tue, 2011-07-26 at 17:37 -0700, Jay Vosburgh wrote:
> > Joe Perches <joe@perches.com> wrote:
> > >I'd prefer you don't separate the format string
> > >into multiple pieces.
> > Why not? To me, it looks easier to read split into sections
> > that don't wrap lines.
>
> Harder to grep for a dmesg and the
> defect rate of these split formats is
> typically higher than single strings
> because of bad spacing between string
> segments.
>

I noticed that you took some time back in late 2009 to 'consolidate' the
split format-strings present in the bonding driver at the time and I've
decided I'm fine to leave them the way they are. The main point of my
patch was to change the output and I would like to get that included.
Here is my updated patch...

Subject: [PATCH net-next-2.6 v2] bonding: reduce noise during init

Many are using sysfs to configure bonding rather than module options, so
there is no need for bonding to throw this warning in normal cases.

Keep the message around when debugging is enabled as it might be useful
for someone desperate enough to enable debugging, but eliminate it
otherwise.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
550fd08c2cebad61c548def135f67aba284c6162 26-Jul-2011 Neil Horman <nhorman@tuxdriver.com> net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared

After the last patch, We are left in a state in which only drivers calling
ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
hardware call ether_setup for their net_devices and don't hold any state in
their skbs. There are a handful of drivers that violate this assumption of
course, and need to be fixed up. This patch identifies those drivers, and marks
them as not being able to support the safe transmission of skbs by clearning the
IFF_TX_SKB_SHARING flag in priv_flags

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Karsten Keil <isdn@linux-pingi.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: "John W. Linville" <linville@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Marcel Holtmann <marcel@holtmann.org>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
cc0e40700656b09d93b062ef6c818aa45429d09a 20-Jul-2011 Jiri Pirko <jpirko@redhat.com> bonding: do vlan cleanup

Now when all devices are cleaned up, bond can be cleaned up as well

- remove bond->vlgrp
- remove bond_vlan_rx_register
- substitute necessary occurences of vlan_group_get_device

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
62f2a3a48bdc99822a24356e667e52c30df287c9 13-Jul-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: remove NETIF_F_ALL_TX_OFFLOADS

There is no software fallback implemented for SCTP or FCoE checksumming,
and so it should not be passed on by software devices like bridge or bonding.

For VLAN devices, this is different. First, the driver for underlying device
should be prepared to get offloaded packets even when the feature is disabled
(especially if it advertises it in vlan_features). Second, devices under
VLANs do not get replaced without tearing down the VLAN first.

This fixes a mess I accidentally introduced while converting bonding to
ndo_fix_features.

NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they
are unused as of commit 712ae51afd.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
655f8919d549ad1872e24d826b6ce42530516d2e 22-Jun-2011 stephen hemminger <shemminger@vyatta.com> bonding: add min links parameter to 802.3ad

This adds support for a configuring the minimum number of links that
must be active before asserting carrier. It is similar to the Cisco
EtherChannel min-links feature. This allows setting the minimum number
of member ports that must be up (link-up state) before marking the
bond device as up (carrier on). This is useful for situations where
higher level services such as clustering want to ensure a minimum
number of low bandwidth links are active before switchover.

See:
http://bugzilla.vyatta.com/show_bug.cgi?id=7196

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
56f8a75c17abb854b5907f4a815dc4c3f186ba11 22-Jun-2011 Paul Gortmaker <paul.gortmaker@windriver.com> ip: introduce ip_is_fragment helper inline function

There are enough instances of this:

iph->frag_off & htons(IP_MF | IP_OFFSET)

that a helper function is probably warranted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cefa9993f161c1c2b6b91b7ea2e84a9bfbd43d2e 20-Jun-2011 WANG Cong <amwang@redhat.com> netpoll: copy dev name of slaves to struct netpoll

Otherwise we will not see the name of the slave dev in error
message:

[ 388.469446] (null): doesn't support polling, aborting.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
830a9c75514b477994fd3847f72654d3dbdfa5ca 10-Jun-2011 Jiri Bohac <jbohac@suse.cz> bonding: clean up bond_del_vlan()

1) the setting of NETIF_F_VLAN_CHALLENGED in bond_del_vlan() is
useless since commit b2a103e6 because bond_fix_features() now
sets NETIF_F_VLAN_CHALLENGED whenever the last slave is being
removed.

2) the code never triggers anyway as vlan_list is never empty
since ad1afb00.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
56d00c677de0a6285587af4f6c6f10aef3209f9f 08-Jun-2011 Peter Pan(潘卫平) <panweiping3@gmail.com> bonding:delete lacp_fast from ad_bond_info

These is also a bug, that if you modify lacp_rate via sysfs,
and add new slaves in bonding, new slaves won't use the latest lacp_rate,
since ad_bond_info->lacp_fast is initialized only once,
in bond_3ad_initialize().

Since both struct bond_params and ad_bond_info have lacp_fast,
they are duplicate and need extra synchronization.

bond_3ad_bind_slave() can use bond_params->lacp_fast to initialize port.
So we can just remove lacp_fast from struct ad_bond_info.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
374eeb5a9d77ea719c5c46f4d70226623f4528ce 03-Jun-2011 Neil Horman <nhorman@tuxdriver.com> bonding: reset queue mapping prior to transmission to physical device (v5)

The bonding driver is multiqueue enabled, in which each queue represents a slave
to enable optional steering of output frames to given slaves against the default
output policy. However, it needs to reset the skb->queue_mapping prior to
queuing to the physical device or the physical slave (if it is multiqueue) could
wind up transmitting on an unintended tx queue

Change Notes:
v2) Based on first pass review, updated the patch to restore the origional queue
mapping that was found in bond_select_queue, rather than simply resetting to
zero. This preserves the value of queue_mapping when it was set on receive in
the forwarding case which is desireable.

v3) Fixed spelling an casting error in skb->cb

v4) fixed to store raw queue_mapping to avoid double decrement

v5) Eric D requested that ->cb access be wrapped in a macro.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6f92c66f7190b1677ea666249b72298723392115 01-Jun-2011 Jiri Pirko <jpirko@redhat.com> bonding: allow all slave speeds

No need to check for 10, 100, 1000, 10000 explicitly. Just make this
generic and check for invalid values only (similar check is in ethtool
userspace app). This enables correct speed handling for slave devices
with "nonstandard" speeds.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
90e62474fd08e16ba5309886c801243b0eb782f3 25-May-2011 Andy Gospodarek <andy@greyhouse.net> bonding: cleanup module option descriptions

Weiping Pan noticed that the module option description for
xmit_hash_policy was incorrect and was nice enough to post a patch to
fix it. The text was correct, but created a line over 80 characters and
I would rather not add those. I realized I could take a few minutes and
clean up all the descriptions and things would look much better. This
is the result.

Based on patch from Weiping Pan <panweiping3@gmail.com>.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
94265cf5f731c7df29fdfde262ca3e6d51e6828c 25-May-2011 Flavio Leitner <fbl@redhat.com> bonding: documentation and code cleanup for resend_igmp

Improves the documentation about how IGMP resend parameter
works, fix two missing checks and coding style issues.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9fe0617d9b6d21f700ee9e658e1c9fe3be2fb402 25-May-2011 Neil Horman <nhorman@tuxdriver.com> bonding: prevent deadlock on slave store with alb mode (v3)

This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>] [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8 EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS: 00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
[<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
[<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
[<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
[<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
[<ffffffff8006457b>] __down_write_nested+0x12/0x92
[<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
[<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
[<ffffffff80016b87>] vfs_write+0xce/0x174
[<ffffffff80017450>] sys_write+0x45/0x6e
[<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down. The bonding driver initializes some data structures
only after its ndo_open routine is called. Among them is the initalization of
the alb tx and rx hash locks. So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.

Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.

Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do. As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem. We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe

v3) Fix spelling error

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
daf9209bb2c8b07ca025eac82e3d175534086c77 19-May-2011 Amerigo Wang <amwang@redhat.com> net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASE

s/NETDEV_BONDING_DESLAVE/NETDEV_RELEASE/ as Andy suggested.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d8fc29d02a33e4bd5f4fa47823c1fd386346093 19-May-2011 Amerigo Wang <amwang@redhat.com> netpoll: disable netpoll when enslave a device

V3: rename NETDEV_ENSLAVE to NETDEV_JOIN

Currently we do nothing when we enslave a net device which is running netconsole.
Neil pointed out that we may get weird results in such case, so let's disable
netpoll on the device being enslaved. I think it is too harsh to prevent
the device being ensalved if it is running netconsole.

By the way, this patch also removes the NETDEV_GOING_DOWN from netconsole
netdev notifier, because netpoll will check if the device is running or not
and we don't handle NETDEV_PRE_UP neither.

This patch is based on net-next-2.6.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b2a103e6d0afa432dff66b36473c5a55b6b0376c 07-May-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> bonding: convert to ndo_fix_features

This should also fix updating of vlan_features and propagating changes to
VLAN devices on the bond.

Side effect: it allows user to force-disable some offloads on the bond
interface.

Note: NETIF_F_VLAN_CHALLENGED is managed by bond_fix_features() now.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
0693e88e6ccf615d9674548d8b924cdd9a1c976c 07-May-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: bonding: factor out rlock(bond->lock) in xmit path

Pull read_lock(&bond->lock) and BOND_IS_OK() to bond_start_xmit() from
mode-dependent xmit functions.

netif_running() is always true in hard_start_xmit.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
1c5cae815d19ffe02bdfda1260949ef2b1806171 30-Apr-2011 Jiri Pirko <jpirko@redhat.com> net: call dev_alloc_name from register_netdevice

Force dev_alloc_name() to be called from register_netdevice() by
dev_get_valid_name(). That allows to remove multiple explicit
dev_alloc_name() calls.

The possibility to call dev_alloc_name in advance remains.

This also fixes veth creation regresion caused by
84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ad246c992bea6d33c6421ba1f03e2b405792adf9 26-Apr-2011 Ben Hutchings <bhutchings@solarflare.com> ipv4, ipv6, bonding: Restore control over number of peer notifications

For backward compatibility, we should retain the module parameters and
sysfs attributes to control the number of peer notifications
(gratuitous ARPs and unsolicited NAs) sent after bonding failover.
Also, it is possible for failover to take place even though the new
active slave does not have link up, and in that case the peer
notification should be deferred until it does.

Change ipv4 and ipv6 so they do not automatically send peer
notifications on bonding failover.

Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
notifications when the link is up, as many times as requested. Since
it does not directly control which protocols send notifications, make
num_grat_arp and num_unsol_na aliases for a single parameter. Bump
the bonding version number and update its documentation.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Acked-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3aba891dde3842d89ad022237b99c1ed308040b0 19-Apr-2011 Jiri Pirko <jpirko@redhat.com> bonding: move processing of recv handlers into handle_frame()

Since now when bonding uses rx_handler, all traffic going into bond
device goes thru bond_handle_frame. So there's no need to go back into
bonding code later via ptype handlers. This patch converts
original ptype handlers into "bonding receive probes". These functions
are called from bond_handle_frame and they are registered per-mode.

Note that vlan packets are also handled because they are always untagged
thanks to vlan_untag()

Note that this also allows arpmon for eth-bond-bridge-vlan topology.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7c89943236750537d26421d9bbb6f6575e2d1e1b 15-Apr-2011 Ben Hutchings <bhutchings@solarflare.com> bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS

It is undesirable for the bonding driver to be poking into higher
level protocols, and notifiers provide a way to avoid that. This does
mean removing the ability to configure reptitition of gratuitous ARPs
and unsolicited NAs.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7d038eb6dc0e256dbcac88d52972c4ac55a78fc5 17-Apr-2011 David S. Miller <davem@davemloft.net> bonding: Fix set-but-unused variable.

The variable 'vlan_dev' is set but unused in
bond_send_gratuitous_arp(). Just kill it off.

Signed-off-by: David S. Miller <davem@davemloft.net>
5d30530efbb811f875786d788ae1c5d79547c3a4 13-Apr-2011 David Decotigny <decot@google.com> net-bonding: Adding support for throughputs larger than 65536 Mbps

This updates the bonding driver to support v2.6.27-rc3 enhancements
(b11f8d8c aka. "ethtool: Expand ethtool_cmd.speed to 32 bits") which
allow to encode the Mbps link speed on 32-bits (Max 4 Pbps) instead of
16 (Max 65536 Mbps).

This patch also attempts to compact struct slave by reordering its
fields.

Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
65cce19c07756c2b2b51595c967dda93b0727027 13-Apr-2011 David Decotigny <decot@google.com> net-bonding: Fix minor/cosmetic type inconsistencies

The __get_link_speed() function returns a u16 value which was stored
in a u32 local variable. This patch uses the return value directly,
thus fixing that minor type consistency.

The 'duplex' field in struct slave being encoded on 8 bits, to be more
consistent we use a u8 integer (instead of u16) whenever we copy it to
local variables.

Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d30ee670f25ea8f265a2804e2a0a53804cac5185 13-Apr-2011 David Decotigny <decot@google.com> net-bonding: Fix minor sparse complaints

This gets rid of minor sparse complaints:
drivers/net/bonding/bond_main.c:4361:4: warning: do-while statement is not a compound statement
drivers/net/bonding/bond_main.c:243:12: warning: symbol 'bond_mode_name' was not declared. Should it be static?

Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c6e1a0d12ca7b4f22c58e55a16beacfb7d3d8462 05-Apr-2011 Tom Herbert <therbert@google.com> net: Allow no-cache copy from user on transmit

This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement. skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg. This functionality is
configurable per device using ethtool.

Presumably, this feature would only be useful when the driver does
not touch the data. The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary. For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.

This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply. Platform is 16 core AMD x86.

No-cache copy disabled:
672703 tps, 97.13% utilization
50/90/99% latency:244.31 484.205 1028.41

No-cache copy enabled:
702113 tps, 96.16% utilization,
50/90/99% latency 238.56 467.56 956.955

Using 14000 byte request and response sizes demonstrate the
effects more dramatically:

No-cache copy disabled:
79571 tps, 34.34 %utlization
50/90/95% latency 1584.46 2319.59 5001.76

No-cache copy enabled:
83856 tps, 34.81% utilization
50/90/95% latency 2508.42 2622.62 2735.88

Note especially the effect on latency tail (95th percentile).

This seems to provide a nice performance improvement and is
consistent in the tests I ran. Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
35d48903e9781975e823b359ee85c257c9ff5c1c 22-Mar-2011 Jiri Pirko <jpirko@redhat.com> bonding: fix rx_handler locking

This prevents possible race between bond_enslave and bond_handle_frame
as reported by Nicolas by moving rx_handler register/unregister.
slave->bond is added to hold pointer to master bonding sructure. That
way dev->master is no longer used in bond_handler_frame.
Also, this removes "BUG: scheduling while atomic" message

Reported-by: Nicolas de Pesloüan <nicolas.2p.debian@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Tested-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
dadaa10b077133e5c03333131b82ecb13679af2b 19-Mar-2011 Nicolas de Pesloüan <nicolas.2p.debian@free.fr> bonding: fix a typo in a comment

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
ceda86a108671294052cbf51660097b6534672f5 14-Mar-2011 Andy Gospodarek <andy@greyhouse.net> bonding: enable netpoll without checking link status

Only slaves that are up should transmit netpoll frames, so there is no
need to check to see if a slave is up before enabling netpoll on it.
This resolves a reported failure on active-backup bonds where a slave
interface is down when netpoll was enabled.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Tested-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8a4eb5734e8d1dc60a8c28576bbbdfdcc643626d 12-Mar-2011 Jiri Pirko <jpirko@redhat.com> net: introduce rx_handler results and logic around that

This patch allows rx_handlers to better signalize what to do next to
it's caller. That makes skb->deliver_no_wcard no longer needed.

kernel-doc for rx_handler_result is taken from Nicolas' patch.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2d7011ca79f1a8792e04d131b8ea21db179ab917 16-Mar-2011 Jiri Pirko <jpirko@redhat.com> bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag

Since bond-related code was moved from net/core/dev.c into bonding,
IFF_SLAVE_INACTIVE is no longer needed. Replace is with flag "inactive"
stored in slave structure

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
e30bc066ab67a4c8abcb972227ffe7c576f06a86 12-Mar-2011 Jiri Pirko <jpirko@redhat.com> bonding: wrap slave state work

transfers slave->state into slave->backup (that it's going to transfer
into bitfield. Introduce wrapper inlines to do the work with it.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
0bd80dad57d82676ee484fb1f9aa4c5e8b5bc469 16-Mar-2011 Jiri Pirko <jpirko@redhat.com> net: get rid of multiple bond-related netdevice->priv_flags

Now when bond-related code is moved from net/core/dev.c into bonding
code, multiple priv_flags are not needed anymore. So let them rot.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
f1c1775ac7e61950225925c949045406ffcb43de 12-Mar-2011 Jiri Pirko <jpirko@redhat.com> bonding: register slave pointer for rx_handler

Register slave pointer as rx_handler data. That would eventually prevent
need to loop over slave devices to find the right slave.

Use synchronize_net to ensure that bond_handle_frame does not get slave
structure freed when working with that.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
e826eafa65c6f1f7c8db5a237556cebac57ebcc5 14-Mar-2011 Phil Oester <kernel@linuxace.com> bonding: Call netif_carrier_off after register_netdevice

Bringing up a bond interface with all network cables disconnected
does not properly set the interface as DOWN because the call to
netif_carrier_off occurs too early in bond_init. The call needs
to occur after register_netdevice has set dev->reg_state to
NETREG_REGISTERED, so that netif_carrier_off will trigger the
call to linkwatch_fire_event.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
fd0e435b0fe85622f167b84432552885a4856ac8 14-Mar-2011 Phil Oester <kernel@linuxace.com> bonding: Incorrect TX queue offset

When packets come in from a device with >= 16 receive queues
headed out a bonding interface, syslog gets filled with this:

kernel: bond0 selects TX queue 16, but real number of TX queues is 16

because queue_mapping is offset by 1. Adjust return value
to account for the offset.

This is a revision of my earlier patch (which did not use the
skb_rx_queue_* helpers - thanks to Ben for the suggestion).
Andy submitted a similar patch which emits a pr_warning on
invalid queue selection, but I believe the log spew is
not useful. We can revisit that question in the future,
but in the interim I believe fixing the core problem is
worthwhile.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
78fbfd8a653ca972afe479517a40661bfff6d8c3 12-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Create and use route lookup helpers.

The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>
bd33acc3cc525972ac779067e98efb26516c5b94 06-Mar-2011 Amerigo Wang <amwang@redhat.com> bonding: move procfs code into bond_procfs.c

V2: Move #ifdef CONFIG_PROC_FS into bonding.h, as suggested by David.

bond_main.c is bloating, separate the procfs code out,
move them to bond_procfs.c

Signed-off-by: WANG Cong <amwang@redhat.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
541ac7c9b30ee2ff84ad87f27e0bc069e143afb5 02-Mar-2011 Changli Gao <xiaosuo@gmail.com> bonding: COW before overwriting the destination MAC address

When there is a ptype handler holding a clone of this skb, whose
destination MAC addresse is overwritten, the owner of this handler may
get a corrupted packet.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cca134fe784a01cf4eb9f0621b0104591b7ddfba 02-Mar-2011 Changli Gao <xiaosuo@gmail.com> bonding: remove the unused dummy functions when net poll controller isn't enabled

These two functions are only used when net poll controller is enabled.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b23dd4fe42b455af5c6e20966b7d6959fa8352ea 02-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Make output route lookup return rtable directly.

Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
5b2c4dd2ec12cf0e53b2bd2926f0fe2d1fbb4eda 23-Feb-2011 Jiri Pirko <jpirko@redhat.com> net: convert bonding to use rx_handler

This patch converts bonding to use rx_handler. Results in cleaner
__netif_receive_skb() with much less exceptions needed. Also
bond-specific work is moved into bond code.

Did performance test using pktgen and counting incoming packets by
iptables. No regression noted.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
080e4130b1fb6a02e75149a1cccc8192e734713d 18-Feb-2011 Amerigo Wang <amwang@redhat.com> netpoll: remove IFF_IN_NETPOLL flag

V4: rebase to net-next-2.6

This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since
we have netpoll_tx_running() now.

Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8a8efa22f51b3c3f879d272914e3dbbc2041bf91 18-Feb-2011 Amerigo Wang <amwang@redhat.com> bonding: sync netpoll code with bridge

V4: rebase to net-next-2.6
V3: remove an useless #ifdef.

This patch unifies the netpoll code in bonding with netpoll code in bridge,
thanks to Herbert that code is much cleaner now.

Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9232ecca3ecd2e32140118c8fdabd7f8fb9ef4d5 13-Feb-2011 Jiri Pirko <jpirko@redhat.com> bond: implement [add/del]_slave ops

allow enslaving/releasing using netlink interface

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1765a575334f1a232c1478accdee5c7d19f4b3e3 12-Feb-2011 Jiri Pirko <jpirko@redhat.com> net: make dev->master general

dev->master is now tightly connected to bonding driver. This patch makes
this pointer more general and ready to be used by others.

- netdev_set_master() - bond specifics moved to new function
netdev_set_bond_master()
- introduced netif_is_bond_slave() to check if device is a bonding slave

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
acd1130e8793fb150fb522da8ec51675839eb4b1 25-Jan-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: reduce and unify printk level in netdev_fix_features()

Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
04ed3e741d0f133e02bed7fa5c98edba128f90e7 25-Jan-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: change netdev->features to u32

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
b30532515f0a62bfe17207ab00883dd262497006 20-Jan-2011 Neil Horman <nhorman@tuxdriver.com> bonding: Ensure that we unshare skbs prior to calling pskb_may_pull

Recently reported oops:

kernel BUG at net/core/skbuff.c:813!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/net/bond0/broadcast
CPU 8
Modules linked in: sit tunnel4 cpufreq_ondemand acpi_cpufreq freq_table bonding
ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii serio_raw i2c_i801
i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma i7core_edac edac_core bnx2
ixgbe dca mdio sg ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase
scsi_transport_sas dm_mod [last unloaded: microcode]

Modules linked in: sit tunnel4 cpufreq_ondemand acpi_cpufreq freq_table bonding
ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii serio_raw i2c_i801
i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma i7core_edac edac_core bnx2
ixgbe dca mdio sg ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase
scsi_transport_sas dm_mod [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.32-71.el6.x86_64 #1 BladeCenter HS22
-[7870AC1]-
RIP: 0010:[<ffffffff81405b16>] [<ffffffff81405b16>]
pskb_expand_head+0x36/0x1e0
RSP: 0018:ffff880028303b70 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff880c6458ec80 RCX: 0000000000000020
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880c6458ec80
RBP: ffff880028303bc0 R08: ffffffff818a6180 R09: ffff880c6458ed64
R10: ffff880c622b36c0 R11: 0000000000000400 R12: 0000000000000000
R13: 0000000000000180 R14: ffff880c622b3000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000038653452a4 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8806649c2000, task ffff880c64f16ab0)
Stack:
ffff880028303bc0 ffffffff8104fff9 000000000000001c 0000000100000000
<0> ffff880000047d80 ffff880c6458ec80 000000000000001c ffff880c6223da00
<0> ffff880c622b3000 0000000000000000 ffff880028303c10 ffffffff81407f7a
Call Trace:
<IRQ>
[<ffffffff8104fff9>] ? __wake_up_common+0x59/0x90
[<ffffffff81407f7a>] __pskb_pull_tail+0x2aa/0x360
[<ffffffffa0244530>] bond_arp_rcv+0x2c0/0x2e0 [bonding]
[<ffffffff814a0857>] ? packet_rcv+0x377/0x440
[<ffffffff8140f21b>] netif_receive_skb+0x2db/0x670
[<ffffffff8140f788>] napi_skb_finish+0x58/0x70
[<ffffffff8140fc89>] napi_gro_receive+0x39/0x50
[<ffffffffa01286eb>] ixgbe_clean_rx_irq+0x35b/0x900 [ixgbe]
[<ffffffffa01290f6>] ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe]
[<ffffffff8140fe53>] net_rx_action+0x103/0x210
[<ffffffff81073bd7>] __do_softirq+0xb7/0x1e0
[<ffffffff810d8740>] ? handle_IRQ_event+0x60/0x170
[<ffffffff810142cc>] call_softirq+0x1c/0x30
[<ffffffff81015f35>] do_softirq+0x65/0xa0
[<ffffffff810739d5>] irq_exit+0x85/0x90
[<ffffffff814cf915>] do_IRQ+0x75/0xf0
[<ffffffff81013ad3>] ret_from_intr+0x0/0x11
<EOI>
[<ffffffff8101bc01>] ? mwait_idle+0x71/0xd0
[<ffffffff814cd80a>] ? atomic_notifier_call_chain+0x1a/0x20
[<ffffffff81011e96>] cpu_idle+0xb6/0x110
[<ffffffff814c17c8>] start_secondary+0x1fc/0x23f

Resulted from bonding driver registering packet handlers via dev_add_pack and
then trying to call pskb_may_pull. If another packet handler (like for AF_PACKET
sockets) gets called first, the delivered skb will have a user count > 1, which
causes pskb_may_pull to BUG halt when it does its skb_shared check. Fix this by
calling skb_share_check prior to the may_pull call sites in the bonding driver
to clone the skb when needed. Tested by myself and the reported successfully.

Signed-off-by: Neil Horman
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ffa95ed50f9fb2d8faaa6bd73086a7056ea46a06 13-Dec-2010 Ben Hutchings <bhutchings@solarflare.com> bonding: Change active slave quietly when bond is down

bond_change_active_slave() may be called when a slave is added, even
if the bond has not been brought up yet. It may then attempt to send
packets, and further it may use mcast_work which is uninitialised
before the bond is brought up. Add the necessary checks for
netif_running(bond->dev).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8387451e558853f7b513790c0070e3b6f0c135aa 13-Dec-2010 Ben Hutchings <bhutchings@solarflare.com> bonding/vlan: Remove redundant VLAN tag insertion logic

A bond may have a mixture of slave devices with and without hardware
VLAN tag insertion capability. Therefore it always claims this
capability and performs software VLAN tag insertion if the slave does
not.

Since commit 7b9c60903714bf0a19d746b228864bad3497284e, this has
also been done by dev_hard_start_xmit(). The result is that VLAN-
tagged skbs are now double-tagged when transmitted through slave
devices without hardware VLAN tag insertion!

Remove the now-redundant logic from bond_dev_queue_xmit().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f073c7ca29a4a7e14060d9d3ddf09bfbb7cd9cc0 09-Dec-2010 Taku Izumi <izumi.taku@jp.fujitsu.com> bonding: add the debugfs facility to the bonding driver

This patch provides the debugfs facility to the bonding driver.
The "bonding" directory is created in the debugfs root and directories of
each bonding interface (like bond0, bond1...) are created in that.

# mount -t debugfs none /sys/kernel/debug

# ls /sys/kernel/debug/bonding
bond0 bond1

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fb4fa76a1fa59340154c42d998d700e1f8bf21e0 06-Dec-2010 Neil Horman <nhorman@tuxdriver.com> net: Convert netpoll blocking api in bonding driver to be a counter

A while back I made some changes to enable netpoll in the bonding driver. Among
them was a per-cpu flag that indicated we were in a path that held locks which
could cause the netpoll path to block in during tx, and as such the tx path
should queue the frame for later use. This appears to have given rise to a
regression. If one of those paths on which we hold the per-cpu flag yields the
cpu, its possible for us to come back on a different cpu, leading to us clearing
a different flag than we set. This results in odd netpoll drops, and BUG
backtraces appearing in the log, as we check to make sure that we only clear set
bits, and only set clear bits. I had though briefly about changing the
offending paths so that they wouldn't sleep, but looking at my origional work
more closely, it doesn't appear that a per-cpu flag is warranted. We alrady
gate the checking of this flag on IFF_IN_NETPOLL, so we don't hit this in the
normal tx case anyway. And practically speaking, the normal use case for
netpoll is to only have one client anyway, so we're not going to erroneously
queue netpoll frames when its actually safe to do so. As such, lets just
convert that per-cpu flag to an atomic counter. It fixes the rescheduling bugs,
is equivalent from a performance perspective and actually eliminates some code
in the process.

Tested by the reporter and myself, successfully

Reported-by: Liang Zheng <lzheng@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d13a2cb63d06fe2e3067c7d40f9a5946abd614c8 01-Dec-2010 David Strand <dpstrand@gmail.com> bonding: check for assigned mac before adopting the slaves mac address

Restore the check for an unassigned mac address before adopting the
first slaves as it's own. The change in behavior was introduced by:

commit c20811a79e671a6a1fe86a8c1afe04aca8a7f085
Author: Jiri Pirko <jpirko@redhat.com>

bonding: move dev_addr cpy to bond_enslave


Signed-off-by: David Strand <dpstrand@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
866f3b25a2eb60d7529c227a0ecd80c3aba443fd 18-Nov-2010 Eric Dumazet <eric.dumazet@gmail.com> bonding: IGMP handling cleanup

Instead of iterating in_dev->mc_list from bonding driver, its better
to call a helper function provided by igmp.c
Details of implementation (locking) are private to igmp code.

ip_mc_rejoin_group(struct ip_mc_list *im) becomes
ip_mc_rejoin_groups(struct in_device *in_dev);

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3006bc38895895f1a0352c2e17e1a503f35f7e2f 18-Nov-2010 Eric Dumazet <eric.dumazet@gmail.com> bonding: fix a race in IGMP handling

RCU conversion in IGMP code done in net-next-2.6 raised a race in
__bond_resend_igmp_join_requests().

It iterates in_dev->mc_list without appropriate protection (RTNL, or
read_lock on in_dev->mc_list_lock).

Another cpu might delete an entry while we use it and trigger a fault.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e4a7b93bd5d84e1e79917d024d17d745d190fc9a 29-Oct-2010 Eric Dumazet <eric.dumazet@gmail.com> bonding: remove dev_base_lock use

bond_info_seq_start() uses a read_lock(&dev_base_lock) to make sure
device doesn’t disappear. Same goal can be achieved using RCU.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a71fb88145a03678fef3796930993e390db68a15 27-Oct-2010 Jarek Poplawski <jarkao2@gmail.com> bonding: Fix lockdep warning after bond_vlan_rx_register()

Fix lockdep warning:
[ 52.991402] ======================================================
[ 52.991511] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[ 52.991569] 2.6.36-04573-g4b60626-dirty #65
[ 52.991622] ------------------------------------------------------
[ 52.991696] ip/4842 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
[ 52.991758] (&bond->lock){++++..}, at: [<efe4d300>] bond_set_multicast_list+0x60/0x2c0 [bonding]
[ 52.991966]
[ 52.991967] and this task is already holding:
[ 52.992008] (&bonding_netdev_addr_lock_key){+.....}, at: [<c04e5530>] dev_mc_sync+0x50/0xa0
[ 52.992008] which would create a new lock dependency:
[ 52.992008] (&bonding_netdev_addr_lock_key){+.....} -> (&bond->lock){++++..}
[ 52.992008]
[ 52.992008] but this new dependency connects a SOFTIRQ-irq-safe lock:
[ 52.992008] (&(&mc->mca_lock)->rlock){+.-...}
[ 52.992008] ... which became SOFTIRQ-irq-safe at:
[ 52.992008] [<c0272beb>] __lock_acquire+0x96b/0x1960
[ 52.992008] [<c027415e>] lock_acquire+0x7e/0xf0
[ 52.992008] [<c05f356d>] _raw_spin_lock_bh+0x3d/0x50
[ 52.992008] [<c0584e40>] mld_ifc_timer_expire+0xf0/0x280
[ 52.992008] [<c024cee6>] run_timer_softirq+0x146/0x310
[ 52.992008] [<c024591d>] __do_softirq+0xad/0x1c0
[ 52.992008]
[ 52.992008] to a SOFTIRQ-irq-unsafe lock:
[ 52.992008] (&bond->lock){++++..}
[ 52.992008] ... which became SOFTIRQ-irq-unsafe at:
[ 52.992008] ... [<c0272c3b>] __lock_acquire+0x9bb/0x1960
[ 52.992008] [<c027415e>] lock_acquire+0x7e/0xf0
[ 52.992008] [<c05f36b8>] _raw_write_lock+0x38/0x50
[ 52.992008] [<efe4cbe4>] bond_vlan_rx_register+0x24/0x70 [bonding]
[ 52.992008] [<c0598010>] register_vlan_dev+0xc0/0x280
[ 52.992008] [<c0599f3a>] vlan_newlink+0xaa/0xd0
[ 52.992008] [<c04ed4b4>] rtnl_newlink+0x404/0x490
[ 52.992008] [<c04ece35>] rtnetlink_rcv_msg+0x1e5/0x220
[ 52.992008] [<c050424e>] netlink_rcv_skb+0x8e/0xb0
[ 52.992008] [<c04ecbac>] rtnetlink_rcv+0x1c/0x30
[ 52.992008] [<c0503bfb>] netlink_unicast+0x24b/0x290
[ 52.992008] [<c0503e37>] netlink_sendmsg+0x1f7/0x310
[ 52.992008] [<c04cd41c>] sock_sendmsg+0xac/0xe0
[ 52.992008] [<c04ceb80>] sys_sendmsg+0x130/0x230
[ 52.992008] [<c04cf04e>] sys_socketcall+0xde/0x280
[ 52.992008] [<c0202d10>] sysenter_do_call+0x12/0x36
[ 52.992008]
[ 52.992008] other info that might help us debug this:
...
[ Full info at netdev: Wed, 27 Oct 2010 12:24:30 +0200
Subject: [BUG net-2.6 vlan/bonding] lockdep splats ]

Use BH variant of write_lock(&bond->lock) (as elsewhere in bond_main)
to prevent this dependency.

Fixes commit f35188faa0fbabefac476536994f4b6f3677380f [v2.6.36]

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
26d8ee75e08cfca8b65ade871d68c8cd96e4ea23 15-Oct-2010 stephen hemminger <shemminger@vyatta.com> bonding: make release_and_destroy static

Only used in main file.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
379b7383413d883ffc4db55914626ca303e6f7f5 15-Oct-2010 stephen hemminger <shemminger@vyatta.com> bonding: make bond_resend_igmp_join_requests_delayed static

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9ff76c951c5194d44a7cdce51d807d67fc3ae514 19-Oct-2010 Neil Horman <nhorman@tuxdriver.com> netpoll: Remove netpoll blocking from uninit path

Some recent testing in netpoll with bonding showed this backtrace

------------[ cut here ]------------
kernel BUG at drivers/net/bonding/bonding.h:134!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.2/usb7/devnum
CPU 0
Pid: 1876, comm: rmmod Not tainted 2.6.36-rc3+ #10 D26928/
RIP: 0010:[<ffffffffa0514ba4>] [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0
RSP: 0018:ffff88003b1b5d58 EFLAGS: 00010296
RAX: ffff88003b9b6200 RBX: ffff8800373e8e00 RCX: 00000000000f4240
RDX: 00000000ffffffff RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff88003b1b5dc8 R08: 0000000000000000 R09: 00000001af7de920
R10: 0000000000000000 R11: ffff880002495e98 R12: ffff880037922700
R13: ffff880038c31000 R14: ffff880037922730 R15: 0000000000000286
FS: 00007f90e6d72700(0000) GS:ffff880002400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000346f0d9ad0 CR3: 000000003b263000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 1876, threadinfo ffff88003b1b4000, task ffff88003b36aa80)
Stack:
00000000ffffffff ffff88003b1b5d7a ffff8800379221e8 ffff880037922000
<0> ffff88003b1b5dc8 ffffffff813eb5fb ffff88003b1b5da8 0000000031b177a3
<0> ffff88003b1b5da8 ffff880037922000 ffff88003b1b5e48 ffff88003b1b5e48
Call Trace:
[<ffffffff813eb5fb>] ? rtmsg_ifinfo+0xcb/0xf0
[<ffffffff813daad8>] rollback_registered_many+0x168/0x280
[<ffffffff813dac09>] unregister_netdevice_many+0x19/0x80
[<ffffffff813e97b3>] __rtnl_kill_links+0x63/0x90
[<ffffffff813e980b>] __rtnl_link_unregister+0x2b/0x60
[<ffffffff813e9bde>] rtnl_link_unregister+0x1e/0x30
[<ffffffffa052124b>] bonding_exit+0x37/0x51 [bonding]
[<ffffffff81098b2e>] sys_delete_module+0x19e/0x270
[<ffffffff810bb2b2>] ? audit_syscall_entry+0x252/0x280
[<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
RIP [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0 [bonding]
RSP <ffff88003b1b5d58>
---[ end trace 1395ad691cea24d1 ]---

It occurs because of my recent netpoll blocking patches, which I added to avoid
recursive deadlock in the bonding driver. It relies on some per cpu bits, but
the shutdown path forces some rescheduling as we cancel workqueues for the
driver and wait for some device refcounts. If after the forced reschedule, we
wind up on a different cpu we trigger the bughalt in unblock_netpoll_tx.

The fix is to remove the netpoll block/unblock calls from bond_release_all.
This is safe to do because bond_uninit, which is called via ndo_uninit in
rollback_registered_many, doesn't occur until we send a NETDEV_UNREGISTER event,
which triggers netconsole to remove us as a netpoll client, so we are guaranteed
not to recurse into our own tx path here.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
45b0cb8abdbdd425934f6b02dbb3963dd89fef55 13-Oct-2010 Neil Horman <nhorman@tuxdriver.com> bonding: Re-enable netpoll over bonding

With the inclusion of previous fixup patches, netpoll over bonding apears to
work reliably with failover conditions. This reverts Gospos previous commit
c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b, and allows access again to the netpoll
functionality in the bonding driver.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e843fa50887582c867d8b7995f81fe9c1a076806 13-Oct-2010 Neil Horman <nhorman@tuxdriver.com> bonding: Fix deadlock in bonding driver resulting from internal locking when using netpoll

The monitoring paths in the bonding driver take write locks that are shared by
the tx path. If netconsole is in use, these paths can call printk which puts us
in the netpoll tx path, which, if netconsole is attached to the bonding driver,
result in deadlock (the xmit_lock guards are useless in netpoll_send_skb, as the
monitor paths in the bonding driver don't claim the xmit_lock, nor should they).
The solution is to use a per cpu flag internal to the driver to indicate when a
cpu is holding the lock in a path that might recusrse into the tx path for the
driver via netconsole. By checking this flag on transmit, we can defer the
sending of the netconsole frames until a later time using the retransmit feature
of netpoll_send_skb that is triggered on the return code NETDEV_TX_BUSY. I've
tested this and am able to transmit via netconsole while causing failover
conditions on the bond slave links.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c2355e1ab910278a94d487b78590ee3c8eecd08a 13-Oct-2010 Neil Horman <nhorman@tuxdriver.com> bonding: Fix bonding drivers improper modification of netpoll structure

The bonding driver currently modifies the netpoll structure in its xmit path
while sending frames from netpoll. This is racy, as other cpus can access the
netpoll structure in parallel. Since the bonding driver points np->dev to a
slave device, other cpus can inadvertently attempt to send data directly to
slave devices, leading to improper locking with the bonding master, lost frames,
and deadlocks. This patch fixes that up.

This patch also removes the real_dev pointer from the netpoll structure as that
data is really only used by bonding in the poll_controller, and we can emulate
its behavior by check each slave for IS_UP.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dd53df265b1ee7a1fbbc76bb62c3bec2383bbd44 30-Sep-2010 Krzysztof Oledzki <ole@ans.pl> bonding: add Speed/Duplex information to /proc/net/bonding/bond

Effect:
Slave Interface: eth5
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: XX:XX:XX:XX:XX:XX
Slave queue ID: 0

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
546add79468183f266c75c632c96e4b0029e0d96 06-Oct-2010 Krzysztof Piotr Oledzki <ole@ans.pl> bonding: reread information about speed and duplex when interface goes up

When an interface was enslaved when it was down, bonding thinks
it has speed -1 even after it goes up. This leads into selecting
a wrong active interface in active/backup mode on mixed 10G/1G or
1G/100M environment.

before:
bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.

after:
bonding: bond0: link status definitely up for interface eth5, 10000 Mbps full duplex.
bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
700c2a779e6d5a60e2ef4716e75ea7f41546602f 06-Oct-2010 Krzysztof Piotr Oledzki <ole@ans.pl> bonding: print information about speed and duplex seen by the driver

before:
bonding: bond0: link status definitely up for interface eth5
bonding: bond0: link status definitely up for interface eth0

after:
bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.


Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
c2952c314b4fe61820ba8fd6c949eed636140d52 05-Oct-2010 Flavio Leitner <fleitner@redhat.com> bonding: add retransmit membership reports tunable

Allow sysadmins to configure the number of multicast
membership report sent on a link failure event.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a37e8ca8536c47871d46c82211f399adf06fd44 05-Oct-2010 Flavio Leitner <fleitner@redhat.com> bonding: rejoin multicast groups on VLANs

During a failover, the IGMP membership is sent to update
the switch restoring the traffic, but it misses groups added
to VLAN devices running on top of bonding devices.

This patch changes it to iterate over all VLAN devices
on top of it sending IGMP memberships too.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
27e6f065df132b5270014d3285889b15185e9da9 05-Oct-2010 Neil Horman <nhorman@tuxdriver.com> bonding: fix WARN_ON when writing to bond_master sysfs file

Fix a WARN_ON failure in bond_masters sysfs file

Got a report of this warning recently

bonding: bond0 is being created...
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:590 proc_register+0x14d/0x185()
Hardware name: ProLiant BL465c G1
proc_dir_entry 'bonding/bond0' already registered
Modules linked in: bonding ipv6 tg3 bnx2 shpchp amd64_edac_mod edac_core
ipmi_si
ipmi_msghandler serio_raw i2c_piix4 k8temp edac_mce_amd hpwdt microcode hpsa
cc
iss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded:
scsi_wai
t_scan]
Pid: 935, comm: ifup-eth Not tainted 2.6.33.5-124.fc13.x86_64 #1
Call Trace:
[<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f
[<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e
[<ffffffff8114bf0b>] proc_register+0x14d/0x185
[<ffffffff8114c20c>] proc_create_data+0x87/0xa1
[<ffffffffa0211e9b>] bond_create_proc_entry+0x55/0x95 [bonding]
[<ffffffffa0215e5d>] bond_init+0x95/0xd0 [bonding]
[<ffffffff8138cd97>] register_netdevice+0xdd/0x29e
[<ffffffffa021240b>] bond_create+0x8e/0xb8 [bonding]
[<ffffffffa021c4be>] bonding_store_bonds+0xb3/0x1c1 [bonding]
[<ffffffff812aec85>] class_attr_store+0x27/0x29
[<ffffffff8115423d>] sysfs_write_file+0x10f/0x14b
[<ffffffff81101acf>] vfs_write+0xa9/0x106
[<ffffffff81101be2>] sys_write+0x45/0x69
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
---[ end trace a677c3f7f8b16b1e ]---
bonding: Bond creation failed.

It happens because a user space writer to bond_master can try to
register an already existing bond interface name. Fix it by teaching
bond_create to check for the existance of devices with that name first
in cases where a non-NULL name parameter has been passed in

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6599c2ecf18002339fe81cde1fa83b37bf26290 17-Sep-2010 Eric Dumazet <eric.dumazet@gmail.com> bonding: enable gro by default

gro can be enabled by default on bonding devices.

Actual support depends on the lower devices.

One can still use ethtool to switch off GRO if needed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cb32f2a0d194212e4e750a8cdedcc610c9ca4876 02-Sep-2010 Jiri Bohac <jbohac@suse.cz> bonding: Fix jiffies overflow problems (again)

The time_before_eq()/time_after_eq() functions operate on unsigned
long and only work if the difference between the two compared values
is smaller than half the range of unsigned long (31 bits on i386).

Some of the variables (slave->jiffies, dev->trans_start, dev->last_rx)
used by bonding store a copy of jiffies and may not be updated for a
long time. With HZ=1000, time_before_eq()/time_after_eq() will start
giving bad results after ~25 days.

jiffies will never be before slave->jiffies, dev->trans_start,
dev->last_rx by more than possibly a couple ticks caused by preemption
of this code. This allows us to detect/prevent these overflows by
replacing time_before_eq()/time_after_eq() with time_in_range().

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
03dc2f4c525afb9488edb687c2e1f7057d59b40e 21-Jul-2010 Jay Vosburgh <fubar@us.ibm.com> bonding: don't lock when copying/clearing VLAN list on slave

When copying VLAN information to or removing from a slave
during slave addition or removal, the bonding code currently holds
the bond->lock for write to prevent concurrent modification of the
vlan_list / vlgrp.

This is unnecessary, as all of these operations occur under
RTNL. Holding the bond->lock also caused might_sleep issues for
some drivers' ndo_vlan_* functions. This patch removes the extra
locking.

Problem reported by Michael Chan <mchan@broadcom.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f35188faa0fbabefac476536994f4b6f3677380f 21-Jul-2010 Jay Vosburgh <fubar@us.ibm.com> bonding: change test for presence of VLANs

After commit ad1afb00393915a51c21b1ae8704562bf036855f
("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
it is now regular practice for a VLAN "add vid" for VLAN 0 to
arrive prior to any VLAN registration or creation of a vlan_group.

This patch updates the bonding code that tests for the presence
of VLANs configured above bonding. The new logic tests for bond->vlgrp
to determine if a registration has occured, instead of testing that
bonding's internal vlan_list is empty.

The old code would panic when vlan_list was not empty, but
vlgrp was still NULL (because only an "add vid" for VLAN 0 had occured).

Bonding still adds VLAN 0 to its internal list so that 802.1p
frames are handled correctly on transmit when non-VLAN accelerated
slaves are members of the bond. The test against bond->vlan_list
remains in bond_dev_queue_xmit for this reason.

Modification to the bond->vlgrp now occurs under lock (in
addition to RTNL), because not all inspections of it occur under RTNL.

Additionally, because 8021q will never issue a "kill vid" for
VLAN 0, there is now logic in bond_uninit to release any remaining
entries from vlan_list.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Cc: Pedro Garcia <pedro.netdev@dondevamos.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
90e1795b9b18ce47e95cd26028a9cfd0f4cc35ba 19-Jul-2010 Eric Dumazet <eric.dumazet@gmail.com> bonding: avoid a warning

drivers/net/bonding/bond_main.c:179:12: warning: ‘disable_netpoll’
defined but not used

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
28172739f0a276eb8d6ca917b3974c2edb036da3 07-Jul-2010 Eric Dumazet <eric.dumazet@gmail.com> net: fix 64 bit counters on 32 bit arches

There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.

One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().

Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)

Old drivers (only providing get_stats() method) need no changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b 25-Jun-2010 Andy Gospodarek <andy@greyhouse.net> bonding: prevent netpoll over bonded interfaces

Support for netpoll over bonded interfaces was added here:

commit f6dc31a85cd46a959bdd987adad14c3b645e03c1
Author: WANG Cong <amwang@redhat.com>
Date: Thu May 6 00:48:51 2010 -0700

bonding: make bonding support netpoll

but it is bad enough that we should probably just disable netpoll over
bonding until some of the locking logic in the bonding driver is changed
or converted completely to RCU. Simple actions like changing the active
slave in active-backup mode will hang the box if a high enough printk
debugging level is enabled.

Keeping the old code around will be good for anyone that wants to work
on it (and for after the RCU conversion), so I propose this small patch
rather than ripping it all out.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
be1f3c2c027cc5ad735df6a45a542ed1db7ec48b 08-Jun-2010 Ben Hutchings <bhutchings@solarflare.com> net: Enable 64-bit net device statistics on 32-bit architectures

Use struct rtnl_link_stats64 as the statistics structure.

On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64. Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.

Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64. Drivers that implement
this operation must not update the structure asynchronously.

Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64. Change callers of
dev_get_stats() accordingly.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d8d1f30b95a635dbd610dcc5eb641aca8f4768cf 11-Jun-2010 Changli Gao <xiaosuo@gmail.com> net-next: remove useless union keyword

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bb1d912323d5dd50e1079e389f4e964be14f0ae3 02-Jun-2010 Andy Gospodarek <andy@greyhouse.net> bonding: allow user-controlled output slave selection

v2: changed bonding module version, modified to apply on top of changes
from previous patch in series, and updated documentation to elaborate on
multiqueue awareness that now exists in bonding driver.

This patch give the user the ability to control the output slave for
round-robin and active-backup bonding. Similar functionality was
discussed in the past, but Jay Vosburgh indicated he would rather see a
feature like this added to existing modes rather than creating a
completely new mode. Jay's thoughts as well as Neil's input surrounding
some of the issues with the first implementation pushed us toward a
design that relied on the queue_mapping rather than skb marks.
Round-robin and active-backup modes were chosen as the first users of
this slave selection as they seemed like the most logical choices when
considering a multi-switch environment.

Round-robin mode works without any modification, but active-backup does
require inclusion of the first patch in this series and setting
the 'all_slaves_active' flag. This will allow reception of unicast traffic on
any of the backup interfaces.

This was tested with IPv4-based filters as well as VLAN-based filters
with good results.

More information as well as a configuration example is available in the
patch to Documentation/networking/bonding.txt.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ebd8e4977a87cb81d93c62a9bff0102a9713722f 02-Jun-2010 Andy Gospodarek <andy@greyhouse.net> bonding: add all_slaves_active parameter

v2: changed parameter name from 'keep_all' to 'all_slaves_active' and
skipped setting slaves to inactive rather than creating a new flag at
Jay's suggestion.

In an effort to suppress duplicate frames on certain bonding modes
(specifically the modes that do not require additional configuration on
the switch or switches connected to the host), code was added in the
generic receive patch in 2.6.16. The current behavior works quite well
for most users, but there are some times it would be nice to restore old
functionality and allow all frames to make their way up the stack.

This patch adds support for a new module option and sysfs file called
'all_slaves_active' that will restore pre-2.6.16 functionality if the
user desires. The default value is '0' and retains existing behavior,
but the user can set it to '1' and allow all frames up if desired.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5206e24c2c348d739c31ebed10a741a02bbde9e0 19-May-2010 Jiri Pirko <jpirko@redhat.com> bonding: remove unused original_flags struct slave member

This is stored but never restored. So remove this as it is useless.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c20811a79e671a6a1fe86a8c1afe04aca8a7f085 19-May-2010 Jiri Pirko <jpirko@redhat.com> bonding: move dev_addr cpy to bond_enslave

Move the code that copies slave's mac address in case that's the first slave into
bond_enslave. Ifenslave app does this also but that's not a problem. This is
something that should be done in bond_enslave, and it shound not matter from
where is it called.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b15ba0fbdc2e54c3885fed91c54aeef7fe474033 18-May-2010 Jiri Pirko <jpirko@redhat.com> bonding: move slave MTU handling from sysfs V2

V1->V2: corrected res/ret use

For some reason, MTU handling (storing, and restoring) is taking place in
bond_sysfs. The correct place for this code is in bond_enslave, bond_release.
So move it there.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f6dc31a85cd46a959bdd987adad14c3b645e03c1 06-May-2010 WANG Cong <amwang@redhat.com> bonding: make bonding support netpoll

Based on Andy's work, but I modified a lot.

Similar to the patch for bridge, this patch does:

1) implement the 2 methods to support netpoll for bonding;

2) modify netpoll during forwarding packets via bonding;

3) disable netpoll support of bonding when a netpoll-unabled device
is added to bonding;

4) enable netpoll support when all underlying devices support netpoll.

Cc: Andy Gospodarek <gospo@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22bedad3ce112d5ca1eaf043d4990fa2ed698c87 01-Apr-2010 Jiri Pirko <jpirko@redhat.com> net: convert multicast list to list_head

Converts the list and the core manipulating with it to be the same as uc_list.

+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a748ee2426817a95b1f03012d8f339c45c722ae1 01-Apr-2010 Jiri Pirko <jpirko@redhat.com> net: move address list functions to a separate file

+little renaming of unicast functions to be smooth with multicast ones

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e2e61fbf8ad016d24e4af0afff13505f3dd2a2a 31-Mar-2010 Amerigo Wang <amwang@redhat.com> bonding: fix potential deadlock in bond_uninit()

bond_uninit() is invoked with rtnl_lock held, when it does destroy_workqueue()
which will potentially flush all works in this workqueue, if we hold rtnl_lock
again in the work function, it will deadlock.

So move destroy_workqueue() to destructor where rtnl_lock is not held any more,
suggested by Eric.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
00ae702847df5566ce9182e9c895185e2ad1c181 31-Mar-2010 Eric Dumazet <eric.dumazet@gmail.com> bonding: bond_xmit_roundrobin() fix

Commit a2fd940f (bonding: fix broken multicast with round-robin mode)
added a problem on litle endian machines.

drivers/net/bonding/bond_main.c:4159: warning: comparison is always
false due to limited range of data type

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a2fd940f4cff74b932728bd6ca12848da21a0234 25-Mar-2010 Andy Gospodarek <andy@greyhouse.net> bonding: fix broken multicast with round-robin mode

Round-robin (mode 0) does nothing to ensure that any multicast traffic
originally destined for the host will continue to arrive at the host when
the link that sent the IGMP join or membership report goes down. One of
the benefits of absolute round-robin transmit.

Keeping track of subscribed multicast groups for each slave did not seem
like a good use of resources, so I decided to simply send on the
curr_active slave of the bond (typically the first enslaved device that
is up). This makes failover management simple as IGMP membership
reports only need to be sent when the curr_active_slave changes. I
tested this patch and it appears to work as expected.

Originally reported by Lon Hohberger <lhh@redhat.com>.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Lon Hohberger <lhh@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2381a55c88453d3f29fe62d235579a05fc20b7b3 24-Mar-2010 Frans Pop <elendil@planet.nl> net/various: remove trailing space in messages

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
32a806c194ea112cfab00f558482dd97bee5e44e 19-Mar-2010 Jiri Pirko <jpirko@redhat.com> bonding: flush unicast and multicast lists when changing type

After the type change, addresses in unicast and multicast lists wouldn't make
sense, not to mention possible different lenghts. So flush both lists here.

Note "dev_addr_discard" will be very soon replaced by "dev_mc_flush" (once
mc_list conversion will be done).

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3ca5b4042ecae5e73c59de62e4ac0db31c10e0f8 10-Mar-2010 Jiri Pirko <jpirko@redhat.com> bonding: check return value of nofitier when changing type

This patch adds the possibility to refuse the bonding type change for
other subsystems (such as for example bridge, vlan, etc.)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
93d9b7d7a85cfb4e1711d5226eba73586dd4919f 10-Mar-2010 Jiri Pirko <jpirko@redhat.com> net: rename notifier defines for netdev type change

Since generally there could be more netdevices changing type other
than bonding, making this event type name "bonding-unrelated"

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d6184e4881b423522136aeb3ec1cbd9c35e8813 27-Feb-2010 Patrick McHardy <kaber@trash.net> bonding: fix device leak on error in bond_create()

When the register_netdevice() call fails, the newly allocated device is
not freed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
35cfabdc5e9b99e732899db8f36c63a215e105bc 01-Feb-2010 Ajit Khaparde <ajitkhaparde@gmail.com> bonding: Remove net_device_stats from bonding struct

There is no need to maintain stats in the bonding structure.
Use the instance of net_device_stats in netdevice.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b473946a0853860e13b51c28add5524741117786 26-Jan-2010 stephen hemminger <shemminger@vyatta.com> bonding: bond_open error return value

The convention for API functions in kernel is to return errno value;
bond_open would return -1 if alb setup failed. The only reason that
could happen is if kmalloc() failed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2c8c1e7297e19bdef3c178c3ea41d898a7716e3e 17-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com> net: spread __net_init, __net_exit

__net_init/__net_exit are apparently not going away, so use them
to full extent.

In some cases __net_init was removed, because it was called from
__net_exit code.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1f3c8804acba841b5573b953f5560d2683d2db0d 14-Dec-2009 Andy Gospodarek <andy@greyhouse.net> bonding: allow arp_ip_targets on separate vlans to use arp validation

This allows a bond device to specify an arp_ip_target as a host that is
not on the same vlan as the base bond device and still use arp
validation. A configuration like this, now works:

BONDING_OPTS="mode=active-backup arp_interval=1000 arp_ip_target=10.0.100.1 arp_validate=3"

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
inet6 fe80::213:21ff:febe:33e9/64 scope link
valid_lft forever preferred_lft forever
9: bond0.100@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
inet 10.0.100.2/24 brd 10.0.100.255 scope global bond0.100
inet6 fe80::213:21ff:febe:33e9/64 scope link
valid_lft forever preferred_lft forever

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.0.100.1

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:40:05:30:ff:30

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:13:21:be:33:e9

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a4aee5c808fc5bf6889c9012217841eb3fd91a6a 14-Dec-2009 Joe Perches <joe@perches.com> drivers/net/bonding/: : use pr_fmt

Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove DRV_NAME from pr_<level>s
Consolidate long format strings
Remove some extra tab indents
Remove some unnecessary ()s from pr_<level>s arguments
Align pr_<level> arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8e95a2026f3b43f7c3d676adaccd2de9532e8dcc 03-Dec-2009 Joe Perches <joe@perches.com> drivers/net: Move && and || to end of previous line

Only files where David Miller is the primary git-signer.
wireless, wimax, ixgbe, etc are not modified.

Compile tested x86 allyesconfig only
Not all files compiled (not x86 compatible)

Added a few > 80 column lines, which I ignored.
Existing checkpatch complaints ignored.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15449745e5d181ae214ceaf0880350bb4e63512a 29-Nov-2009 Eric W. Biederman <ebiederm@xmission.com> net: Simplify the bond drivers pernet operations.

Take advantage of the new pernet automatic storage management,
and stop using compatibility network namespace functions.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f99189b186f3922ede4fa33c02f6edc735b8c981 17-Nov-2009 Eric Dumazet <eric.dumazet@gmail.com> netns: net_identifiers should be read_mostly

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6639104bd826e0b1388c69a6b7564fffc636c8a8 30-Oct-2009 Eric W. Biederman <ebiederm@xmission.com> bond: Get the rtnl_link_ops support correct

- Don't call rtnl_link_unregister if rtnl_link_register fails
- Set .priv_size so we aren't stomping on uninitialized memory
when we use netdev_priv, on bond devices created with
ip link add type bond.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ec87fd3b4e111e8bc367d247a963e27e5b86df26 29-Oct-2009 Eric W. Biederman <ebiederm@aristanetworks.com> bond: Add support for multiple network namespaces

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
88ead977109da926a03068e277869ea8fedd170d 29-Oct-2009 Eric W. Biederman <ebiederm@aristanetworks.com> bond: Implement a basic set of rtnl link ops

This implements a basic set of rtnl link ops and takes advantage of
the fact that rtnl_link_unregister kills all of the surviving
devices to all us to kill bond_free_all. A module alias
is added so ip link add can pull in the bonding module.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c67dfb299e05a132154b9bfaae4a83de478ffaa9 29-Oct-2009 Eric W. Biederman <ebiederm@aristanetworks.com> bond: Simplify bond device destruction

Manually inline the code from bond_deinit to bond_uninit. bond_uninit
is the only caller and it is short.

Move the call of bond_release_all from the netdev notifier into
bond_uninit. The call site is effectively the same and performing
the call explicitly allows all the paths for destroying a
bonding device to behave the same way.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
30c15ba9936a17d743f90eb3e2f6fa82acddc5f3 29-Oct-2009 Eric W. Biederman <ebiederm@aristanetworks.com> bond: Simplify bond_create.

Stop calling dev_get_by_name to see if the bond device already
exists. register_netdevice already does that.

Stop calling bond_deinit if register_netdevice fails as bond_uninit
is guaranteed to be called if bond_init succeeds.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6151b3d435feeeae7487032fcd5c8c7f281ba05c 29-Oct-2009 Eric W. Biederman <ebiederm@aristanetworks.com> bond: Simply bond sysfs group creation

This patch delegates the work of creating the sysfs groups
to the netdev layer and ultimately to the device layer. This
closes races between uevents.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d9d5283228d0c752f199c901fff6e1405dc91bcb 29-Oct-2009 Jiri Bohac <jbohac@suse.cz> bonding: fix a race condition in calls to slave MII ioctls

In mii monitor mode, bond_check_dev_link() calls the the ioctl
handler of slave devices. It stores the ndo_do_ioctl function
pointer to a static (!) ioctl variable and later uses it to call the
handler with the IOCTL macro.

If another thread executes bond_check_dev_link() at the same time
(even with a different bond, which none of the locks prevent), a
race condition occurs. If the two racing slaves have different
drivers, this may result in one driver's ioctl handler being
called with a pointer to a net_device controlled with a different
driver, resulting in unpredictable breakage.

Unless I am overlooking something, the "static" must be a
copy'n'paste error (?).

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
a361c83cb4d7c8fe013d82a2f124175a7f276f30 23-Oct-2009 Jasper Spaans <spaans@fox-it.com> bonding: Remove bond_dev from xmit_hash_policy call.

Now that the bonding device is no longer used in determining the device to
which to send packets, it can be dropped from the argument list of the various
xmit_hash_policy calls.

Signed-off-by: Jasper Spaans <spaans@fox-it.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d3da68310a2cf934c2ea8a99a519d8b1ccca4c56 23-Oct-2009 Jasper Spaans <spaans@fox-it.com> bonding: Modify hash transmit policies to use the packet's source MAC address

Modify bonding hash transmit policies to use the psource MAC address of
the packet instead of the MAC address configured for the bonding device.

The old sitation conflicts with the documentation.

Signed-off-by: Jasper Spaans <spaans@fox-it.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
38fc0026da255aa328c3730a1c4d28b4e11e6a2b 13-Oct-2009 Nicolas de Pesloüan <nicolas.2p.debian@free.fr> bonding: change bond_create_proc_entry() to return void

The function bond_create_proc_entry is currently of type int.

Two versions of this function exist:

The one in the ifdef CONFIG_PROC_FS branch always return 0.
The one in the else branch (which is empty) return nothing.

When CONFIG_PROC_FS is undef, this cause the following warning:

drivers/net/bonding/bond_main.c: In function `bond_create_proc_entry':
drivers/net/bonding/bond_main.c:3393: warning: control reaches end of
non-void function

No caller of this function use the returned value.

So change the returned type from int to void and remove the
useless return 0; .

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
49b4ad92d1a5bb9909deb3216ffec6f0febc7b71 07-Oct-2009 Nicolas de Pesloüan <nicolas.2p.debian@free.fr> bonding: remove useless assignment

The variable old_active is first set to bond->curr_active_slave.
Then, it is unconditionally set to new_active, without being used in between.

The first assignment, having no side effect, is useless.

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3c6aaa24613cbd56f853363e3ce00091a9d2eac8 07-Oct-2009 Nicolas de Pesloüan <nicolas.2p.debian@free.fr> bonding: fix a parameter name in error message

When parsing module parameters, bond_check_params() erroneously use
'xor_mode' as the name of a module parameter in an error message.

The right name for this parameter is 'xmit_hash_policy'.

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
a549952ad323d68daf5b50bf716db895479af84c 25-Sep-2009 Jiri Pirko <jpirko@redhat.com> bonding: introduce primary_reselect option

In some cases there is not desirable to switch back to primary interface when
it's link recovers and rather stay with currently active one. We need to avoid
packetloss as much as we can in some cases. This is solved by introducing
primary_reselect option. Note that enslaved primary slave is set as current
active no matter what.

Patch modified by Jay Vosburgh as follows: fixed bug in action
after change of option setting via sysfs, revised the documentation
update, and bumped the bonding version number.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b9f602533e2f5c32a09a3a75904e5373cb6e6377 31-Aug-2009 Jiri Pirko <jpirko@redhat.com> bonding: make ab_arp select active slaves as other modes

When I was implementing primary_passive option (formely named primary_lazy) I've
run into troubles with ab_arp. This is the only mode which is not using
bond_select_active_slave() function to select active slave and instead it
selects it itself. This seems to be not the right behaviour and it would be
better to do it in bond_select_active_slave() for all cases. This patch makes
this happen. Please review.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
75c78500ddad74b229cd0691496b8549490496a2 15-Sep-2009 Moni Shoua <monis@voltaire.com> bonding: remap muticast addresses without using dev_close() and dev_open()

This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:

*. It assumes tha the device is UP when calling dev_close(), or otherwise
dev_close() has no affect. It is worth to mention that initscripts (Redhat)
and sysconfig (Suse) doesn't act the same in this matter.
*. dev_close() has other side affects, like deleting entries from the routing
table, which might be unnecessary.

The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.

Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
424efe9caf6047ffbcd6b383ff4d2347254aabf1 31-Aug-2009 Stephen Hemminger <shemminger@vyatta.com> netdev: convert pseudo drivers to netdev_tx_t

These are all drivers that don't touch real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6c9888532bb540cb692f51f1d34fe9344eed5a0d 28-Aug-2009 Petri Gynther <pgynther@google.com> bonding: Have bond_check_dev_link examine netif_running

bonding: Have bond_check_dev_link examine netif_running

Some network devices do not call netif_carrier_off when they
are set administratively down. Have the bonding link check function
also inspect the netif_running state. Ignore netif_running if the
bond_check_dev_link function is called with "reporting" set, as in that
case it's inspecting the capabilities of the non-netif_carrier device
driver.

Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f584130616dfae757b888b7ee472e7c824f59e6a 28-Aug-2009 Nicolas de Pesloüan <nicolas.2p.debian@free.fr> bonding: Fix useless test: int > INT_MAX

max_bonds is of type int and cannot be greater than INT_MAX.

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
89c76c62f191daa7ede3d1d0c510a5ccfbcae571 28-Aug-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: use compare_ether_addr

Bonding can use compare_ether_addr() in bond_release.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
278339a42a1bcef1fb448d275056d519307e6025 28-Aug-2009 Jay Vosburgh <fubar@us.ibm.com> bonding: propogate vlan_features to bonding master

Propogate the vlan_features of the slave devices to the bonding
master device, using the same logic as for regular features.

Tested by Or Gerlitz <ogerlitz@voltaire.com>, who also removed
the debug logic from the original test patch.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e5e2a8fd8358d1b3a2c51c3248edee72e4194703 13-Aug-2009 Jiri Pirko <jpirko@redhat.com> bonding: wipe out printk's

I did not introduce new lines over 80 chars. I even eliminated some of
them.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10 15-Jul-2009 Moni Shoua <monis@Voltaire.COM> bonding: clean muticast addresses when device changes type

Bonding device forbids slave device of different types under the same
master.

However, it is possible for a bonding master to change type during its
lifetime. This can be either from ARPHRD_ETHER to ARPHRD_INFINIBAND
or the other way arround. The change of type requires device level
multicast address cleanup because device level multicast addresses
depend on the device type.

The patch adds a call to dev_close() before the bonding master changes
type and dev_open() just after that.

In the example below I enslaved an IPoIB device (ib0) under
bond0. Since each bonding master starts as device of type ARPHRD_ETHER
by default, a change of type occurs when ib0 is enslaved.

This is how /proc/net/dev_mcast looks like without the patch

5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 01005e000116
5 bond0 1 0 01005e7ffffd
5 bond0 1 0 01005e000001
5 bond0 1 0 333300000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 333300000001
6 ib0 1 0 01005e000001
6 ib0 1 0 01005e7ffffd
6 ib0 1 0 01005e000116
6 ib0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff00000000000000000001

and this is how it looks like after the patch.

5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 00ffffffff12601bffff00000000000000000001
5 bond0 1 0 00ffffffff12401bffff0000000000000ffffffd
5 bond0 1 0 00ffffffff12401bffff00000000000000000116
5 bond0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 00ffffffff12401bffff00000000000000000116
6 ib0 1 0 00ffffffff12401bffff0000000000000ffffffd
6 ib0 2 0 00ffffffff12401bffff00000000000000000001
6 ib0 2 0 00ffffffff12601bffff00000000000000000001

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ec634fe328182a1a098585bfc7b69e5042bdb08d 06-Jul-2009 Patrick McHardy <kaber@trash.net> net: convert remaining non-symbolic return values in ndo_start_xmit() functions

This patch converts the remaining occurences of raw return values to their
symbolic counterparts in ndo_start_xmit() functions that were missed by the
previous automatic conversion.

Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero
is changed to explicitly use NETDEV_TX_OK.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
181470fcf3f8ecc16625bc45a5f6f678e57bfb22 12-Jun-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: initialization rework

Need to rework how bonding devices are initialized to make it more
amenable to creating bonding devices via netlink.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
373500db927706d1f60785aff40b9884f789b01a 12-Jun-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: network device names are case sensative

The bonding device acts unlike all other Linux network device functions
in that it ignores case of device names. The developer must have come
from windows!

Cleanup the management of names and use standard routines where possible.
Flag places where bonding device still doesn't work right with network
namespaces.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3d632c3f28e69f0d6d44aa09c4df708d63a91a7c 12-Jun-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: fix style issues

Resolve some of the complaints from checkpatch, and remove "magic emacs format"
comments, and useless MODULE_SUPPORTED_DEVICE(). But should not
change actual code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e71626c1c23ec69372c43c6fe66c1171032bf42 12-Jun-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: fix destructor

It is not safe to use a network device destructor that is a function in
the module, since it can be called after module is unloaded if sysfs
handle is open.

When eventually using netlink, the device cleanup code needs to be done
via uninit function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7e0838404541d2758bee089632690aabd82f3d5d 12-Jun-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: remove bonding read/write semaphore

The whole read/write semaphore locking can be removed. It doesn't add any
protection that isn't already done by using the RTNL mutex properly.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d93216051ae60995736518ca9ebb58a0e6ade212 12-Jun-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: initialize before registration

Avoid a unnecessary carrier state transistion that happens when device
is registered.
Lockdep works better if initialization is done before registration as well.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d2991f75354941a4cdf61ce7443d21804b978f89 12-Jun-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: bond_create always called with default parameters

bond_create() is always called with same parameters so move the argument
down.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ae63e808f508c38fe65e23a1480c85d5bd00ecbd 27-May-2009 Jiri Pirko <jpirko@redhat.com> bonding: use bond_is_lb() when it's appropriate

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
93f154b594fe47e4a7e5358b309add449a046cd3 19-May-2009 Eric Dumazet <dada1@cosmosbay.com> net: release dst entry in dev_hard_start_xmit()

One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d21493b4beb8f918ba248032fefa393074a5e2b 18-May-2009 Eric Dumazet <dada1@cosmosbay.com> net: tx scalability works : trans_start

struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.

But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.

We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.

Further patches could also put tx_bytes/tx_packets counters in
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d34d1a20e8171be819a6c8c4de4eea6104d174e 08-May-2009 Florian Westphal <fw@strlen.de> bonding: fix panic if initialization fails

If module initialisation failed (e.g. because the bonding sysfs entry
cannot be created), kernel panics:
IP: [<ffffffff8024910a>] destroy_workqueue+0x2d/0x146
Call Trace:
[<ffffffff808268c4>] bond_destructor+0x28/0x78
[<ffffffff80b64471>] netdev_run_todo+0x231/0x25a
[<ffffffff80b6dbcd>] rtnl_unlock+0x9/0xb
[<ffffffff81567907>] bonding_init+0x83e/0x84a

Remove the calls to bond_work_cancel_all() and destroy_workqueue();
both are also called/scheduled via bond_free_all().

bond_destroy_sysfs is unecessary because the sysfs entry has
not been created in the error case.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aee64faf236815e0f337408892c01b373cd340f3 05-May-2009 Jiri Pirko <jpirko@redhat.com> bonding: get rid of CONFIG_PROC_FS ifdefs

Remove CONFIG_PROC_FS ifdefs from the code by adding void functions.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

drivers/net/bonding/bond_main.c | 30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
1363d9b135270662852ed2e6629fb79a36de5400 02-May-2009 Jiri Pirko <jpirko@redhat.com> bonding: correct the cleanup in bond_create()

This patch makes the cleanup in bond_create nicer :) Also now the forgotten
free_netdev is called.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
689c96cca7ec3d2ba7fba00481810f99f1803c63 23-Apr-2009 Eric Dumazet <dada1@cosmosbay.com> bonding: bond_slave_info_query() fix

bond_slave_info_query() should keep a read lock while accessing slave info,
or risk accessing stale data and corruption.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
41f8910040639eb106b1a5b5301aab79ecde4940 24-Apr-2009 Jiri Pirko <jpirko@redhat.com> bonding: ignore updelay param when there is no active slave

Pointed out by Sean E. Millichamp.

Quote from Documentation/networking/bonding.txt:
"Note that when a bonding interface has no active links, the
driver will immediately reuse the first link that goes up, even if the
updelay parameter has been specified (the updelay is ignored in this
case). If there are slave interfaces waiting for the updelay timeout
to expire, the interface that first went into that state will be
immediately reused. This reduces down time of the network if the
value of updelay has been overestimated, and since this occurs only in
cases with no connectivity, there is no additional penalty for
ignoring the updelay."

This patch actually changes the behaviour in this way.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

drivers/net/bonding/bond_main.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
29112f4e248ca6941f2233f6ed96a7283a67cced 24-Apr-2009 Jiri Pirko <jpirko@redhat.com> bonding: use ethtool for link checking first

This patch only changes the order of interfaces to use for checking slave link
status in bond_check_dev_link() to priorize ethtool interface. Should safe some
troubles as ethtool seems to be more supported.

Jirka

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

drivers/net/bonding/bond_main.c | 26 ++++++++++++--------------
1 files changed, 12 insertions(+), 14 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
5a31bec014449dc9ca994e4c1dbf2802b7ca458a 13-Apr-2009 Brian Haley <brian.haley@hp.com> Bonding: fix zero address hole bug in arp_ip_target list

Fix a zero address hole bug in the bonding arp_ip_target list
that was causing the bond to ignore ARP replies (bugz 13006).
Instead of just setting the array entry to zero, we now
copy any additional entries down one slot, putting the
zero entry at the end. With this change we can now have
all the loops that walk the array stop when they hit a zero
since there will be no addresses after it.

Changes are based in part on code fragment provided in kernel:
bugzilla 13006:

http://bugzilla.kernel.org/show_bug.cgi?id=13006

by Steve Howard <steve@astutenetworks.com>

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
99b76233803beab302123d243eea9e41149804f3 25-Mar-2009 Alexey Dobriyan <adobriyan@gmail.com> proc 2/2: remove struct proc_dir_entry::owner

Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.

We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.

But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.

->read_proc/->write_proc were just fixed to not require ->owner for
protection.

rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.

Removing ->owner will also make PDE smaller.

So, let's nuke it.

Kudos to Jeff Layton for reminding about this, let's say, oversight.

http://bugzilla.kernel.org/show_bug.cgi?id=12454

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
5a29f7893fbe681f1334285be7e41e56f0de666c 26-Mar-2009 Jiri Pirko <jpirko@redhat.com> bonding: select current active slave when enslaving device for mode tlb and alb

I've hit an issue on my system when I've been using RealTek RTL8139D cards in
bonding interface in mode balancing-alb. When I enslave a card, the current
active slave (bond->curr_active_slave) is not set and the link is therefore
not functional.

----
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: None
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:2f:22
----

The thing that gets it right is when I unplug the cable and then I put it back
into the NIC. Then the current active slave is set to eth1 and link is working
just fine. Here is dmesg log with bonding DEBUG messages turned on:
----
ADDRCONF(NETDEV_UP): bond0: link is not ready
event_dev: bond0, event: 1
IFF_MASTER
event_dev: bond0, event: 8
IFF_MASTER
bond_ioctl: master=bond0, cmd=35216
slave_dev=cac5d800:
slave_dev->name=eth1:
eth1: ! NETIF_F_VLAN_CHALLENGED
event_dev: eth1, event: 8
eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1
event_dev: eth1, event: 1
event_dev: eth1, event: 8
IFF_SLAVE
Initial state of slave_dev is BOND_LINK_UP
bonding: bond0: enslaving eth1 as an active interface with an up link.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
event_dev: bond0, event: 4
IFF_MASTER
bond0: no IPv6 routers present

<<<<cable unplug>>>>

eth1: link down
event_dev: eth1, event: 4
IFF_SLAVE
bonding: bond0: link status definitely down for interface eth1, disabling it
event_dev: bond0, event: 4
IFF_MASTER

<<<<cable plug>>>>

eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1
event_dev: eth1, event: 4
IFF_SLAVE
bonding: bond0: link status definitely up for interface eth1.
bonding: bond0: making interface eth1 the new active one.
event_dev: eth1, event: 8
IFF_SLAVE
event_dev: eth1, event: 8
IFF_SLAVE
bonding: bond0: first active interface up!
event_dev: bond0, event: 4
IFF_MASTER
----

The current active slave is set by calling bond_select_active_slave() function
from bond_miimon_commit() function when the slave (eth1) link goes to state up.

I also tested this on other machine with Broadcom NetXtreme II BCM5708
1000Base-T NIC and there all works fine. The thing is that this adapter is down
and goes up after few seconds after it is enslaved.

This patch calls bond_select_active_slave() in bond_enslave() function for modes
alb and tlb and makes sure that the current active slave is set up properly even
when the slave state is already up. Tested on both systems, works fine.

Notice: The same problem can maybe also occrur in mode 8023AD but I'm unable to
test that.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17d04500e2528217de5fe967599f98ee84348a9c 19-Mar-2009 Jay Vosburgh <fubar@us.ibm.com> bonding: Fix updating of speed/duplex changes

This patch corrects an omission from the following commit:

commit f0c76d61779b153dbfb955db3f144c62d02173c2
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Wed Jul 2 18:21:58 2008 -0700

bonding: refactor mii monitor

The un-refactored code checked the link speed and duplex of
every slave on every pass; the refactored code did not do so.

The 802.3ad and balance-alb/tlb modes utilize the speed and
duplex information, and require it to be kept up to date. This patch
adds a notifier check to perform the appropriate updating when the slave
device speed changes.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
72e2240f181871675d3a979766330c91d48a1673 05-Mar-2009 Patrick McHardy <kaber@trash.net> bonding: Fix device passed into ->ndo_neigh_setup().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
54b87323eddd9b7854249f05cfd183a0ac602ab6 14-Feb-2009 Hannes Eder <hannes@hanneseder.net> drivers/net/bonding: fix sparse warning: symbol shadows an earlier one

Impact: Rename function scope variable.

Fix this sparse warning:
drivers/net/bonding/bond_main.c:4704:13: warning: symbol 'mode' shadows an earlier one
drivers/net/bonding/bond_main.c:95:13: originally declared here

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1f78d9f94539b8806b81057e75025f2bac7d7ccc 14-Feb-2009 Hannes Eder <hannes@hanneseder.net> drivers/net/bonding: fix sparse warnings: context imbalance

Impact: Attribute functions with __acquires(...) and/or __releases(...).

Fix this sparse warnings:
drivers/net/bonding/bond_alb.c:1675:9: warning: context imbalance in 'bond_alb_handle_active_change' - unexpected unlock
drivers/net/bonding/bond_alb.c:1742:9: warning: context imbalance in 'bond_alb_set_mac_address' - unexpected unlock
drivers/net/bonding/bond_main.c:1025:17: warning: context imbalance in 'bond_do_fail_over_mac' - unexpected unlock
drivers/net/bonding/bond_main.c:3195:13: warning: context imbalance in 'bond_info_seq_start' - wrong count at exit
drivers/net/bonding/bond_main.c:3234:13: warning: context imbalance in 'bond_info_seq_stop' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4101dec9ca64d40f0d673f0a40ba46ba2c60e117 14-Jan-2009 Jan Engelhardt <jengelh@medozas.de> net: constify VFTs

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
53a3294e26c49622daa14c1d8540500f568ded99 06-Jan-2009 Stephen Hemminger <shemminger@vyatta.com> bonding: use net_device_ops

Use the correct pointer in debug message.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b06715b7a3db551dcf4706f05e8d2285a66fe05f 26-Dec-2008 Hannes Eder <hannes@hanneseder.net> drivers/net/bonding: fix sparse warnings: move decls to header file

Fix this sparse warnings:

drivers/net/bonding/bond_main.c:104:20: warning: symbol 'bonding_defaults' was not declared. Should it be static?
drivers/net/bonding/bond_main.c:204:22: warning: symbol 'ad_select_tbl' was not declared. Should it be static?
drivers/net/bonding/bond_sysfs.c:60:21: warning: symbol 'bonding_rwsem' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e97fd7c6d51d8bf32ce981b853d987cfc6bdfb7f 10-Dec-2008 Holger Eitzenberger <holger@eitzenberger.org> bonding: turn all bond_parm_tbls const

Turn all bond_parm_tbls const.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
325dcf7a907a43f8832b92ae1c672798b4e60ce2 10-Dec-2008 Holger Eitzenberger <holger@eitzenberger.org> bonding: make tbl argument to bond_parse_parm() const

bond_parse_parm() parses a parameter table for a particular value and
is therefore not modifying the table at all. Therefore make the 2nd
argument const, thus allowing to make the tables const later.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a03cdb7f2d7ff88e50153d8c3b90a1d52dca435 10-Dec-2008 Holger Eitzenberger <holger@eitzenberger.org> bonding: use pr_debug instead of own macros

Use pr_debug() instead of own macros.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
77afc92b7915b6bb21584474a429a04603ac8963 10-Dec-2008 Holger Eitzenberger <holger@eitzenberger.org> bonding: use table for mode names

Use a small array in bond_mode_name() for the names, thus saving some
space:

before

text data bss dec hex filename
57736 9372 344 67452 1077c drivers/net/bonding/bonding.ko

after
text data bss dec hex filename
57441 9372 344 67157 10655 drivers/net/bonding/bonding.ko

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
58402054264fa33b405d1abcbcd8e528507aac1a 10-Dec-2008 Holger Eitzenberger <holger@eitzenberger.org> bonding: add and use bond_is_lb()

Introduce and use bond_is_lb(), it is usefull to shorten the repetitive
check for either ALB or TLB mode.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
244ef9b9176c7c7a095f4738d353a3a60b88097d 04-Dec-2008 Wang Chen <wangchen@cn.fujitsu.com> bond: Kill directly reference of netdev->priv

Simply replace netdev->priv with netdev_priv().

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
008298231abbeb91bc7be9e8b078607b816d1a4a 21-Nov-2008 Stephen Hemminger <shemminger@vyatta.com> netdev: add more functions to netdevice ops

This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.

Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eb7cc59a038b4e1914ae991d313f35904924759f 20-Nov-2008 Stephen Hemminger <shemminger@vyatta.com> bonding: convert to net_device_ops

Convert to net_device_ops table.
Note: for some operations move error checking into generic networking
layer (rather than looking at pointers in bonding).

A couple of gratituous style cleanups to get rid of extra {}

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eeda3fd64f75bcbfaa70ce946513abaf3f23b8e0 20-Nov-2008 Stephen Hemminger <shemminger@vyatta.com> netdev: introduce dev_get_stats()

In order for the network device ops get_stats call to be immutable, the handling
of the default internal network device stats block has to be changed. Add a new
helper function which replaces the old use of internal_get_stats.

Note: change return code to make it clear that the caller should not
go changing the returned statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
454d7c9b14e20fd1949e2686e9de4a2926e01476 13-Nov-2008 Wang Chen <wangchen@cn.fujitsu.com> netdevice: safe convert to netdev_priv() #part-1

We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.

This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fd989c83325cb34795bc4d4aa6b13c06f90eac99 05-Nov-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: alternate agg selection policies for 802.3ad

This patch implements alternative aggregator selection policies
for 802.3ad. The existing policy, now termed "stable," selects the active
aggregator by greatest bandwidth, and only reselects a new aggregator
if the active aggregator is entirely disabled (no more ports or all ports
down).

This patch adds two new policies: bandwidth and count, selecting
the active aggregator by total bandwidth (like the stable policy) or by
the number of ports in the aggregator, respectively. These two policies
also differ from the stable policy in that they will reselect the active
aggregator when availability-related changes occur in the bond (e.g.,
link state change).

This permits "gang failover" within 802.3ad, allowing redundant
aggregators along parallel paths to always maintain the "best" aggregator
as the active aggregator (rather than having to wait for the active to
entirely fail).

This patch also updates the driver version to 3.5.0.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
305d552accae6afb859c493ebc7d98ca3371dae2 05-Nov-2008 Brian Haley <brian.haley@hp.com> bonding: send IPv6 neighbor advertisement on failover

This patch adds better IPv6 failover support for bonding devices,
especially when in active-backup mode and there are only IPv6 addresses
configured, as reported by Alex Sidorenko.

- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
IPv6-specific routines. Both regular bonds and VLANs over bonds
are supported.

- Adds a new tunable, num_unsol_na, to limit the number of unsolicited
IPv6 Neighbor Advertisements that are sent on a failover event.
Default is 1.

- Creates two new IPv6 neighbor discovery functions:

ndisc_build_skb()
ndisc_send_skb()

These were required to support VLANs since we have to be able to
add the VLAN id to the skb since ndisc_send_na() and friends
shouldn't be asked to do this. These two routines are basically
__ndisc_send() split into two pieces, in a slightly different order.

- Updates Documentation/networking/bonding.txt and bumps the rev of bond
support to 3.4.0.

On failover, this new code will generate one packet:

- An unsolicited IPv6 Neighbor Advertisement, which helps the switch
learn that the address has moved to the new slave.

Testing has shown that sending just the NA results in pretty good
behavior when in active-back mode, I saw no lost ping packets for example.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
6cf3f41e6c08bca6641a695449791c38a25f35ff 04-Nov-2008 Jay Vosburgh <fubar@us.ibm.com> bonding, net: Move last_rx update into bonding recv logic

The only user of the net_device->last_rx field is bonding.
This patch adds a conditional update of last_rx to the bonding special
logic in skb_bond_should_drop, causing last_rx to only be updated when
the ARP monitor is running.

This frees network device drivers from the necessity of
updating last_rx, which can have cache line thrash issues.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
63779436ab4ad0867bcea53bf853b0004d7b895d 31-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> drivers: replace NIPQUAD()

Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a434e43f3d844192bc23bd7b408bac979c40efe7 31-Oct-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: Clean up resource leaks

This patch reworks the resource free logic performed at the time
a bonding device is released. This (a) closes two resource leaks, one
for workqueues and one for multicast lists, and (b) improves commonality
of code between the "destroy one" and "destroy all" paths by performing
final free activity via destructor instead of explicitly (and differently)
in each path.

"Sean E. Millichamp" <sean@bruenor.org> reported the workqueue
leak, and included a different patch.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
fba4acda35f3119328bcba28aacefae14245d2bb 31-Oct-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix miimon failure counter

During the rework of the mii monitor for:

commit f0c76d61779b153dbfb955db3f144c62d02173c2
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Wed Jul 2 18:21:58 2008 -0700

bonding: refactor mii monitor

I left out the increment of the link failure counter. This
patch corrects that omission.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
8cf14e38372d84ea09ba45fb60b61f6e36c18546 30-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: easy removals of HIPQUAD using %pI4 format

As a bonus, removes some unnecessary byteswapping.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e174961ca1a0b28f7abf0be47973ad57cb74e5f0 27-Oct-2008 Johannes Berg <johannes@sipsolutions.net> net: convert print_mac to %pM

This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.

I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
b63365a2d60268a3988285d6c3c6003d7066f93a 23-Oct-2008 Herbert Xu <herbert@gondor.apana.org.au> net: Fix disjunct computation of netdev features

My change

commit e2a6b85247aacc52d6ba0d9b37a99b8d1a3e0d83
net: Enable TSO if supported by at least one device

didn't do what was intended because the netdev_compute_features
function was designed for conjunctions. So what happened was that
it would simply take the TSO status of the last constituent device.

This patch extends it to support both conjunctions and disjunctions
under the new name of netdev_increment_features.

It also adds a new function netdev_fix_features which does the
sanity checking that usually occurs upon registration. This ensures
that the computation doesn't result in an illegal combination
since this checking is absent when the change is initiated via
ethtool.

The two users of netdev_compute_features have been converted.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
fa53ebac42d3de04619c813f5f6628ca2a7ce97f 14-Sep-2008 Stephen Hemminger <shemminger@vyatta.com> bonding: add more ethtool support

This patch allows reporting the link, checksum, and feature settings
of bonded device by using generic hooks.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
f14c4e4e3651b76ae09082fa66cda37e10ac2b43 02-Sep-2008 Brian Haley <brian.haley@hp.com> bonding: change some __constant_htons() to htons()

Resending since I didn't see any responses from the first try.

Change __constant_htons() to htons() in the bonding driver, it should
only be used for initializers.

-Brian

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
f0c76d61779b153dbfb955db3f144c62d02173c2 03-Jul-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: refactor mii monitor

Refactor mii monitor. As with the previous ARP monitor refactor,
the motivation for this is to handle locking rationally (in this case,
removing conditional locking) and generally clean up the code.

This patch breaks up the monolithic mii monitor into two phases:
an inspection phase, followed by an optional commit phase. The commit phase
is the only portion that requires RTNL or makes changes to state, and is
only called when inspection finds something to change.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
cf508b1211dbe576778ff445ea1b4b0bcfa5c4ea 22-Jul-2008 David S. Miller <davem@davemloft.net> netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep.

The new address list lock needs to handle the same device layering
issues that the _xmit_lock one does.

This integrates work done by Patrick McHardy.

Signed-off-by: David S. Miller <davem@davemloft.net>
e8a0464cc950972824e2e128028ae3db666ec1ed 17-Jul-2008 David S. Miller <davem@davemloft.net> netdev: Allocate multiple queues for TX.

alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces. This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.

Signed-off-by: David S. Miller <davem@davemloft.net>
b9e40857682ecfc5bcd0356a23ff409883ffb982 15-Jul-2008 David S. Miller <davem@davemloft.net> netdev: Do not use TX lock to protect address lists.

Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
e308a5d806c852f56590ffdd3834d0df0cbed8d7 15-Jul-2008 David S. Miller <davem@davemloft.net> netdev: Add netdev->addr_list_lock protection.

Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
7e1a1ac1fbaa88fe254400b7f30b775502932ad3 15-Jul-2008 Wang Chen <wangchen@cn.fujitsu.com> bonding: Check return of dev_set_promiscuity/allmulti

dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

In bond_alb and bond_main, we check all positive increment for promiscuity
and allmulti to get error return.
But there are still two problems left.
1. Some code path has no mechanism to signal errors upstream.
2. If there are multi slaves, it's hard to tell which slaves increment
promisc/allmulti successfully and which failed.
So I left these problems to be FIXME.
Fortunately, the overflow is very rare case.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c773e847ea8f6812804e40f52399c6921a00eab1 09-Jul-2008 David S. Miller <davem@davemloft.net> netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue.

Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.

Signed-off-by: David S. Miller <davem@davemloft.net>
b8a9787eddb0e4665f31dd1d64584732b2b5d051 14-Jun-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: Allow setting max_bonds to zero

Permit bonding to function rationally if max_bonds is set to
zero. This will load the module, but create no master devices (which can
be created via sysfs).

Requires some change to bond_create_sysfs; currently, the
netdev sysfs directory is determined from the first bonding device created,
but this is no longer possible. Instead, an interface from net/core is
created to create and destroy files in net_class.

Based on a patch submitted by Phil Oester <kernel@linuxaces.com>.
Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to
update the documentation.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
b59f9f74c4c0a569398f08c34a877f1b7b457496 14-Jun-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: Rework / fix multiple gratuitous ARP support

Support for sending multiple gratuitous ARPs during failovers
was added by commit:

commit 7893b2491a2d5f716540ac5643d78d37a7f6628b
Author: Moni Shoua <monis@voltaire.com>
Date: Sat May 17 21:10:12 2008 -0700

bonding: Send more than one gratuitous ARP when slave takes over

This change modifies that support to remove duplicated code,
add support for ARP monitor (the original only supported miimon), clear
the grat ARP counter in bond_close (lest a later "ifconfig up" immediately
start spewing ARPs), and add documentation for the module parameter.

Also updated driver version to 3.3.0.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
01f3109de49a889db8adf9116449727547ee497e 14-Jun-2008 Or Gerlitz <ogerlitz@voltaire.com> bonding: deliver netdev event for fail-over under the active-backup mode

under active-backup mode and when there's actual new_active slave,
have bond_change_active_slave() call the networking core to deliver
NETDEV_BONDING_FAILOVER event such that the fail-over can be notable
by code outside of the bonding driver such as the RDMA stack and
monitoring tools.

As the correct context of locking appropriate for notifier calls is RTNL
and nothing else, bond->curr_slave_lock and bond->lock are unlocked and
later locked again. This is ensured by the rest of the code to be safe
under backup-mode AND when new_active is not NULL.

Jay Vosburgh modified the original patch for formatting and fixed a
compiler error.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
709f8a45e8521f2f4229e5fdf3ded1fb77e2ca4e 14-Jun-2008 Or Gerlitz <ogerlitz@voltaire.com> bonding: bond_change_active_slave() cleanup under active-backup

simplified the code of bond_change_active_slave() such that under
active-backup mode there's one "if (new_active)" test and the rest
of the code only does extra checks on top of it. This removed an
unneeded "if (bond->send_grat_arp > 0)" check and avoid calling
bond_send_gratuitous_arp when there's no active slave.

Jay Vosburgh made minor coding style changes to the orignal patch.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
3915c1e8634a321d9680e5cd80a53053b642dc0c 18-May-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: Add "follow" option to fail_over_mac

Add a "follow" selection for fail_over_mac. This option
causes the MAC address to move from slave to slave as the active
slave changes. This is in addition to the existing fail_over_mac option
that causes the bond's MAC address to change during failover.

This new option is useful for devices that cannot tolerate
multiple ports using the same MAC address simultaneously, either
because it confuses them or incurs a performance penalty (as is the
case with some LPAR-aware multiport devices). Because the MAC of the
bond itself does not change, the "follow" option is slightly more
reliable during failover and doesn't change the MAC of the bond during
operation.

This patch requires a previous ARP monitor change to properly
handle RTNL during failovers.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
b2220cad583c9b63e085476df448fa2aff5ea906 18-May-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: refactor ARP active-backup monitor

Refactor ARP monitor for active-backup mode. The motivation for
this is to take care of locking issues in a clear manner (particularly to
correctly handle RTNL vs. the bonding locks). Currently, the a-b ARP
monitor does not hold RTNL at all, but future changes will require RTNL
during ARP monitor failovers.

Rather than using conditional locking, this patch instead breaks
up the ARP monitor into three discrete steps: inspection, commit changes,
and probe. The inspection phase marks slaves that require link state
changes. The commit phase is only called if inspection detects that
changes are needed, and is called with RTNL. Lastly, the probe phase
issues the ARP probes that the inspection phase uses to determine link
state.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
7893b2491a2d5f716540ac5643d78d37a7f6628b 18-May-2008 Moni Shoua <monis@voltaire.com> bonding: Send more than one gratuitous ARP when slave takes over

With IPoIB, reception of gratuitous ARP by neighboring hosts
is essential for a successful change of slaves in case of failure.
Otherwise, they won't learn about the HW address change and need
to wait a long time until the neighboring system gives up and sends
an ARP request to learn the new HW address. This patch decreases
the chance for a lost of a gratuitous ARP packet by sending it more
than once. The number retries is configurable and can be set with a
module param.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
8047637c70e4451e2ac1c17ed9a91a2f753daae7 18-May-2008 Pavel Emelyanov <xemul@openvz.org> bonding: Remove unneeded list_empty checks.

Some places iterate over the checked list right after the check
itself, so even if the list is empty, the list_for_each_xxx
iterator will make everything right by himself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
0883beca7f39ab0c6447af35080e5caaa07418e3 18-May-2008 Pavel Emelyanov <xemul@openvz.org> bonding: Relax unneeded _safe lists iterations.

Many places either do not modify the list under the list_for_each_xxx,
or break out of the loop as soon as the first element is removed.

Thus, this _safe iteration just occupies some unneeded .text space
and requires an additional variable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
0dd646fe0549251e79d6fb03e6773bcc6ccea61f 18-May-2008 Pavel Emelyanov <xemul@openvz.org> bonding: Remove redundant argument from bond_create.

While we're fixing the bond_create, I hope it's OK to polish it
a bit after the fixes.

The third argument is NULL at the first caller and is ignored by
the second one, so remove it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
4b8a9239ee708958ed72722a0e5e0cf34243ad26 18-May-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: remove test for IP in ARP monitor

Remove bond_has_ip and all references to it. With this change,
the ARP monitor will always send ARP probes if the master is up and has
at least one slave. If the bond has an IP address, it is used in the
ARP probe; if not, the probes are sent with all zeros in the sender's
IP address (which is consistent with an RFC 2131 4.4.1 duplicate address
probe).

This is useful for cases when bonding itself is hidden underneath
a layer of virtual devices, e.g., with Xen.

Change suggested by Tsutomu Fujii <t-fujii@nb.jp.nec.com>, who
included a one-line patch that only affected active-backup mode.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
5ce0da8f0386b62345312ec8fed31303732f4220 18-May-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: Use msecs_to_jiffies, eliminate panic

Convert bonding to use msecs_to_jiffies instead of doing the
math. For the ARP monitor, there was an underflow problem that could
result in an infinite loop. The miimon already had that worked around,
but this is cleaner.

Originally by Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Jay Vosburgh corrected a math error in the original; Nicolas' original
commit message is:

When setting arp_interval parameter to a very low value, delta_in_ticks
for next arp might become 0, causing an infinite loop.

See http://bugzilla.kernel.org/show_bug.cgi?id=10680

Same problem for miimon parameter already fixed, but fix might be
enhanced, by using msecs_to_jiffies() function.

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
569f0c4d909c7f73de634abcdc36344cb72de36a 03-May-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix enslavement error unwinds

As part of:

commit c2edacf80e155ef54ae4774379d461b60896bc2e
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Mon Jul 9 10:42:47 2007 -0700

bonding / ipv6: no addrconf for slaves separately from master

two steps were rearranged in the enslavement process: netdev_set_master
is now before the call to dev_open to open the slave.

This patch updates the error cases and unwind process at the
end of bond_enslave to match the new order. Without this patch, it is
possible for the enslavement to fail, but leave the slave with IFF_SLAVE
set in its flags.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ae68c39819ddf30549652962768a50edae5eec6f 03-May-2008 Pavel Emelyanov <xemul@openvz.org> bonding: Deadlock between bonding_store_bonds and bond_destroy_sysfs.

The sysfs layer has an internal protection, that ensures, that
all the process sitting inside ->sore/->show callback exits
before the appropriate entry is unregistered (the calltraces
are rather big, but I can provide them if required).

On the other hand, bonding takes rtnl_lock in
a) the bonding_store_bonds, i.e. in ->store callback,
b) module exit before calling the sysfs unregister routines.

Thus, the classical AB-BA deadlock may occur. To reproduce run
# while :; do modprobe bonding; rmmod bonding; done
and
# while :; do echo '+bond%d' > /sys/class/net/bonding_masters ; done
in parallel.

The fix is to move the bond_destroy_sysfs out of the rtnl_lock,
but _before_ bond_free_all to make sure no bonding devices exist
after module unload.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
822973ba79fd5a5b711270c2de7196c6b50c6687 03-May-2008 Pavel Emelyanov <xemul@openvz.org> bonding: Do not call free_netdev for already registered device.

If the call to bond_create_sysfs_entry in bond_create fails, the
proper rollback is to call unregister_netdevice, not free_netdev.
Otherwise - kernel BUG at net/core/dev.c:4057!

Checked with artificial failures injected into bond_create_sysfs_entry.

Pavel's original patch modified by Jay Vosburgh to move code around
for clarity (remove goto-hopping within the unwind block).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
a95609cb0283a23e519e607ff9fc2a4aa77e2532 29-Apr-2008 Denis V. Lunev <den@openvz.org> netdev: use non-racy method for proc entries creation

Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dc13b385999f163dc30c73d66f2ac6d67410528d 10-Apr-2008 Joe Perches <joe@perches.com> drivers/net/bonding/bond_main.c - remove unnecessary #define

bond_main.c already #includes <linux/seq_file.h>

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
92b41daa45a505268b11de9b7cbde2c13c0223b5 22-Mar-2008 Libor Pechacek <lpechacek@suse.cz> bonding: Fix sysfs attribute handling

For bonding interfaces any attempt to read the sysfs directory contents after
module removal results in an oops. The fix is to release sysfs attributes
for the interfaces upon module unload.

Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
966bc6f434df4a02108d01dda8cd52951fe853da 22-Mar-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix two compiler warnings

Fix two compiler warnings that are new with recent versions of gcc
(apparently 4.2 and up). One is fixed by refactoring; this change was
supplied by Stephen Hemminger. The other was fixed by labelling the
variable as uninitialized_var() after confirming via inspection that it
cannot actually be used uninitialized.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
c346dca10840a874240c78efe3f39acf4312a1f2 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.

Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
988b705077d8f922408913f4f521ae073256d4a1 03-Mar-2008 Pavel Emelyanov <xemul@openvz.org> [ARP]: Introduce the arp_hdr_len helper.

There are some place, that calculate the ARP header length. These
calculations are correct, but
a) some operate with "magic" constants,
b) enlarge the code length (sometimes at the cost of coding style),
c) are not informative from the first glance.

The proposal is to introduce a helper, that includes all the good
sides of these calculations.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6133fb1aa137b35a8fa91ec17977ebf6a41456ec 29-Feb-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Disable inetaddr notifiers in namespaces other than initial.

ip_fib_init is kept enabled. It is already namespace-aware.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
21c9d8d73dd1a152c49b4e3176193a099849d4c9 30-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: do not acquire rtnl in ARP monitor

The ARP monitor functions currently acquire RTNL when performing
failover operations, but do so incorrectly (out of order). This causes
various warnings from might_sleep.

The ARP monitor isn't supported for any of the bonding modes
that actually require RTNL, so it is safe to not hold RTNL when
failing over in the ARP monitor.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2439f9ebd45349246b0fec7c47e6d0e05b1357c7 30-Jan-2008 Andy Gospodarek <andy@greyhouse.net> bonding: fix race that causes invalid statistics

I've seen reports of invalid stats in /proc/net/dev for bonding
interfaces, and found it's a pretty easy problem to reproduce. Since
the current code zeros the bonding stats when a read is requested and a
pointer to that data is returned to the caller we cannot guarantee that
the caller has completely accessed the data before a successive call to
request the stats zeroes the stats again.

This patch creates a new stack variable to keep track of the updated
stats and copies the data from that variable into the bonding stats
structure. This ensures that the value for any of the bonding stats
should not incorrectly return zero for any of the bonding statistics.
This does use more stack space and require an extra memcpy, but it seems
like a fair trade-off for consistently correct bonding statistics.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Chris Snook <csnook@redhat.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4fe4763cd8cacd81d892193efb48b99c99c15323 30-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix NULL pointer deref in startup processing

Fix the "are we creating a duplicate" check to not compare
the name if the name is NULL (meaning that the system should select
a name). Bug reported by Benny Amorsen <benny+usenet@amorsen.dk>.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
80ee5ad23150f1f3fe8d35728e860850ccea44da 30-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix set_multicast_list locking

This patch eliminates a problem (reported by lockdep) in the
bond_set_multicast_list function. It first reduces the locking on
bond->lock to a simple read_lock, and second, adds netif_tx locking
around the bonding mc_list manipulations that occur outside of the
set_multicast_list function.

The original problem was related to IPv6 addrconf activity.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a42e534f1b6be7f2f68f83d29588c3f2736b4d25 30-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix parameter parsing

My last fix (commit ece95f7fefe3afae19e641e1b3f5e64b00d5b948)
didn't handle one case correctly. This resolves that, and it will now
correctly parse parameters with arbitrary white space, and either text
names or mode values.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
f206351a50ea86250fabea96b9af8d8f8fc02603 23-Jan-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Add namespace parameter to ip_route_output_key.

Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5655662dab4ef044be7efd155f2f5fef2e486545 18-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: Don't hold lock when calling rtnl_unlock

Change bond_mii_monitor to not hold any locks when calling rtnl_unlock,
as rtnl_unlock can sleep (when acquring another mutex in netdev_run_todo).

Bug reported by Makito SHIOKAWA <mshiokawa@miraclelinux.com>, who
included a different patch.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
027ea0416c955778ceca7ef82e48a1dd6b4617c9 18-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix lock ordering for rtnl and bonding_rwsem

Fix the handling of rtnl and the bonding_rwsem to always be acquired
in a consistent order (rtnl, then bonding_rwsem).

The existing code sometimes acquired them in this order, and sometimes
in the opposite order, which opens a window for deadlock between ifenslave
and sysfs.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ece95f7fefe3afae19e641e1b3f5e64b00d5b948 18-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: Fix up parameter parsing

A recent change to add an additional hash policy modified
bond_parse_parm, but it now does not correctly match parameters passed in
via sysfs.

Rewrote bond_parse_parm to handle (a) parameter matches that
are substrings of one another and (b) user input with whitespace (e.g.,
sysfs input often has a trailing newline).

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
3b96c858fcb27120fcba222366180c3293393ccf 18-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: release slaves when master removed via sysfs

Add a call to bond_release_all in the bonding netdev event
handler for the master. This releases the slaves for the case of, e.g.,
"echo -bond0 > /sys/class/net/bonding_masters", which otherwise will spin
forever waiting for references to be released.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2543331d367c9fe54f4ba73300894bc21e0a08f4 18-Jan-2008 Jay Vosburgh <fubar@us.ibm.com> bonding: fix locking during alb failover and slave removal

alb_fasten_mac_swap (actually rlb_teach_disabled_mac_on_primary)
requries RTNL and no other locks. This could cause dev_set_promiscuity
and/or dev_set_mac_address to be called with improper locking.

Changed callers to hold only RTNL during calls to alb_fasten_mac_swap
or functions calling it. Updated header comments in affected functions to
reflect proper reality of locking requirements.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
fdaea7a93d097b066e76c7db6091228a84f87ec2 07-Dec-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Fix race at module unload

Fixes a race condition in module unload. Without this change,
workqueue events may fire while bonding data structures are partially
freed but before bond_close() is invoked by unregister_netdevice().

Update version to 3.2.3.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
6f6652be183c8c7cb99c646dd7494ab45e4833ba 07-Dec-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Add new layer2+3 hash for xor/802.3ad modes

Add new hash for balance-xor and 802.3ad modes. Originally
submitted by "Glenn Griffin" <ggriffin.kernel@gmail.com>; modified by
Jay Vosburgh to move setting of hash policy out of line, tweak the
documentation update and add version update to 3.2.2.

Glenn's original comment follows:

Included is a patch for a new xmit_hash_policy for the bonding driver
that selects slaves based on MAC and IP information. This is a middle
ground between what currently exists in the layer2 only policy and the
layer3+4 policy. This policy strives to be fully 802.3ad compliant by
transmitting every packet of any particular flow over the same link.
As documented the layer3+4 policy is not fully compliant for extreme
cases such as ip fragmentation, so this policy is a nice compromise
for environments that require full compliance but desire more than the
layer2 only policy.

Signed-off-by: "Glenn Griffin" <ggriffin.kernel@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
b63bb739a1d24f395c09f88ff43c54c736a60453 07-Dec-2007 David Sterba <dsterba@suse.cz> bonding: Fix time comparison

From: David Sterba <dsterba@suse.cz>

Use macros for comparing jiffies. Jiffies' wrap caused missed events and hangs.
Module reinsert was needed to make bonding work again.

Signed-off-by: David Sterba <dsterba@suse.cz>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
8cbdeec637c1ce87bf329c5c19a9964e36bdf9fb 14-Nov-2007 Jay Vosburgh <fubar@us.ibm.com> [BONDING]: Fix resource use after free

Fix bond_destroy and bond_free_all to not reference the struct
net_device after calling unregister_netdevice.

Bug and offending change reported by Moni Shoua <monis@voltaire.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3a1521b7e5b6964c293bb8ed6773513f8f503de5 06-Nov-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: don't validate address at device open

The standard validate_addr handler refuses to accept the all zeroes address
as valid. However, it's common historical practice for the bonding
master to be configured up prior to having any slaves, at which time the
master will have a MAC address of all zeroes.

Resolved by setting the dev->validate_addr to NULL. The master still can't
end up with an invalid address, as the set_mac_address function tests
for validity.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
a40745f5ef38f4542d120bd67c2c4a07702eb1da 24-Oct-2007 Adrian Bunk <bunk@kernel.org> bonding/bond_main.c: fix cut'n'paste error

This patch fixes a cut'n'paste error in
commit 1b76b31693d4a6088dec104ff6a6ead54081a3c2.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
c50b85d0fbca0a2017b8c0b1e2aeb650724c0a71 24-Oct-2007 Adrian Bunk <bunk@kernel.org> make bonding/bond_main.c:bond_deinit() static

bond_deinit() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
6603a6f25e4bca922a7dfbf0bf03072d98850176 18-Oct-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Convert more locks to _bh, acquire rtnl, for new locking

Convert more lock acquisitions to _bh flavor to avoid deadlock
with workqueue activity and add acquisition of RTNL in appropriate places.
Affects ALB mode, as well as core bonding functions and sysfs.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
059fe7a578fba5bbb0fdc0365bfcf6218fa25eb0 18-Oct-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Convert locks to _bh, rework alb locking for new locking

Convert locking-related activity to new & improved system.
Convert some lock acquisitions to _bh and rework parts of ALB mode, both
to avoid deadlocks with workqueue activity.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
0b0eef66419e9abe6fd62bc958ab7cd0a18f858e 18-Oct-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Convert miimon to new locking

Convert mii (link state) monitor to acquire correct locks for
failover events. In particular, failovers generally require RTNL at a low
level (when manipulating device MAC addresses, for example) and no other
locks. The high level monitor is responsible for acquiring a known set
of locks, RTNL, the bond->lock for read and the slave_lock for write, and
the low level failover processing can then release appropriate locks as
needed. This patch provides the high level portion.

As it is undesirable to acquire RTNL for every monitor pass (which
may occur as often as every 10 ms), the miimon has been converted to
do conditional locking. A first pass inspects all slaves to determine
if any action is required, and if so, a second pass (after acquring RTNL)
is done to perform any actions (doing a complete rescan, as the situation
may have changed when all locks were released).

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
cf5f9044934658dd3ffc628a60cd37c70f8168b1 18-Oct-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Convert balance-rr transmit to new locking

Change locking in balance-rr transmit processing to use a free
running counter to determine which slave to transmit on. Instead, a
free-running counter is maintained, and modulo arithmetic used to select
a slave for transmit.

This removes lock operations from the TX path, and eliminates
a deadlock introduced by the conversion to work queues.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
1b76b31693d4a6088dec104ff6a6ead54081a3c2 18-Oct-2007 Jay Vosburgh <fubar@us.ibm.com> Convert bonding timers to workqueues

Convert bonding timers to workqueues. This converts the various
monitor functions to run in periodic work queues instead of timers. This
patch introduces the framework and convers the calls, but does not resolve
various locking issues, and does not stand alone.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
1284cd3a2b740d0118458d2ea470a1e5bc19b187 16-Oct-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: two small fixes for IPoIB support

Two small fixes to IPoIB support for bonding:

1- copy header_ops from slave to bonding for IPoIB slaves
2- move release and destroy logic to UNREGISTER from GOING_DOWN
notifier to avoid double release

Set bonding to version 3.2.1.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
dd957c57c52a3964b8446a3e868a08186274b628 10-Oct-2007 Jay Vosburgh <fubar@us.ibm.com> net/bonding: Optionally allow ethernet slaves to keep own MAC

Update the "don't change MAC of slaves" functionality added in
previous changes to be a generic option, rather than something tied to
IB devices, as it's occasionally useful for regular ethernet devices as
well.

Adds "fail_over_mac" option (which is automatically enabled for IB
slaves), applicable only to active-backup mode.

Includes documentation update.

Updates bonding driver version to 3.2.0.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
d90a162a4ee280201e84944a84f86d6728dc0c27 10-Oct-2007 Moni Shoua <monis@voltaire.com> net/bonding: Destroy bonding master when last slave is gone

When bonding enslaves non Ethernet devices it takes pointers to functions
in the module that owns the slaves. In this case it becomes unsafe
to keep the bonding master registered after last slave was unenslaved
because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero
ensures that these functions be used anymore.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
1053f62c24faa6d4ee6f5bfddeca847b84f67a95 10-Oct-2007 Moni Shoua <monis@voltaire.com> net/bonding: Delay sending of gratuitous ARP to avoid failure

Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev->state field is on. This improves the chances for the arp packet to
be transmitted.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
3158bf7d414b69fdc0c715d0a4d82e12b74ef974 10-Oct-2007 Moni Shoua <monis@voltaire.com> net/bonding: Handlle wrong assumptions that slave is always an Ethernet device

bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).

Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
6b1bf096508c870889c2be63c7757a04d72116fe 10-Oct-2007 Moni Shoua <monis@voltaire.com> net/bonding: Enable IP multicast for bonding IPoIB devices

Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2ab82852a2706b47c257ac87675ab8b06bc214dd 10-Oct-2007 Moni Shoua <monis@voltaire.com> net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()

This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
872254dd6b1f80cb95ee9e2e22980888533fc293 10-Oct-2007 Moni Shoua <monis@voltaire.com> net/bonding: Enable bonding to enslave non ARPHRD_ETHER

This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
d3bb52b0948cf118131c951c5a34a2d4d0246171 23-Aug-2007 Al Viro <viro@zeniv.linux.org.uk> endianness annotations drivers/net/bonding/

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
0795af5729b18218767fab27c44b1384f72dc9ad 04-Oct-2007 Joe Perches <joe@perches.com> [NET]: Introduce and use print_mac() and DECLARE_MAC_BUF()

This is nicer than the MAC_FMT stuff.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
88d3aafdae5c5e1d2dd9489a5c8a24e29d335f2e 15-Sep-2007 Jeff Garzik <jeff@garzik.org> [ETHTOOL] Provide default behaviors for a few ethtool sub-ioctls

For the operations
get-tx-csum
get-sg
get-tso
get-ufo
the default ethtool_op_xxx behavior is fine for all drivers, so we
permit op==NULL to imply the default behavior.

This provides a more uniform behavior across all drivers, eliminating
ethtool(8) "ioctl not supported" errors on older drivers that had
not been updated for the latest sub-ioctls.

The ethtool_op_xxx() functions are left exported, in case anyone
wishes to call them directly from a driver-private implementation --
a not-uncommon case. Should an ethtool_op_xxx() helper remain unused
for a while, except by net/core/ethtool.c, we can un-export it at a
later date.

[ Resolved conflicts with set/get value ethtool patch... -DaveM ]

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10d024c1b2fd58af8362670d7d6e5ae52fc33353 17-Sep-2007 Ralf Baechle <ralf@linux-mips.org> [NET]: Nuke SET_MODULE_OWNER macro.

It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it. The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.

[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
881d966b48b035ab3f3aeaae0f3d3f9b584f45b2 17-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make the device list and device lookups per namespace.

This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e9dc86534051b78e41e5b746cccc291b57a3a311 12-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make device event notification network namespace safe

Every user of the network device notifiers is either a protocol
stack or a pseudo device. If a protocol stack that does not have
support for multiple network namespaces receives an event for a
device that is not in the initial network namespace it quite possibly
can get confused and do the wrong thing.

To avoid problems until all of the protocol stacks are converted
this patch modifies all netdev event handlers to ignore events on
devices that are not in the initial network namespace.

As the rest of the code is made network namespace aware these
checks can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e730c15519d09ea528b4d2f1103681fa5937c0e6 17-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make packet reception network namespace safe

This patch modifies every packet receive function
registered with dev_add_pack() to drop packets if they
are not from the initial network namespace.

This should ensure that the various network stacks do
not receive packets in a anything but the initial network
namespace until the code has been converted and is ready
for them.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
457c4cbc5a3dde259d2a1f15d5f9785290397267 12-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make /proc/net per network namespace

This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7f353bf29e162459f2f1e2ca25e41011fae65241 11-Aug-2007 Herbert Xu <herbert@gondor.apana.org.au> [NET]: Share correct feature code between bridging and bonding

http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the
bonding driver may produce bogus combinations of the checksum
flags and SG/TSO.

For example, if you bond devices with NETIF_F_HW_CSUM and
NETIF_F_IP_CSUM you'll end up with a bonding device that
has neither flag set. If both have TSO then this produces
an illegal combination.

The bridge device on the other hand has the correct code to
deal with this.

In fact, the same code can be used for both. So this patch
moves that logic into net/core/dev.c and uses it for both
bonding and bridging.

In the process I've made small adjustments such as only
setting GSO_ROBUST if at least one constituent device
supports it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
61a44b9c4b20d40c41fd1b70a4ceb13b75ea79a4 31-Jul-2007 Matthew Wilcox <matthew@wil.cx> [NET]: ethtool ops are the only way

During the transition to the ethtool_ops way of doing things, we supported
calling the device's ->do_ioctl method to allow unconverted drivers to
continue working. Those days are long behind us, all in-tree drivers
use the ethtool_ops way, and so we no longer need to support this.

The bonding driver is the biggest beneficiary of this; it no longer
needs to call ioctl() as a fallback if ethtool_ops aren't supported.

Also put a proper copyright statement on ethtool.c.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
4ad072c984ebe329c99965ddd1e58b0bb24af12b 09-Jul-2007 Adrian Bunk <bunk@stusta.de> bonding/bond_main.c: make 2 functions static

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Chad Tindel <ctindel@users.sourceforge.net>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
c2edacf80e155ef54ae4774379d461b60896bc2e 09-Jul-2007 Jay Vosburgh <fubar@us.ibm.com> bonding / ipv6: no addrconf for slaves separately from master

At present, when a device is enslaved to bonding, if ipv6 is
active then addrconf will be initated on the slave (because it is closed
then opened during the enslavement processing). This causes DAD and RS
packets to be sent from the slave. These packets in turn can confuse
switches that perform ipv6 snooping, causing them to incorrectly update
their forwarding tables (if, e.g., the slave being added is an inactve
backup that won't be used right away) and direct traffic away from the
active slave to a backup slave (where the incoming packets will be
dropped).

This patch alters the behavior so that addrconf will only run on
the master device itself. I believe this is logically correct, as it
prevents slaves from having an IPv6 identity independent from the
master. This is consistent with the IPv4 behavior for bonding.

This is accomplished by (a) having bonding set IFF_SLAVE sooner
in the enslavement processing than currently occurs (before open, not
after), and (b) having ipv6 addrconf ignore UP and CHANGE events on
slave devices.

The eql driver also uses the IFF_SLAVE flag. I inspected eql,
and I believe this change is reasonable for its usage of IFF_SLAVE, but
I did not test it.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
3201e656ce56ed02e9501906c18ffe16ae350a52 19-Jun-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Fix use after free in unregister path

The following patch (based on a patch from Stephen Hemminger
<shemminger@linux-foundation.org>) removes use after free conditions in
the unregister path for the bonding master. Without this patch, an
operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
would trigger a NULL pointer dereference in sysfs. I was not able to
induce the failure with the non-sysfs code path, but for consistency I
updated that code as well.

I also did some testing of the bonding /proc file being open
while the bond is being deleted, and didn't see any problems there.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
59c51591a0ac7568824f541f57de967e88adaa07 09-May-2007 Michael Opdenacker <michael@free-electrons.com> Fix occurrences of "the the "

Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
5a1b5898ee9e0bf68a86609ecb9775457b1857a5 29-Apr-2007 Rusty Russell <rusty@rustcorp.com.au> [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats.

Herbert Xu conviced me that a new flag was overkill; every driver
currently overrides get_stats, so we might as well make the internal
one the default. If someone did fail to set get_stats, they would now
get all 0 stats instead of "No statistics available".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
c45d286e72dd72c0229dc9e2849743ba427fee84 28-Mar-2007 Rusty Russell <rusty@rustcorp.com.au> [NET]: Inline net_device_stats

Network drivers which keep stats allocate their own stats structure
then write a get_stats() function to return them. It would be nice if
this were done by default.

1) Add a new "stats" field to "struct net_device".
2) Add a new feature field to say "this driver uses the internal one"
3) Have a default "get_stats" which returns NULL if that feature not set.
4) Change callers to check result of get_stats call for NULL, not if
->get_stats is set.

This should not break backwards compatibility with older drivers, yet
allow modern drivers to shed some boilerplate code.

Lightly tested: works for a modified lguest network driver.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
d0a92be05ed4aea7d35c2b257e3f9173565fe4eb 13-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce arp_hdr(), remove skb->nh.arph

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eddc9ec53be2ecdbf4efe0efd4a83052594f0ac0 21-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a816c7c712ff9f6770168b91facb9bfa9f0acd48 01-Mar-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: Improve IGMP join processing

In active-backup mode, the current bonding code duplicates IGMP
traffic to all slaves, so that switches are up to date in case of a
failover from an active to a backup interface. If bonding then fails
back to the original active interface, it is likely that the "active
slave" switch's IGMP forwarding for the port will be out of date until
some event occurs to refresh the switch (e.g., a membership query).

This patch alters the behavior of bonding to no longer flood
IGMP to all ports, and to issue IGMP JOINs to the newly active port at
the time of a failover. This insures that switches are kept up to date
for all cases.

"GOELLESCH Niels" <niels.goellesch@eurocontrol.int> originally
reported this problem, and included a patch. His original patch was
modified by Jay Vosburgh to additionally remove the existing IGMP flood
behavior, use RCU, streamline code paths, fix trailing white space, and
adjust for style.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
e245cb71d490e5e516c0ca0688fad7de6c22943d 01-Mar-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: only receive ARPs for us

The ARP validation code only needs ARPs for the bonding device.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
c4f283b1f275e5528c13c119e5cfc80cdba55d00 01-Mar-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: fix double dev_add_pack

Bonding can erroneously register the same packet_type to receive
ARPs (for use by ARP validation): once at device open time, and once via
sysfs. Since sysfs can change the validate setting (and thus register
or unregister) at any time, a flag is needed to synchronize with device
open in order to avoid double registrations, and the simplest place is
within the packet_type structure itself. Double unregister is not an
issue.

Bug reported by Ulrich Oelmann <ulrich.oelmann@web.de>.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
5c15bdec5c38f4ccf73ef2585fc80a6164de9554 03-Mar-2007 Dan Aloni <da-x@monatomic.org> [VLAN]: Avoid a 4-order allocation.

This patch splits the vlan_group struct into a multi-allocated struct. On
x86_64, the size of the original struct is a little more than 32KB, causing
a 4-order allocation, which is prune to problems caused by buddy-system
external fragmentation conditions.

I couldn't just use vmalloc() because vfree() cannot be called in the
softirq context of the RCU callback.

Signed-off-by: Dan Aloni <da-x@monatomic.org>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
cd354f1ae75e6466a7e31b727faede57a1f89ca5 14-Feb-2007 Tim Schmielau <tim@physik3.uni-rostock.de> [PATCH] remove many unneeded #includes of sched.h

After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d54b1fdb1d9f82e375a299e22bd366aad52d4c34 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com> [PATCH] mark struct file_operations const 5

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
243cb4e56061c3f4cb76312c5527840344d57c3b 06-Feb-2007 Joe Jin <lkmaillist@gmail.com> [BONDING]: Replace kmalloc() + memset() pairs with the appropriate kzalloc() calls

Replace kmalloc() + memset() pairs with the appropriate kzalloc() calls in
the bonding driver.

Signed-off-by: Joe Jin <lkmaillist@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
09c892797688312dc8a3c4d8b37dcb7207c1d48a 20-Jan-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: fix error check in sysfs creation

The existing code did not correctly handle failures to create
the per-interface sysfs group for bonding.

Modified code to notice errors, and correctly unwind.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
e4b91c484611da385e34ff0f8bb2744ae2c735b7 20-Jan-2007 Jay Vosburgh <fubar@us.ibm.com> bonding: fix device name allocation error

The code to select names for the bonding interfaces was, for the
non-sysfs creation case, always using a hard-coded set of bond0, bond1,
etc, up to max_bonds. This caused conflicts for the second or
subsequent loads of the module.

Changed the code to obtain device names from dev_alloc_name().

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
4e1400796c93df5e7f92d766e4a4332d0c98795f 05-Dec-2006 Andy Gospodarek <andy@greyhouse.net> [PATCH] bonding: incorrect bonding state reported via ioctl

This is a small fix-up to finish out the work done by Jay Vosburgh to add
carrier-state support for bonding devices. The output in
/proc/net/bonding/bondX was correct, but when collecting the same info via
an iotcl it could still be incorrect.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
418e8f3d7ef4a30d4b5c84440641c9792a7f83f1 18-Nov-2006 Laurent Riffard <laurent.riffard@free.fr> [PATCH] bonding: fix an oops when slave device does not provide get_stats

Bonding driver unconditionnaly dereference get_stats function pointer
for each of its slave device. This patch
- adds a check for NULL dev->get_stats pointer in bond_get_stats
- prints a notice when the bonding device enslave a device without
get_stats function.

Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
0daa2303028a63dbd1b2e38f10854f0f7bf1ef9a 09-Nov-2006 Peter Zijlstra <a.p.zijlstra@chello.nl> [PATCH] bonding: lockdep annotation

=============================================
[ INFO: possible recursive locking detected ]
2.6.17-1.2600.fc6 #1

Signed-off-by: Jeff Garzik <jeff@garzik.org>
a144ea4b7a13087081ab5402fa9ad0bcfd249e67 29-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [IPV4]: annotate struct in_ifaddr

ifa_local, ifa_address, ifa_mask, ifa_broadcast and ifa_anycast are
net-endian. Annotated them and variables that are inferred to be
net-endian.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
8a8e447b2aa1f9139d0bfc94a2a3426be9c8d40a 23-Sep-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: Fix primary selection error at enslavement time

At enslavement time, the primary slave might not be activated if
there is already an active slave and the new slave is the primary.
Replaced complicated logic with a call to bond_select_active_slave(),
which does the right thing.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=6378

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
f5b2b966f032f22d3a289045a5afd4afa09f09c6 23-Sep-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: Validate probe replies in ARP monitor

Add logic to check ARP request / reply packets used for ARP
monitor link integrity checking.

The current method simply examines the slave device to see if it
has sent and received traffic; this can be fooled by extraneous traffic.
For example, if multiple hosts running bonding are behind a common
switch, the probe traffic from the multiple instances of bonding will
update the tx/rx times on each other's slave devices.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
70298705bb29fb7982b85089adf17cd37b94baa7 23-Sep-2006 jamal <hadi@cyberus.ca> [PATCH] bonding: Don't release slaves when master is admin down

When a bonding netdevice is admin-ed down it loses the slaves
attributes (set via ifenslave). This is not consistent with other
behavior of netdevices (example a qdisc attached to a netdevice doesnt
disappear or an attached IP address etc).
The included patch fixes this. Ive tested by ifenslaving, downing the
bond, checking /proc and making sure it still has the slaves, up-ing the
bond and making sure things continue to work.

Jay/Bonding folks if you are ok with it, just ACK it or include it in
your tree etc. Otherwise we can discuss.

Acked-by: Jay Vosburgh <fubar@us.ibm.com>

Signed-off-by: Jeff Garzik <jeff@garzik.org>
0b680e753724d31a9c45f059d1aad29df54584a1 23-Sep-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: Add priv_flag to avoid event mishandling

Add priv_flag to specifically identify bonding-involved devices. Needed
because IFF_MASTER is an unreliable identifier (vlan interfaces above bonding
will inherit IFF_MASTER). Misidentification of devices would cause
notifier events for other devices to be erroneously processed by bonding,
causing various havoc.

Bug discovered by Martin Papik <martin.papik@ipsec.info>; this patch is
modified from his original.

Signed-off-by: Martin Papik <martin.papik@ipsec.info>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
54ef313714070b397d3857289f0fd099b7643631 23-Sep-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: Handle large hard_header_len

The bonding driver fails to adjust its hard_header_len when enslaving
interfaces. Whenever an interface with a hard_header_len greater than the
ETH_HLEN default is enslaved, the potential for an oops exists, and if the
oops happens while responding to an arp request, for example, the system
panics. GIANFAR devices may use an extended hard_header for VLAN or
hardware checksumming. Enslaving such a device and then transmitting over
it causes a kernel panic.

Patch modified from submitter's original, but submitter agreed with this
patch in private email.

Signed-off-by: Mark Huth <mhuth@mvista.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
65509645ae05886eccc81b8a453afea07f0eabb6 23-Sep-2006 Kenzo Iwami <k-iwami@cj.jp.nec.com> [PATCH] bonding: Format fix in seq_printf call

Though link_failure_count is type unsigned int, this value is outputted to
/proc/net/bonding/bondX file using "%d" instead of "%u".

The attached patch fixes this problem.

Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
94dbffd540eea601aecad07e2df5bfd8a46672f3 23-Sep-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: Allow bonding to enslave a 10 Gig adapter

Allow channel bonding to enslave a 10 Gig adapter without errors.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
7282d491ecaee9883233a0e27283c4c79486279a 13-Sep-2006 Jeff Garzik <jeff@garzik.org> drivers/net: const-ify ethtool_ops declarations

Signed-off-by: Jeff Garzik <jeff@garzik.org>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
8648b3053bff39a7ee4c711d74268079c928a657 18-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au> [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM

The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
identically so we test for them in quite a few places. For the sake
of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We
also test the disjunct of NETIF_F_IP_CSUM and the other two in various
places, for that purpose I've added NETIF_F_ALL_CSUM.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
932ff279a43ab7257942cddff2595acd541cc49b 09-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au> [NET]: Add netif_tx_lock

Various drivers use xmit_lock internally to synchronise with their
transmission routines. They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.

With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set. This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.

While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire. So
delaying or dropping the message is to be avoided as much as possible.

So the only alternative is to always set xmit_lock_owner. The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.

I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly. I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.

This is pretty much a straight text substitution except for a small
bug fix in winbond. It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission. This is
unsafe as an IRQ can potentially wake up the queue. So it is safer to
use netif_tx_disable.

The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
ff59c4563a8d1b39597aab4917959146c61f09b0 27-Mar-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: support carrier state for master

Add support for the bonding master to specify its carrier state
based upon the state of the slaves. For 802.3ad, the bond is up if
there is an active, parterned aggregator. For other modes, the bond is
up if any slaves are up. Updates driver version to 3.0.3.

Based on a patch by jamal <hadi@cyberus.ca>.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
e041c683412d5bf44dc2b109053e3b837b71742d 27-Mar-2006 Alan Stern <stern@rowland.harvard.edu> [PATCH] Notifier chain update: API changes

The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;

"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.

ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain

BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain

It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
f71e130966ba429dbd24be08ddbcdf263df9a5ad 04-Mar-2006 Arjan van de Ven <arjan@infradead.org> Massive net driver const-ification.
8f903c708fcc2b579ebf16542bf6109bad593a1d 22-Feb-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: suppress duplicate packets

Originally submitted by Kenzo Iwami; his original description is:

The current bonding driver receives duplicate packets when broadcast/
multicast packets are sent by other devices or packets are flooded by the
switch. In this patch, new flags are added in priv_flags of net_device
structure to let the bonding driver discard duplicate packets in
dev.c:skb_bond().

Modified by Jay Vosburgh to change a define name, update some
comments, rearrange the new skb_bond() for clarity, clear all bonding
priv_flags on slave release, and update the driver version.

Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
f5e2a7b22e7d7dfda8794906d0fddeaaa09bb944 08-Feb-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: fix a locking bug in bond_release

bond_release returns EINVAL without releasing the bond lock if the
slave device is not being bonded by the bond. The following patch
ensures that the lock is released in this case.

Signed-off-by: Stephen J. Bevan <stephen@dino.dnsalias.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
a0de3adf8f4e5618c5bd62db08ed293042c8e454 31-Jan-2006 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: allow bond to use TSO if slaves support it

Add NETIF_F_TSO (NETIF_F_UFO) to BOND_INTERSECT_FEATURES so that it can
be used by a bonding device iff all its slave devices support TSO (UFO).

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
6a986ce45d45b099ddf676c340267765e76db91e 20-Jan-2006 Eric Sesterhenn <snakebyte@gmx.de> [PATCH] bonding: fix ->get_settings error checking

Since get_settings() returns a signed int and it gets checked
for < 0 to catch an error, res should be a signed int too.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2e06cb5859fdaeba0529806eb1bf161ffd0db201 28-Nov-2005 Jeff Garzik <jgarzik@pobox.com> [bonding] Remove superfluous changelog.

No need to record this information in source code, its all in the git
repository, and kernel archives.
691b73b13220886aefacb7c7f7ace7f528bbf800 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: comments and changelog

Bonding source files still have changelogs in the comments. This, then,
is an update to that changelog.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
e944ef79184ff7f283e7bf79496d2873a0b0410b 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: spelling and whitespace corrections

Minor spelling and whitespace corrections.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
b76cdba9cdb29b091cacb4c11534ffb2eac02f64 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: add sysfs functionality to bonding (large)

This large patch adds sysfs functionality to the channel bonding module.
Bonds can be added, removed, and reconfigured at runtime without having
to reload the module. Multiple bonds with different configurations are
easily configured, and ifenslave is no longer required to configure bonds.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
4756b02f558cbbbef5ae278fd3bbed778458c124 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: add ARP entries to /proc

Make the /proc files show which ARP targets are in use by each bond.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6b780567223524cac86c745aeac425521cf37490 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: Allow ARP target table to have empty entries

With the sysfs interface, the user can remove entries from the ARP table
at runtime. The ARP monitor code now allows for empty entries in the
table.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
3c535952d86df83f817595068c9fd2b3cfbd3a4d 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: make bond_init not __init

The sysfs interface can create bonds at runtime, and __init code goes away
after module init.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
dfe60397a62b1a5ebc7f05fd65463d3e29397677 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: move bond creation into separate function

The sysfs interface can create bonds at runtime, so we need a separate
function to do this, instead of just doing it in the module init code.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
a77b53258d76513c37e766dc0db1fc9db7c4ac1e 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: make functions not static

The sysfs code needs access these functions, so make them
not static, and move the protos to the header file.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12479f9a823dc7d791f198af2d3e4efae418a65e 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: expose some structs

The sysfs code needs to know what these structs look like, so make them
not static, and move the definition to the header.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
0f418b2ac49e97b7b763e0473320a201eec15ed3 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: get slave name from actual slave instead of param list

Take the primary slave name shown in /proc from the actual slave dev
instead of from the command-line parameter, which won't be present
if the bond is created via sysfs.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
c61b75ad03f3a30ef247cac27406f030c10628b0 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: Add transmit policy to /proc

Adds information about the recently-added transmit policy setting to each
bond's /proc file.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2ac47660f9b4d0ea1a2ab9becba03c14ef5d9b99 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: expand module param descriptions

Expand and correct the parameter descriptions shown by modinfo.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
4e0952c74ee450ded86e8946ce58ea8dfd05b007 09-Nov-2005 Mitch Williams <mitch.a.williams@intel.com> [PATCH] bonding: add bond name to all error messages

Add the bond name to all error messages so we can tell which one is
complaining. Also reformats some error messages to be more consistent.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
8e3babcd69ec0fde874838e276eb0b211c6a5647 05-Nov-2005 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: fix feature consolidation

This should resolve http://bugzilla.kernel.org/show_bug.cgi?id=5519

The current feature computation loses bits that it doesn't know about,
resulting in an inability to add VLANs and possibly other havoc.
Rewrote function to preserve bits it doesn't know about, remove an
unneeded state variable, and simplify the code.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
df49898a47061e82219c991dfbe9ac6ddf7a866b 19-Oct-2005 John W. Linville <linville@tuxdriver.com> [PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack

Expand comment explaining MAC address selection for replicated IGMP
frames transmitted in bonding mode 1 (active-backup). Also, a small
whitespace cleanup.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7 07-Oct-2005 Al Viro <viro@ftp.linux.org.uk> [PATCH] gfp flags annotations - part 1

- added typedef unsigned int __nocast gfp_t;

- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
de54f3907d2f5d8e25cfafe513811f146b250dee 05-Oct-2005 Randy Dunlap <rdunlap@xenotime.net> [BONDING]: fix sparse gfp nocast warnings

Fix implicit nocast warnings in bonding code:
drivers/net/bonding/bond_main.c:1302:49: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
075897ce3b1027fccb98f36dd1f18c07f5c374ef 28-Sep-2005 John W. Linville <linville@tuxdriver.com> [PATCH] bonding: replicate IGMP traffic in activebackup mode

Replicate IGMP frames across all slaves in activebackup mode. This
ensures fail-over is rapid for multicast traffic as well. Otherwise,
multicast traffic will be lost until the next IGMP membership report
poll timeout.

This is conceptually similar to the treatment of IGMP traffic in
bond_alb_xmit. In that case, IGMP traffic transmitted on any slave
is re-routed to the active slave in order to ensure that multicast
traffic continues to be directed to the active receiver.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
217df670d9a4da036d68b22500ac06128811d5c8 27-Sep-2005 Jay Vosburgh <fubar@us.ibm.com> [PATCH] fix bonding crash, remove old ABI support

David S. Miller <davem@davemloft.net> wrote:
>I think removing support for older ifenslave binaries is
>the least painful solution to this problem.

This patch removes backwards compatibility for old ifenslave
binaries (ifenslave prior to verison 1.0.0).

I did not similarly modify ifenslave itself; with sysfs on the
horizon, I don't see that as being worthwhile.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
e5ed639913eea3e4783a550291775ab78dd84966 03-Oct-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl

The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.

1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().

There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition. I've marked it as such so that we remember to fix it.

This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
552709d5aee9145f325bf07348fb299e84b2e5b3 21-Sep-2005 nsxfreddy@gmail.com <nsxfreddy@gmail.com> [PATCH] bonding: Fix link monitor capability check (was skge: set mac address oops with bonding)

Fix bond_enslave link monitoring warning to check use_carrier status
and ethtool_ops in addition to do_ioctl. This version checks ethtool_ops
as well as do_ioctl, and also uses the per-bond params.use_carrier
instead of the global use_carrier.

Signed-off-by: Jason R. Martin <nsxfreddy@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
40abc27066c49b2c13c817154d438431b0303b96 18-Sep-2005 Florin Malita <fmalita@gmail.com> [BOND]: Fix bond_init() error path handling.

From: Florin Malita <fmalita@gmail.com>

bond_init() is not releasing rtnl_sem after register_netdevice() and before
calling unregister_netdevice() (from bond_free_all()) in the exception
path. As the device registration is not completed (dev->reg_state ==
NETREG_REGISTERING), the call to unregister_netdevice() triggers
BUG_ON(dev->reg_state != NETREG_REGISTERED).

Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ed4b9f8014db4f343e89b44b7c5ca355f439ce36 14-Sep-2005 Jay Vosburgh <fubar@us.ibm.com> [PATCH] bonding: plug reference count leak

Bonding leaks route structures when the ARP monitor is
configured to send probes over VLANs.

Originally reported by Ian Abel <ian.abel@mxtelecom.com>; his
original fix was modified by Jay Vosburgh to correct coding style and to
close a leak it missed.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
8531c5ffbca65f6df868637c26e6df6f88bff738 23-Aug-2005 Arthur Kepner <akepner@sgi.com> [PATCH] bonding: inherit zero-copy flags of slaves

This change allows a bonding device to inherit the "zero-copy"
features of its slave devices.

It was inspired by a couple of previous postings on this topic:
http://marc.theaimsgroup.com/?l=bonding-devel&m=111924607327794&w=2
http://marc.theaimsgroup.com/?l=bonding-devel&m=111925242706297&w=2
and it's largely a combination of the patches that appear in those
emails.

Signed-off-by: Arthur Kepner <akepner@sgi.com>
169a3e66637c667b43dab7c319ffd5c99804cad8 26-Jun-2005 Jay Vosburgh <fubar@us.ibm.com> bonding: xor/802.3ad improved slave hash

Add support for alternate slave selection algorithms to bonding
balance-xor and 802.3ad modes. Default mode (what we have now: xor of
MAC addresses) is "layer2", new choice is "layer3+4", using IP and port
information for hashing to select peer.

Originally submitted by Jason Gabler for balance-xor mode;
modified by Jay Vosburgh to additionally support 802.3ad mode. Jason's
original comment is as follows:

The attached patch to the Linux Etherchannel Bonding driver modifies the
driver's "balance-xor" mode as follows:

- alternate hashing policy support for mode 2
* Added kernel parameter "xmit_policy" to allow the specification
of different hashing policies for mode 2. The original mode 2
policy is the default, now found in xmit_hash_policy_layer2().
* Added xmit_hash_policy_layer34()

This patch was inspired by hashing policies implemented by Cisco,
Foundry and IBM, which are explained in
Foundry documentation found at:
http://www.foundrynet.com/services/documentation/sribcg/Trunking.html#112750

Signed-off-by: Jason Gabler <jygabler@lbl.gov>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
c3ade5cad07f4d67f2e16a28f3c73d9483a55e0e 26-Jun-2005 Jay Vosburgh <fubar@us.ibm.com> bonding: gratuitous ARP

Add support for generating gratuitous ARPs in bonding
active-backup mode when failovers occur. Includes support for VLAN
tagging the ARPs as needed.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
2f872f0401d4b470990864fbf99c19130f25ad4d 26-May-2005 Jay Vosburgh <fubar@us.ibm.com> [BONDING]: bonding using arp_ip_target may stay down with active path

Correcting the list traversal makes the problem go away.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!