[go: nahoru, domu]

History log of /drivers/md/bcache/btree.c
Revision Date Author Comments
2452cc89063a2a6890368f185c4b6d7d8802179e 12-Jul-2014 Slava Pestov <sp@daterainc.com> bcache: try to set b->parent properly

bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size
before; now it passes.

Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1
400ffaa2acd72274e2c7293a9724382383bebf3e 13-Jul-2014 Slava Pestov <sp@daterainc.com> bcache: fix use-after-free in btree_gc_coalesce()

If we goto out_nocoalesce after we free new_nodes[0], we end up freeing
new_nodes[0] again. This was generating a lockdep warning. The fix is
to set new_nodes[0] to NULL, since the out_nocoalesce path safely
ignores NULL entries in the new_nodes array.

This regression was introduced in 2d7f9531.

Change-Id: I76564d7257800583214376b4bacf236cda90c89c
913dc33fb2720fb5f979011664294137ddd8b13b 23-May-2014 Slava Pestov <sp@daterainc.com> bcache: fix crash in bcache_btree_node_alloc_fail tracepoint

'b' was NULL.

Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
501d52a90cbe652b41336c206ff0e95799d5a9b5 19-May-2014 Kent Overstreet <kmo@daterainc.com> bcache: Allocate bounce buffers with GFP_NOWAIT

There's no point in blocking on these allocations, since our fallback paths will
probably go faster than blocking.

Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c
bcf090e0040e30f8409e6a535a01e6473afb096f 19-May-2014 Kent Overstreet <kmo@daterainc.com> bcache: Make sure to pass GFP_WAIT to mempool_alloc()

this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT.
bcache uses GFP_NOWAIT in various other places where we have a fallback,
circuits must've gotten crossed when writing this code or something.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
c5aa4a3157b55bdca18dd2a9d9f43314470b6d32 22-Apr-2014 Slava Pestov <sp@daterainc.com> bcache: wait for buckets when allocating new btree root

Tested:
- sometimes bcache_tier test would hang on startup with a failure
to allocate the btree root -- no longer seeing this

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
3a2fd9d5090b83aab85378a846fa10f39b0b5aa7 28-Feb-2014 Kent Overstreet <kmo@daterainc.com> bcache: Kill bucket->gc_gen

gc_gen was a temporary used to recalculate last_gc, but since we only need
bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
last_gc directly.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2531d9ee61fa08a5a9ab8f002c50779888d232c7 18-Mar-2014 Kent Overstreet <kmo@daterainc.com> bcache: Kill unused freelist

This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
0a63b66db566cffdf90182eb6e66fdd4d0479e63 18-Mar-2014 Kent Overstreet <kmo@daterainc.com> bcache: Rework btree cache reserve handling

This changes the bucket allocation reserves to use _real_ reserves - separate
freelists - instead of watermarks, which if nothing else makes the current code
saner to reason about and is going to be important in the future when we add
support for multiple btrees.

It also adds btree_check_reserve(), which checks (and locks) the reserves for
both bucket allocation and memory allocation for btree nodes; the old code just
kinda sorta assumed that since (e.g. for btree node splits) it had the root
locked and that meant no other threads could try to make use of the same
reserve; this technically should have been ok for memory allocation (we should
always have a reserve for memory allocation (the btree node cache is used as a
reserve and we preallocate it)), but multiple btrees will mean that locking the
root won't be sufficient anymore, and for the bucket allocation reserve it was
technically possible for the old code to deadlock.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
56b30770b27d54d68ad51eccc6d888282b568cee 23-Jan-2014 Kent Overstreet <kmo@daterainc.com> bcache: Kill btree_io_wq

With the locking rework in the last patch, this shouldn't be needed anymore -
btree_node_write_work() only takes b->write_lock which is never held for very
long.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2a285686c109816ba71a00b9278262cf02648258 05-Mar-2014 Kent Overstreet <kmo@daterainc.com> bcache: btree locking rework

Add a new lock, b->write_lock, which is required to actually modify - or write -
a btree node; this lock is only held for short durations.

This means we can write out a btree node without taking b->lock, which _is_ held
for long durations - solving a deadlock when btree_flush_write() (from the
journalling code) is called with a btree node locked.

Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
rework is going to happen a lot more.

This also turns b->lock is now more of a read/intent lock instead of a
read/write lock - but not completely, since it still blocks readers. May turn it
into a real intent lock at some point in the future.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
05335cff9f01555b769ac97b7bacc472b7ed047a 18-Mar-2014 Kent Overstreet <kmo@daterainc.com> bcache: Fix a race when freeing btree nodes

This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
bucket on the unused freelist, where it can be reused right away without any
ordering requirements. It would be better to wait on at least a journal write to
go down before reusing the bucket. bch_btree_set_root() does this, and inserting
into non leaf nodes is completely synchronous so we should be ok, but future
patches are just going to get rid of the unused freelist - it was needed in the
past for various reasons but shouldn't be anymore.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
4fe6a816707aace9e8e297b708411c5930537793 13-Mar-2014 Kent Overstreet <kmo@daterainc.com> bcache: Add a real GC_MARK_RECLAIMABLE

This means the garbage collection code can better check for data and metadata
pointers to the same buckets.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
3f5e0a34daed197aa55d0c6b466bb4cd03babb4f 23-Jan-2014 Kent Overstreet <kmo@daterainc.com> bcache: Kill dead cgroup code

This hasn't been used or even enabled in ages.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
487dded86ea065317aea121bec8f1816f2f235c9 17-Mar-2014 Kent Overstreet <kmo@daterainc.com> bcache: Fix another bug recovering from unclean shutdown

The on disk bucket gens are allowed to be out of date, when we reuse buckets
that didn't have any live data in them. To deal with this, the initial gc has to
update the bucket gen when we find a pointer gen newer than the bucket's gen.

Unfortunately we weren't doing this for pointers in the journal that we're about
to replay.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
0bd143fd800055b1db756693289bbebdb93f2a73 05-Mar-2014 Kent Overstreet <kmo@daterainc.com> bcache: Fix a bug recovering from unclean shutdown

The code to fixup incorrect bucket prios incorrectly did not skip btree node
freeing keys

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
3572324af0f4ef877545e5a17bd3e788551f166a 11-Jan-2014 Kent Overstreet <kmo@daterainc.com> bcache: Minor fixes from kbuild robot

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
947174476701fbc84ea8c7ec9664270f9d80b076 29-Jan-2014 Darrick J. Wong <darrick.wong@oracle.com> bcache: fix BUG_ON due to integer overflow with GC_SECTORS_USED

The BUG_ON at the end of __bch_btree_mark_key can be triggered due to
an integer overflow error:

BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13);
...
SET_GC_SECTORS_USED(g, min_t(unsigned,
GC_SECTORS_USED(g) + KEY_SIZE(k),
(1 << 14) - 1));
BUG_ON(!GC_SECTORS_USED(g));

In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide.
While the SET_ code tries to ensure that the field doesn't overflow by
clamping it to (1<<14)-1 == 16383, this is incorrect because 16383
requires 14 bits. Therefore, if GC_SECTORS_USED() + KEY_SIZE() =
8192, the SET_ statement tries to store 8192 into a 13-bit field. In
a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON.

Therefore, create a field width constant and a max value constant, and
use those to create the bitfield and check the inputs to
SET_GC_SECTORS_USED. Arguably the BITMASK() template ought to have
BUG_ON checks for too-large values, but that's a separate patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
3b3e9e50dd951725130645660b526c4f367dcdee 07-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Don't return -EINTR when insert finished

We need to return -EINTR after a split because we invalidated iterators
(and freed the btree node) - but if we were finished inserting, we don't
want to redo the traversal.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
829a60b9055c319f3656a01eb8cb78b1b86232ef 12-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Move insert_fixup() to btree_keys_ops

Now handling overlapping extents/keys is a method that's specific to what the
btree node contains.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
89ebb4a28ba9efb5c9b18ba552e784021957b14a 12-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert sorting to btree_keys

More work to disentangle various code from struct btree

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
dc9d98d621bdce0552997200ce855659875a5c9f 18-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert debug code to btree_keys

More work to disentangle various code from struct btree

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
c052dd9a26f60bcf70c0c3fcc08e07abb60295cd 12-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert btree_iter to struct btree_keys

More work to disentangle bset.c from struct btree

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
59158fde429fb5d18064e2734b3dd5e6048affbd 12-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add bch_btree_keys_u64s_remaining()

Helper function to explicitly check how much space is free in a btree node

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
a85e968e66a175c86d0410719ea84a5bd0f1d070 21-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add struct btree_keys

Soon, bset.c won't need to depend on struct btree.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
65d45231b56efb3db51eb441e2c68f8252ecdd12 21-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Abstract out stuff needed for sorting

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
ee811287c9f241641899788cbfc9d70ed96ba3a5 18-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Rename/shuffle various code around

More work to disentangle bset.c from the rest of the code:

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
67539e85289c14a76a1c4162613d14a5f05a0027 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add struct bset_sort_state

More disentangling bset.c from the rest of the bcache code - soon, the
sorting routines won't have any dependencies on any outside structs.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
911c9610099f26e9e6ea3d1962ce24f53890b163 29-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Split out sort_extent_cmp()

Only use extent comparison for comparing extents, so we're not using
START_KEY() on other key types (i.e. btree pointers)

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
fafff81cead78157099df1ee10af16cc51893ddc 18-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Bkey indexing renaming

More refactoring:

node() -> bset_bkey_idx()
end() -> bset_bkey_last()

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
085d2a3dd4d65b7bce1dead987c647dbbc014281 12-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Make bch_keylist_realloc() take u64s, not nptrs

Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic
2s

Also - split out the part that checks against journal entry size so as to avoid
a dependancy on struct cache_set in bset.c

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
78b77bf8b20431f8ad8a4db7e3120103bd922337 18-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Btree verify code improvements

Used this fixed code to find and fix the bug fixed by
a4d885097b0ac0cd1337f171f2d4b83e946094d4.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
88b9f8c426f35e04738220c1bc05dd1ea1b513a3 18-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: kill index()

That was a terrible name for a macro, add some better helpers to replace it.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
5f5837d2d650db25b9153b91535e67a96b265f58 17-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Do bkey_put() in btree_split() error path

This error path shouldn't have been hit in practice.. and we've got reworked
reserve code coming soon so that it shouldn't _ever_ be bit... but if we've got
code for this error path it should be correct.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
78365411b344df35a198b119133e6515c2dcfb9f 17-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Rework allocator reserves

We need a reserve for allocating buckets for new btree nodes - and now that
we've got multiple btrees, it really needs to be per btree.

This reworks the reserves so we've got separate freelists for each reserve
instead of watermarks, which seems to make things a bit cleaner, and it adds
some code so that btree_split() can make sure the reserve is available before it
starts.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
cb7a583e6a6ace661a5890803e115d2292a293df 17-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: kill closure locking usage

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
b0f32a56f27eb0df4124dbfc8eb6f09f423eed99 10-Dec-2013 Kent Overstreet <kmo@daterainc.com> bcache: Minor btree cache fix

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
bf0a628a95dba7f983b6047cea695fb066fb2512 27-Nov-2013 Nicholas Swenson <nks@daterainc.com> bcache: fix for gc and writeback race

Garbage collector needs to check keys in the writeback keybuf to
make sure it's not invalidating buckets to which the writeback
keys point to.

Signed-off-by: Nicholas Swenson <nks@daterainc.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
d24a6e1087030b6da286df9433add5fa2f21b83b 11-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Fix dirty_data accounting

Dirty data accounting wasn't quite right - firstly, we were adding the key we're
inserting after it could have merged with another dirty key already in the
btree, and secondly we could sometimes pass the wrong offset to
bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is
important when tracking dirty data by stripe.

NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes
necessary that got squashed in with other patches; the full patch against 3.10
is 408cc2f47eeac93a, available at:
git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 2a46036..4a12b2f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree *b, struct bkey *insert,
if (KEY_START(k) > KEY_START(insert) + sectors_found)
goto check_failed;

- if (KEY_PTRS(replace_key) != KEY_PTRS(k))
+ if (KEY_PTRS(k) != KEY_PTRS(replace_key) ||
+ KEY_DIRTY(k) != KEY_DIRTY(replace_key))
goto check_failed;

/* skip past gen */
08239ca2a053dbc3b082916bdfbd88e5a9ad9267 28-Nov-2013 Wei Yongjun <yongjun_wei@trendmicro.com.cn> bcache: fix sparse non static symbol warning

Fixes the following sparse warning:

drivers/md/bcache/btree.c:2220:5: warning:
symbol 'btree_insert_fn' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
7988613b0e5b2638caf6cd493cc78e9595eba19c 24-Nov-2013 Kent Overstreet <kmo@daterainc.com> block: Convert bio_for_each_segment() to bvec_iter

More prep work for immutable biovecs - with immutable bvecs drivers
won't be able to use the biovec directly, they'll need to use helpers
that take into account bio->bi_iter.bi_bvec_done.

This updates callers for the new usage without changing the
implementation yet.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Cc: Jim Paris <jim@jtan.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
Cc: support@lsi.com
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-m68k@lists.linux-m68k.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: drbd-user@lists.linbit.com
Cc: nbd-general@lists.sourceforge.net
Cc: cbe-oss-dev@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-raid@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: DL-MPTFusionLinux@lsi.com
Cc: linux-scsi@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: linux-fsdevel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: linux-mm@kvack.org
Acked-by: Geoff Levand <geoff@infradead.org>
4f024f3797c43cb4b73cd2c50cec728842d0e49e 12-Oct-2013 Kent Overstreet <kmo@daterainc.com> block: Abstract out bvec iterator

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
48a915a87f0bd98c3d68d029acf223a2e5116f07 31-Oct-2013 Kent Overstreet <kmo@daterainc.com> bcache: Better full stripe scanning

The old scanning-by-stripe code burned too much CPU, this should be
better.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
17e21a9f248d3d330acdfb2405c23b8d84c9c23a 26-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Have btree_split() insert into parent directly

The flow control in btree_insert_node() was... fragile... before,
this'll use more stack (but since our btrees are never more than depth
1, that shouldn't matter) and it should be significantly clearer and
less fragile.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
65d22e911bfc4f46cda4751f1b1926b43c316c14 31-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Move spinlock into struct time_stats

Minor cleanup.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
50310164bcd789eb3690f45a9baf8a507bf93358 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Kill bch_next_recurse_key()

This dates from before the btree iterator, and now it's finally gone

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
bc9389eefe479b7b7b323c2729b61a7155d2d0ea 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Avoid deadlocking in garbage collection

Not a complete fix - we could still deadlock if btree_insert_node() has
to split...

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
a1f0358b2bf69be216cb6e4ea40fe7ae4d38b8a6 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Incremental gc

Big garbage collection rewrite; now, garbage collection uses the same
mechanisms as used elsewhere for inserting/updating btree node pointers,
instead of rewriting interior btree nodes in place.

This makes the code significantly cleaner and less fragile, and means we
can now make garbage collection incremental - it doesn't have to hold a
write lock on the root of the btree for the entire duration of garbage
collection.

This means that there's less of a latency hit for doing garbage
collection, which means we can gc more frequently (and do a better job
of reclaiming from the cache), and we can coalesce across more btree
nodes (improving our space efficiency).

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
8835c1234dd9a838993a2d5cb7572f57992ebbee 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add make_btree_freeing_key()

Refactoring, prep work for incremental garbage collection.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
f269af5a078302712de8ee70d273eba2eb4485ca 24-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add btree_node_write_sync()

More refactoring - mostly making the interfaces more explicit about what
we actually want to do.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
0eacac22034ca21c73fe49e800d0b938b2047250 02-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: PRECEDING_KEY()

btree_insert_key() was open coding this, this is just refactoring.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
3a3b6a4e075188342b58d4b6560f5540af64cac0 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Don't bother with bucket refcount for btree node allocations

The bucket refcount (dropped with bkey_put()) is only needed to prevent
the newly allocated bucket from being garbage collected until we've
added a pointer to it somewhere. But for btree node allocations, the
fact that we have btree nodes locked is enough to guard against races
with garbage collection.

Eventually the per bucket refcount is going to be replaced with
something specific to bch_alloc_sectors().

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
280481d06c8a683d9aaa26125476222e76b733c5 25-Oct-2013 Kent Overstreet <kmo@daterainc.com> bcache: Debug code improvements

Couple changes:
* Consolidate bch_check_keys() and bch_check_key_order(), and move the
checks that only check_key_order() could do to bch_btree_iter_next().

* Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in
when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to
flip on the EDEBUG checks at runtime.

* Dropped an old not terribly useful check in rw_unlock(), and
refactored/improved a some of the other debug code.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
81ab4190ac17df41686a37c97f701623276b652a 31-Oct-2013 Kent Overstreet <kmo@daterainc.com> bcache: Pull on disk data structures out into a separate header

Now, the on disk data structures are in a header that can be exported to
userspace - and having them all centralized is nice too.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
cc7b8819212f437fc82f0f9cdc24deb0fb5d775f 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert bch_btree_insert() to bch_btree_map_leaf_nodes()

Last of the btree_map() conversions. Main visible effect is
bch_btree_insert() is no longer taking a struct btree_op as an argument
anymore - there's no fancy state machine stuff going on, it's just a
normal function.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
6054c6d4da1940c7bf8870c6393773aa794f53d8 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Don't use op->insert_collision

When we convert bch_btree_insert() to bch_btree_map_leaf_nodes(), we
won't be passing struct btree_op to bch_btree_insert() anymore - so we
need a different way of returning whether there was a collision (really,
a replace collision).

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
1b207d80d5b986fb305bc899357435d319319513 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Kill op->replace

This is prep work for converting bch_btree_insert to
bch_btree_map_leaf_nodes() - we have to convert all its arguments to
actual arguments. Bunch of churn, but should be straightforward.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
faadf0c96547ec8277ad0abd6959f2ef48522f31 02-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Drop some closure stuff

With a the recent bcache refactoring, some of the closure code isn't
needed anymore.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
b54d6934da7857f87b092df9b77dc1f42818ba94 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Kill op->cl

This isn't used for waiting asynchronously anymore - so this is a fairly
trivial refactoring.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
c18536a72ddd7fe30d63e6c1500b5c930ac14594 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Prune struct btree_op

Eventual goal is for struct btree_op to contain only what is necessary
for traversing the btree.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2c1953e201a05ddfb1ea53f23d81a492c6513028 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert bch_btree_read_async() to bch_btree_map_keys()

This is a fairly straightforward conversion, mostly reshuffling -
op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the
code for handling cache hits and misses wasn't really btree code, so it
gets moved to request.c.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
df8e89701fb02cba6e09c5f46f002778b5b52dd2 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Move some stuff to btree.c

With the new btree_map() functions, we don't need to export the stuff
needed for traversing the btree anymore.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
48dad8baf92fe8967d9e1358af1cfdda1d2d3298 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add btree_map() functions

Lots of stuff has been open coding its own btree traversal - which is
generally pretty simple code, but there are a few subtleties.

This adds new new functions, bch_btree_map_nodes() and
bch_btree_map_keys(), which do the traversal for you. Everything that's
open coding btree traversal now (with the exception of garbage
collection) is slowly going to be converted to these two functions;
being able to write other code at a higher level of abstraction is a
big improvement w.r.t. overall code quality.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
72a44517f3ca3725dc86081d105457df46448679 25-Oct-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert gc to a kthread

We needed a dedicated rescuer workqueue for gc anyways... and gc was
conceptually a dedicated thread, just one that wasn't running all the
time. Switch it to a dedicated thread to make the code a bit more
straightforward.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
35fcd848d72683141052aa9880542461577f2dbe 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert bucket_wait to wait_queue_head_t

At one point we did do fancy asynchronous waiting stuff with
bucket_wait, but that's all gone (and bucket_wait is used a lot less
than it used to be). So use the standard primitives.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
e8e1d4682c8cb06dbcb5ef7bb851bf9bcb889c84 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert try_wait to wait_queue_head_t

We never waited on c->try_wait asynchronously, so just use the standard
primitives.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
0b93207abb40d3c42bb83eba1e1e7edc1da77810 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Move keylist out of btree_op

Slowly working on pruning struct btree_op - the aim is for it to only
contain things that are actually necessary for traversing the btree.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
a34a8bfd4e6358c646928320d37b0425c0762f8a 25-Oct-2013 Kent Overstreet <kmo@daterainc.com> bcache: Refactor journalling flow control

Making things less asynchronous that don't need to be - bch_journal()
only has to block when the journal or journal entry is full, which is
emphatically not a fast path. So make it a normal function that just
returns when it finishes, to make the code and control flow easier to
follow.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
c2f95ae2ebbe1ab61b1d4437f5923fdf720d4d4d 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Clean up keylist code

More random refactoring.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
4f3d40147b8d0ce7055e241e1d263e0aa2b2b46d 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add explicit keylist arg to btree_insert()

Some refactoring - better to explicitly pass stuff around instead of
having it all in the "big bag of state", struct btree_op. Going to prune
struct btree_op quite a bit over time.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
e7c590eb63509c5d5f48a390d23aa25f4417ac96 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Convert btree_insert_check_key() to btree_insert_node()

This was the main point of all this refactoring - now,
btree_insert_check_key() won't fail just because the leaf node happened
to be full.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
403b6cdeb1a38d896ffcb1f99ddcfd4e343b5d69 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Insert multiple keys at a time

We'll often end up with a list of adjacent keys to insert -
because bch_data_insert() may have to fragment the data it writes.

Originally, to simplify things and avoid having to deal with corner
cases bch_btree_insert() would pass keys from this list one at a time to
btree_insert_recurse() - mainly because the list of keys might span leaf
nodes, so it was easier this way.

With the btree_insert_node() refactoring, it's now a lot easier to just
pass down the whole list and have btree_insert_recurse() iterate over
leaf nodes until it's done.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
26c949f8062cb9221a28b2228104f1cc5b265097 11-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Add btree_insert_node()

The flow of control in the old btree insertion code was rather -
backwards; we'd recurse down the btree (in btree_insert_recurse()), and
then if we needed to split the keys to be inserted into the parent node
would be effectively returned up to btree_insert_recurse(), which would
notice there was more work to do and finish the insertion.

The main problem with this was that the full logic for btree insertion
could only be used by calling btree_insert_recurse; if you'd gotten to a
btree leaf some other way and had a key to insert, if it turned out that
node needed to be split you were SOL.

This inverts the flow of control so btree_insert_node() does _full_
btree insertion, including splitting - and takes a (leaf) btree node to
insert into as a parameter.

This means we can now _correctly_ handle cache misses - for cache
misses, we need to insert a fake "check" key into the btree when we
discover we have a cache miss - while we still have the btree locked.
Previously, if the btree node was full inserting a cache miss would just
fail.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
d6fd3b11cea82346837957feab25b0be48aa424c 25-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Explicitly track btree node's parent

This is prep work for the reworked btree insertion code.

The way we set b->parent is ugly and hacky... the problem is, when
btree_split() or garbage collection splits or rewrites a btree node, the
parent changes for all its (potentially already cached) children.

I may change this later and add some code to look through the btree node
cache and find all our cached child nodes and change the parent pointer
then...

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
1fa8455deb92e9ec7756df23030e73b2d28eeca7 11-Nov-2013 Kent Overstreet <kmo@daterainc.com> bcache: Fix dirty_data accounting

Dirty data accounting wasn't quite right - firstly, we were adding the key we're
inserting after it could have merged with another dirty key already in the
btree, and secondly we could sometimes pass the wrong offset to
bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is
important when tracking dirty data by stripe.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
a698e08c82dfb9771e0bac12c7337c706d729b6d 24-Sep-2013 Kent Overstreet <kmo@daterainc.com> bcache: Fix a shrinker deadlock

GFP_NOIO means we could be getting called recursively - mca_alloc() ->
mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then.
Whoops.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
61cbd250f867f98bb4738000afc6002d6f2b14bd 24-Sep-2013 Geert Uytterhoeven <geert@linux-m68k.org> bcache: Correct printf()-style format length modifier

Fix

drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’:
drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7dc19d5affd71370754a2c3d36b485810eaee7a1 28-Aug-2013 Dave Chinner <dchinner@redhat.com> drivers: convert shrinkers to new count/scan API

Convert the driver shrinkers to the new API. Most changes are compile
tested only because I either don't have the hardware or it's staging
stuff.

FWIW, the md and android code is pretty good, but the rest of it makes me
want to claw my eyes out. The amount of broken code I just encountered is
mind boggling. I've added comments explaining what is broken, but I fear
that some of the code would be best dealt with by being dragged behind the
bike shed, burying in mud up to it's neck and then run over repeatedly
with a blunt lawn mower.

Special mention goes to the zcache/zcache2 drivers. They can't co-exist
in the build at the same time, they are under different menu options in
menuconfig, they only show up when you've got the right set of mm
subsystem options configured and so even compile testing is an exercise in
pulling teeth. And that doesn't even take into account the horrible,
broken code...

[glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
29ebf465b9050f241c4433a796a32e6c896a9dcd 12-Jul-2013 Kent Overstreet <kmo@daterainc.com> bcache: Fix GC_SECTORS_USED() calculation

Part of the job of garbage collection is to add up however many sectors
of live data it finds in each bucket, but that doesn't work very well if
it doesn't reset GC_SECTORS_USED() when it starts. Whoops.

This wouldn't have broken anything horribly, but allocation tries to
preferentially reclaim buckets that are mostly empty and that's not
gonna work with an incorrect GC_SECTORS_USED() value.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
8e51e414a3c6d92ef2cc41720c67342a8e2c0bf7 07-Jun-2013 Kent Overstreet <koverstreet@google.com> bcache: Use standard utility code

Some of bcache's utility code has made it into the rest of the kernel,
so drop the bcache versions.

Bcache used to have a workaround for allocating from a bio set under
generic_make_request() (if you allocated more than once, the bios you
already allocated would get stuck on current->bio_list when you
submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT
when allocating bios under generic_make_request() so that allocation
could fail and it could retry from workqueue. But bio_alloc_bioset() has
a workaround now, so we can drop this hack and the associated error
handling.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
f3059a54610f6c516c0942d58b9435921768ce2d 16-May-2013 Kent Overstreet <koverstreet@google.com> bcache: Delete fuzz tester

This code has rotted and it hasn't been used in ages anyways.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
36c9ea9837c1cb21c778781495101eaff7e5eb56 03-Jun-2013 Kent Overstreet <koverstreet@google.com> bcache: Document shrinker reserve better

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
e49c7c374e7aacd1f04ecbc21d9dbbeeea4a77d6 27-Jun-2013 Kent Overstreet <koverstreet@google.com> bcache: FUA fixes

Journal writes need to be marked FUA, not just REQ_FLUSH. And btree node
writes have... weird ordering requirements.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
72c270612bd33192fa836ad0f2939af1ca218292 05-Jun-2013 Kent Overstreet <koverstreet@google.com> bcache: Write out full stripes

Now that we're tracking dirty data per stripe, we can add two
optimizations for raid5/6:

* If a stripe is already dirty, force writes to that stripe to
writeback mode - to help build up full stripes of dirty data

* When flushing dirty data, preferentially write out full stripes first
if there are any.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
279afbad4e54acbd61bf88a54a73af3bbfdeb5dd 05-Jun-2013 Kent Overstreet <koverstreet@google.com> bcache: Track dirty data by stripe

To make background writeback aware of raid5/6 stripes, we first need to
track the amount of dirty data within each stripe - we do this by
breaking up the existing sectors_dirty into per stripe atomic_ts

Signed-off-by: Kent Overstreet <koverstreet@google.com>
444fc0b6b167ed164e7436621a9d095e042644dd 12-May-2013 Kent Overstreet <koverstreet@google.com> bcache: Initialize sectors_dirty when attaching

Previously, dirty_data wouldn't get initialized until the first garbage
collection... which was a bit of a problem for background writeback (as
the PD controller keys off of it) and also confusing for users.

This is also prep work for making background writeback aware of raid5/6
stripes.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
85b1492ee113486d871de7676a61f506a43ca475 15-May-2013 Kent Overstreet <koverstreet@google.com> bcache: Rip out pkey()/pbtree()

Old gcc doesnt like the struct hack, and it is kind of ugly. So finish
off the work to convert pr_debug() statements to tracepoints, and delete
pkey()/pbtree().

Signed-off-by: Kent Overstreet <koverstreet@google.com>
c37511b863f36c1cc6e18440717fd4cc0e881b8a 27-Apr-2013 Kent Overstreet <koverstreet@google.com> bcache: Fix/revamp tracepoints

The tracepoints were reworked to be more sensible, and fixed a null
pointer deref in one of the tracepoints.

Converted some of the pr_debug()s to tracepoints - this is partly a
performance optimization; it used to be that with DEBUG or
CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
was changed to an empty inline function.

Some of the pr_debug() statements had rather expensive function calls as
part of the arguments, so this code was getting run unnecessarily even
on non debug kernels - in some fast paths, too.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
5794351146199b9ac67a5ab1beab82be8bfd7b5d 25-Apr-2013 Kent Overstreet <koverstreet@google.com> bcache: Refactor btree io

The most significant change is that btree reads are now done
synchronously, instead of asynchronously and doing the post read stuff
from a workqueue.

This was originally done because we can't block on IO under
generic_make_request(). But - we already have a mechanism to punt cache
lookups to workqueue if needed, so if we just use that we don't have to
deal with the complexity of doing things asynchronously.

The main benefit is this makes the locking situation saner; we can hold
our write lock on the btree node until we're finished reading it, and we
don't need that btree_node_read_done() flag anymore.

Also, for writes, btree_write() was broken out into btree_node_write()
and btree_leaf_dirty() - the old code with the boolean argument was dumb
and confusing.

The prio_blocked mechanism was improved a bit too, now the only counter
is in struct btree_write, we don't mess with transfering a count from
struct btree anymore.

This required changing garbage collection to block prios at the start
and unblock when it finishes, which is cleaner than what it was doing
anyways (the old code had mostly the same effect, but was doing it in a
convoluted way)

And the btree iter btree_node_read_done() uses was converted to a real
mempool.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
119ba0f82839cd80eaef3e6991988f1403965d5b 25-Apr-2013 Kent Overstreet <koverstreet@google.com> bcache: Convert allocator thread to kthread

Using a workqueue when we just want a single thread is a bit silly.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
86b26b824cf5d15d4408b33d6d716104f249e8bd 01-May-2013 Kent Overstreet <koverstreet@google.com> bcache: Allocator cleanup/fixes

The main fix is that bch_allocator_thread() wasn't waiting on
garbage collection to finish (if invalidate_buckets had set
ca->invalidate_needs_gc); we need that to make sure the allocator
doesn't spin and potentially block gc from finishing.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
cd953ed0363b28e3dc503b735cc4079e9f5edba7 27-Mar-2013 Geert Uytterhoeven <geert@linux-m68k.org> bcache: Add missing #include <linux/prefetch.h>

m68k/allmodconfig:

drivers/md/bcache/bset.c: In function ‘bset_search_tree’:
drivers/md/bcache/bset.c:727: error: implicit declaration of function ‘prefetch’

drivers/md/bcache/btree.c: In function ‘bch_btree_node_get’:
drivers/md/bcache/btree.c:933: error: implicit declaration of function ‘prefetch’

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
c19ed23a0b1848eca6b6f22c1ee233abe54d37f9 26-Mar-2013 Kent Overstreet <koverstreet@google.com> bcache: Sparse fixes

Signed-off-by: Kent Overstreet <koverstreet@google.com>
169ef1cf6171d35550fef85645b83b960e241cff 28-Mar-2013 Kent Overstreet <koverstreet@google.com> bcache: Don't export utility code, prefix with bch_

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
b1a67b0f4c747ca10c96ebb24f04e2a74b3c298d 25-Mar-2013 Kent Overstreet <koverstreet@google.com> bcache: Style/checkpatch fixes

Took out some nested functions, and fixed some more checkpatch
complaints.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
07e86ccb543bb1e748f32d6f0f18913d3f58d988 25-Mar-2013 Kent Overstreet <koverstreet@google.com> bcache: Build fixes from test robot

config: make ARCH=i386 allmodconfig

All error/warnings:

drivers/md/bcache/bset.c: In function 'bch_ptr_bad':
>> drivers/md/bcache/bset.c:164:2: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat]
--
drivers/md/bcache/debug.c: In function 'bch_pbtree':
>> drivers/md/bcache/debug.c:86:4: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat]
--
drivers/md/bcache/btree.c: In function 'bch_btree_read_done':
>> drivers/md/bcache/btree.c:245:8: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat]
--
drivers/md/bcache/closure.o: In function `closure_debug_init':
>> (.init.text+0x0): multiple definition of `init_module'
>> drivers/md/bcache/super.o:super.c:(.init.text+0x0): first defined here

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
cafe563591446cf80bfbc2fe3bc72a2e36cf1060 24-Mar-2013 Kent Overstreet <koverstreet@google.com> bcache: A block layer cache

Does writethrough and writeback caching, handles unclean shutdown, and
has a bunch of other nifty features motivated by real world usage.

See the wiki at http://bcache.evilpiepirate.org for more.

Signed-off-by: Kent Overstreet <koverstreet@google.com>