Compare commits
84 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
21d8cc8e5a | ||
|
|
5dcd9fb036 | ||
|
|
0c2ce66c42 | ||
|
|
899d510f98 | ||
|
|
71b28462ad | ||
|
|
25b230ac26 | ||
|
|
7ec2cf5188 | ||
|
|
3531efa5d5 | ||
|
|
aad3968aa3 | ||
|
|
7341e2c04c | ||
|
|
b207fd3776 | ||
|
|
5020de4a73 | ||
|
|
8754293bb2 | ||
|
|
a596d6dd72 | ||
|
|
2afc140a8f | ||
|
|
4e5d0ed8d0 | ||
|
|
46dde506be | ||
|
|
475ed5727f | ||
|
|
2f0a526576 | ||
|
|
ffbc1d7385 | ||
|
|
5bc8353ad8 | ||
|
|
d6c9313b9f | ||
|
|
c47b927b26 | ||
|
|
11348d6db2 | ||
|
|
444e8a9362 | ||
|
|
c8a4341d85 | ||
|
|
abd79f526b | ||
|
|
1b70668459 | ||
|
|
f4eb5f5ae4 | ||
|
|
760d924220 | ||
|
|
fc0bfe8cf8 | ||
|
|
106781ae4d | ||
|
|
dfa2ced70b | ||
|
|
0c513f52de | ||
|
|
560728aea3 | ||
|
|
d3653388d3 | ||
|
|
83f31ca7cc | ||
|
|
7b69ec27f8 | ||
|
|
82da24ab69 | ||
|
|
237fa82236 | ||
|
|
aa2d8d54f6 | ||
|
|
596dc261cc | ||
|
|
4b149e5d7e | ||
|
|
8af4f55089 | ||
|
|
50a82caf47 | ||
|
|
830b4408ac | ||
|
|
fd19bf22ef | ||
|
|
00571007ad | ||
|
|
133df7fe40 | ||
|
|
e1436289dc | ||
|
|
97ddb19dc5 | ||
|
|
563cb4c1fa | ||
|
|
3b6b18eb91 | ||
|
|
fcf7d77313 | ||
|
|
27e8b3e403 | ||
|
|
53fdb96878 | ||
|
|
d1a3194794 | ||
|
|
95eed0bb99 | ||
|
|
41e4f7a105 | ||
|
|
ff48224c38 | ||
|
|
f7af15fc88 | ||
|
|
4289f0a1bf | ||
|
|
57cf5c5693 | ||
|
|
4620a3c2e5 | ||
|
|
30cee0a04e | ||
|
|
96c4d80978 | ||
|
|
a12b4b8927 | ||
|
|
5eba9718d8 | ||
|
|
ad66a9457f | ||
|
|
eea5629e03 | ||
|
|
7ec10f1dd8 | ||
|
|
c3f99ef868 | ||
|
|
249b9bb0ca | ||
|
|
de5376d583 | ||
|
|
f28613804c | ||
|
|
0ae9906cff | ||
|
|
8bc7e8e6f9 | ||
|
|
94541fdfe4 | ||
|
|
9c0e333511 | ||
|
|
81d884b9c5 | ||
|
|
13e9ae7627 | ||
|
|
829fbafa85 | ||
|
|
c644dab8ea | ||
|
|
74f4280476 |
1
.hgtags
1
.hgtags
@@ -11,3 +11,4 @@ ef844093bec2ac38945fd04487dc3a051f4b9136 1.1.5
|
||||
9046bcab74eba90a2cb05af28026ec4a74e4fb9c 1.2.1
|
||||
50cb97d00c6238ebef64e290616e8cec9995687f 1.2.2
|
||||
ef3ccde04cb28060319be900a2d31c88071933f6 1.3.0
|
||||
945a898eb714bb8d46c088928d81b2135eefc18e 1.3.1
|
||||
|
||||
59
NEWS
59
NEWS
@@ -1,3 +1,62 @@
|
||||
WiredTiger release 1.3.2, 2012-10-03
|
||||
------------------------------------
|
||||
|
||||
This is a bugfix and performance tuning release, primarily related to LSM
|
||||
trees. The changes are as follows:
|
||||
|
||||
* Implement minor merges for LSM trees, prefer them to major merges.
|
||||
|
||||
* Update hazard references, so the active array grows as needed. Change
|
||||
the default hazard_max to 1000.
|
||||
|
||||
* Abort transactions if the cache is so full that they cannot make
|
||||
progress.
|
||||
|
||||
* Fix a bug where verify could crash if an empty checkpoint exists.
|
||||
|
||||
* Make the maximum number of chunks for merges configurable, rather than
|
||||
deriving a value from the number of hazard references available.
|
||||
|
||||
* Switch to an atomic add to allocate transaction IDs. This fixes a subtle
|
||||
race before where two threads could temporarily have the same ID in the
|
||||
global state table. If one of the threads timed out and the other thread
|
||||
committed its transaction with that ID, the commit would not become
|
||||
visible immediately. This could lead to deadlock errors in workloads
|
||||
that are logically conflict-free.
|
||||
|
||||
* Have auto-commit transactions retry deadlocks. This requires that we
|
||||
keep the user's key and value in the cursor.
|
||||
|
||||
* Simplify the code handling updated records in variable-length
|
||||
column-store reconciliation.
|
||||
|
||||
* Never wait for eviction when holding the schema lock. This avoids
|
||||
deadlocks between opening a column store file and taking a checkpoint.
|
||||
|
||||
* Take care with the loop termination when walking files for eviction. We
|
||||
were making one extra call into __wt_tree_walk, which would leave a leaf
|
||||
page in the WT_REF_EVICT_WALK state, unable to be evicted. In some
|
||||
workloads, including LSM loads, we could end up with many files all
|
||||
consisting of a single leaf page, none of which could be evicted.
|
||||
|
||||
* Pause updates when the cache is full.
|
||||
|
||||
* In files marked as "out of cache", don't wait for eviction when reading a
|
||||
page.
|
||||
|
||||
* Fix the record count calculation for minor merges. This was leading to
|
||||
no Bloom filter being created for minor merges after running for some
|
||||
time, leading to merges taking increasingly long to complete.
|
||||
|
||||
* Only sleep in the LSM checkpoint thread if no work is done.
|
||||
|
||||
* Add sanity check of cache size to LSM open.
|
||||
|
||||
[#338] Create fake checkpoints until an object is modified, so that a
|
||||
checkpoint between the cursor create and the bulk load doesn't make
|
||||
it impossible to do a bulk-load on the cursor.
|
||||
|
||||
|
||||
WiredTiger release 1.3.1, 2012-09-25
|
||||
------------------------------------
|
||||
|
||||
|
||||
4
README
4
README
@@ -1,6 +1,6 @@
|
||||
WiredTiger 1.3.1: (September 25, 2012)
|
||||
WiredTiger 1.3.2: (October 3, 2012)
|
||||
|
||||
This is version 1.3.1 of WiredTiger.
|
||||
This is version 1.3.2 of WiredTiger.
|
||||
|
||||
WiredTiger documentation can be found at:
|
||||
|
||||
|
||||
2
RELEASE
2
RELEASE
@@ -1,6 +1,6 @@
|
||||
WIREDTIGER_VERSION_MAJOR=1
|
||||
WIREDTIGER_VERSION_MINOR=3
|
||||
WIREDTIGER_VERSION_PATCH=1
|
||||
WIREDTIGER_VERSION_PATCH=2
|
||||
WIREDTIGER_VERSION="$WIREDTIGER_VERSION_MAJOR.$WIREDTIGER_VERSION_MINOR.$WIREDTIGER_VERSION_PATCH"
|
||||
|
||||
WIREDTIGER_RELEASE_DATE=`date "+%B %e, %Y"`
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir)
|
||||
AM_CPPFLAGS = -I$(top_builddir)
|
||||
LDADD = $(top_builddir)/libwiredtiger.la
|
||||
|
||||
noinst_PROGRAMS = wttest
|
||||
|
||||
@@ -31,7 +31,7 @@ wt_SOURCES =\
|
||||
src/utilities/util_write.c
|
||||
|
||||
include_HEADERS= wiredtiger.h
|
||||
INCLUDES = -I$(srcdir)/src/include
|
||||
AM_CPPFLAGS = -I$(srcdir)/src/include
|
||||
|
||||
pkgconfigdir = $(libdir)/pkgconfig
|
||||
pkgconfig_DATA = wiredtiger.pc
|
||||
|
||||
@@ -2,8 +2,8 @@ dnl build by dist/s_version
|
||||
|
||||
VERSION_MAJOR=1
|
||||
VERSION_MINOR=3
|
||||
VERSION_PATCH=1
|
||||
VERSION_STRING='"WiredTiger 1.3.1: (September 25, 2012)"'
|
||||
VERSION_PATCH=2
|
||||
VERSION_STRING='"WiredTiger 1.3.2: (October 3, 2012)"'
|
||||
|
||||
AC_SUBST(VERSION_MAJOR)
|
||||
AC_SUBST(VERSION_MINOR)
|
||||
|
||||
@@ -1,2 +1,2 @@
|
||||
dnl WiredTiger product version for AC_INIT. Maintained by dist/s_version
|
||||
1.3.1
|
||||
1.3.2
|
||||
|
||||
37
dist/api_data.py
vendored
37
dist/api_data.py
vendored
@@ -80,15 +80,18 @@ format_meta = column_meta + [
|
||||
]
|
||||
|
||||
lsm_config = [
|
||||
Config('lsm_chunk_size', '2MB', r'''
|
||||
the maximum size of the in-memory chunk of an LSM tree''',
|
||||
min='512K',max='500MB'),
|
||||
Config('lsm_bloom_hash_count', '4', r'''
|
||||
the number of hash values per item used for LSM bloom filters.''',
|
||||
min='2',max='100'),
|
||||
min='2', max='100'),
|
||||
Config('lsm_bloom_bit_count', '8', r'''
|
||||
the number of bits used per item for LSM bloom filters.''',
|
||||
min='2',max='1000'),
|
||||
min='2', max='1000'),
|
||||
Config('lsm_chunk_size', '2MB', r'''
|
||||
the maximum size of the in-memory chunk of an LSM tree''',
|
||||
min='512K', max='500MB'),
|
||||
Config('lsm_merge_max', '15', r'''
|
||||
the maximum number of chunks to include in a merge operation''',
|
||||
min='2', max='100'),
|
||||
]
|
||||
|
||||
# Per-file configuration
|
||||
@@ -279,16 +282,16 @@ methods = {
|
||||
number key; valid only for cursors with record number keys''',
|
||||
type='boolean'),
|
||||
Config('bulk', 'false', r'''
|
||||
configure the cursor for bulk loads; bulk-load is a fast
|
||||
load path for newly created objects and only newly
|
||||
created objects may be bulk-loaded. Cursors configured
|
||||
for bulk load only support the WT_CURSOR::insert and
|
||||
WT_CURSOR::close methods''',
|
||||
configure the cursor for bulk loads, a fast load path
|
||||
that may only be used for newly created objects. Cursors
|
||||
configured for bulk load only support the WT_CURSOR::insert
|
||||
and WT_CURSOR::close methods''',
|
||||
type='boolean'),
|
||||
Config('checkpoint', '', r'''
|
||||
the name of a checkpoint to open; the reserved checkpoint
|
||||
name "WiredTigerCheckpoint" opens a cursor on the most recent
|
||||
internal checkpoint taken for the object'''),
|
||||
the name of a checkpoint to open (the reserved name
|
||||
"WiredTigerCheckpoint" opens the most recent internal
|
||||
checkpoint taken for the object). The cursor does not
|
||||
support data modification'''),
|
||||
Config('dump', '', r'''
|
||||
configure the cursor for dump format inputs and outputs:
|
||||
"hex" selects a simple hexadecimal format, "print"
|
||||
@@ -303,6 +306,10 @@ methods = {
|
||||
and WT_CURSOR::close methods. See @ref cursor_random for
|
||||
details''',
|
||||
type='boolean'),
|
||||
Config('no_cache', 'false', r'''
|
||||
do not cache pages from the underlying object. The cursor
|
||||
does not support data modification''',
|
||||
type='boolean', undoc=True),
|
||||
Config('overwrite', 'false', r'''
|
||||
change the behavior of the cursor's insert method to overwrite
|
||||
previously existing values''',
|
||||
@@ -409,8 +416,8 @@ methods = {
|
||||
paths may need quoting, for example,
|
||||
<code>extensions=("/path/to/ext.so"="entry=my_entry")</code>''',
|
||||
type='list'),
|
||||
Config('hazard_max', '30', r'''
|
||||
number of simultaneous hazard references per session handle''',
|
||||
Config('hazard_max', '1000', r'''
|
||||
maximum number of simultaneous hazard references per session handle''',
|
||||
min='15'),
|
||||
Config('logging', 'false', r'''
|
||||
enable logging''',
|
||||
|
||||
3
dist/config.py
vendored
3
dist/config.py
vendored
@@ -94,6 +94,9 @@ for line in open(f, 'r'):
|
||||
if name == lastname:
|
||||
continue
|
||||
lastname = name
|
||||
if 'undoc' in c.flags:
|
||||
continue
|
||||
|
||||
desc = textwrap.dedent(c.desc) + '.'
|
||||
desc = desc.replace(',', '\\,')
|
||||
default = '\\c ' + str(c.default) if c.default or gettype(c) == 'int' \
|
||||
|
||||
1
dist/s_define.list
vendored
1
dist/s_define.list
vendored
@@ -13,6 +13,7 @@ SIZE_CHECK
|
||||
TXN_API_CALL
|
||||
TXN_API_CALL_NOCONF
|
||||
TXN_API_END
|
||||
TXNID_LE
|
||||
WT_BARRIER
|
||||
WT_BLOCK_DESC_SIZE
|
||||
WT_DEBUG_BYTE
|
||||
|
||||
4
dist/s_string.ok
vendored
4
dist/s_string.ok
vendored
@@ -247,6 +247,7 @@ btcur
|
||||
btdsk
|
||||
btmem
|
||||
btree
|
||||
btrees
|
||||
buf
|
||||
builtin
|
||||
bytelock
|
||||
@@ -262,6 +263,8 @@ cd
|
||||
cfg
|
||||
cfkos
|
||||
checkfrag
|
||||
checkpointed
|
||||
checkpointing
|
||||
checksum
|
||||
checksums
|
||||
chk
|
||||
@@ -688,6 +691,7 @@ uninstantiated
|
||||
unix
|
||||
unjams
|
||||
unlinked
|
||||
unmerged
|
||||
unmodify
|
||||
unpackv
|
||||
unreferenced
|
||||
|
||||
1
dist/stat_data.py
vendored
1
dist/stat_data.py
vendored
@@ -40,6 +40,7 @@ connection_stats = [
|
||||
Stat('txn_ancient', 'ancient transactions'),
|
||||
Stat('txn_begin', 'transactions'),
|
||||
Stat('txn_commit', 'transactions committed'),
|
||||
Stat('txn_fail_cache', 'transaction failures due to cache overflow'),
|
||||
Stat('txn_rollback', 'transactions rolled-back'),
|
||||
]
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
|
||||
lib_LTLIBRARIES = reverse_collator.la
|
||||
reverse_collator_la_LDFLAGS = -avoid-version -module
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
|
||||
lib_LTLIBRARIES = bzip2_compress.la
|
||||
bzip2_compress_la_LDFLAGS = -avoid-version -module
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
|
||||
lib_LTLIBRARIES = nop_compress.la
|
||||
nop_compress_la_LDFLAGS = -avoid-version -module
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
|
||||
lib_LTLIBRARIES = snappy_compress.la
|
||||
snappy_compress_la_LDFLAGS = -avoid-version -module
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(abs_top_builddir)
|
||||
AM_CPPFLAGS = -I$(abs_top_builddir)
|
||||
|
||||
PYSRC = $(top_srcdir)/lang/python
|
||||
if DEBUG
|
||||
|
||||
@@ -115,6 +115,9 @@ __verify_start_filesize(WT_SESSION_IMPL *session,
|
||||
*/
|
||||
file_size = 0;
|
||||
WT_CKPT_FOREACH(ckptbase, ckpt) {
|
||||
/* Skip empty checkpoints. */
|
||||
if (ckpt->raw.size == 0)
|
||||
continue;
|
||||
WT_RET(__wt_block_buffer_to_ckpt(
|
||||
session, block, ckpt->raw.data, ci));
|
||||
if (ci->file_size > file_size)
|
||||
|
||||
@@ -22,6 +22,10 @@ __wt_bulk_init(WT_CURSOR_BULK *cbulk)
|
||||
session = (WT_SESSION_IMPL *)cbulk->cbt.iface.session;
|
||||
btree = session->btree;
|
||||
|
||||
/*
|
||||
* Bulk-load is only permitted on newly created files, not any empty
|
||||
* file -- see the checkpoint code for a discussion.
|
||||
*/
|
||||
if (!btree->bulk_load_ok)
|
||||
WT_RET_MSG(session, EINVAL,
|
||||
"bulk-load is only possible for newly created trees");
|
||||
|
||||
@@ -403,6 +403,7 @@ __wt_btcur_next(WT_CURSOR_BTREE *cbt, int discard)
|
||||
LF_SET(WT_TREE_DISCARD);
|
||||
|
||||
__cursor_func_init(cbt, 0);
|
||||
__cursor_position_clear(cbt);
|
||||
|
||||
/*
|
||||
* If we aren't already iterating in the right direction, there's
|
||||
@@ -507,6 +508,7 @@ __wt_btcur_next_random(WT_CURSOR_BTREE *cbt)
|
||||
WT_BSTAT_INCR(session, cursor_read_next);
|
||||
|
||||
__cursor_func_init(cbt, 1);
|
||||
__cursor_position_clear(cbt);
|
||||
|
||||
/*
|
||||
* Only supports row-store: applications can trivially select a random
|
||||
|
||||
@@ -491,6 +491,7 @@ __wt_btcur_prev(WT_CURSOR_BTREE *cbt, int discard)
|
||||
LF_SET(WT_TREE_DISCARD);
|
||||
|
||||
__cursor_func_init(cbt, 0);
|
||||
__cursor_position_clear(cbt);
|
||||
|
||||
/*
|
||||
* If we aren't already iterating in the right direction, there's
|
||||
|
||||
@@ -112,6 +112,7 @@ __wt_btcur_reset(WT_CURSOR_BTREE *cbt)
|
||||
|
||||
__cursor_leave(cbt);
|
||||
__cursor_search_clear(cbt);
|
||||
__cursor_position_clear(cbt);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
@@ -208,12 +208,15 @@ int
|
||||
__wt_debug_off(
|
||||
WT_SESSION_IMPL *session, uint32_t offset, uint32_t size, const char *ofile)
|
||||
{
|
||||
WT_BTREE *btree;
|
||||
WT_DECL_ITEM(buf);
|
||||
WT_DECL_RET;
|
||||
|
||||
btree = session->btree;
|
||||
|
||||
WT_RET(__wt_scr_alloc(session, size, &buf));
|
||||
WT_ERR(__wt_block_read_off(
|
||||
session, session->btree->block, buf, offset, size, 0));
|
||||
WT_ERR(__wt_block_read_off(session,
|
||||
btree->block, buf, offset, size, WT_BLOCK_CHECKSUM_NOT_SET));
|
||||
ret = __wt_debug_disk(session, buf->mem, ofile);
|
||||
err: __wt_scr_free(&buf);
|
||||
|
||||
|
||||
@@ -428,9 +428,13 @@ __evict_page(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
WT_RET(__wt_txn_init(session));
|
||||
|
||||
__wt_txn_get_evict_snapshot(session);
|
||||
saved_txn.oldest_snap_min = txn->oldest_snap_min;
|
||||
txn->isolation = TXN_ISO_READ_COMMITTED;
|
||||
ret = __wt_rec_evict(session, page, 0);
|
||||
|
||||
/* Keep count of any failures. */
|
||||
saved_txn.eviction_fails = txn->eviction_fails;
|
||||
|
||||
if (was_running) {
|
||||
WT_ASSERT(session, txn->snapshot == NULL ||
|
||||
txn->snapshot != saved_txn.snapshot);
|
||||
@@ -720,7 +724,8 @@ __evict_walk(WT_SESSION_IMPL *session)
|
||||
* get some pages from each underlying file. In practice, a realloc
|
||||
* is rarely needed, so it is worth avoiding the LRU lock.
|
||||
*/
|
||||
elem = WT_EVICT_WALK_BASE + (conn->btqcnt * WT_EVICT_WALK_PER_TABLE);
|
||||
elem = WT_EVICT_WALK_BASE +
|
||||
(conn->open_btree_count * WT_EVICT_WALK_PER_TABLE);
|
||||
if (elem > cache->evict_entries) {
|
||||
__wt_spin_lock(session, &cache->evict_lock);
|
||||
/* Save the offset of the eviction point. */
|
||||
@@ -792,16 +797,22 @@ __evict_walk_file(WT_SESSION_IMPL *session, u_int *slotp)
|
||||
end = cache->evict + cache->evict_entries;
|
||||
|
||||
/*
|
||||
* Get the next WT_EVICT_WALK_PER_TABLE entries.
|
||||
*
|
||||
* We can't evict the page just returned to us, it marks our place in
|
||||
* the tree. So, always stay one page ahead of the page being returned.
|
||||
* Get some more eviction candidate pages.
|
||||
*/
|
||||
for (evict = start, restarts = 0;
|
||||
evict < end && restarts <= 1 && ret == 0;
|
||||
evict < end && ret == 0;
|
||||
ret = __wt_tree_walk(session, &btree->evict_page, WT_TREE_EVICT)) {
|
||||
if ((page = btree->evict_page) == NULL) {
|
||||
++restarts;
|
||||
/*
|
||||
* Take care with terminating this loop.
|
||||
*
|
||||
* Don't make an extra call to __wt_tree_walk: that
|
||||
* will leave a page in the WT_REF_EVICT_WALK state,
|
||||
* unable to be evicted, which may prevent any work
|
||||
* from being done.
|
||||
*/
|
||||
if (++restarts == 2)
|
||||
break;
|
||||
continue;
|
||||
}
|
||||
|
||||
@@ -935,6 +946,7 @@ int
|
||||
__wt_evict_lru_page(WT_SESSION_IMPL *session, int is_app)
|
||||
{
|
||||
WT_BTREE *btree, *saved_btree;
|
||||
WT_DECL_RET;
|
||||
WT_PAGE *page;
|
||||
|
||||
__evict_get_page(session, is_app, &btree, &page);
|
||||
@@ -947,19 +959,14 @@ __wt_evict_lru_page(WT_SESSION_IMPL *session, int is_app)
|
||||
saved_btree = session->btree;
|
||||
WT_SET_BTREE_IN_SESSION(session, btree);
|
||||
|
||||
/*
|
||||
* We don't care why eviction failed (maybe the page was dirty and
|
||||
* we're out of disk space, or the page had an in-memory subtree
|
||||
* already being evicted).
|
||||
*/
|
||||
(void)__evict_page(session, page);
|
||||
ret = __evict_page(session, page);
|
||||
|
||||
(void)WT_ATOMIC_SUB(btree->lru_count, 1);
|
||||
|
||||
WT_CLEAR_BTREE_IN_SESSION(session);
|
||||
session->btree = saved_btree;
|
||||
|
||||
return (0);
|
||||
return (ret);
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
|
||||
#include "wt_internal.h"
|
||||
|
||||
static int __btree_conf(WT_SESSION_IMPL *);
|
||||
static int __btree_conf(WT_SESSION_IMPL *, const char *[]);
|
||||
static int __btree_get_last_recno(WT_SESSION_IMPL *);
|
||||
static int __btree_page_sizes(WT_SESSION_IMPL *, const char *);
|
||||
static int __btree_tree_open_empty(WT_SESSION_IMPL *, int);
|
||||
@@ -54,16 +54,11 @@ __wt_btree_open(WT_SESSION_IMPL *session,
|
||||
WT_CLEAR(dsk);
|
||||
|
||||
/* Initialize and configure the WT_BTREE structure. */
|
||||
WT_ERR(__btree_conf(session));
|
||||
WT_ERR(__btree_conf(session, cfg));
|
||||
|
||||
/*
|
||||
* Bulk-load is only permitted on newly created files, not any empty
|
||||
* file. The reason is because deleting a checkpoint requires writing
|
||||
* the file, and a fake checkpoint can't write the file. So, if you
|
||||
* have a named checkpoint in the file, then, because tree is empty,
|
||||
* you start bulk-loading it, then you enter another checkpoint with
|
||||
* the same name, you end up using a fake checkpoint to delete a real
|
||||
* checkpoint, and that's going to end in tears.
|
||||
* file -- see the checkpoint code for a discussion.
|
||||
*/
|
||||
created = addr == NULL || addr_size == 0;
|
||||
if (!created && F_ISSET(btree, WT_BTREE_BULK))
|
||||
@@ -72,11 +67,11 @@ __wt_btree_open(WT_SESSION_IMPL *session,
|
||||
|
||||
/* Handle salvage configuration. */
|
||||
forced_salvage = 0;
|
||||
if (F_ISSET(btree, WT_BTREE_SALVAGE)) {
|
||||
if (F_ISSET(btree, WT_BTREE_SALVAGE) && cfg != NULL) {
|
||||
ret = __wt_config_gets(session, cfg, "force", &cval);
|
||||
if (ret != 0 && ret != WT_NOTFOUND)
|
||||
WT_ERR(ret);
|
||||
if (cval.val != 0)
|
||||
if (ret == 0 && cval.val != 0)
|
||||
forced_salvage = 1;
|
||||
}
|
||||
|
||||
@@ -160,11 +155,12 @@ __wt_btree_close(WT_SESSION_IMPL *session)
|
||||
* Configure a WT_BTREE structure.
|
||||
*/
|
||||
static int
|
||||
__btree_conf(WT_SESSION_IMPL *session)
|
||||
__btree_conf(WT_SESSION_IMPL *session, const char *cfg[])
|
||||
{
|
||||
WT_BTREE *btree;
|
||||
WT_CONFIG_ITEM cval;
|
||||
WT_CONNECTION_IMPL *conn;
|
||||
WT_DECL_RET;
|
||||
WT_NAMED_COLLATOR *ncoll;
|
||||
uint32_t bitcnt;
|
||||
int fixed;
|
||||
@@ -188,8 +184,7 @@ __btree_conf(WT_SESSION_IMPL *session)
|
||||
|
||||
/* Row-store key comparison and key gap for prefix compression. */
|
||||
if (btree->type == BTREE_ROW) {
|
||||
WT_RET(__wt_config_getones(
|
||||
session, config, "collator", &cval));
|
||||
WT_RET(__wt_config_getones(session, config, "collator", &cval));
|
||||
if (cval.len > 0) {
|
||||
TAILQ_FOREACH(ncoll, &conn->collqh, q) {
|
||||
if (WT_STRING_MATCH(
|
||||
@@ -235,23 +230,32 @@ __btree_conf(WT_SESSION_IMPL *session)
|
||||
F_CLR(btree, WT_BTREE_NO_EVICTION);
|
||||
}
|
||||
|
||||
/* No-cache files are never evicted or cached. */
|
||||
if (cfg != NULL) {
|
||||
ret = __wt_config_gets(session, cfg, "no_cache", &cval);
|
||||
if (ret != 0 && ret != WT_NOTFOUND)
|
||||
WT_RET(ret);
|
||||
if (ret == 0 && cval.val != 0)
|
||||
F_SET(session->btree, WT_BTREE_NO_CACHE |
|
||||
WT_BTREE_NO_EVICTION | WT_BTREE_NO_HAZARD);
|
||||
}
|
||||
|
||||
/* Huffman encoding */
|
||||
WT_RET(__wt_btree_huffman_open(session, config));
|
||||
|
||||
/* Reconciliation configuration. */
|
||||
WT_RET(__wt_config_getones(
|
||||
session, btree->config, "dictionary", &cval));
|
||||
WT_RET(__wt_config_getones(session, config, "dictionary", &cval));
|
||||
btree->dictionary = (u_int)cval.val;
|
||||
|
||||
WT_RET(__wt_config_getones(
|
||||
session, btree->config, "internal_key_truncate", &cval));
|
||||
session, config, "internal_key_truncate", &cval));
|
||||
btree->internal_key_truncate = cval.val == 0 ? 0 : 1;
|
||||
|
||||
WT_RET(__wt_config_getones(
|
||||
session, btree->config, "prefix_compression", &cval));
|
||||
WT_RET(
|
||||
__wt_config_getones(session, config, "prefix_compression", &cval));
|
||||
btree->prefix_compression = cval.val == 0 ? 0 : 1;
|
||||
|
||||
WT_RET(__wt_config_getones(session, btree->config, "split_pct", &cval));
|
||||
WT_RET(__wt_config_getones(session, config, "split_pct", &cval));
|
||||
btree->split_pct = (u_int)cval.val;
|
||||
|
||||
WT_RET(__wt_stat_alloc_btree_stats(session, &btree->stats));
|
||||
@@ -476,7 +480,7 @@ __btree_get_last_recno(WT_SESSION_IMPL *session)
|
||||
return (WT_NOTFOUND);
|
||||
|
||||
btree->last_recno = __col_last_recno(page);
|
||||
__wt_page_release(session, page);
|
||||
__wt_stack_release(session, page);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
@@ -42,9 +42,20 @@ __wt_page_in_func(
|
||||
case WT_REF_DISK:
|
||||
case WT_REF_DELETED:
|
||||
/* The page isn't in memory, attempt to read it. */
|
||||
|
||||
/* Check if there is space in the cache. */
|
||||
__wt_eviction_check(session, &read_lockout, wake);
|
||||
wake = 0;
|
||||
if (read_lockout)
|
||||
|
||||
/*
|
||||
* If the cache is full, give up, but only if we are
|
||||
* not holding the schema lock. The schema lock can
|
||||
* block checkpoints, and thus eviction, so it is not
|
||||
* safe to wait for eviction if we are holding it.
|
||||
*/
|
||||
if (read_lockout &&
|
||||
!F_ISSET(session, WT_SESSION_SCHEMA_LOCKED) &&
|
||||
!F_ISSET(session->btree, WT_BTREE_NO_CACHE))
|
||||
break;
|
||||
|
||||
WT_RET(__wt_cache_read(session, parent, ref));
|
||||
@@ -103,13 +114,13 @@ __wt_page_in_func(
|
||||
WT_ILLEGAL_VALUE(session);
|
||||
}
|
||||
|
||||
/*
|
||||
* Find a page to evict -- if that fails, we don't care why,
|
||||
* but we may need to wake the eviction server again if the
|
||||
* cache is still full.
|
||||
*/
|
||||
if (__wt_evict_lru_page(session, 1) != 0)
|
||||
/* Find a page to evict -- if the page is busy, keep trying. */
|
||||
if ((ret = __wt_evict_lru_page(session, 1)) == EBUSY)
|
||||
__wt_yield();
|
||||
else if (ret == WT_NOTFOUND)
|
||||
wake = 1;
|
||||
else
|
||||
WT_RET(ret);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -148,7 +148,6 @@ int
|
||||
__wt_tree_walk(WT_SESSION_IMPL *session, WT_PAGE **pagep, uint32_t flags)
|
||||
{
|
||||
WT_BTREE *btree;
|
||||
WT_DECL_RET;
|
||||
WT_PAGE *page, *t;
|
||||
WT_REF *ref;
|
||||
uint32_t slot;
|
||||
@@ -184,30 +183,14 @@ __wt_tree_walk(WT_SESSION_IMPL *session, WT_PAGE **pagep, uint32_t flags)
|
||||
t = page->parent;
|
||||
slot = (uint32_t)(page->ref - t->u.intl.t);
|
||||
|
||||
/*
|
||||
* Swap our hazard reference for the hazard reference of our parent,
|
||||
* if it's not the root page (we could access it directly because we
|
||||
* know it's in memory, but we need a hazard reference). Don't leave
|
||||
* a hazard reference dangling on error.
|
||||
*
|
||||
* We're hazard-reference coupling up the tree and that's OK: first,
|
||||
* hazard references can't deadlock, so there's none of the usual
|
||||
* problems found when logically locking up a Btree; second, we don't
|
||||
* release our current hazard reference until we have our parent's
|
||||
* hazard reference. If the eviction thread tries to evict the active
|
||||
* page, that fails because of our hazard reference. If eviction tries
|
||||
* to evict our parent, that fails because the parent has a child page
|
||||
* that can't be discarded.
|
||||
*/
|
||||
/* If not the eviction thread, release the page's hazard reference. */
|
||||
if (eviction) {
|
||||
if (page->ref->state == WT_REF_EVICT_WALK)
|
||||
page->ref->state = WT_REF_MEM;
|
||||
} else {
|
||||
if (!WT_PAGE_IS_ROOT(t))
|
||||
ret = __wt_page_in(session, t, t->ref);
|
||||
} else
|
||||
__wt_page_release(session, page);
|
||||
WT_RET(ret);
|
||||
}
|
||||
|
||||
/* Switch to the parent. */
|
||||
page = t;
|
||||
|
||||
/*
|
||||
@@ -283,14 +266,7 @@ descend: for (;;) {
|
||||
break;
|
||||
}
|
||||
|
||||
/*
|
||||
* Swap hazard references at each level (but
|
||||
* don't leave a hazard reference dangling on
|
||||
* error).
|
||||
*/
|
||||
ret = __wt_page_in(session, page, ref);
|
||||
__wt_page_release(session, page);
|
||||
WT_RET(ret);
|
||||
WT_RET(__wt_page_in(session, page, ref));
|
||||
}
|
||||
|
||||
page = ref->page;
|
||||
|
||||
@@ -68,9 +68,8 @@ __wt_col_search(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt, int is_modify)
|
||||
ref = page->u.intl.t + (base - 1);
|
||||
}
|
||||
|
||||
/* Swap the parent page for the child page. */
|
||||
/* Move to the child page. */
|
||||
WT_ERR(__wt_page_in(session, page, ref));
|
||||
__wt_page_release(session, page);
|
||||
page = ref->page;
|
||||
}
|
||||
|
||||
@@ -159,6 +158,6 @@ past_end:
|
||||
F_SET(cbt, WT_CBT_MAX_RECORD);
|
||||
return (0);
|
||||
|
||||
err: __wt_page_release(session, page);
|
||||
err: __wt_stack_release(session, page);
|
||||
return (ret);
|
||||
}
|
||||
|
||||
@@ -238,8 +238,11 @@ __rec_review(WT_SESSION_IMPL *session,
|
||||
{
|
||||
WT_DECL_RET;
|
||||
WT_PAGE_MODIFY *mod;
|
||||
WT_TXN *txn;
|
||||
uint32_t i;
|
||||
|
||||
txn = &session->txn;
|
||||
|
||||
/*
|
||||
* Get exclusive access to the page if our caller doesn't have the tree
|
||||
* locked down.
|
||||
@@ -327,9 +330,17 @@ __rec_review(WT_SESSION_IMPL *session,
|
||||
WT_VERBOSE_RET(session, evict,
|
||||
"page %p written but not clean", page);
|
||||
|
||||
if (F_ISSET(txn, TXN_RUNNING) &&
|
||||
++txn->eviction_fails >= 100) {
|
||||
txn->eviction_fails = 0;
|
||||
ret = WT_DEADLOCK;
|
||||
WT_STAT_INCR(
|
||||
S2C(session)->stats, txn_fail_cache);
|
||||
}
|
||||
|
||||
/*
|
||||
* If there is only a single cursor open, there are no
|
||||
* consistency issues: try to bump our snapshot.
|
||||
* If there aren't multiple cursors active, there
|
||||
* are no consistency issues: try to bump our snapshot.
|
||||
*/
|
||||
if (session->ncursors <= 1) {
|
||||
__wt_txn_read_last(session);
|
||||
@@ -347,6 +358,8 @@ __rec_review(WT_SESSION_IMPL *session,
|
||||
}
|
||||
}
|
||||
WT_RET(ret);
|
||||
|
||||
txn->eviction_fails = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@@ -1965,7 +1965,7 @@ __rec_col_var(WT_SESSION_IMPL *session,
|
||||
WT_INSERT *ins;
|
||||
WT_INSERT_HEAD *append;
|
||||
WT_ITEM *last;
|
||||
WT_UPDATE *next_upd, *upd;
|
||||
WT_UPDATE *upd;
|
||||
uint64_t n, nrepeat, repeat_count, rle, slvg_missing, src_recno;
|
||||
uint32_t i, size;
|
||||
int deleted, last_deleted, orig_deleted, update_no_copy;
|
||||
@@ -2016,21 +2016,13 @@ __rec_col_var(WT_SESSION_IMPL *session,
|
||||
WT_COL_FOREACH(page, cip, i) {
|
||||
ovfl_state = OVFL_IGNORE;
|
||||
if ((cell = WT_COL_PTR(page, cip)) == NULL) {
|
||||
ins = NULL;
|
||||
nrepeat = 1;
|
||||
ins = NULL;
|
||||
orig_deleted = 1;
|
||||
} else {
|
||||
__wt_cell_unpack(cell, unpack);
|
||||
nrepeat = __wt_cell_rle(unpack);
|
||||
|
||||
ins = WT_SKIP_FIRST(WT_COL_UPDATE(page, cip));
|
||||
while (ins != NULL) {
|
||||
WT_ERR(
|
||||
__rec_txn_read(session, r, ins->upd, &upd));
|
||||
if (upd != NULL)
|
||||
break;
|
||||
ins = WT_SKIP_NEXT(ins);
|
||||
}
|
||||
|
||||
/*
|
||||
* If the original value is "deleted", there's no value
|
||||
@@ -2090,19 +2082,13 @@ record_loop: /*
|
||||
*/
|
||||
for (n = 0;
|
||||
n < nrepeat; n += repeat_count, src_recno += repeat_count) {
|
||||
if (ins != NULL &&
|
||||
WT_INSERT_RECNO(ins) == src_recno) {
|
||||
upd = NULL;
|
||||
if (ins != NULL && WT_INSERT_RECNO(ins) == src_recno) {
|
||||
WT_ERR(
|
||||
__rec_txn_read(session, r, ins->upd, &upd));
|
||||
WT_ASSERT(session, upd != NULL);
|
||||
do {
|
||||
ins = WT_SKIP_NEXT(ins);
|
||||
if (ins == NULL)
|
||||
break;
|
||||
WT_ERR(__rec_txn_read(
|
||||
session, r, ins->upd, &next_upd));
|
||||
} while (next_upd == NULL);
|
||||
|
||||
ins = WT_SKIP_NEXT(ins);
|
||||
}
|
||||
if (upd != NULL) {
|
||||
update_no_copy = 1; /* No data copy */
|
||||
|
||||
repeat_count = 1;
|
||||
|
||||
@@ -269,7 +269,9 @@ err: __wt_session_serialize_wrapup(session, page, ret);
|
||||
int
|
||||
__wt_update_check(WT_SESSION_IMPL *session, WT_PAGE *page, WT_UPDATE *next)
|
||||
{
|
||||
WT_DECL_RET;
|
||||
WT_TXN *txn;
|
||||
int lockout, wake = 1;
|
||||
|
||||
/* Discard obsolete WT_UPDATE structures. */
|
||||
if (next != NULL)
|
||||
@@ -278,6 +280,22 @@ __wt_update_check(WT_SESSION_IMPL *session, WT_PAGE *page, WT_UPDATE *next)
|
||||
/* Before allocating anything, make sure this update is permitted. */
|
||||
WT_RET(__wt_txn_update_check(session, next));
|
||||
|
||||
/*
|
||||
* Pause if the cache is full.
|
||||
* This matches the logic in __wt_page_in_func.
|
||||
*/
|
||||
for (;;) {
|
||||
__wt_eviction_check(session, &lockout, wake);
|
||||
wake = 0;
|
||||
if (!lockout ||
|
||||
F_ISSET(session, WT_SESSION_SCHEMA_LOCKED))
|
||||
break;
|
||||
if ((ret = __wt_evict_lru_page(session, 1)) == EBUSY)
|
||||
__wt_yield();
|
||||
else
|
||||
WT_RET_NOTFOUND_OK(ret);
|
||||
}
|
||||
|
||||
/*
|
||||
* Record the transaction ID for the first update to a page.
|
||||
* We don't care if this races: there is a buffer built into the
|
||||
|
||||
@@ -153,9 +153,8 @@ __wt_row_search(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt, int is_modify)
|
||||
if (cmp != 0)
|
||||
ref = page->u.intl.t + (base - 1);
|
||||
|
||||
/* Swap the parent page for the child page. */
|
||||
/* Move to the child page. */
|
||||
WT_ERR(__wt_page_in(session, page, ref));
|
||||
__wt_page_release(session, page);
|
||||
page = ref->page;
|
||||
}
|
||||
|
||||
@@ -243,7 +242,7 @@ __wt_row_search(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt, int is_modify)
|
||||
WT_ERR(__wt_search_insert(session, cbt, cbt->ins_head, srch_key));
|
||||
return (0);
|
||||
|
||||
err: __wt_page_release(session, page);
|
||||
err: __wt_stack_release(session, page);
|
||||
return (ret);
|
||||
}
|
||||
|
||||
@@ -270,7 +269,6 @@ __wt_row_random(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt)
|
||||
|
||||
/* Swap the parent page for the child page. */
|
||||
WT_ERR(__wt_page_in(session, page, ref));
|
||||
__wt_page_release(session, page);
|
||||
page = ref->page;
|
||||
}
|
||||
|
||||
@@ -311,6 +309,6 @@ __wt_row_random(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt)
|
||||
|
||||
return (0);
|
||||
|
||||
err: __wt_page_release(session, page);
|
||||
err: __wt_stack_release(session, page);
|
||||
return (ret);
|
||||
}
|
||||
|
||||
@@ -113,8 +113,8 @@ __wt_confdfl_file_meta =
|
||||
"huffman_value=,internal_item_max=0,internal_key_truncate=,"
|
||||
"internal_page_max=2KB,key_format=u,key_gap=10,leaf_item_max=0,"
|
||||
"leaf_page_max=1MB,lsm_bloom_bit_count=8,lsm_bloom_hash_count=4,"
|
||||
"lsm_chunk_size=2MB,prefix_compression=,split_pct=75,type=btree,"
|
||||
"value_format=u,version=(major=0,minor=0)";
|
||||
"lsm_chunk_size=2MB,lsm_merge_max=15,prefix_compression=,split_pct=75"
|
||||
",type=btree,value_format=u,version=(major=0,minor=0)";
|
||||
|
||||
WT_CONFIG_CHECK
|
||||
__wt_confchk_file_meta[] = {
|
||||
@@ -138,6 +138,7 @@ __wt_confchk_file_meta[] = {
|
||||
{ "lsm_bloom_bit_count", "int", "min=2,max=1000" },
|
||||
{ "lsm_bloom_hash_count", "int", "min=2,max=100" },
|
||||
{ "lsm_chunk_size", "int", "min=512K,max=500MB" },
|
||||
{ "lsm_merge_max", "int", "min=2,max=100" },
|
||||
{ "prefix_compression", "boolean", NULL },
|
||||
{ "split_pct", "int", "min=25,max=100" },
|
||||
{ "type", "string", "choices=[\"btree\"]" },
|
||||
@@ -213,8 +214,8 @@ __wt_confdfl_session_create =
|
||||
"internal_key_truncate=,internal_page_max=2KB,key_format=u,"
|
||||
"key_format=u,key_gap=10,leaf_item_max=0,leaf_page_max=1MB,"
|
||||
"lsm_bloom_bit_count=8,lsm_bloom_hash_count=4,lsm_chunk_size=2MB,"
|
||||
"prefix_compression=,split_pct=75,type=btree,value_format=u,"
|
||||
"value_format=u";
|
||||
"lsm_merge_max=15,prefix_compression=,split_pct=75,type=btree,"
|
||||
"value_format=u,value_format=u";
|
||||
|
||||
WT_CONFIG_CHECK
|
||||
__wt_confchk_session_create[] = {
|
||||
@@ -242,6 +243,7 @@ __wt_confchk_session_create[] = {
|
||||
{ "lsm_bloom_bit_count", "int", "min=2,max=1000" },
|
||||
{ "lsm_bloom_hash_count", "int", "min=2,max=100" },
|
||||
{ "lsm_chunk_size", "int", "min=512K,max=500MB" },
|
||||
{ "lsm_merge_max", "int", "min=2,max=100" },
|
||||
{ "prefix_compression", "boolean", NULL },
|
||||
{ "split_pct", "int", "min=25,max=100" },
|
||||
{ "type", "string", "choices=[\"btree\"]" },
|
||||
@@ -280,8 +282,8 @@ __wt_confchk_session_log_printf[] = {
|
||||
|
||||
const char *
|
||||
__wt_confdfl_session_open_cursor =
|
||||
"append=0,bulk=0,checkpoint=,dump=,next_random=0,overwrite=0,raw=0,"
|
||||
"statistics=0,statistics_clear=0,target=";
|
||||
"append=0,bulk=0,checkpoint=,dump=,next_random=0,no_cache=0,"
|
||||
"overwrite=0,raw=0,statistics=0,statistics_clear=0,target=";
|
||||
|
||||
WT_CONFIG_CHECK
|
||||
__wt_confchk_session_open_cursor[] = {
|
||||
@@ -290,6 +292,7 @@ __wt_confchk_session_open_cursor[] = {
|
||||
{ "checkpoint", "string", NULL },
|
||||
{ "dump", "string", "choices=[\"hex\",\"print\"]" },
|
||||
{ "next_random", "boolean", NULL },
|
||||
{ "no_cache", "boolean", NULL },
|
||||
{ "overwrite", "boolean", NULL },
|
||||
{ "raw", "boolean", NULL },
|
||||
{ "statistics", "boolean", NULL },
|
||||
@@ -381,7 +384,7 @@ const char *
|
||||
__wt_confdfl_wiredtiger_open =
|
||||
"buffer_alignment=-1,cache_size=100MB,create=0,direct_io=,"
|
||||
"error_prefix=,eviction_target=80,eviction_trigger=95,extensions=,"
|
||||
"hazard_max=30,logging=0,multiprocess=0,session_max=50,sync=,"
|
||||
"hazard_max=1000,logging=0,multiprocess=0,session_max=50,sync=,"
|
||||
"transactional=,use_environment_priv=0,verbose=";
|
||||
|
||||
WT_CONFIG_CHECK
|
||||
|
||||
@@ -845,7 +845,7 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
|
||||
WT_ERR(__conn_config_env(session, cfg));
|
||||
|
||||
WT_ERR(__wt_config_gets(session, cfg, "hazard_max", &cval));
|
||||
conn->hazard_size = (uint32_t)cval.val;
|
||||
conn->hazard_max = (uint32_t)cval.val;
|
||||
WT_ERR(__wt_config_gets(session, cfg, "session_max", &cval));
|
||||
conn->session_size = (uint32_t)cval.val + WT_NUM_INTERNAL_SESSIONS;
|
||||
WT_ERR(__wt_config_gets(session, cfg, "sync", &cval));
|
||||
|
||||
@@ -93,15 +93,18 @@ __conn_btree_get(WT_SESSION_IMPL *session,
|
||||
WT_ASSERT(session, F_ISSET(session, WT_SESSION_SCHEMA_LOCKED));
|
||||
|
||||
/* Increment the reference count if we already have the btree open. */
|
||||
TAILQ_FOREACH(btree, &conn->btqh, q)
|
||||
if (strcmp(name, btree->name) == 0 &&
|
||||
((ckpt == NULL && btree->checkpoint == NULL) ||
|
||||
(ckpt != NULL && btree->checkpoint != NULL &&
|
||||
strcmp(ckpt, btree->checkpoint) == 0))) {
|
||||
++btree->refcnt;
|
||||
session->btree = btree;
|
||||
return (__conn_btree_open_lock(session, flags));
|
||||
if (!LF_ISSET(WT_BTREE_NO_CACHE)) {
|
||||
TAILQ_FOREACH(btree, &conn->btqh, q) {
|
||||
if (strcmp(name, btree->name) == 0 &&
|
||||
((ckpt == NULL && btree->checkpoint == NULL) ||
|
||||
(ckpt != NULL && btree->checkpoint != NULL &&
|
||||
strcmp(ckpt, btree->checkpoint) == 0))) {
|
||||
++btree->refcnt;
|
||||
session->btree = btree;
|
||||
return (__conn_btree_open_lock(session, flags));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Allocate the WT_BTREE structure, its lock, and set the name so we
|
||||
@@ -118,10 +121,11 @@ __conn_btree_get(WT_SESSION_IMPL *session,
|
||||
__wt_writelock(session, btree->rwlock);
|
||||
F_SET(btree, WT_BTREE_EXCLUSIVE);
|
||||
|
||||
/* Add to the connection list. */
|
||||
btree->refcnt = 1;
|
||||
TAILQ_INSERT_TAIL(&conn->btqh, btree, q);
|
||||
++conn->btqcnt;
|
||||
if (!LF_ISSET(WT_BTREE_NO_CACHE)) {
|
||||
/* Add to the connection list. */
|
||||
btree->refcnt = 1;
|
||||
TAILQ_INSERT_TAIL(&conn->btqh, btree, q);
|
||||
}
|
||||
}
|
||||
|
||||
if (ret == 0)
|
||||
@@ -153,6 +157,9 @@ __wt_conn_btree_sync_and_close(WT_SESSION_IMPL *session)
|
||||
if (!F_ISSET(btree, WT_BTREE_OPEN))
|
||||
return (0);
|
||||
|
||||
if (btree->checkpoint == NULL)
|
||||
--S2C(session)->open_btree_count;
|
||||
|
||||
/*
|
||||
* Checkpoint to flush out the file's changes. This usually happens on
|
||||
* session handle close (which means we're holding the handle lock, so
|
||||
@@ -230,6 +237,12 @@ __conn_btree_open(WT_SESSION_IMPL *session,
|
||||
WT_ERR(__wt_btree_open(session, addr->data, addr->size, cfg,
|
||||
btree->checkpoint == NULL ? 0 : 1));
|
||||
F_SET(btree, WT_BTREE_OPEN);
|
||||
/*
|
||||
* Checkpoint handles are read only, so eviction calculations
|
||||
* based on the number of btrees are better to ignore them.
|
||||
*/
|
||||
if (btree->checkpoint == NULL)
|
||||
++S2C(session)->open_btree_count;
|
||||
|
||||
/* Drop back to a readlock if that is all that was needed. */
|
||||
if (!LF_ISSET(WT_BTREE_EXCLUSIVE)) {
|
||||
@@ -483,11 +496,11 @@ err: WT_CLEAR_BTREE_IN_SESSION(session);
|
||||
}
|
||||
|
||||
/*
|
||||
* __conn_btree_discard --
|
||||
* __wt_conn_btree_discard_single --
|
||||
* Discard a single btree file handle structure.
|
||||
*/
|
||||
static int
|
||||
__conn_btree_discard(WT_SESSION_IMPL *session, WT_BTREE *btree)
|
||||
int
|
||||
__wt_conn_btree_discard_single(WT_SESSION_IMPL *session, WT_BTREE *btree)
|
||||
{
|
||||
WT_DECL_RET;
|
||||
|
||||
@@ -535,8 +548,7 @@ restart:
|
||||
continue;
|
||||
|
||||
TAILQ_REMOVE(&conn->btqh, btree, q);
|
||||
--conn->btqcnt;
|
||||
WT_TRET(__conn_btree_discard(session, btree));
|
||||
WT_TRET(__wt_conn_btree_discard_single(session, btree));
|
||||
goto restart;
|
||||
}
|
||||
|
||||
@@ -552,8 +564,7 @@ restart:
|
||||
/* Close the metadata file handle. */
|
||||
while ((btree = TAILQ_FIRST(&conn->btqh)) != NULL) {
|
||||
TAILQ_REMOVE(&conn->btqh, btree, q);
|
||||
--conn->btqcnt;
|
||||
WT_TRET(__conn_btree_discard(session, btree));
|
||||
WT_TRET(__wt_conn_btree_discard_single(session, btree));
|
||||
}
|
||||
|
||||
return (ret);
|
||||
|
||||
@@ -73,8 +73,9 @@ __wt_connection_destroy(WT_CONNECTION_IMPL *conn)
|
||||
|
||||
__wt_spin_destroy(session, &conn->api_lock);
|
||||
__wt_spin_destroy(session, &conn->fh_lock);
|
||||
__wt_spin_destroy(session, &conn->serial_lock);
|
||||
__wt_spin_destroy(session, &conn->metadata_lock);
|
||||
__wt_spin_destroy(session, &conn->schema_lock);
|
||||
__wt_spin_destroy(session, &conn->serial_lock);
|
||||
|
||||
/* Free allocated memory. */
|
||||
__wt_free(session, conn->home);
|
||||
|
||||
@@ -336,6 +336,17 @@ __wt_curfile_create(WT_SESSION_IMPL *session,
|
||||
if (bulk)
|
||||
WT_ERR(__wt_curbulk_init((WT_CURSOR_BULK *)cbt));
|
||||
|
||||
/*
|
||||
* no_cache
|
||||
* No cache cursors are read-only.
|
||||
*/
|
||||
WT_ERR(__wt_config_gets_defno(session, cfg, "no_cache", &cval));
|
||||
if (cval.val != 0) {
|
||||
cursor->insert = __wt_cursor_notsup;
|
||||
cursor->update = __wt_cursor_notsup;
|
||||
cursor->remove = __wt_cursor_notsup;
|
||||
}
|
||||
|
||||
/*
|
||||
* random_retrieval
|
||||
* Random retrieval cursors only support next, reset and close.
|
||||
@@ -368,20 +379,30 @@ __wt_curfile_open(WT_SESSION_IMPL *session, const char *uri,
|
||||
{
|
||||
WT_CONFIG_ITEM cval;
|
||||
WT_DECL_RET;
|
||||
int bulk;
|
||||
uint32_t flags;
|
||||
|
||||
/*
|
||||
* Bulk and no cache handles are exclusive and may not be used by more
|
||||
* than a single thread.
|
||||
* Additionally set the discard flag on no cache handles so they are
|
||||
* destroyed on close.
|
||||
*/
|
||||
flags = 0;
|
||||
WT_RET(__wt_config_gets_defno(session, cfg, "bulk", &cval));
|
||||
bulk = (cval.val != 0);
|
||||
if (cval.val != 0)
|
||||
LF_SET(WT_BTREE_EXCLUSIVE | WT_BTREE_BULK);
|
||||
WT_RET(__wt_config_gets_defno(session, cfg, "no_cache", &cval));
|
||||
if (cval.val != 0)
|
||||
LF_SET(WT_BTREE_EXCLUSIVE | WT_BTREE_NO_CACHE);
|
||||
|
||||
/* TODO: handle projections. */
|
||||
|
||||
/* Get the handle and lock it while the cursor is using it. */
|
||||
if (WT_PREFIX_MATCH(uri, "colgroup:") || WT_PREFIX_MATCH(uri, "index:"))
|
||||
WT_RET(__wt_schema_get_btree(session, uri, strlen(uri), cfg,
|
||||
bulk ? WT_BTREE_BULK | WT_BTREE_EXCLUSIVE : 0));
|
||||
WT_RET(__wt_schema_get_btree(
|
||||
session, uri, strlen(uri), cfg, flags));
|
||||
else if (WT_PREFIX_MATCH(uri, "file:"))
|
||||
WT_RET(__wt_session_get_btree_ckpt(session, uri, cfg,
|
||||
bulk ? WT_BTREE_BULK | WT_BTREE_EXCLUSIVE : 0));
|
||||
WT_RET(__wt_session_get_btree_ckpt(session, uri, cfg, flags));
|
||||
else
|
||||
WT_RET(__wt_bad_object_type(session, uri));
|
||||
|
||||
|
||||
@@ -170,8 +170,7 @@ __wt_cursor_get_keyv(WT_CURSOR *cursor, uint32_t flags, va_list ap)
|
||||
*va_arg(ap, uint64_t *) = cursor->recno;
|
||||
} else {
|
||||
fmt = cursor->key_format;
|
||||
if (LF_ISSET(
|
||||
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW))
|
||||
if (LF_ISSET(WT_CURSOR_RAW_OK))
|
||||
fmt = "u";
|
||||
ret = __wt_struct_unpackv(
|
||||
session, cursor->key.data, cursor->key.size, fmt, ap);
|
||||
@@ -212,17 +211,14 @@ __wt_cursor_set_keyv(WT_CURSOR *cursor, uint32_t flags, va_list ap)
|
||||
sz = sizeof(cursor->recno);
|
||||
} else {
|
||||
fmt = cursor->key_format;
|
||||
if (LF_ISSET(
|
||||
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW))
|
||||
fmt = "u";
|
||||
if (strcmp(fmt, "S") == 0) {
|
||||
str = va_arg(ap, const char *);
|
||||
sz = strlen(str) + 1;
|
||||
cursor->key.data = (void *)str;
|
||||
} else if (strcmp(fmt, "u") == 0) {
|
||||
if (LF_ISSET(WT_CURSOR_RAW_OK) || strcmp(fmt, "u") == 0) {
|
||||
item = va_arg(ap, WT_ITEM *);
|
||||
sz = item->size;
|
||||
cursor->key.data = (void *)item->data;
|
||||
} else if (strcmp(fmt, "S") == 0) {
|
||||
str = va_arg(ap, const char *);
|
||||
sz = strlen(str) + 1;
|
||||
cursor->key.data = (void *)str;
|
||||
} else {
|
||||
buf = &cursor->key;
|
||||
|
||||
@@ -269,9 +265,7 @@ __wt_cursor_get_value(WT_CURSOR *cursor, ...)
|
||||
WT_CURSOR_NEEDVALUE(cursor);
|
||||
|
||||
va_start(ap, cursor);
|
||||
fmt = F_ISSET(cursor,
|
||||
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW) ?
|
||||
"u" : cursor->value_format;
|
||||
fmt = F_ISSET(cursor, WT_CURSOR_RAW_OK) ? "u" : cursor->value_format;
|
||||
ret = __wt_struct_unpackv(session,
|
||||
cursor->value.data, cursor->value.size, fmt, ap);
|
||||
va_end(ap);
|
||||
@@ -297,38 +291,42 @@ __wt_cursor_set_value(WT_CURSOR *cursor, ...)
|
||||
CURSOR_API_CALL(cursor, session, set_value, NULL);
|
||||
|
||||
va_start(ap, cursor);
|
||||
fmt = F_ISSET(cursor,
|
||||
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW) ?
|
||||
"u" : cursor->value_format;
|
||||
/* Fast path some common cases: single strings or byte arrays. */
|
||||
fmt = F_ISSET(cursor, WT_CURSOR_RAW_OK) ? "u" : cursor->value_format;
|
||||
|
||||
/* Fast path some common cases: single strings, byte arrays and bits. */
|
||||
if (strcmp(fmt, "S") == 0) {
|
||||
str = va_arg(ap, const char *);
|
||||
sz = strlen(str) + 1;
|
||||
cursor->value.data = str;
|
||||
} else if (strcmp(fmt, "u") == 0) {
|
||||
} else if (F_ISSET(cursor, WT_CURSOR_RAW_OK) || strcmp(fmt, "u") == 0) {
|
||||
item = va_arg(ap, WT_ITEM *);
|
||||
sz = item->size;
|
||||
cursor->value.data = item->data;
|
||||
} else {
|
||||
} else if (strcmp(fmt, "t") == 0 ||
|
||||
(isdigit(fmt[0]) && strcmp(fmt + 1, "t"))) {
|
||||
sz = 1;
|
||||
buf = &cursor->value;
|
||||
ret = __wt_struct_sizev(session, &sz, cursor->value_format, ap);
|
||||
WT_ERR(__wt_buf_initsize(session, buf, sz));
|
||||
*(uint8_t *)buf->mem = (uint8_t)va_arg(ap, int);
|
||||
} else {
|
||||
WT_ERR(
|
||||
__wt_struct_sizev(session, &sz, cursor->value_format, ap));
|
||||
va_end(ap);
|
||||
WT_ERR(ret);
|
||||
va_start(ap, cursor);
|
||||
if ((ret = __wt_buf_initsize(session, buf, sz)) != 0 ||
|
||||
(ret = __wt_struct_packv(session, buf->mem, sz,
|
||||
cursor->value_format, ap)) != 0) {
|
||||
cursor->saved_err = ret;
|
||||
F_CLR(cursor, WT_CURSTD_VALUE_SET);
|
||||
goto err;
|
||||
}
|
||||
cursor->value.data = buf->mem;
|
||||
buf = &cursor->value;
|
||||
WT_ERR(__wt_buf_initsize(session, buf, sz));
|
||||
WT_ERR(__wt_struct_packv(session, buf->mem, sz,
|
||||
cursor->value_format, ap));
|
||||
}
|
||||
F_SET(cursor, WT_CURSTD_VALUE_SET);
|
||||
cursor->value.size = WT_STORE_SIZE(sz);
|
||||
va_end(ap);
|
||||
|
||||
err: API_END(session);
|
||||
if (0) {
|
||||
err: cursor->saved_err = ret;
|
||||
F_CLR(cursor, WT_CURSTD_VALUE_SET);
|
||||
}
|
||||
va_end(ap);
|
||||
API_END(session);
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@@ -77,8 +77,7 @@ __wt_curtable_get_value(WT_CURSOR *cursor, ...)
|
||||
WT_CURSOR_NEEDVALUE(primary);
|
||||
|
||||
va_start(ap, cursor);
|
||||
if (F_ISSET(cursor,
|
||||
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW)) {
|
||||
if (F_ISSET(cursor, WT_CURSOR_RAW_OK)) {
|
||||
ret = __wt_schema_project_merge(session,
|
||||
ctable->cg_cursors, ctable->plan,
|
||||
cursor->value_format, &cursor->value);
|
||||
@@ -147,8 +146,7 @@ __wt_curtable_set_value(WT_CURSOR *cursor, ...)
|
||||
CURSOR_API_CALL(cursor, session, set_value, NULL);
|
||||
|
||||
va_start(ap, cursor);
|
||||
if (F_ISSET(cursor,
|
||||
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW)) {
|
||||
if (F_ISSET(cursor, WT_CURSOR_RAW_OK)) {
|
||||
item = va_arg(ap, WT_ITEM *);
|
||||
cursor->value.data = item->data;
|
||||
cursor->value.size = item->size;
|
||||
|
||||
@@ -101,8 +101,14 @@ To remove existing data using a cursor, use the WT_CURSOR::remove method:
|
||||
@section cursor_error Cursor position after error
|
||||
|
||||
After any cursor handle method failure, the cursor's position is
|
||||
undetermined. Applications that cannot re-position the cursor after
|
||||
failure must duplicate the cursor before calling a cursor method that will
|
||||
undetermined. For cursor operations that expect a key to be set before the
|
||||
operation begins (including WT_CURSOR::search, WT_CURSOR::insert,
|
||||
WT_CURSOR::update and WT_CURSOR::remove), the application's key and value
|
||||
will not be cleared by an error.
|
||||
|
||||
Applications that cannot re-position the cursor after failure must
|
||||
duplicate the cursor by calling WT_SESSION::open_cursor and passing the
|
||||
cursor as the \c to_dup parameter before calling a cursor method that will
|
||||
attempt to re-position the cursor.
|
||||
|
||||
*/
|
||||
|
||||
@@ -13,7 +13,7 @@ To ask questions or discuss issues related to using WiredTiger, visit our
|
||||
|
||||
View the documentation online:
|
||||
|
||||
- <a href="1.3.0/index.html"><b>WiredTiger 1.3.0 (current release)</b></a>
|
||||
- <a href="1.3.2/index.html"><b>WiredTiger 1.3.2 (current release)</b></a>
|
||||
- <a href="1.2.2/index.html"><b>WiredTiger 1.2.2</b></a>
|
||||
- <a href="1.1.5/index.html"><b>WiredTiger 1.1.5</b></a>
|
||||
|
||||
|
||||
@@ -72,6 +72,11 @@ updating the same value will fail with ::WT_DEADLOCK. Some applications
|
||||
may benefit from application-level synchronization to avoid repeated
|
||||
attempts to rollback and update the same value.
|
||||
|
||||
Operations in transactions may also fail with the ::WT_DEADLOCK error if
|
||||
some resource cannot be allocated after repeated attempts. For example, if
|
||||
the cache is not large enough to hold the updates required to satisfy
|
||||
transactional readers, an operation may fail and return ::WT_DEADLOCK.
|
||||
|
||||
@section transaction_isolation Isolation levels
|
||||
|
||||
WiredTiger supports <code>read-uncommitted</code>,
|
||||
|
||||
@@ -130,6 +130,7 @@ struct __wt_session_impl {
|
||||
* easily call a function to clear memory up to, but not including, the
|
||||
* hazard reference.
|
||||
*/
|
||||
uint32_t hazard_size; /* Count of used hazard references */
|
||||
u_int nhazard;
|
||||
#define WT_SESSION_CLEAR(s) memset(s, 0, WT_PTRDIFF(&(s)->hazard, s))
|
||||
WT_HAZARD *hazard; /* Hazard reference array */
|
||||
@@ -213,7 +214,7 @@ struct __wt_connection_impl {
|
||||
/* Locked: library list */
|
||||
TAILQ_HEAD(__wt_dlh_qh, __wt_dlh) dlhqh;
|
||||
|
||||
u_int btqcnt; /* Locked: btree count */
|
||||
u_int open_btree_count; /* Locked: open writable btree count */
|
||||
u_int next_file_id; /* Locked: file ID counter */
|
||||
|
||||
/*
|
||||
@@ -235,7 +236,7 @@ struct __wt_connection_impl {
|
||||
* WiredTiger allocates space for a fixed number of hazard references
|
||||
* in each thread of control.
|
||||
*/
|
||||
uint32_t hazard_size; /* Hazard array size */
|
||||
uint32_t hazard_max; /* Hazard array size */
|
||||
|
||||
WT_CACHE *cache; /* Page cache */
|
||||
uint64_t cache_size;
|
||||
@@ -329,13 +330,19 @@ struct __wt_connection_impl {
|
||||
__wt_txn_read_first(session); \
|
||||
} \
|
||||
ret = __wt_txn_commit((s), NULL); \
|
||||
} else \
|
||||
} else { \
|
||||
(void)__wt_txn_rollback((s), NULL); \
|
||||
if (ret == 0 || ret == WT_DEADLOCK) { \
|
||||
ret = 0; \
|
||||
continue; \
|
||||
} \
|
||||
} \
|
||||
} else if ((ret) != 0 && \
|
||||
(ret) != WT_NOTFOUND && \
|
||||
(ret) != WT_DUPLICATE_KEY) \
|
||||
F_SET(&(s)->txn, TXN_ERROR); \
|
||||
} while (0)
|
||||
break; \
|
||||
} while (1)
|
||||
|
||||
/*
|
||||
* If a session or connection method is about to return WT_NOTFOUND (some
|
||||
|
||||
@@ -125,18 +125,20 @@ struct __wt_btree {
|
||||
#define WT_BTREE_DISCARD 0x0002 /* Discard on release */
|
||||
#define WT_BTREE_EXCLUSIVE 0x0004 /* Need exclusive access to handle */
|
||||
#define WT_BTREE_LOCK_ONLY 0x0008 /* Handle is only needed for locking */
|
||||
#define WT_BTREE_NO_EVICTION 0x0010 /* Disable eviction */
|
||||
#define WT_BTREE_NO_HAZARD 0x0020 /* Disable hazard references */
|
||||
#define WT_BTREE_OPEN 0x0040 /* Handle is open */
|
||||
#define WT_BTREE_SALVAGE 0x0080 /* Handle is for salvage */
|
||||
#define WT_BTREE_UPGRADE 0x0100 /* Handle is for upgrade */
|
||||
#define WT_BTREE_VERIFY 0x0200 /* Handle is for verify */
|
||||
#define WT_BTREE_NO_CACHE 0x0010 /* Disable caching */
|
||||
#define WT_BTREE_NO_EVICTION 0x0020 /* Disable eviction */
|
||||
#define WT_BTREE_NO_HAZARD 0x0040 /* Disable hazard references */
|
||||
#define WT_BTREE_OPEN 0x0080 /* Handle is open */
|
||||
#define WT_BTREE_SALVAGE 0x0100 /* Handle is for salvage */
|
||||
#define WT_BTREE_UPGRADE 0x0200 /* Handle is for upgrade */
|
||||
#define WT_BTREE_VERIFY 0x0400 /* Handle is for verify */
|
||||
uint32_t flags;
|
||||
};
|
||||
|
||||
/* Flags that make a btree handle special (not for normal use). */
|
||||
#define WT_BTREE_SPECIAL_FLAGS \
|
||||
(WT_BTREE_BULK | WT_BTREE_SALVAGE | WT_BTREE_UPGRADE | WT_BTREE_VERIFY)
|
||||
(WT_BTREE_BULK | WT_BTREE_NO_CACHE | \
|
||||
WT_BTREE_SALVAGE | WT_BTREE_UPGRADE | WT_BTREE_VERIFY)
|
||||
|
||||
/*
|
||||
* WT_SALVAGE_COOKIE --
|
||||
|
||||
@@ -295,9 +295,43 @@ __wt_get_addr(
|
||||
static inline void
|
||||
__wt_page_release(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
{
|
||||
/* We never acquired a hazard reference on the root page. */
|
||||
if (page != NULL && !WT_PAGE_IS_ROOT(page))
|
||||
__wt_hazard_clear(session, page);
|
||||
WT_BTREE *btree;
|
||||
|
||||
btree = session->btree;
|
||||
|
||||
/*
|
||||
* Fast-track pages we don't have and the root page, which sticks
|
||||
* in memory, regardless.
|
||||
*/
|
||||
if (page == NULL || WT_PAGE_IS_ROOT(page))
|
||||
return;
|
||||
|
||||
/* If this is a non cached page, discard it. */
|
||||
if (F_ISSET(btree, WT_BTREE_NO_CACHE)) {
|
||||
page->ref->page = NULL;
|
||||
page->ref->state = WT_REF_DISK;
|
||||
__wt_page_out(session, &page, 0);
|
||||
return;
|
||||
}
|
||||
|
||||
/* Discard our hazard reference. */
|
||||
__wt_hazard_clear(session, page);
|
||||
}
|
||||
|
||||
/*
|
||||
* __wt_stack_release --
|
||||
* Release references to a page stack.
|
||||
*/
|
||||
static inline void
|
||||
__wt_stack_release(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
{
|
||||
WT_PAGE *next;
|
||||
|
||||
while (page != NULL && !WT_PAGE_IS_ROOT(page)) {
|
||||
next = page->parent;
|
||||
__wt_page_release(session, page);
|
||||
page = next;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -310,7 +344,7 @@ __wt_page_hazard_check(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
WT_CONNECTION_IMPL *conn;
|
||||
WT_HAZARD *hp;
|
||||
WT_SESSION_IMPL *s;
|
||||
uint32_t i, session_cnt;
|
||||
uint32_t i, hazard_size, session_cnt;
|
||||
|
||||
conn = S2C(session);
|
||||
|
||||
@@ -326,7 +360,8 @@ __wt_page_hazard_check(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
for (s = conn->sessions, i = 0; i < session_cnt; ++s, ++i) {
|
||||
if (!s->active)
|
||||
continue;
|
||||
for (hp = s->hazard; hp < s->hazard + conn->hazard_size; ++hp)
|
||||
WT_ORDERED_READ(hazard_size, s->hazard_size);
|
||||
for (hp = s->hazard; hp < s->hazard + hazard_size; ++hp)
|
||||
if (hp->page == page)
|
||||
return (hp);
|
||||
}
|
||||
|
||||
@@ -57,9 +57,13 @@ __wt_eviction_page_check(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
F_ISSET(session->btree, WT_BTREE_NO_EVICTION))
|
||||
return (0);
|
||||
|
||||
/* Check the page's memory footprint. */
|
||||
if ((int64_t)page->memory_footprint > conn->cache_size / 2 ||
|
||||
page->memory_footprint > 20 * session->btree->maxleafpage)
|
||||
/*
|
||||
* Check the page's memory footprint - evict pages that take up more
|
||||
* than their fair share of the cache. We define a fair share as
|
||||
* approximately half the cache size per open writable btree handle.
|
||||
*/
|
||||
if ((int64_t)page->memory_footprint >
|
||||
conn->cache_size / (2 * (conn->open_btree_count + 1)))
|
||||
return (1);
|
||||
|
||||
/*
|
||||
|
||||
@@ -197,3 +197,6 @@ struct __wt_cursor_table {
|
||||
if (!F_ISSET(cursor, WT_CURSTD_VALUE_SET)) \
|
||||
WT_ERR(__wt_cursor_kv_not_set(cursor, 0)); \
|
||||
} while (0)
|
||||
|
||||
#define WT_CURSOR_RAW_OK \
|
||||
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW
|
||||
|
||||
@@ -16,6 +16,16 @@ __cursor_set_recno(WT_CURSOR_BTREE *cbt, uint64_t v)
|
||||
cbt->iface.recno = cbt->recno = v;
|
||||
}
|
||||
|
||||
/*
|
||||
* __cursor_position_clear --
|
||||
* Forget the current key and value in a cursor.
|
||||
*/
|
||||
static inline void
|
||||
__cursor_position_clear(WT_CURSOR_BTREE *cbt)
|
||||
{
|
||||
F_CLR(&cbt->iface, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
|
||||
}
|
||||
|
||||
/*
|
||||
* __cursor_search_clear --
|
||||
* Reset the cursor's state for a search.
|
||||
@@ -56,14 +66,9 @@ __cursor_leave(WT_CURSOR_BTREE *cbt)
|
||||
cursor = &cbt->iface;
|
||||
session = (WT_SESSION_IMPL *)cursor->session;
|
||||
|
||||
/* Optionally release any page references we're holding. */
|
||||
if (cbt->page != NULL) {
|
||||
__wt_page_release(session, cbt->page);
|
||||
cbt->page = NULL;
|
||||
}
|
||||
|
||||
/* Reset the returned key/value state. */
|
||||
F_CLR(cursor, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
|
||||
/* Release any page references we're holding. */
|
||||
__wt_stack_release(session, cbt->page);
|
||||
cbt->page = NULL;
|
||||
|
||||
if (F_ISSET(cbt, WT_CBT_ACTIVE)) {
|
||||
WT_ASSERT(session, session->ncursors > 0);
|
||||
|
||||
@@ -589,6 +589,8 @@ extern int __wt_conn_btree_apply_single(WT_SESSION_IMPL *session,
|
||||
extern int __wt_conn_btree_close(WT_SESSION_IMPL *session, int locked);
|
||||
extern int __wt_conn_btree_close_all(WT_SESSION_IMPL *session,
|
||||
const char *name);
|
||||
extern int __wt_conn_btree_discard_single(WT_SESSION_IMPL *session,
|
||||
WT_BTREE *btree);
|
||||
extern int __wt_conn_btree_discard(WT_CONNECTION_IMPL *conn);
|
||||
extern int __wt_connection_init(WT_CONNECTION_IMPL *conn);
|
||||
extern void __wt_connection_destroy(WT_CONNECTION_IMPL *conn);
|
||||
@@ -670,7 +672,9 @@ extern int __wt_log_printf(WT_SESSION_IMPL *session,
|
||||
2,
|
||||
3)));
|
||||
extern WT_LOGREC_DESC __wt_logdesc_debug;
|
||||
extern int __wt_clsm_init_merge(WT_CURSOR *cursor, int nchunks);
|
||||
extern int __wt_clsm_init_merge(WT_CURSOR *cursor,
|
||||
int start_chunk,
|
||||
int nchunks);
|
||||
extern int __wt_clsm_open(WT_SESSION_IMPL *session,
|
||||
const char *uri,
|
||||
const char *cfg[],
|
||||
@@ -679,10 +683,10 @@ extern int __wt_lsm_init(WT_CONNECTION *wt_conn, const char *config);
|
||||
extern int __wt_lsm_cleanup(WT_CONNECTION *wt_conn);
|
||||
extern int __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
|
||||
WT_LSM_TREE *lsm_tree,
|
||||
int start_chunk,
|
||||
int nchunks,
|
||||
WT_LSM_CHUNK **chunkp);
|
||||
extern int __wt_lsm_major_merge(WT_SESSION_IMPL *session,
|
||||
WT_LSM_TREE *lsm_tree);
|
||||
extern int __wt_lsm_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
|
||||
extern int __wt_lsm_meta_read(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
|
||||
extern int __wt_lsm_meta_write(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
|
||||
extern int __wt_lsm_tree_close_all(WT_SESSION_IMPL *session);
|
||||
@@ -724,6 +728,7 @@ extern int __wt_lsm_tree_worker(WT_SESSION_IMPL *session,
|
||||
const char *cfg[],
|
||||
uint32_t open_flags);
|
||||
extern void *__wt_lsm_worker(void *arg);
|
||||
extern void *__wt_lsm_checkpoint_worker(void *arg);
|
||||
extern int __wt_metadata_get(WT_SESSION *session,
|
||||
const char *uri,
|
||||
const char **valuep);
|
||||
|
||||
@@ -21,9 +21,10 @@ struct __wt_cursor_lsm {
|
||||
#define WT_CLSM_ITERATE_NEXT 0x01 /* Forward iteration */
|
||||
#define WT_CLSM_ITERATE_PREV 0x02 /* Backward iteration */
|
||||
#define WT_CLSM_MERGE 0x04 /* Merge cursor, don't update. */
|
||||
#define WT_CLSM_MULTIPLE 0x08 /* Multiple cursors have values for the
|
||||
#define WT_CLSM_MINOR_MERGE 0x08 /* Minor merge, include tombstones. */
|
||||
#define WT_CLSM_MULTIPLE 0x10 /* Multiple cursors have values for the
|
||||
current key */
|
||||
#define WT_CLSM_UPDATED 0x10 /* Cursor has done updates */
|
||||
#define WT_CLSM_UPDATED 0x20 /* Cursor has done updates */
|
||||
uint32_t flags;
|
||||
};
|
||||
|
||||
@@ -51,17 +52,21 @@ struct __wt_lsm_tree {
|
||||
uint32_t *memsizep;
|
||||
|
||||
/* Configuration parameters */
|
||||
uint32_t threshold;
|
||||
uint32_t bloom_bit_count;
|
||||
uint32_t bloom_hash_count;
|
||||
uint32_t chunk_size;
|
||||
uint32_t merge_max;
|
||||
|
||||
WT_SESSION_IMPL *worker_session;/* Passed to thread_create */
|
||||
pthread_t worker_tid; /* LSM worker thread */
|
||||
WT_SESSION_IMPL *ckpt_session; /* For checkpoint worker */
|
||||
pthread_t ckpt_tid; /* LSM checkpoint worker thread */
|
||||
|
||||
int nchunks; /* Number of active chunks */
|
||||
int last; /* Last allocated ID. */
|
||||
WT_LSM_CHUNK **chunk; /* Array of active LSM chunks */
|
||||
size_t chunk_alloc; /* Space allocated for chunks */
|
||||
|
||||
WT_LSM_CHUNK **old_chunks; /* Array of old LSM chunks */
|
||||
size_t old_alloc; /* Space allocated for old chunks */
|
||||
int nold_chunks; /* Number of old chunks */
|
||||
@@ -77,3 +82,12 @@ struct __wt_lsm_data_source {
|
||||
|
||||
WT_RWLOCK *rwlock;
|
||||
};
|
||||
|
||||
struct __wt_lsm_worker_cookie {
|
||||
WT_LSM_CHUNK **chunk_array;
|
||||
size_t chunk_alloc;
|
||||
int nchunks;
|
||||
#define WT_LSM_WORKER_MERGE 0x01
|
||||
#define WT_LSM_WORKER_CHECKPOINT 0x02
|
||||
uint32_t flags;
|
||||
};
|
||||
|
||||
@@ -52,6 +52,9 @@
|
||||
#define WT_SKIP_MAXDEPTH 10
|
||||
#define WT_SKIP_PROBABILITY (UINT32_MAX >> 2)
|
||||
|
||||
/* The number of hazard references that can be in use is grown dynamically. */
|
||||
#define WT_HAZARD_INCR 10
|
||||
|
||||
/*
|
||||
* Quiet compiler warnings about unused parameters.
|
||||
*/
|
||||
|
||||
@@ -119,6 +119,7 @@ struct __wt_connection_stats {
|
||||
WT_STATS memfree;
|
||||
WT_STATS total_read_io;
|
||||
WT_STATS total_write_io;
|
||||
WT_STATS txn_fail_cache;
|
||||
WT_STATS txn_begin;
|
||||
WT_STATS txn_commit;
|
||||
WT_STATS txn_rollback;
|
||||
|
||||
@@ -34,15 +34,17 @@ typedef uint32_t wt_txnid_t;
|
||||
* remains in the system after 2 billion transactions it can no longer be
|
||||
* compared with current transaction ID.
|
||||
*/
|
||||
#define TXNID_LT(t1, t2) \
|
||||
(((t1) == (t2) || \
|
||||
(t1) == WT_TXN_ABORTED || (t2) == WT_TXN_NONE) ? 0 : \
|
||||
((t1) == WT_TXN_NONE || (t2) == WT_TXN_ABORTED) ? 1 : \
|
||||
#define TXNID_LE(t1, t2) \
|
||||
(((t1) == WT_TXN_ABORTED || (t2) == WT_TXN_NONE) ? 0 : \
|
||||
((t1) == WT_TXN_NONE || (t2) == WT_TXN_ABORTED) ? 1 : \
|
||||
(t2) - (t1) < (UINT32_MAX / 2))
|
||||
|
||||
#define TXNID_LT(t1, t2) \
|
||||
((t1) != (t2) && TXNID_LE(t1, t2))
|
||||
|
||||
struct __wt_txn_state {
|
||||
wt_txnid_t id;
|
||||
wt_txnid_t snap_min;
|
||||
volatile wt_txnid_t id;
|
||||
volatile wt_txnid_t snap_min;
|
||||
};
|
||||
|
||||
struct __wt_txn_global {
|
||||
@@ -89,6 +91,12 @@ struct __wt_txn {
|
||||
size_t modref_alloc;
|
||||
u_int modref_count;
|
||||
|
||||
/*
|
||||
* Count of unsuccessful eviction attempts, used to abort if the cache
|
||||
* is full and no progress can be made.
|
||||
*/
|
||||
u_int eviction_fails;
|
||||
|
||||
#define TXN_AUTOCOMMIT 0x01
|
||||
#define TXN_ERROR 0x02
|
||||
#define TXN_RUNNING 0x04
|
||||
|
||||
@@ -133,9 +133,7 @@ __wt_txn_visible_all(WT_SESSION_IMPL *session, wt_txnid_t id)
|
||||
WT_TXN *txn;
|
||||
|
||||
txn = &session->txn;
|
||||
if (TXNID_LT(txn->oldest_snap_min, id))
|
||||
return (0);
|
||||
return (1);
|
||||
return (TXNID_LT(id, txn->oldest_snap_min));
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@@ -522,15 +522,14 @@ struct __wt_session {
|
||||
* @config{append, append the value as a new record\, creating a new
|
||||
* record number key; valid only for cursors with record number keys.,a
|
||||
* boolean flag; default \c false.}
|
||||
* @config{bulk, configure the cursor for bulk loads; bulk-load is a
|
||||
* fast load path for newly created objects and only newly created
|
||||
* objects may be bulk-loaded. Cursors configured for bulk load only
|
||||
* support the WT_CURSOR::insert and WT_CURSOR::close methods.,a boolean
|
||||
* flag; default \c false.}
|
||||
* @config{checkpoint, the name of a checkpoint to open; the reserved
|
||||
* checkpoint name "WiredTigerCheckpoint" opens a cursor on the most
|
||||
* recent internal checkpoint taken for the object.,a string; default
|
||||
* empty.}
|
||||
* @config{bulk, configure the cursor for bulk loads\, a fast load path
|
||||
* that may only be used for newly created objects. Cursors configured
|
||||
* for bulk load only support the WT_CURSOR::insert and WT_CURSOR::close
|
||||
* methods.,a boolean flag; default \c false.}
|
||||
* @config{checkpoint, the name of a checkpoint to open (the reserved
|
||||
* name "WiredTigerCheckpoint" opens the most recent internal checkpoint
|
||||
* taken for the object). The cursor does not support data
|
||||
* modification.,a string; default empty.}
|
||||
* @config{dump, configure the cursor for dump format inputs and
|
||||
* outputs: "hex" selects a simple hexadecimal format\, "print" selects
|
||||
* a format where only non-printing characters are hexadecimal encoded.
|
||||
@@ -663,6 +662,8 @@ struct __wt_session {
|
||||
* for LSM bloom filters..,an integer between 2 and 100; default \c 4.}
|
||||
* @config{lsm_chunk_size, the maximum size of the in-memory chunk of an
|
||||
* LSM tree.,an integer between 512K and 500MB; default \c 2MB.}
|
||||
* @config{lsm_merge_max, the maximum number of chunks to include in a
|
||||
* merge operation.,an integer between 2 and 100; default \c 15.}
|
||||
* @config{prefix_compression, configure row-store format key prefix
|
||||
* compression.,a boolean flag; default \c true.}
|
||||
* @config{split_pct, the Btree page split size as a percentage of the
|
||||
@@ -1146,8 +1147,8 @@ struct __wt_connection {
|
||||
* may need quoting\, for example\,
|
||||
* <code>extensions=("/path/to/ext.so"="entry=my_entry")</code>.,a list of
|
||||
* strings; default empty.}
|
||||
* @config{hazard_max, number of simultaneous hazard references per session
|
||||
* handle.,an integer greater than or equal to 15; default \c 30.}
|
||||
* @config{hazard_max, maximum number of simultaneous hazard references per
|
||||
* session handle.,an integer greater than or equal to 15; default \c 1000.}
|
||||
* @config{logging, enable logging.,a boolean flag; default \c false.}
|
||||
* @config{multiprocess, permit sharing between processes (will automatically
|
||||
* start an RPC server for primary processes and use RPC for secondary
|
||||
@@ -1669,12 +1670,14 @@ extern int wiredtiger_extension_init(WT_SESSION *session,
|
||||
#define WT_STAT_total_read_io 18
|
||||
/*! total write I/Os */
|
||||
#define WT_STAT_total_write_io 19
|
||||
/*! transaction failures due to cache overflow */
|
||||
#define WT_STAT_txn_fail_cache 20
|
||||
/*! transactions */
|
||||
#define WT_STAT_txn_begin 20
|
||||
#define WT_STAT_txn_begin 21
|
||||
/*! transactions committed */
|
||||
#define WT_STAT_txn_commit 21
|
||||
#define WT_STAT_txn_commit 22
|
||||
/*! transactions rolled-back */
|
||||
#define WT_STAT_txn_rollback 22
|
||||
#define WT_STAT_txn_rollback 23
|
||||
|
||||
/*!
|
||||
* @}
|
||||
|
||||
@@ -135,6 +135,8 @@ struct __wt_lsm_data_source;
|
||||
typedef struct __wt_lsm_data_source WT_LSM_DATA_SOURCE;
|
||||
struct __wt_lsm_tree;
|
||||
typedef struct __wt_lsm_tree WT_LSM_TREE;
|
||||
struct __wt_lsm_worker_cookie;
|
||||
typedef struct __wt_lsm_worker_cookie WT_LSM_WORKER_COOKIE;
|
||||
struct __wt_named_collator;
|
||||
typedef struct __wt_named_collator WT_NAMED_COLLATOR;
|
||||
struct __wt_named_compressor;
|
||||
|
||||
@@ -33,7 +33,7 @@
|
||||
CURSOR_UPDATE_API_CALL(cursor, session, n, NULL); \
|
||||
WT_ERR(__clsm_enter(clsm))
|
||||
|
||||
static int __clsm_open_cursors(WT_CURSOR_LSM *);
|
||||
static int __clsm_open_cursors(WT_CURSOR_LSM *, int);
|
||||
static int __clsm_search(WT_CURSOR *);
|
||||
|
||||
static inline int
|
||||
@@ -41,7 +41,7 @@ __clsm_enter(WT_CURSOR_LSM *clsm)
|
||||
{
|
||||
if (!F_ISSET(clsm, WT_CLSM_MERGE) &&
|
||||
clsm->dsk_gen != clsm->lsm_tree->dsk_gen)
|
||||
WT_RET(__clsm_open_cursors(clsm));
|
||||
WT_RET(__clsm_open_cursors(clsm, 0));
|
||||
|
||||
return (0);
|
||||
}
|
||||
@@ -54,7 +54,7 @@ static WT_ITEM __lsm_tombstone = { "", 0, 0, NULL, 0 };
|
||||
|
||||
#define WT_LSM_NEEDVALUE(c) do { \
|
||||
WT_CURSOR_NEEDVALUE(c); \
|
||||
if (__clsm_deleted(&(c)->value)) \
|
||||
if (__clsm_deleted((WT_CURSOR_LSM *)(c), &(c)->value)) \
|
||||
WT_ERR(__wt_cursor_kv_not_set(cursor, 0)); \
|
||||
} while (0)
|
||||
|
||||
@@ -63,9 +63,9 @@ static WT_ITEM __lsm_tombstone = { "", 0, 0, NULL, 0 };
|
||||
* Check whether the current value is a tombstone.
|
||||
*/
|
||||
static inline int
|
||||
__clsm_deleted(WT_ITEM *item)
|
||||
__clsm_deleted(WT_CURSOR_LSM *clsm, WT_ITEM *item)
|
||||
{
|
||||
return (item->size == 0);
|
||||
return (!F_ISSET(clsm, WT_CLSM_MINOR_MERGE) && item->size == 0);
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -106,7 +106,7 @@ __clsm_close_cursors(WT_CURSOR_LSM *clsm)
|
||||
* Open cursors for the current set of files.
|
||||
*/
|
||||
static int
|
||||
__clsm_open_cursors(WT_CURSOR_LSM *clsm)
|
||||
__clsm_open_cursors(WT_CURSOR_LSM *clsm, int start_chunk)
|
||||
{
|
||||
WT_CURSOR *c, **cp;
|
||||
WT_DECL_RET;
|
||||
@@ -115,6 +115,8 @@ __clsm_open_cursors(WT_CURSOR_LSM *clsm)
|
||||
WT_SESSION_IMPL *session;
|
||||
const char *ckpt_cfg[] = API_CONF_DEFAULTS(session, open_cursor,
|
||||
"checkpoint=WiredTigerCheckpoint");
|
||||
const char *merge_cfg[] = API_CONF_DEFAULTS(session, open_cursor,
|
||||
"checkpoint=WiredTigerCheckpoint,no_cache");
|
||||
int i, nchunks;
|
||||
|
||||
session = (WT_SESSION_IMPL *)clsm->iface.session;
|
||||
@@ -152,10 +154,11 @@ __clsm_open_cursors(WT_CURSOR_LSM *clsm)
|
||||
* Read from the checkpoint if the file has been written.
|
||||
* Once all cursors switch, the in-memory tree can be evicted.
|
||||
*/
|
||||
chunk = lsm_tree->chunk[i];
|
||||
chunk = lsm_tree->chunk[i + start_chunk];
|
||||
ret = __wt_curfile_open(session,
|
||||
chunk->uri, &clsm->iface,
|
||||
F_ISSET(chunk, WT_LSM_CHUNK_ONDISK) ? ckpt_cfg : NULL, cp);
|
||||
!F_ISSET(chunk, WT_LSM_CHUNK_ONDISK) ? NULL :
|
||||
(F_ISSET(clsm, WT_CLSM_MERGE) ? merge_cfg : ckpt_cfg), cp);
|
||||
|
||||
/*
|
||||
* XXX kludge: we may have an empty chunk where no checkpoint
|
||||
@@ -200,15 +203,17 @@ err: __wt_spin_unlock(session, &lsm_tree->lock);
|
||||
* Initialize an LSM cursor for a (major) merge.
|
||||
*/
|
||||
int
|
||||
__wt_clsm_init_merge(WT_CURSOR *cursor, int nchunks)
|
||||
__wt_clsm_init_merge(WT_CURSOR *cursor, int start_chunk, int nchunks)
|
||||
{
|
||||
WT_CURSOR_LSM *clsm;
|
||||
|
||||
clsm = (WT_CURSOR_LSM *)cursor;
|
||||
F_SET(clsm, WT_CLSM_MERGE);
|
||||
if (start_chunk != 0)
|
||||
F_SET(clsm, WT_CLSM_MINOR_MERGE);
|
||||
clsm->nchunks = nchunks;
|
||||
|
||||
return (__clsm_open_cursors(clsm));
|
||||
return (__clsm_open_cursors(clsm, start_chunk));
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -255,7 +260,7 @@ __clsm_get_current(
|
||||
WT_RET(current->get_key(current, &c->key));
|
||||
WT_RET(current->get_value(current, &c->value));
|
||||
|
||||
if ((*deletedp = __clsm_deleted(&c->value)) == 0)
|
||||
if ((*deletedp = __clsm_deleted(clsm, &c->value)) == 0)
|
||||
F_SET(c, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
|
||||
else
|
||||
F_CLR(c, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
|
||||
@@ -517,7 +522,7 @@ __clsm_search(WT_CURSOR *cursor)
|
||||
WT_ERR(c->get_key(c, &cursor->key));
|
||||
WT_ERR(c->get_value(c, &cursor->value));
|
||||
clsm->current = c;
|
||||
if (__clsm_deleted(&cursor->value))
|
||||
if (__clsm_deleted(clsm, &cursor->value))
|
||||
ret = WT_NOTFOUND;
|
||||
goto done;
|
||||
} else if (ret != WT_NOTFOUND)
|
||||
@@ -573,7 +578,7 @@ __clsm_search_near(WT_CURSOR *cursor, int *exactp)
|
||||
goto err;
|
||||
|
||||
WT_ERR(c->get_value(c, &v));
|
||||
deleted = __clsm_deleted(&v);
|
||||
deleted = __clsm_deleted(clsm, &v);
|
||||
|
||||
if (cmp == 0 && !deleted) {
|
||||
clsm->current = c;
|
||||
@@ -588,13 +593,13 @@ __clsm_search_near(WT_CURSOR *cursor, int *exactp)
|
||||
while (deleted && (ret = c->next(c)) == 0) {
|
||||
cmp = 1;
|
||||
WT_ERR(c->get_value(c, &v));
|
||||
deleted = __clsm_deleted(&v);
|
||||
deleted = __clsm_deleted(clsm, &v);
|
||||
}
|
||||
WT_ERR_NOTFOUND_OK(ret);
|
||||
while (deleted && (ret = c->prev(c)) == 0) {
|
||||
cmp = -1;
|
||||
WT_ERR(c->get_value(c, &v));
|
||||
deleted = __clsm_deleted(&v);
|
||||
deleted = __clsm_deleted(clsm, &v);
|
||||
}
|
||||
WT_ERR_NOTFOUND_OK(ret);
|
||||
if (deleted)
|
||||
@@ -698,7 +703,7 @@ __clsm_put(
|
||||
clsm->current = primary;
|
||||
|
||||
if ((memsizep = lsm_tree->memsizep) != NULL &&
|
||||
*memsizep > lsm_tree->threshold) {
|
||||
*memsizep > lsm_tree->chunk_size) {
|
||||
/*
|
||||
* Close our cursors: if we are the only open cursor, this
|
||||
* means the btree handle is unlocked.
|
||||
|
||||
@@ -14,10 +14,10 @@
|
||||
*/
|
||||
int
|
||||
__wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
|
||||
WT_LSM_TREE *lsm_tree, int nchunks, WT_LSM_CHUNK **chunkp)
|
||||
WT_LSM_TREE *lsm_tree, int start_chunk, int nchunks, WT_LSM_CHUNK **chunkp)
|
||||
{
|
||||
WT_LSM_CHUNK *chunk;
|
||||
size_t chunk_sz;
|
||||
size_t chunk_sz, chunks_after_merge;
|
||||
int i, j;
|
||||
|
||||
/* Setup the array of obsolete chunks. */
|
||||
@@ -34,7 +34,9 @@ __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
|
||||
/* Copy entries one at a time, so we can reuse gaps in the list. */
|
||||
for (i = j = 0; j < nchunks && i < lsm_tree->nold_chunks; i++) {
|
||||
if (lsm_tree->old_chunks[i] == NULL) {
|
||||
lsm_tree->old_chunks[i] = lsm_tree->chunk[j++];
|
||||
lsm_tree->old_chunks[i] =
|
||||
lsm_tree->chunk[start_chunk + j];
|
||||
++j;
|
||||
--lsm_tree->old_avail;
|
||||
}
|
||||
}
|
||||
@@ -42,13 +44,15 @@ __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
|
||||
WT_ASSERT(session, j == nchunks);
|
||||
|
||||
/* Update the current chunk list. */
|
||||
memmove(lsm_tree->chunk + 1, lsm_tree->chunk + nchunks,
|
||||
(lsm_tree->nchunks - nchunks) * sizeof(*lsm_tree->chunk));
|
||||
chunks_after_merge = lsm_tree->nchunks - (nchunks + start_chunk);
|
||||
memmove(lsm_tree->chunk + start_chunk + 1,
|
||||
lsm_tree->chunk + start_chunk + nchunks,
|
||||
chunks_after_merge * sizeof(*lsm_tree->chunk));
|
||||
lsm_tree->nchunks -= nchunks - 1;
|
||||
memset(lsm_tree->chunk + lsm_tree->nchunks, 0,
|
||||
(nchunks - 1) * sizeof(*lsm_tree->chunk));
|
||||
WT_RET(__wt_calloc_def(session, 1, &chunk));
|
||||
lsm_tree->chunk[0] = chunk;
|
||||
lsm_tree->chunk[start_chunk] = chunk;
|
||||
lsm_tree->dsk_gen++;
|
||||
|
||||
*chunkp = chunk;
|
||||
@@ -56,11 +60,11 @@ __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
|
||||
}
|
||||
|
||||
/*
|
||||
* __wt_lsm_major_merge --
|
||||
* Merge a set of chunks of an LSM tree including the oldest.
|
||||
* __wt_lsm_merge --
|
||||
* Merge a set of chunks of an LSM tree.
|
||||
*/
|
||||
int
|
||||
__wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
__wt_lsm_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
{
|
||||
WT_BLOOM *bloom;
|
||||
WT_CURSOR *src, *dest;
|
||||
@@ -71,7 +75,7 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
WT_SESSION *wt_session;
|
||||
const char *dest_uri;
|
||||
uint64_t insert_count, record_count;
|
||||
int dest_id, i, nchunks;
|
||||
int dest_id, i, nchunks, start_chunk;
|
||||
|
||||
src = dest = NULL;
|
||||
dest_uri = NULL;
|
||||
@@ -94,21 +98,46 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
return (WT_NOTFOUND);
|
||||
|
||||
/*
|
||||
* We have a limited number of hazard references, and we want to bound
|
||||
* the amount of work in the merge.
|
||||
*
|
||||
* Use the lsm_tree lock to read the chunks (so no switches occur), but
|
||||
* avoid holding it while the merge is in progress: that may take a
|
||||
* long time.
|
||||
*/
|
||||
nchunks = WT_MIN((int)S2C(session)->hazard_size / 2, nchunks);
|
||||
__wt_spin_lock(session, &lsm_tree->lock);
|
||||
|
||||
/* Only include chunks that are on disk */
|
||||
while (nchunks > 1 &&
|
||||
(!F_ISSET(lsm_tree->chunk[nchunks - 1], WT_LSM_CHUNK_ONDISK) ||
|
||||
lsm_tree->chunk[nchunks - 1]->ncursor > 0))
|
||||
--nchunks;
|
||||
|
||||
/*
|
||||
* Look for a minor merge to do in preference to a major merge.
|
||||
*
|
||||
* The difference is whether the oldest chunk is involved: if it is, we
|
||||
* can discard tombstones, because there can be no older record to
|
||||
* marked deleted.
|
||||
*
|
||||
* We look at the Bloom URI to decide whether a chunk is the result of
|
||||
* an earlier merge. In a minor merge, we take as many chunks as we
|
||||
* can that have not yet been merged. If there are less than 2 "new"
|
||||
* chunks, fall back to a major merge.
|
||||
*/
|
||||
for (i = 0; i < nchunks; i++)
|
||||
if (lsm_tree->chunk[i]->bloom_uri == NULL)
|
||||
break;
|
||||
|
||||
if (i < nchunks - 2) {
|
||||
start_chunk = i;
|
||||
nchunks -= i;
|
||||
} else
|
||||
start_chunk = 0;
|
||||
|
||||
/* Respect the configured limit on the number of chunks to merge. */
|
||||
if (nchunks > (int)lsm_tree->merge_max)
|
||||
nchunks = (int)lsm_tree->merge_max;
|
||||
|
||||
for (record_count = 0, i = 0; i < nchunks; i++)
|
||||
record_count += lsm_tree->chunk[i]->count;
|
||||
record_count += lsm_tree->chunk[start_chunk + i]->count;
|
||||
__wt_spin_unlock(session, &lsm_tree->lock);
|
||||
|
||||
if (nchunks <= 1)
|
||||
@@ -118,7 +147,8 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
dest_id = WT_ATOMIC_ADD(lsm_tree->last, 1);
|
||||
|
||||
WT_VERBOSE_RET(session, lsm,
|
||||
"Merging first %d chunks into %d\n", nchunks, dest_id);
|
||||
"Merging chunks %d-%d into %d (%" PRIu64 " records)\n",
|
||||
start_chunk, start_chunk + nchunks, dest_id, record_count);
|
||||
|
||||
if (record_count != 0) {
|
||||
WT_RET(__wt_scr_alloc(session, 0, &bbuf));
|
||||
@@ -140,7 +170,7 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
WT_ERR(wt_session->open_cursor(
|
||||
wt_session, lsm_tree->name, NULL, NULL, &src));
|
||||
F_SET(src, WT_CURSTD_RAW);
|
||||
WT_ERR(__wt_clsm_init_merge(src, nchunks));
|
||||
WT_ERR(__wt_clsm_init_merge(src, start_chunk, nchunks));
|
||||
|
||||
WT_WITH_SCHEMA_LOCK(session, ret = __wt_lsm_tree_create_chunk(
|
||||
session, lsm_tree, dest_id, &dest_uri));
|
||||
@@ -174,7 +204,8 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
WT_ERR(ret);
|
||||
|
||||
__wt_spin_lock(session, &lsm_tree->lock);
|
||||
ret = __wt_lsm_merge_update_tree(session, lsm_tree, nchunks, &chunk);
|
||||
ret = __wt_lsm_merge_update_tree(
|
||||
session, lsm_tree, start_chunk, nchunks, &chunk);
|
||||
|
||||
chunk->uri = dest_uri;
|
||||
dest_uri = NULL;
|
||||
|
||||
@@ -44,8 +44,10 @@ __wt_lsm_meta_read(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
else if (WT_STRING_MATCH(
|
||||
"lsm_bloom_hash_count", ck.str, ck.len))
|
||||
lsm_tree->bloom_hash_count = (uint32_t)cv.val;
|
||||
else if (WT_STRING_MATCH("threshold", ck.str, ck.len))
|
||||
lsm_tree->threshold = (uint32_t)cv.val;
|
||||
else if (WT_STRING_MATCH("lsm_chunk_size", ck.str, ck.len))
|
||||
lsm_tree->chunk_size = (uint32_t)cv.val;
|
||||
else if (WT_STRING_MATCH("lsm_merge_max", ck.str, ck.len))
|
||||
lsm_tree->merge_max = (uint32_t)cv.val;
|
||||
else if (WT_STRING_MATCH("last", ck.str, ck.len))
|
||||
lsm_tree->last = (int)cv.val;
|
||||
else if (WT_STRING_MATCH("chunks", ck.str, ck.len)) {
|
||||
@@ -135,9 +137,11 @@ __wt_lsm_meta_write(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
lsm_tree->file_config,
|
||||
lsm_tree->key_format, lsm_tree->value_format));
|
||||
WT_ERR(__wt_buf_catfmt(session, buf,
|
||||
",last=%" PRIu32 ",threshold=%" PRIu64
|
||||
",last=%" PRIu32
|
||||
",lsm_chunk_size=%" PRIu64 ",lsm_merge_max=%" PRIu32
|
||||
",lsm_bloom_bit_count=%" PRIu32 ",lsm_bloom_hash_count=%" PRIu32,
|
||||
lsm_tree->last, (uint64_t)lsm_tree->threshold,
|
||||
lsm_tree->last,
|
||||
(uint64_t)lsm_tree->chunk_size, lsm_tree->merge_max,
|
||||
lsm_tree->bloom_bit_count, lsm_tree->bloom_hash_count));
|
||||
WT_ERR(__wt_buf_catfmt(session, buf, ",chunks=["));
|
||||
for (i = 0; i < lsm_tree->nchunks; i++) {
|
||||
@@ -166,7 +170,10 @@ __wt_lsm_meta_write(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
WT_ERR(__wt_buf_catfmt(session, buf, "\"%s\"", chunk->uri));
|
||||
}
|
||||
WT_ERR(__wt_buf_catfmt(session, buf, "]"));
|
||||
WT_ERR(__wt_metadata_update(session, lsm_tree->name, buf->data));
|
||||
__wt_spin_lock(session, &S2C(session)->metadata_lock);
|
||||
ret = __wt_metadata_update(session, lsm_tree->name, buf->data);
|
||||
__wt_spin_unlock(session, &S2C(session)->metadata_lock);
|
||||
WT_ERR(ret);
|
||||
|
||||
err: __wt_scr_free(&buf);
|
||||
return (ret);
|
||||
|
||||
@@ -7,6 +7,7 @@
|
||||
|
||||
#include "wt_internal.h"
|
||||
|
||||
static int __lsm_tree_open_check(WT_SESSION_IMPL *, WT_LSM_TREE *);
|
||||
static int __lsm_tree_open(WT_SESSION_IMPL *, const char *, WT_LSM_TREE **);
|
||||
|
||||
/*
|
||||
@@ -62,11 +63,13 @@ __lsm_tree_close(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
if (F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
|
||||
F_CLR(lsm_tree, WT_LSM_TREE_WORKING);
|
||||
WT_TRET(__wt_thread_join(lsm_tree->worker_tid));
|
||||
WT_TRET(__wt_thread_join(lsm_tree->ckpt_tid));
|
||||
}
|
||||
|
||||
/*
|
||||
* Close the session and free its hazard array (necessary because
|
||||
* we set WT_SESSION_INTERNAL to simplify shutdown ordering.
|
||||
* Close the worker thread sessions and free their hazard arrays
|
||||
* (necessary because we set WT_SESSION_INTERNAL to simplify shutdown
|
||||
* ordering.
|
||||
*
|
||||
* Do this in the main thread to avoid deadlocks.
|
||||
*/
|
||||
@@ -83,6 +86,19 @@ __lsm_tree_close(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
*/
|
||||
__wt_free(NULL, lsm_tree->worker_session->hazard);
|
||||
}
|
||||
if (lsm_tree->ckpt_session != NULL) {
|
||||
F_SET(lsm_tree->ckpt_session,
|
||||
F_ISSET(session, WT_SESSION_SCHEMA_LOCKED));
|
||||
|
||||
wt_session = &lsm_tree->ckpt_session->iface;
|
||||
WT_TRET(wt_session->close(wt_session, NULL));
|
||||
|
||||
/*
|
||||
* This is safe after the close because session handles are
|
||||
* not freed, but are managed by the connection.
|
||||
*/
|
||||
__wt_free(NULL, lsm_tree->ckpt_session->hazard);
|
||||
}
|
||||
|
||||
return (ret);
|
||||
}
|
||||
@@ -167,11 +183,17 @@ __lsm_tree_start_worker(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
lsm_tree->worker_session = (WT_SESSION_IMPL *)wt_session;
|
||||
F_SET(lsm_tree->worker_session, WT_SESSION_INTERNAL);
|
||||
|
||||
WT_RET(wt_conn->open_session(wt_conn, NULL, NULL, &wt_session));
|
||||
lsm_tree->ckpt_session = (WT_SESSION_IMPL *)wt_session;
|
||||
F_SET(lsm_tree->ckpt_session, WT_SESSION_INTERNAL);
|
||||
|
||||
F_SET(lsm_tree, WT_LSM_TREE_WORKING);
|
||||
/* The new thread will rely on the WORKING value being visible. */
|
||||
WT_FULL_BARRIER();
|
||||
WT_RET(__wt_thread_create(
|
||||
&lsm_tree->worker_tid, __wt_lsm_worker, lsm_tree));
|
||||
WT_RET(__wt_thread_create(
|
||||
&lsm_tree->ckpt_tid, __wt_lsm_checkpoint_worker, lsm_tree));
|
||||
|
||||
return (0);
|
||||
}
|
||||
@@ -219,12 +241,14 @@ __wt_lsm_tree_create(WT_SESSION_IMPL *session,
|
||||
WT_ERR(__wt_strndup(session, cval.str, cval.len,
|
||||
&lsm_tree->value_format));
|
||||
|
||||
WT_ERR(__wt_config_gets(session, cfg, "lsm_chunk_size", &cval));
|
||||
lsm_tree->threshold = (uint32_t)cval.val;
|
||||
WT_ERR(__wt_config_gets(session, cfg, "lsm_bloom_bit_count", &cval));
|
||||
lsm_tree->bloom_bit_count = (uint32_t)cval.val;
|
||||
WT_ERR(__wt_config_gets(session, cfg, "lsm_bloom_hash_count", &cval));
|
||||
lsm_tree->bloom_hash_count = (uint32_t)cval.val;
|
||||
WT_ERR(__wt_config_gets(session, cfg, "lsm_chunk_size", &cval));
|
||||
lsm_tree->chunk_size = (uint32_t)cval.val;
|
||||
WT_ERR(__wt_config_gets(session, cfg, "lsm_merge_max", &cval));
|
||||
lsm_tree->merge_max = (uint32_t)cval.val;
|
||||
|
||||
WT_ERR(__wt_scr_alloc(session, 0, &buf));
|
||||
WT_ERR(__wt_buf_fmt(session, buf,
|
||||
@@ -238,8 +262,12 @@ __wt_lsm_tree_create(WT_SESSION_IMPL *session,
|
||||
__lsm_tree_discard(session, lsm_tree);
|
||||
lsm_tree = NULL;
|
||||
|
||||
/* Open our new tree and add it to the handle cache. */
|
||||
WT_ERR(__lsm_tree_open(session, uri, &lsm_tree));
|
||||
/*
|
||||
* Open our new tree and add it to the handle cache. Don't discard on
|
||||
* error the returned handle is NULL on error, and the metadata tracking
|
||||
* macros handle cleaning up on failure.
|
||||
*/
|
||||
ret = __lsm_tree_open(session, uri, &lsm_tree);
|
||||
|
||||
if (0) {
|
||||
err: __lsm_tree_discard(session, lsm_tree);
|
||||
@@ -248,6 +276,34 @@ err: __lsm_tree_discard(session, lsm_tree);
|
||||
return (ret);
|
||||
}
|
||||
|
||||
/*
|
||||
* __lsm_tree_open_check --
|
||||
* Validate the configuration of an LSM tree.
|
||||
*/
|
||||
static int
|
||||
__lsm_tree_open_check(
|
||||
WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
{
|
||||
WT_CONFIG_ITEM cval;
|
||||
const char *cfg[] = API_CONF_DEFAULTS(
|
||||
session, create, lsm_tree->file_config);
|
||||
uint32_t maxleafpage;
|
||||
uint64_t req;
|
||||
|
||||
WT_RET(__wt_config_gets(
|
||||
session, cfg, "leaf_page_max", &cval));
|
||||
maxleafpage = (uint32_t)cval.val;
|
||||
|
||||
/* Three chunks, plus one page for each participant in a merge. */
|
||||
req = 3 * lsm_tree->chunk_size + (lsm_tree->merge_max * maxleafpage);
|
||||
if (S2C(session)->cache_size < req)
|
||||
WT_RET_MSG(session, EINVAL,
|
||||
"The LSM configuration requires a cache size of at least %"
|
||||
PRIu64 ". Configured size is %" PRIu64,
|
||||
req, S2C(session)->cache_size);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* __lsm_tree_open --
|
||||
* Open an LSM tree structure.
|
||||
@@ -273,6 +329,12 @@ __lsm_tree_open(
|
||||
lsm_tree->filename = lsm_tree->name + strlen("lsm:");
|
||||
WT_ERR(__wt_lsm_meta_read(session, lsm_tree));
|
||||
|
||||
/*
|
||||
* Sanity check the configuration. Do it now since this is the first
|
||||
* time we have the LSM tree configuration.
|
||||
*/
|
||||
WT_ERR(__lsm_tree_open_check(session, lsm_tree));
|
||||
|
||||
if (lsm_tree->nchunks == 0)
|
||||
WT_ERR(__wt_lsm_tree_switch(session, lsm_tree));
|
||||
|
||||
@@ -332,7 +394,7 @@ __wt_lsm_tree_switch(
|
||||
WT_VERBOSE_RET(session, lsm,
|
||||
"Tree switch to: %d because %d > %d", lsm_tree->last + 1,
|
||||
(lsm_tree->memsizep == NULL ? 0 : (int)*lsm_tree->memsizep),
|
||||
(int)lsm_tree->threshold);
|
||||
(int)lsm_tree->chunk_size);
|
||||
|
||||
lsm_tree->memsizep = NULL;
|
||||
|
||||
@@ -500,7 +562,7 @@ __wt_lsm_tree_truncate(
|
||||
|
||||
/* Mark all chunks old. */
|
||||
WT_ERR(__wt_lsm_merge_update_tree(
|
||||
session, lsm_tree, lsm_tree->nchunks, &chunk));
|
||||
session, lsm_tree, 0, lsm_tree->nchunks, &chunk));
|
||||
|
||||
/* Create the new chunk. */
|
||||
WT_ERR(__wt_lsm_tree_create_chunk(
|
||||
|
||||
@@ -7,8 +7,8 @@
|
||||
|
||||
#include "wt_internal.h"
|
||||
|
||||
static int
|
||||
__lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
|
||||
static int __lsm_free_chunks(WT_SESSION_IMPL *, WT_LSM_TREE *);
|
||||
static int __lsm_copy_chunks(WT_LSM_TREE *, WT_LSM_WORKER_COOKIE *);
|
||||
|
||||
/*
|
||||
* __wt_lsm_worker --
|
||||
@@ -18,78 +18,20 @@ __lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
|
||||
void *
|
||||
__wt_lsm_worker(void *arg)
|
||||
{
|
||||
WT_DECL_RET;
|
||||
WT_LSM_CHUNK *chunk, **chunk_array;
|
||||
WT_LSM_TREE *lsm_tree;
|
||||
WT_SESSION_IMPL *session;
|
||||
const char *cfg[] = API_CONF_DEFAULTS(session, checkpoint, NULL);
|
||||
size_t chunk_alloc;
|
||||
int i, nchunks, progress;
|
||||
int progress;
|
||||
|
||||
lsm_tree = arg;
|
||||
session = lsm_tree->worker_session;
|
||||
|
||||
chunk_array = NULL;
|
||||
chunk_alloc = 0;
|
||||
|
||||
while (F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
|
||||
progress = 0;
|
||||
|
||||
__wt_spin_lock(session, &lsm_tree->lock);
|
||||
if (!F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
|
||||
__wt_spin_unlock(session, &lsm_tree->lock);
|
||||
break;
|
||||
}
|
||||
/*
|
||||
* Take a copy of the current state of the LSM tree. Skip
|
||||
* the last chunk - since it is the active one and not relevant
|
||||
* to merge operations.
|
||||
*/
|
||||
for (nchunks = lsm_tree->nchunks - 1;
|
||||
nchunks > 0 && lsm_tree->chunk[nchunks - 1]->ncursor > 0;
|
||||
--nchunks)
|
||||
;
|
||||
if (chunk_alloc < lsm_tree->chunk_alloc)
|
||||
ret = __wt_realloc(session,
|
||||
&chunk_alloc, lsm_tree->chunk_alloc,
|
||||
&chunk_array);
|
||||
if (ret == 0 && nchunks > 0)
|
||||
memcpy(chunk_array, lsm_tree->chunk,
|
||||
nchunks * sizeof(*lsm_tree->chunk));
|
||||
__wt_spin_unlock(session, &lsm_tree->lock);
|
||||
WT_ERR(ret);
|
||||
|
||||
/*
|
||||
* Write checkpoints in all completed files, then find
|
||||
* something to merge.
|
||||
*/
|
||||
for (i = 0; i < nchunks; i++) {
|
||||
chunk = chunk_array[i];
|
||||
if (F_ISSET(chunk, WT_LSM_CHUNK_ONDISK) ||
|
||||
chunk->ncursor > 0)
|
||||
continue;
|
||||
|
||||
/* XXX durability: need to checkpoint the metadata? */
|
||||
/*
|
||||
* NOTE: we pass a non-NULL config, because otherwise
|
||||
* __wt_checkpoint thinks we're closing the file.
|
||||
*/
|
||||
WT_WITH_SCHEMA_LOCK(session, ret =
|
||||
__wt_schema_worker(session, chunk->uri,
|
||||
__wt_checkpoint, cfg, 0));
|
||||
if (ret == 0) {
|
||||
__wt_spin_lock(session, &lsm_tree->lock);
|
||||
F_SET(lsm_tree->chunk[i], WT_LSM_CHUNK_ONDISK);
|
||||
lsm_tree->dsk_gen++;
|
||||
__wt_spin_unlock(session, &lsm_tree->lock);
|
||||
progress = 1;
|
||||
}
|
||||
}
|
||||
|
||||
/* Clear any state from previous worker thread iterations. */
|
||||
session->btree = NULL;
|
||||
|
||||
if (nchunks > 0 && __wt_lsm_major_merge(session, lsm_tree) == 0)
|
||||
if (__wt_lsm_merge(session, lsm_tree) == 0)
|
||||
progress = 1;
|
||||
|
||||
/* Clear any state from previous worker thread iterations. */
|
||||
@@ -103,11 +45,120 @@ __wt_lsm_worker(void *arg)
|
||||
__wt_sleep(0, 10);
|
||||
}
|
||||
|
||||
err: __wt_free(session, chunk_array);
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* __wt_lsm_checkpoint_worker --
|
||||
* A worker thread for an LSM tree, responsible for checkpointing chunks
|
||||
* once they become read only.
|
||||
*/
|
||||
void *
|
||||
__wt_lsm_checkpoint_worker(void *arg)
|
||||
{
|
||||
WT_DECL_RET;
|
||||
WT_LSM_CHUNK *chunk;
|
||||
WT_LSM_TREE *lsm_tree;
|
||||
WT_LSM_WORKER_COOKIE cookie;
|
||||
WT_SESSION_IMPL *session;
|
||||
const char *cfg[] = { "name=,drop=", NULL };
|
||||
int i, j;
|
||||
|
||||
lsm_tree = arg;
|
||||
session = lsm_tree->ckpt_session;
|
||||
|
||||
memset(&cookie, 0, sizeof(cookie));
|
||||
F_SET(&cookie, WT_LSM_WORKER_CHECKPOINT);
|
||||
|
||||
while (F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
|
||||
WT_ERR(__lsm_copy_chunks(lsm_tree, &cookie));
|
||||
|
||||
/* Write checkpoints in all completed files. */
|
||||
for (i = 0, j = 0; i < cookie.nchunks; i++) {
|
||||
chunk = cookie.chunk_array[i];
|
||||
if (F_ISSET(chunk, WT_LSM_CHUNK_ONDISK))
|
||||
continue;
|
||||
|
||||
/*
|
||||
* NOTE: we pass a non-NULL config, because otherwise
|
||||
* __wt_checkpoint thinks we're closing the file.
|
||||
*/
|
||||
WT_WITH_SCHEMA_LOCK(session,
|
||||
ret = __wt_schema_worker(session, chunk->uri,
|
||||
__wt_checkpoint, cfg, 0));
|
||||
if (ret == 0) {
|
||||
++j;
|
||||
__wt_spin_lock(session, &lsm_tree->lock);
|
||||
F_SET(chunk, WT_LSM_CHUNK_ONDISK);
|
||||
lsm_tree->dsk_gen++;
|
||||
__wt_spin_unlock(session, &lsm_tree->lock);
|
||||
WT_VERBOSE_ERR(session, lsm,
|
||||
"LSM worker checkpointed %d.", i);
|
||||
}
|
||||
}
|
||||
if (j == 0)
|
||||
__wt_sleep(0, 10);
|
||||
}
|
||||
err: __wt_free(session, cookie.chunk_array);
|
||||
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Take a copy of part of the LSM tree chunk array so that we can work on
|
||||
* the contents without holding the LSM tree handle lock long term.
|
||||
*/
|
||||
static int
|
||||
__lsm_copy_chunks(WT_LSM_TREE *lsm_tree, WT_LSM_WORKER_COOKIE *cookie)
|
||||
{
|
||||
WT_DECL_RET;
|
||||
WT_SESSION_IMPL *session;
|
||||
int nchunks;
|
||||
|
||||
/* Always return zero chunks on error. */
|
||||
cookie->nchunks = 0;
|
||||
|
||||
if (F_ISSET(cookie, WT_LSM_WORKER_CHECKPOINT))
|
||||
session = lsm_tree->ckpt_session;
|
||||
else
|
||||
session = lsm_tree->worker_session;
|
||||
|
||||
__wt_spin_lock(session, &lsm_tree->lock);
|
||||
if (!F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
|
||||
__wt_spin_unlock(session, &lsm_tree->lock);
|
||||
/* The actual error value is ignored. */
|
||||
return (WT_ERROR);
|
||||
}
|
||||
/*
|
||||
* Take a copy of the current state of the LSM tree. Skip
|
||||
* the last chunk - since it is the active one and not relevant
|
||||
* to merge operations.
|
||||
*/
|
||||
nchunks = lsm_tree->nchunks - 1;
|
||||
/* Checkpoint doesn't care if there are active cursors, merge does. */
|
||||
if (F_ISSET(cookie, WT_LSM_WORKER_MERGE)) {
|
||||
for (; nchunks > 0 && lsm_tree->chunk[nchunks - 1]->ncursor > 0;
|
||||
--nchunks)
|
||||
;
|
||||
}
|
||||
/*
|
||||
* If the tree array of active chunks is larger than our current buffer,
|
||||
* increase the size of our current buffer to match.
|
||||
*/
|
||||
if (cookie->chunk_alloc < lsm_tree->chunk_alloc)
|
||||
ret = __wt_realloc(session,
|
||||
&cookie->chunk_alloc, lsm_tree->chunk_alloc,
|
||||
&cookie->chunk_array);
|
||||
if (ret == 0 && nchunks > 0)
|
||||
memcpy(cookie->chunk_array, lsm_tree->chunk,
|
||||
nchunks * sizeof(*lsm_tree->chunk));
|
||||
__wt_spin_unlock(session, &lsm_tree->lock);
|
||||
|
||||
if (ret == 0)
|
||||
cookie->nchunks = nchunks;
|
||||
return (ret);
|
||||
}
|
||||
|
||||
static int
|
||||
__lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
{
|
||||
@@ -137,6 +188,10 @@ __lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
|
||||
chunk->bloom_uri = NULL;
|
||||
} else if (ret != EBUSY)
|
||||
goto err;
|
||||
if (ret == EBUSY)
|
||||
WT_VERBOSE_ERR(session, lsm,
|
||||
"LSM worker bloom drop busy: %s.",
|
||||
chunk->bloom_uri);
|
||||
}
|
||||
if (chunk->uri != NULL) {
|
||||
WT_WITH_SCHEMA_LOCK(session, ret =
|
||||
|
||||
@@ -702,8 +702,15 @@ __wt_open_session(WT_CONNECTION_IMPL *conn, int internal,
|
||||
* first time we open this session.
|
||||
*/
|
||||
if (session_ret->hazard == NULL)
|
||||
WT_ERR(__wt_calloc(session, conn->hazard_size,
|
||||
WT_ERR(__wt_calloc(session, conn->hazard_max,
|
||||
sizeof(WT_HAZARD), &session_ret->hazard));
|
||||
/*
|
||||
* Set an initial size for the hazard array. It will be grown as
|
||||
* required up to hazard_max. The hazard_size is reset on close, since
|
||||
* __wt_hazard_close ensures the array is cleared - so it is safe to
|
||||
* reset the starting size on each open.
|
||||
*/
|
||||
session_ret->hazard_size = WT_HAZARD_INCR;
|
||||
|
||||
/*
|
||||
* Public sessions are automatically closed during WT_CONNECTION->close.
|
||||
|
||||
@@ -103,20 +103,29 @@ __wt_session_release_btree(WT_SESSION_IMPL *session)
|
||||
btree = session->btree;
|
||||
|
||||
/*
|
||||
* If we had special flags set, close the handle so that future access
|
||||
* can get a handle without special flags.
|
||||
* If we had no cache flag set, close and free the btree handle. It was
|
||||
* never added to the handle cache.
|
||||
*/
|
||||
if (F_ISSET(btree, WT_BTREE_DISCARD | WT_BTREE_SPECIAL_FLAGS)) {
|
||||
WT_ASSERT(session, F_ISSET(btree, WT_BTREE_EXCLUSIVE));
|
||||
F_CLR(btree, WT_BTREE_DISCARD);
|
||||
if (F_ISSET(btree, WT_BTREE_NO_CACHE))
|
||||
WT_RET(__wt_conn_btree_discard_single(session, btree));
|
||||
else {
|
||||
|
||||
WT_RET(__wt_conn_btree_sync_and_close(session));
|
||||
/*
|
||||
* If we had special flags set, close the handle so that future
|
||||
* access can get a handle without special flags.
|
||||
*/
|
||||
if (F_ISSET(btree, WT_BTREE_DISCARD | WT_BTREE_SPECIAL_FLAGS)) {
|
||||
WT_ASSERT(session, F_ISSET(btree, WT_BTREE_EXCLUSIVE));
|
||||
F_CLR(btree, WT_BTREE_DISCARD);
|
||||
|
||||
WT_RET(__wt_conn_btree_sync_and_close(session));
|
||||
}
|
||||
|
||||
if (F_ISSET(btree, WT_BTREE_EXCLUSIVE))
|
||||
F_CLR(btree, WT_BTREE_EXCLUSIVE);
|
||||
|
||||
__wt_rwunlock(session, btree->rwlock);
|
||||
}
|
||||
|
||||
if (F_ISSET(btree, WT_BTREE_EXCLUSIVE))
|
||||
F_CLR(btree, WT_BTREE_EXCLUSIVE);
|
||||
|
||||
__wt_rwunlock(session, btree->rwlock);
|
||||
session->btree = NULL;
|
||||
|
||||
return (ret);
|
||||
@@ -193,34 +202,41 @@ __wt_session_get_btree(WT_SESSION_IMPL *session,
|
||||
WT_DECL_RET;
|
||||
|
||||
btree = NULL;
|
||||
btree_session = NULL;
|
||||
|
||||
TAILQ_FOREACH(btree_session, &session->btrees, q) {
|
||||
btree = btree_session->btree;
|
||||
if (strcmp(uri, btree->name) != 0)
|
||||
continue;
|
||||
if ((checkpoint == NULL && btree->checkpoint == NULL) ||
|
||||
(checkpoint != NULL && btree->checkpoint != NULL &&
|
||||
strcmp(checkpoint, btree->checkpoint) == 0))
|
||||
break;
|
||||
}
|
||||
|
||||
if (btree_session == NULL)
|
||||
session->btree = NULL;
|
||||
else {
|
||||
session->btree = btree;
|
||||
|
||||
/*
|
||||
* Try and lock the file; if we succeed, our "exclusive" state
|
||||
* must match.
|
||||
*/
|
||||
if ((ret =
|
||||
__wt_session_lock_btree(session, flags)) != WT_NOTFOUND) {
|
||||
WT_ASSERT(session, ret != 0 ||
|
||||
LF_ISSET(WT_BTREE_EXCLUSIVE) ==
|
||||
F_ISSET(session->btree, WT_BTREE_EXCLUSIVE));
|
||||
return (ret);
|
||||
/*
|
||||
* If the no cache flag is set, we never use the handle cache to
|
||||
* store or retrieve the handle.
|
||||
*/
|
||||
if (!LF_ISSET(WT_BTREE_NO_CACHE)) {
|
||||
TAILQ_FOREACH(btree_session, &session->btrees, q) {
|
||||
btree = btree_session->btree;
|
||||
if (strcmp(uri, btree->name) != 0)
|
||||
continue;
|
||||
if ((checkpoint == NULL && btree->checkpoint == NULL) ||
|
||||
(checkpoint != NULL && btree->checkpoint != NULL &&
|
||||
strcmp(checkpoint, btree->checkpoint) == 0))
|
||||
break;
|
||||
}
|
||||
|
||||
if (btree_session == NULL)
|
||||
session->btree = NULL;
|
||||
else {
|
||||
session->btree = btree;
|
||||
|
||||
/*
|
||||
* Try and lock the file; if we succeed, our "exclusive"
|
||||
* state must match.
|
||||
*/
|
||||
if ((ret = __wt_session_lock_btree(
|
||||
session, flags)) != WT_NOTFOUND) {
|
||||
WT_ASSERT(session, ret != 0 ||
|
||||
LF_ISSET(WT_BTREE_EXCLUSIVE) == F_ISSET(
|
||||
session->btree, WT_BTREE_EXCLUSIVE));
|
||||
return (ret);
|
||||
}
|
||||
ret = 0;
|
||||
}
|
||||
ret = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -231,7 +247,7 @@ __wt_session_get_btree(WT_SESSION_IMPL *session,
|
||||
ret = __wt_conn_btree_get(session, uri, checkpoint, cfg, flags));
|
||||
WT_RET(ret);
|
||||
|
||||
if (btree_session == NULL)
|
||||
if (btree_session == NULL && !LF_ISSET(WT_BTREE_NO_CACHE))
|
||||
WT_RET(__wt_session_add_btree(session, NULL));
|
||||
|
||||
WT_ASSERT(session, LF_ISSET(WT_BTREE_LOCK_ONLY) ||
|
||||
|
||||
@@ -23,11 +23,9 @@ __wt_hazard_set(WT_SESSION_IMPL *session, WT_REF *ref, int *busyp
|
||||
)
|
||||
{
|
||||
WT_BTREE *btree;
|
||||
WT_CONNECTION_IMPL *conn;
|
||||
WT_HAZARD *hp;
|
||||
|
||||
btree = session->btree;
|
||||
conn = S2C(session);
|
||||
*busyp = 0;
|
||||
|
||||
/* If a file can never be evicted, hazard references aren't required. */
|
||||
@@ -48,8 +46,16 @@ __wt_hazard_set(WT_SESSION_IMPL *session, WT_REF *ref, int *busyp
|
||||
* state to WT_REF_LOCKED, then flushes memory and checks the hazard
|
||||
* references).
|
||||
*/
|
||||
for (hp = session->hazard;
|
||||
hp < session->hazard + conn->hazard_size; ++hp) {
|
||||
for (hp = session->hazard; ; ++hp) {
|
||||
/* Expand the number of hazard references if available.*/
|
||||
if (hp >= session->hazard + session->hazard_size) {
|
||||
if (session->hazard_size >= S2C(session)->hazard_max)
|
||||
break;
|
||||
WT_PUBLISH(session->hazard_size,
|
||||
WT_MIN(session->hazard_size + WT_HAZARD_INCR,
|
||||
S2C(session)->hazard_max));
|
||||
}
|
||||
|
||||
if (hp->page != NULL)
|
||||
continue;
|
||||
|
||||
@@ -114,11 +120,9 @@ void
|
||||
__wt_hazard_clear(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
{
|
||||
WT_BTREE *btree;
|
||||
WT_CONNECTION_IMPL *conn;
|
||||
WT_HAZARD *hp;
|
||||
|
||||
btree = session->btree;
|
||||
conn = S2C(session);
|
||||
|
||||
/* If a file can never be evicted, hazard references aren't required. */
|
||||
if (F_ISSET(btree, WT_BTREE_NO_HAZARD))
|
||||
@@ -132,7 +136,7 @@ __wt_hazard_clear(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
|
||||
/* Clear the caller's hazard pointer. */
|
||||
for (hp = session->hazard;
|
||||
hp < session->hazard + conn->hazard_size; ++hp)
|
||||
hp < session->hazard + session->hazard_size; ++hp)
|
||||
if (hp->page == page) {
|
||||
/*
|
||||
* Check to see if the page has grown too big and force
|
||||
@@ -180,15 +184,12 @@ __wt_hazard_clear(WT_SESSION_IMPL *session, WT_PAGE *page)
|
||||
void
|
||||
__wt_hazard_close(WT_SESSION_IMPL *session)
|
||||
{
|
||||
WT_CONNECTION_IMPL *conn;
|
||||
WT_HAZARD *hp;
|
||||
int found;
|
||||
|
||||
conn = S2C(session);
|
||||
|
||||
/* Check for a set hazard reference and complain if we find one. */
|
||||
for (found = 0, hp = session->hazard;
|
||||
hp < session->hazard + conn->hazard_size; ++hp)
|
||||
hp < session->hazard + session->hazard_size; ++hp)
|
||||
if (hp->page != NULL) {
|
||||
__wt_errx(session,
|
||||
"session %p: hazard reference table not empty: "
|
||||
@@ -212,7 +213,7 @@ __wt_hazard_close(WT_SESSION_IMPL *session)
|
||||
* evicted.
|
||||
*/
|
||||
for (hp = session->hazard;
|
||||
hp < session->hazard + conn->hazard_size; ++hp)
|
||||
hp < session->hazard + session->hazard_size; ++hp)
|
||||
if (hp->page != NULL)
|
||||
__wt_hazard_clear(session, hp->page);
|
||||
|
||||
@@ -230,13 +231,10 @@ __wt_hazard_close(WT_SESSION_IMPL *session)
|
||||
static void
|
||||
__hazard_dump(WT_SESSION_IMPL *session)
|
||||
{
|
||||
WT_CONNECTION_IMPL *conn;
|
||||
WT_HAZARD *hp;
|
||||
|
||||
conn = S2C(session);
|
||||
|
||||
for (hp = session->hazard;
|
||||
hp < session->hazard + conn->hazard_size; ++hp)
|
||||
hp < session->hazard + session->hazard_size; ++hp)
|
||||
if (hp->page != NULL)
|
||||
__wt_errx(session,
|
||||
"session %p: hazard reference %p: %s, line %d",
|
||||
|
||||
@@ -35,13 +35,10 @@ __wt_session_dump_all(WT_SESSION_IMPL *session)
|
||||
void
|
||||
__wt_session_dump(WT_SESSION_IMPL *session)
|
||||
{
|
||||
WT_CONNECTION_IMPL *conn;
|
||||
WT_CURSOR *cursor;
|
||||
WT_HAZARD *hp;
|
||||
int first;
|
||||
|
||||
conn = S2C(session);
|
||||
|
||||
(void)__wt_msg(session, "session: %s%s%p",
|
||||
session->name == NULL ? "" : session->name,
|
||||
session->name == NULL ? "" : " ", session);
|
||||
@@ -55,7 +52,7 @@ __wt_session_dump(WT_SESSION_IMPL *session)
|
||||
|
||||
first = 0;
|
||||
for (hp = session->hazard;
|
||||
hp < session->hazard + conn->hazard_size; ++hp) {
|
||||
hp < session->hazard + session->hazard_size; ++hp) {
|
||||
if (hp->page == NULL)
|
||||
continue;
|
||||
if (++first == 1)
|
||||
|
||||
@@ -146,6 +146,8 @@ __wt_stat_alloc_connection_stats(WT_SESSION_IMPL *session, WT_CONNECTION_STATS *
|
||||
stats->txn_ancient.desc = "ancient transactions";
|
||||
stats->txn_begin.desc = "transactions";
|
||||
stats->txn_commit.desc = "transactions committed";
|
||||
stats->txn_fail_cache.desc =
|
||||
"transaction failures due to cache overflow";
|
||||
stats->txn_rollback.desc = "transactions rolled-back";
|
||||
|
||||
*statsp = stats;
|
||||
@@ -177,5 +179,6 @@ __wt_stat_clear_connection_stats(WT_STATS *stats_arg)
|
||||
stats->txn_ancient.v = 0;
|
||||
stats->txn_begin.v = 0;
|
||||
stats->txn_commit.v = 0;
|
||||
stats->txn_fail_cache.v = 0;
|
||||
stats->txn_rollback.v = 0;
|
||||
}
|
||||
|
||||
@@ -74,11 +74,10 @@ __wt_txn_get_snapshot(WT_SESSION_IMPL *session, wt_txnid_t max_id)
|
||||
conn = S2C(session);
|
||||
txn = &session->txn;
|
||||
txn_global = &conn->txn_global;
|
||||
oldest_snap_min = WT_TXN_ABORTED;
|
||||
|
||||
do {
|
||||
/* Take a copy of the current session ID. */
|
||||
current_id = txn_global->current;
|
||||
current_id = oldest_snap_min = txn_global->current;
|
||||
|
||||
/* Copy the array of concurrent transactions. */
|
||||
WT_ORDERED_READ(session_cnt, conn->session_cnt);
|
||||
@@ -93,6 +92,12 @@ __wt_txn_get_snapshot(WT_SESSION_IMPL *session, wt_txnid_t max_id)
|
||||
else if (max_id == WT_TXN_NONE || TXNID_LT(id, max_id))
|
||||
txn->snapshot[n++] = id;
|
||||
}
|
||||
|
||||
/*
|
||||
* Ensure the snapshot reads are scheduled before re-checking
|
||||
* the global current ID.
|
||||
*/
|
||||
WT_READ_BARRIER();
|
||||
} while (current_id != txn_global->current);
|
||||
|
||||
__txn_sort_snapshot(session, n,
|
||||
@@ -116,11 +121,10 @@ __wt_txn_get_evict_snapshot(WT_SESSION_IMPL *session)
|
||||
|
||||
conn = S2C(session);
|
||||
txn_global = &conn->txn_global;
|
||||
oldest_snap_min = WT_TXN_ABORTED;
|
||||
|
||||
do {
|
||||
/* Take a copy of the current session ID. */
|
||||
current_id = txn_global->current;
|
||||
current_id = oldest_snap_min = txn_global->current;
|
||||
|
||||
/* Walk the array of concurrent transactions. */
|
||||
WT_ORDERED_READ(session_cnt, conn->session_cnt);
|
||||
@@ -128,6 +132,12 @@ __wt_txn_get_evict_snapshot(WT_SESSION_IMPL *session)
|
||||
if ((id = s->snap_min) != WT_TXN_NONE &&
|
||||
TXNID_LT(id, oldest_snap_min))
|
||||
oldest_snap_min = id;
|
||||
|
||||
/*
|
||||
* Ensure the snapshot reads are scheduled before re-checking
|
||||
* the global current ID.
|
||||
*/
|
||||
WT_READ_BARRIER();
|
||||
} while (current_id != txn_global->current);
|
||||
|
||||
__txn_sort_snapshot(session, 0, oldest_snap_min, oldest_snap_min);
|
||||
@@ -169,8 +179,26 @@ __wt_txn_begin(WT_SESSION_IMPL *session, const char *cfg[])
|
||||
F_SET(txn, TXN_RUNNING);
|
||||
|
||||
do {
|
||||
/* Take a copy of the current session ID. */
|
||||
txn->id = txn_global->current;
|
||||
/*
|
||||
* Allocate a transaction ID.
|
||||
*
|
||||
* We use an atomic increment to ensure that we get a unique
|
||||
* ID, then publish that to the global state table.
|
||||
*
|
||||
* If two threads race to allocate an ID, only the latest ID
|
||||
* will proceed. The winning thread can be sure its snapshot
|
||||
* contains all of the earlier active IDs. Threads that race
|
||||
* race and get an earlier ID may not appear in the snapshot,
|
||||
* but they will loop and allocate a new ID before proceeding
|
||||
* to make any updates.
|
||||
*
|
||||
* This potentially wastes transaction IDs when threads race to
|
||||
* begin transactions, but that is the price we pay to keep
|
||||
* this path latch free.
|
||||
*/
|
||||
do {
|
||||
txn->id = WT_ATOMIC_ADD(txn_global->current, 1);
|
||||
} while (txn->id == WT_TXN_NONE || txn->id == WT_TXN_ABORTED);
|
||||
WT_PUBLISH(txn_state->id, txn->id);
|
||||
|
||||
/*
|
||||
@@ -200,8 +228,13 @@ __wt_txn_begin(WT_SESSION_IMPL *session, const char *cfg[])
|
||||
session, n, txn->id, oldest_snap_min);
|
||||
txn_state->snap_min = txn->snap_min;
|
||||
}
|
||||
} while (!WT_ATOMIC_CAS(txn_global->current, txn->id, txn->id + 1) ||
|
||||
txn->id == WT_TXN_NONE || txn->id == WT_TXN_ABORTED);
|
||||
|
||||
/*
|
||||
* Ensure the snapshot reads are scheduled before re-checking
|
||||
* the global current ID.
|
||||
*/
|
||||
WT_READ_BARRIER();
|
||||
} while (txn->id != txn_global->current);
|
||||
|
||||
return (0);
|
||||
}
|
||||
@@ -223,7 +256,8 @@ __wt_txn_release(WT_SESSION_IMPL *session)
|
||||
/* Clear the transaction's ID from the global table. */
|
||||
WT_ASSERT(session, txn_state->id != WT_TXN_NONE &&
|
||||
txn->id != WT_TXN_NONE);
|
||||
txn_state->id = txn_state->snap_min = WT_TXN_NONE;
|
||||
WT_PUBLISH(txn_state->id, WT_TXN_NONE);
|
||||
txn_state->snap_min = WT_TXN_NONE;
|
||||
|
||||
/* Reset the transaction state to not running. */
|
||||
txn->id = WT_TXN_NONE;
|
||||
|
||||
@@ -511,6 +511,25 @@ __wt_checkpoint(WT_SESSION_IMPL *session, const char *cfg[])
|
||||
"or verify operations");
|
||||
}
|
||||
|
||||
/*
|
||||
* If an object has never been used (in other words, if it could become
|
||||
* a bulk-loaded file), then we must fake the checkpoint. This is good
|
||||
* because we don't write physical checkpoint blocks for just-created
|
||||
* files, but it's not just a good idea. The reason is because deleting
|
||||
* a physical checkpoint requires writing the file, and fake checkpoints
|
||||
* can't write the file. If you (1) create a physical checkpoint for an
|
||||
* empty file which writes blocks, (2) start bulk-loading records into
|
||||
* the file, (3) during the bulk-load perform another checkpoint with
|
||||
* the same name; in order to keep from having two checkpoints with the
|
||||
* same name you would have to use the bulk-load's fake checkpoint to
|
||||
* delete a physical checkpoint, and that will end in tears.
|
||||
*/
|
||||
if (is_checkpoint)
|
||||
if (btree->bulk_load_ok) {
|
||||
track_ckpt = 0;
|
||||
goto fake;
|
||||
}
|
||||
|
||||
/*
|
||||
* Mark the root page dirty to ensure something gets written.
|
||||
*
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
|
||||
noinst_PROGRAMS = t
|
||||
t_SOURCES = test_bloom.c
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir)
|
||||
AM_CPPFLAGS = -I$(top_builddir)
|
||||
|
||||
noinst_PROGRAMS = t
|
||||
t_LDADD = $(top_builddir)/libwiredtiger.la
|
||||
|
||||
@@ -17,7 +17,7 @@ obj_bulk(void)
|
||||
if ((ret = conn->open_session(conn, NULL, NULL, &session)) != 0)
|
||||
die("conn.session", ret);
|
||||
|
||||
if ((ret = session->create(session, uri, NULL)) != 0)
|
||||
if ((ret = session->create(session, uri, config)) != 0)
|
||||
if (ret != EEXIST && ret != EBUSY)
|
||||
die("session.create", ret);
|
||||
|
||||
@@ -44,7 +44,7 @@ obj_create(void)
|
||||
if ((ret = conn->open_session(conn, NULL, NULL, &session)) != 0)
|
||||
die("conn.session", ret);
|
||||
|
||||
if ((ret = session->create(session, uri, NULL)) != 0)
|
||||
if ((ret = session->create(session, uri, config)) != 0)
|
||||
if (ret != EEXIST && ret != EBUSY)
|
||||
die("session.create", ret);
|
||||
|
||||
|
||||
@@ -10,6 +10,7 @@
|
||||
WT_CONNECTION *conn; /* WiredTiger connection */
|
||||
u_int nops; /* Operations */
|
||||
const char *uri; /* Object */
|
||||
const char *config; /* Object config */
|
||||
|
||||
static char *progname; /* Program name */
|
||||
static FILE *logfp; /* Log file */
|
||||
@@ -28,8 +29,11 @@ main(int argc, char *argv[])
|
||||
u_int nthreads;
|
||||
int ch, cnt, runs;
|
||||
char *config_open;
|
||||
const char **objp;
|
||||
const char **confp, **objp;
|
||||
const char *objs[] = { "file:__wt", "table:__wt", "lsm:__wt", NULL };
|
||||
/* LSM needs configuration or it fails the minimum cache size check. */
|
||||
const char *configs[] = { NULL, NULL,
|
||||
"lsm_chunk_size=1m,lsm_merge_max=2,leaf_page_max=256k", NULL };
|
||||
|
||||
if ((progname = strrchr(argv[0], '/')) == NULL)
|
||||
progname = argv[0];
|
||||
@@ -78,8 +82,10 @@ main(int argc, char *argv[])
|
||||
for (cnt = 1; runs == 0 || cnt <= runs; ++cnt) {
|
||||
shutdown(); /* Clean up previous runs */
|
||||
|
||||
for (objp = objs; *objp != NULL; objp++) {
|
||||
for (objp = objs, confp = configs; *objp != NULL;
|
||||
objp++, confp++) {
|
||||
uri = *objp;
|
||||
config = *confp;
|
||||
printf("%5d: %u threads on %s\n", cnt, nthreads, uri);
|
||||
wt_startup(config_open);
|
||||
if (fop_start(nthreads))
|
||||
@@ -104,15 +110,16 @@ wt_startup(char *config_open)
|
||||
NULL
|
||||
};
|
||||
int ret;
|
||||
char config[128];
|
||||
char config_buf[128];
|
||||
|
||||
snprintf(config, sizeof(config),
|
||||
snprintf(config_buf, sizeof(config_buf),
|
||||
"create,error_prefix=\"%s\",cache_size=5MB%s%s",
|
||||
progname,
|
||||
config_open == NULL ? "" : ",",
|
||||
config_open == NULL ? "" : config_open);
|
||||
|
||||
if ((ret = wiredtiger_open(NULL, &event_handler, config, &conn)) != 0)
|
||||
if ((ret = wiredtiger_open(
|
||||
NULL, &event_handler, config_buf, &conn)) != 0)
|
||||
die("wiredtiger_open", ret);
|
||||
}
|
||||
|
||||
|
||||
@@ -26,6 +26,7 @@ extern WT_CONNECTION *conn; /* WiredTiger connection */
|
||||
extern u_int nops; /* Operations per thread */
|
||||
|
||||
extern const char *uri; /* Object */
|
||||
extern const char *config; /* Object config */
|
||||
|
||||
#if defined (__GNUC__)
|
||||
void die(const char *, int) __attribute__((noreturn));
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
BDB = $(top_builddir)/db
|
||||
INCLUDES = -I$(top_builddir) -I$(BDB)
|
||||
AM_CPPFLAGS = -I$(top_builddir) -I$(BDB)
|
||||
|
||||
noinst_PROGRAMS = t
|
||||
noinst_SCRIPTS = s_dumpcmp
|
||||
|
||||
@@ -75,14 +75,14 @@ wts_open(void)
|
||||
die(ret, "connection.open_session");
|
||||
|
||||
maxintlpage = 1U << g.c_intl_page_max;
|
||||
/* Make sure at least 2 internal page per thread can fix in cache. */
|
||||
/* Make sure at least 2 internal page per thread can fit in cache. */
|
||||
while (2 * g.c_threads * maxintlpage > g.c_cache << 20)
|
||||
maxintlpage >>= 1;
|
||||
maxintlitem = MMRAND(maxintlpage / 50, maxintlpage / 40);
|
||||
if (maxintlitem < 40)
|
||||
maxintlitem = 40;
|
||||
maxleafpage = 1U << g.c_leaf_page_max;
|
||||
/* Make sure at least one leaf page per thread can fix in cache. */
|
||||
/* Make sure at least one leaf page per thread can fit in cache. */
|
||||
while (g.c_threads * (maxintlpage + maxleafpage) > g.c_cache << 20)
|
||||
maxleafpage >>= 1;
|
||||
maxleafitem = MMRAND(maxleafpage / 50, maxleafpage / 40);
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
|
||||
|
||||
noinst_PROGRAMS = t
|
||||
t_SOURCES = salvage.c
|
||||
|
||||
58
test/suite/test_bug003.py
Normal file
58
test/suite/test_bug003.py
Normal file
@@ -0,0 +1,58 @@
|
||||
#!/usr/bin/env python
|
||||
#
|
||||
# Public Domain 2008-2012 WiredTiger, Inc.
|
||||
#
|
||||
# This is free and unencumbered software released into the public domain.
|
||||
#
|
||||
# Anyone is free to copy, modify, publish, use, compile, sell, or
|
||||
# distribute this software, either in source code form or as a compiled
|
||||
# binary, for any purpose, commercial or non-commercial, and by any
|
||||
# means.
|
||||
#
|
||||
# In jurisdictions that recognize copyright laws, the author or authors
|
||||
# of this software dedicate any and all copyright interest in the
|
||||
# software to the public domain. We make this dedication for the benefit
|
||||
# of the public at large and to the detriment of our heirs and
|
||||
# successors. We intend this dedication to be an overt act of
|
||||
# relinquishment in perpetuity of all present and future rights to this
|
||||
# software under copyright law.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
# IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
||||
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
||||
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
||||
# OTHER DEALINGS IN THE SOFTWARE.
|
||||
#
|
||||
# test_bug003.py
|
||||
# Regression tests.
|
||||
|
||||
import wiredtiger, wttest
|
||||
from wtscenario import multiply_scenarios, number_scenarios
|
||||
|
||||
# Regression tests.
|
||||
class test_bug003(wttest.WiredTigerTestCase):
|
||||
types = [
|
||||
('file', dict(uri='file:data')),
|
||||
('table', dict(uri='table:data')),
|
||||
]
|
||||
ckpt = [
|
||||
('no', dict(name=0)),
|
||||
('yes', dict(name=1)),
|
||||
]
|
||||
|
||||
scenarios = number_scenarios(multiply_scenarios('.', types, ckpt))
|
||||
|
||||
# Confirm bulk-load isn't stopped by checkpoints.
|
||||
def test_bug003(self):
|
||||
self.session.create(self.uri, "key_format=S,value_format=S")
|
||||
if self.name == 1:
|
||||
self.session.checkpoint("name=ckpt")
|
||||
else:
|
||||
self.session.checkpoint()
|
||||
cursor = self.session.open_cursor(self.uri, None, "bulk")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
wttest.run()
|
||||
@@ -48,7 +48,7 @@ class test_cursor03(TestCursorTracker):
|
||||
('col.val10k', dict(tablekind='col', keysize=None, valsize=[10, 10000], uri='table')),
|
||||
('row.keyval10k', dict(tablekind='row', keysize=[10,10000], valsize=[10, 10000], uri='table')),
|
||||
], [
|
||||
('count1000', dict(tablecount=1000,cache_size=20*1024*1024)),
|
||||
('count1000', dict(tablecount=1000,cache_size=25*1024*1024)),
|
||||
('count10000', dict(tablecount=10000, cache_size=64*1024*1024))
|
||||
])
|
||||
|
||||
|
||||
@@ -137,14 +137,6 @@ class test_cursor04(wttest.WiredTigerTestCase):
|
||||
cursor.set_key(self.genkey(self.nentries))
|
||||
self.assertEqual(cursor.search(), wiredtiger.WT_NOTFOUND)
|
||||
|
||||
# The key/value should be cleared on NOTFOUND
|
||||
keymsg = 'cursor.get_key: requires key be set: Invalid argument\n'
|
||||
valuemsg = 'cursor.get_value: requires value be set: Invalid argument\n'
|
||||
self.assertRaisesWithMessage(wiredtiger.WiredTigerError,
|
||||
cursor.get_key, keymsg)
|
||||
self.assertRaisesWithMessage(wiredtiger.WiredTigerError,
|
||||
cursor.get_value, valuemsg)
|
||||
|
||||
# 2. Calling search_near for a value beyond the end
|
||||
cursor.set_key(self.genkey(self.nentries))
|
||||
cmp = cursor.search_near()
|
||||
|
||||
69
test/suite/test_nocache.py
Normal file
69
test/suite/test_nocache.py
Normal file
@@ -0,0 +1,69 @@
|
||||
#!/usr/bin/env python
|
||||
#
|
||||
# Public Domain 2008-2012 WiredTiger, Inc.
|
||||
#
|
||||
# This is free and unencumbered software released into the public domain.
|
||||
#
|
||||
# Anyone is free to copy, modify, publish, use, compile, sell, or
|
||||
# distribute this software, either in source code form or as a compiled
|
||||
# binary, for any purpose, commercial or non-commercial, and by any
|
||||
# means.
|
||||
#
|
||||
# In jurisdictions that recognize copyright laws, the author or authors
|
||||
# of this software dedicate any and all copyright interest in the
|
||||
# software to the public domain. We make this dedication for the benefit
|
||||
# of the public at large and to the detriment of our heirs and
|
||||
# successors. We intend this dedication to be an overt act of
|
||||
# relinquishment in perpetuity of all present and future rights to this
|
||||
# software under copyright law.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
# IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
||||
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
||||
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
||||
# OTHER DEALINGS IN THE SOFTWARE.
|
||||
|
||||
import wiredtiger, wttest
|
||||
from helper import simple_populate, key_populate, value_populate
|
||||
|
||||
# Test no-cache flag.
|
||||
class test_no_cache(wttest.WiredTigerTestCase):
|
||||
name = 'no_cache'
|
||||
|
||||
scenarios = [
|
||||
('file', dict(type='file:')),
|
||||
('table', dict(type='table:'))
|
||||
]
|
||||
|
||||
# Create an object, and run an uncached cursor through it.
|
||||
def test_no_cache(self):
|
||||
uri = self.type + self.name
|
||||
simple_populate(self, uri, 'key_format=S,leaf_page_max=512', 10000)
|
||||
cursor = self.session.open_cursor(uri, None, "no_cache")
|
||||
i = 0
|
||||
for key,val in cursor:
|
||||
i += 1
|
||||
self.assertEqual(key, key_populate(cursor, i))
|
||||
self.assertEqual(val, value_populate(cursor, i))
|
||||
|
||||
# Create an object, and run an uncached cursor through part of it to
|
||||
# confirm that we release the full stack on an uncached cursor.
|
||||
def test_no_cache_partial(self):
|
||||
uri = self.type + self.name
|
||||
simple_populate(self, uri, 'key_format=S,leaf_page_max=512', 10000)
|
||||
cursor = self.session.open_cursor(uri, None, "no_cache")
|
||||
i = 0
|
||||
for key,val in cursor:
|
||||
i += 1
|
||||
if i > 2000:
|
||||
break;
|
||||
self.assertEqual(key, key_populate(cursor, i))
|
||||
self.assertEqual(val, value_populate(cursor, i))
|
||||
cursor.close()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
wttest.run()
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
INCLUDES = -I$(top_builddir)
|
||||
AM_CPPFLAGS = -I$(top_builddir)
|
||||
|
||||
noinst_PROGRAMS = t
|
||||
t_LDADD = $(top_builddir)/libwiredtiger.la
|
||||
|
||||
Reference in New Issue
Block a user