Compare commits

..

84 Commits
1.3.1 ... 1.3.2

Author SHA1 Message Date
Michael Cahill
21d8cc8e5a Cut release 1.3.2. 2012-10-03 22:00:22 +10:00
Keith Bostic
5dcd9fb036 Make __wt_debug_off() work again (need to ignore checksum, and it's no longer
a 0 checksum that's ignored).
2012-10-03 07:09:07 -04:00
Michael Cahill
0c2ce66c42 When checking the cache before an update, use the same logic as for reads: don't wait if holding the schema lock. 2012-10-03 18:23:45 +10:00
Michael Cahill
899d510f98 warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' 2012-10-03 18:18:43 +10:00
Michael Cahill
71b28462ad spelling 2012-10-03 18:18:41 +10:00
Michael Cahill
25b230ac26 unused #define 2012-10-03 18:18:39 +10:00
Alex Gorrod
7ec2cf5188 Bump the cache size in test_cursor03 so it passes the LSM sanity check. 2012-10-03 17:10:30 +10:00
Alex Gorrod
3531efa5d5 Add sanity check of cache size to LSM open. 2012-10-03 16:48:25 +10:00
Michael Cahill
aad3968aa3 cur_std.c:308:3: error: 'buf' may be used uninitialized in this function 2012-10-03 16:03:43 +10:00
Michael Cahill
7341e2c04c Merge pull request #347 from wiredtiger/cursor-setkey-opt
Save two strcmp calls in fast pass cursor set_key. Tidy some flags.
2012-10-02 22:55:44 -07:00
Michael Cahill
b207fd3776 typo 2012-10-03 15:32:22 +10:00
Michael Cahill
5020de4a73 Add a fast path for setting fixed-length column store values. 2012-10-03 15:20:54 +10:00
agorrod
8754293bb2 Merge pull request #346 from wiredtiger/lsm-tuning
LSM tuning
2012-10-02 22:03:42 -07:00
Alex Gorrod
a596d6dd72 Save two strcmp calls in fast pass cursor set_key. Tidy some flags. 2012-10-03 14:59:24 +10:00
Michael Cahill
2afc140a8f typo 2012-10-03 14:44:22 +10:00
Michael Cahill
4e5d0ed8d0 Only sleep in the LSM checkpoint thread if no work is done. 2012-10-03 14:37:42 +10:00
Michael Cahill
46dde506be Fix the record count calculation for minor merges.
This was leading to no Bloom filter being created for minor merges after running for some time, leading to merges taking increasingly long to complete.
2012-10-03 14:37:42 +10:00
Michael Cahill
475ed5727f Don't try to write checkpoints in the LSM merge worker thread. Just merge as soon as enough chunks are on disk. 2012-10-03 14:37:42 +10:00
Michael Cahill
2f0a526576 In files marked as "out of cache", don't wait for eviction when reading a page. 2012-10-03 14:37:42 +10:00
Michael Cahill
ffbc1d7385 Pause updates when the cache is full. 2012-10-03 14:37:42 +10:00
Michael Cahill
5bc8353ad8 Take care with the loop termination when walking files for eviction.
We were making one extra call into __wt_tree_walk, which would leave a leaf
page in the WT_REF_EVICT_WALK state, unable to be evicted.  In some workloads,
including LSM loads, we could end up with many files all consisting of a single
leaf page, none of which could be evicted.
2012-10-02 18:05:27 +10:00
Alex Gorrod
d6c9313b9f Fix a crash when opening a LSM tree with an invalid configuration. 2012-10-02 13:45:02 +10:00
Keith Bostic
c47b927b26 Create fake checkpoints until an object is modified, so that a checkpoint
between the cursor create and the bulk load doesn't make it impossible to
do a bulk-load on the cursor.  Closes #338.
2012-09-28 12:53:58 +00:00
Keith Bostic
11348d6db2 whitespace 2012-09-28 07:25:05 +00:00
Alex Gorrod
444e8a9362 Fix a bug in LSM - we were copying too many entries when doing minor
merges.
2012-09-28 17:40:25 +10:00
agorrod
c8a4341d85 Merge pull request #343 from wiredtiger/evict-fail
If no space can be allocated in the cache, abort transactions.
2012-09-28 00:17:44 -07:00
Michael Cahill
abd79f526b Add documentation describing that application-supplied keys and values are not
cleared by failures in operations that expect them as inputs.  Also note that
WT_DEADLOCK can be returned if a resource cannot be allocated (such as when the
cache cannot hold the updates required to satisfy all active readers).
2012-09-28 16:56:14 +10:00
Michael Cahill
1b70668459 Merge pull request #342 from wiredtiger/lsm-metadata-lock2
Have LSM trees use the metadata lock to protect their metadata.
2012-09-27 23:36:05 -07:00
Alex Gorrod
f4eb5f5ae4 Checkpoints required the schema lock, not the metadata lock. 2012-09-28 16:09:49 +10:00
Michael Cahill
760d924220 Calling get_key after a failed search is now permitted. 2012-09-28 15:42:33 +10:00
Michael Cahill
fc0bfe8cf8 Clear the cursor position in next, prev and reset, now that we no longer clear it on error. 2012-09-28 15:33:38 +10:00
Alex Gorrod
106781ae4d Take the metadata lock, not the shcema lock when updating the metadata. 2012-09-28 15:28:13 +10:00
Michael Cahill
dfa2ced70b Never wait for eviction when holding the schema lock.
This avoids deadlocks between opening a column store file and taking a
checkpoint.  The checkpoint blocks eviction before it starts, then waits for
the schema lock.  Opening a column store holds the schema lock and tries to
read the last page of the tree.  If the cache is full, neither can make
progress.

The same deadlock could conceivably happen if the metadata tree grew big enough
to have multiple leaf pages.
2012-09-28 15:01:02 +10:00
Alex Gorrod
0c513f52de Update LSM code to use the new metadata lock when appropriate. 2012-09-28 14:49:02 +10:00
Michael Cahill
560728aea3 Add a read barrier when setting up a snapshot before re-checkint the current ID. 2012-09-28 14:09:52 +10:00
Michael Cahill
d3653388d3 Confirm that we can retry a deadlocked transaction with the same key/value pair
in the cursor: no new key or value will be set until the end of a successful operation.
2012-09-28 13:37:56 +10:00
Michael Cahill
83f31ca7cc Merge branch 'develop' into evict-fail 2012-09-28 13:33:57 +10:00
Alex Gorrod
7b69ec27f8 Fix a copy-paste bug in the lsm code. 2012-09-28 09:00:16 +10:00
Keith Bostic
82da24ab69 Simplify the code handling updated records in variable-length column-store
reconciliation.  I can't think of any reason to read-ahead through the
WT_INSERT list looking for visible update records, we can check for record
visibility when we reach the appropriate record.
2012-09-27 11:53:42 +00:00
Keith Bostic
237fa82236 typo 2012-09-27 10:11:44 +00:00
Keith Bostic
aa2d8d54f6 whitespace 2012-09-27 08:29:33 +00:00
Keith Bostic
596dc261cc Quiet some warnings. 2012-09-27 07:45:06 +00:00
Michael Cahill
4b149e5d7e Propagate eviction failures out to __wt_page_in_func:
it now decides whether to give up.

Have auto-commit transactions retry deadlocks.  This requires that we keep the user's key and value in the cursor.
2012-09-27 17:42:28 +10:00
Michael Cahill
8af4f55089 Add a comment explaining how transaction IDs are allocated. 2012-09-27 16:20:42 +10:00
Michael Cahill
50a82caf47 Switch to an atomic add to allocate transaction IDs.
This fixes a subtle race before where two threads could temporarily have the
same ID in the global state table.  If one of the threads timed out and the
other thread committed its transaction with that ID, the commit would not
become visible immediately.  This could lead to deadlock errors in workloads
that are logically conflict-free.
2012-09-27 15:30:12 +10:00
Michael Cahill
830b4408ac Fix an off-by-one error in the check for obsolete transaction IDs.
refs #310
2012-09-27 12:41:02 +10:00
Alex Gorrod
fd19bf22ef Fix the LSM checkpoint thread to work when a merge happens in parallel. 2012-09-27 11:41:07 +10:00
Keith Bostic
00571007ad Forgot to destroy the metadata spinlock. 2012-09-26 20:54:44 -04:00
Alex Gorrod
133df7fe40 Add a LSM_WORKER structure to save passing as many context paramaters to
helper functions.
2012-09-27 09:30:19 +10:00
Michael Cahill
e1436289dc Merge pull request #341 from wiredtiger/lsm-checkpoint-thread
Move LSM checkpoints of new chunks into a separate thread.
2012-09-26 15:46:43 -07:00
Alex Gorrod
97ddb19dc5 Bug fix in LSM checkpoint thread - initialize nchunks. 2012-09-26 21:53:42 +10:00
Alex Gorrod
563cb4c1fa Tidy and bug fix LSM checkpoint thread implementation. 2012-09-26 21:27:17 +10:00
Alex Gorrod
3b6b18eb91 Merge branch 'develop' into lsm-checkpoint-thread 2012-09-26 20:46:35 +10:00
agorrod
fcf7d77313 Merge pull request #339 from wiredtiger/lsm-minor-merge
Implement minor merges for LSM trees, prefer them to major merges.
2012-09-26 03:37:19 -07:00
Keith Bostic
27e8b3e403 fix a couple of lint warnings. 2012-09-26 07:58:44 +00:00
Michael Cahill
53fdb96878 Merge branch 'develop' into lsm-minor-merge
Conflicts:
	src/lsm/lsm_merge.c
2012-09-26 17:30:08 +10:00
Michael Cahill
d1a3194794 Merge pull request #337 from wiredtiger/nocache
Add support for cursors that operate outside of cache and have LSM merges use them.
2012-09-26 00:16:27 -07:00
Michael Cahill
95eed0bb99 Make the maximum number of chunks for merges configurable, rather than deriving a value from the number of hazard references available. 2012-09-26 16:59:05 +10:00
Michael Cahill
41e4f7a105 Add a comment explaining the search for a minor merge. 2012-09-26 16:48:44 +10:00
Alex Gorrod
ff48224c38 Merge branch 'develop' into lsm-checkpoint-thread 2012-09-26 16:45:45 +10:00
Michael Cahill
f7af15fc88 Add a comment about keeping the count of eviction failures. 2012-09-26 01:53:38 -04:00
Michael Cahill
4289f0a1bf Maintain the count of eviction failures across the special transaction context for eviction. 2012-09-26 01:50:25 -04:00
Alex Gorrod
57cf5c5693 Fix a bug where verify could crash if an empty checkpoint exists.
Empty checkpoints can be created if an explicit checkpoint is run
while a bulk load is in progress.
2012-09-26 15:28:40 +10:00
agorrod
4620a3c2e5 Merge pull request #340 from wiredtiger/cache-full-deadlock
Abort transactions if the cache is so full that they cannot make progress
2012-09-25 22:05:22 -07:00
Alex Gorrod
30cee0a04e Update btree file count to only include open writable files. 2012-09-26 14:57:31 +10:00
Michael Cahill
96c4d80978 Take care with error returns when the cache is full:
* WT_NOTFOUND is expected if no pages have been queued to evict.
 * if an error occurred in a lower-level operation but somehow wasn't
   propagated out, make sure we return WT_DEADLOCK.
2012-09-26 13:53:29 +10:00
Michael Cahill
a12b4b8927 Abort transactions if the cache is so full that they cannot make progress.
This allows WiredTiger to work in caches where the number of pages is small
enough that an old snapshot could block itself from evicting in order to read
new pages.
2012-09-26 11:54:36 +10:00
Alex Gorrod
5eba9718d8 Move hazard_size from the connection to the session. Each session can have a
different count.
Also a couple of bug fixes for growing hazard arrays.
2012-09-26 01:53:42 +00:00
Michael Cahill
ad66a9457f Update the documentation landing page for 1.3.1. 2012-09-26 10:49:11 +10:00
Michael Cahill
eea5629e03 spelling 2012-09-26 10:11:39 +10:00
Michael Cahill
7ec10f1dd8 Implement minor merges for LSM trees, prefer them to major merges. 2012-09-26 09:25:18 +10:00
Michael Cahill
c3f99ef868 Try to avoid test/format configurations that hang in evction.
Also bump the oldest snap_min when an application thread evicts a page: we're
doing the work to calculate a newer value, we might as well use it.
2012-09-26 01:09:06 +10:00
Alex Gorrod
249b9bb0ca Line wrapping and spelling fixes. 2012-09-25 17:53:43 +10:00
Alex Gorrod
de5376d583 Update hazard references, so the active array grows as needed.
Bump default hazard_max to 1000.
2012-09-25 17:41:32 +10:00
Alex Gorrod
f28613804c Make no_cache configuration undocumented. Update calculation for finding
maximum cached page size.
2012-09-25 12:42:57 +10:00
Michael Cahill
0ae9906cff Added tag 1.3.1 for changeset 945a898eb714 2012-09-25 12:16:47 +10:00
Alex Gorrod
8bc7e8e6f9 Add no cache flag to LSM merge cursors. 2012-09-25 10:23:53 +10:00
Alex Gorrod
94541fdfe4 Merge branch 'develop' into nocache 2012-09-25 09:20:17 +10:00
Alex Gorrod
9c0e333511 Update handle cache to deal with no-cache cursors and btree handles.
Update LSM code to use no-cache handles for merges.
2012-09-24 17:38:58 +10:00
Alex Gorrod
81d884b9c5 Merge branch 'develop' into nocache 2012-09-24 11:53:15 +10:00
Alex Gorrod
13e9ae7627 Merge branch 'develop' into nocache. Manually merged. 2012-09-24 11:52:39 +10:00
Alex
829fbafa85 Implement part of LSM checkpoint worker thread implementation. 2012-09-24 00:44:33 +00:00
Alex Gorrod
c644dab8ea Merge branch 'develop' into nocache 2012-09-20 16:12:28 +10:00
Keith Bostic
74f4280476 Add partial support for no-cache files -- this works with two caveats:
first this code maintains a full stack of tree hazard references, and
so I removed the test against the maxleafpage size in the forced
eviction code to avoid running out of hazard reference sizes; second,
each no-cache cursor blocks all other cursor references to the object,
which isn't going to be OK for real use.
maintain a stack of hazard references
2012-09-17 07:53:01 +00:00
82 changed files with 1056 additions and 460 deletions

View File

@@ -11,3 +11,4 @@ ef844093bec2ac38945fd04487dc3a051f4b9136 1.1.5
9046bcab74eba90a2cb05af28026ec4a74e4fb9c 1.2.1
50cb97d00c6238ebef64e290616e8cec9995687f 1.2.2
ef3ccde04cb28060319be900a2d31c88071933f6 1.3.0
945a898eb714bb8d46c088928d81b2135eefc18e 1.3.1

59
NEWS
View File

@@ -1,3 +1,62 @@
WiredTiger release 1.3.2, 2012-10-03
------------------------------------
This is a bugfix and performance tuning release, primarily related to LSM
trees. The changes are as follows:
* Implement minor merges for LSM trees, prefer them to major merges.
* Update hazard references, so the active array grows as needed. Change
the default hazard_max to 1000.
* Abort transactions if the cache is so full that they cannot make
progress.
* Fix a bug where verify could crash if an empty checkpoint exists.
* Make the maximum number of chunks for merges configurable, rather than
deriving a value from the number of hazard references available.
* Switch to an atomic add to allocate transaction IDs. This fixes a subtle
race before where two threads could temporarily have the same ID in the
global state table. If one of the threads timed out and the other thread
committed its transaction with that ID, the commit would not become
visible immediately. This could lead to deadlock errors in workloads
that are logically conflict-free.
* Have auto-commit transactions retry deadlocks. This requires that we
keep the user's key and value in the cursor.
* Simplify the code handling updated records in variable-length
column-store reconciliation.
* Never wait for eviction when holding the schema lock. This avoids
deadlocks between opening a column store file and taking a checkpoint.
* Take care with the loop termination when walking files for eviction. We
were making one extra call into __wt_tree_walk, which would leave a leaf
page in the WT_REF_EVICT_WALK state, unable to be evicted. In some
workloads, including LSM loads, we could end up with many files all
consisting of a single leaf page, none of which could be evicted.
* Pause updates when the cache is full.
* In files marked as "out of cache", don't wait for eviction when reading a
page.
* Fix the record count calculation for minor merges. This was leading to
no Bloom filter being created for minor merges after running for some
time, leading to merges taking increasingly long to complete.
* Only sleep in the LSM checkpoint thread if no work is done.
* Add sanity check of cache size to LSM open.
[#338] Create fake checkpoints until an object is modified, so that a
checkpoint between the cursor create and the bulk load doesn't make
it impossible to do a bulk-load on the cursor.
WiredTiger release 1.3.1, 2012-09-25
------------------------------------

4
README
View File

@@ -1,6 +1,6 @@
WiredTiger 1.3.1: (September 25, 2012)
WiredTiger 1.3.2: (October 3, 2012)
This is version 1.3.1 of WiredTiger.
This is version 1.3.2 of WiredTiger.
WiredTiger documentation can be found at:

View File

@@ -1,6 +1,6 @@
WIREDTIGER_VERSION_MAJOR=1
WIREDTIGER_VERSION_MINOR=3
WIREDTIGER_VERSION_PATCH=1
WIREDTIGER_VERSION_PATCH=2
WIREDTIGER_VERSION="$WIREDTIGER_VERSION_MAJOR.$WIREDTIGER_VERSION_MINOR.$WIREDTIGER_VERSION_PATCH"
WIREDTIGER_RELEASE_DATE=`date "+%B %e, %Y"`

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir)
AM_CPPFLAGS = -I$(top_builddir)
LDADD = $(top_builddir)/libwiredtiger.la
noinst_PROGRAMS = wttest

View File

@@ -31,7 +31,7 @@ wt_SOURCES =\
src/utilities/util_write.c
include_HEADERS= wiredtiger.h
INCLUDES = -I$(srcdir)/src/include
AM_CPPFLAGS = -I$(srcdir)/src/include
pkgconfigdir = $(libdir)/pkgconfig
pkgconfig_DATA = wiredtiger.pc

View File

@@ -2,8 +2,8 @@ dnl build by dist/s_version
VERSION_MAJOR=1
VERSION_MINOR=3
VERSION_PATCH=1
VERSION_STRING='"WiredTiger 1.3.1: (September 25, 2012)"'
VERSION_PATCH=2
VERSION_STRING='"WiredTiger 1.3.2: (October 3, 2012)"'
AC_SUBST(VERSION_MAJOR)
AC_SUBST(VERSION_MINOR)

View File

@@ -1,2 +1,2 @@
dnl WiredTiger product version for AC_INIT. Maintained by dist/s_version
1.3.1
1.3.2

37
dist/api_data.py vendored
View File

@@ -80,15 +80,18 @@ format_meta = column_meta + [
]
lsm_config = [
Config('lsm_chunk_size', '2MB', r'''
the maximum size of the in-memory chunk of an LSM tree''',
min='512K',max='500MB'),
Config('lsm_bloom_hash_count', '4', r'''
the number of hash values per item used for LSM bloom filters.''',
min='2',max='100'),
min='2', max='100'),
Config('lsm_bloom_bit_count', '8', r'''
the number of bits used per item for LSM bloom filters.''',
min='2',max='1000'),
min='2', max='1000'),
Config('lsm_chunk_size', '2MB', r'''
the maximum size of the in-memory chunk of an LSM tree''',
min='512K', max='500MB'),
Config('lsm_merge_max', '15', r'''
the maximum number of chunks to include in a merge operation''',
min='2', max='100'),
]
# Per-file configuration
@@ -279,16 +282,16 @@ methods = {
number key; valid only for cursors with record number keys''',
type='boolean'),
Config('bulk', 'false', r'''
configure the cursor for bulk loads; bulk-load is a fast
load path for newly created objects and only newly
created objects may be bulk-loaded. Cursors configured
for bulk load only support the WT_CURSOR::insert and
WT_CURSOR::close methods''',
configure the cursor for bulk loads, a fast load path
that may only be used for newly created objects. Cursors
configured for bulk load only support the WT_CURSOR::insert
and WT_CURSOR::close methods''',
type='boolean'),
Config('checkpoint', '', r'''
the name of a checkpoint to open; the reserved checkpoint
name "WiredTigerCheckpoint" opens a cursor on the most recent
internal checkpoint taken for the object'''),
the name of a checkpoint to open (the reserved name
"WiredTigerCheckpoint" opens the most recent internal
checkpoint taken for the object). The cursor does not
support data modification'''),
Config('dump', '', r'''
configure the cursor for dump format inputs and outputs:
"hex" selects a simple hexadecimal format, "print"
@@ -303,6 +306,10 @@ methods = {
and WT_CURSOR::close methods. See @ref cursor_random for
details''',
type='boolean'),
Config('no_cache', 'false', r'''
do not cache pages from the underlying object. The cursor
does not support data modification''',
type='boolean', undoc=True),
Config('overwrite', 'false', r'''
change the behavior of the cursor's insert method to overwrite
previously existing values''',
@@ -409,8 +416,8 @@ methods = {
paths may need quoting, for example,
<code>extensions=("/path/to/ext.so"="entry=my_entry")</code>''',
type='list'),
Config('hazard_max', '30', r'''
number of simultaneous hazard references per session handle''',
Config('hazard_max', '1000', r'''
maximum number of simultaneous hazard references per session handle''',
min='15'),
Config('logging', 'false', r'''
enable logging''',

3
dist/config.py vendored
View File

@@ -94,6 +94,9 @@ for line in open(f, 'r'):
if name == lastname:
continue
lastname = name
if 'undoc' in c.flags:
continue
desc = textwrap.dedent(c.desc) + '.'
desc = desc.replace(',', '\\,')
default = '\\c ' + str(c.default) if c.default or gettype(c) == 'int' \

1
dist/s_define.list vendored
View File

@@ -13,6 +13,7 @@ SIZE_CHECK
TXN_API_CALL
TXN_API_CALL_NOCONF
TXN_API_END
TXNID_LE
WT_BARRIER
WT_BLOCK_DESC_SIZE
WT_DEBUG_BYTE

4
dist/s_string.ok vendored
View File

@@ -247,6 +247,7 @@ btcur
btdsk
btmem
btree
btrees
buf
builtin
bytelock
@@ -262,6 +263,8 @@ cd
cfg
cfkos
checkfrag
checkpointed
checkpointing
checksum
checksums
chk
@@ -688,6 +691,7 @@ uninstantiated
unix
unjams
unlinked
unmerged
unmodify
unpackv
unreferenced

1
dist/stat_data.py vendored
View File

@@ -40,6 +40,7 @@ connection_stats = [
Stat('txn_ancient', 'ancient transactions'),
Stat('txn_begin', 'transactions'),
Stat('txn_commit', 'transactions committed'),
Stat('txn_fail_cache', 'transaction failures due to cache overflow'),
Stat('txn_rollback', 'transactions rolled-back'),
]

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
lib_LTLIBRARIES = reverse_collator.la
reverse_collator_la_LDFLAGS = -avoid-version -module

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
lib_LTLIBRARIES = bzip2_compress.la
bzip2_compress_la_LDFLAGS = -avoid-version -module

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
lib_LTLIBRARIES = nop_compress.la
nop_compress_la_LDFLAGS = -avoid-version -module

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
lib_LTLIBRARIES = snappy_compress.la
snappy_compress_la_LDFLAGS = -avoid-version -module

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(abs_top_builddir)
AM_CPPFLAGS = -I$(abs_top_builddir)
PYSRC = $(top_srcdir)/lang/python
if DEBUG

View File

@@ -115,6 +115,9 @@ __verify_start_filesize(WT_SESSION_IMPL *session,
*/
file_size = 0;
WT_CKPT_FOREACH(ckptbase, ckpt) {
/* Skip empty checkpoints. */
if (ckpt->raw.size == 0)
continue;
WT_RET(__wt_block_buffer_to_ckpt(
session, block, ckpt->raw.data, ci));
if (ci->file_size > file_size)

View File

@@ -22,6 +22,10 @@ __wt_bulk_init(WT_CURSOR_BULK *cbulk)
session = (WT_SESSION_IMPL *)cbulk->cbt.iface.session;
btree = session->btree;
/*
* Bulk-load is only permitted on newly created files, not any empty
* file -- see the checkpoint code for a discussion.
*/
if (!btree->bulk_load_ok)
WT_RET_MSG(session, EINVAL,
"bulk-load is only possible for newly created trees");

View File

@@ -403,6 +403,7 @@ __wt_btcur_next(WT_CURSOR_BTREE *cbt, int discard)
LF_SET(WT_TREE_DISCARD);
__cursor_func_init(cbt, 0);
__cursor_position_clear(cbt);
/*
* If we aren't already iterating in the right direction, there's
@@ -507,6 +508,7 @@ __wt_btcur_next_random(WT_CURSOR_BTREE *cbt)
WT_BSTAT_INCR(session, cursor_read_next);
__cursor_func_init(cbt, 1);
__cursor_position_clear(cbt);
/*
* Only supports row-store: applications can trivially select a random

View File

@@ -491,6 +491,7 @@ __wt_btcur_prev(WT_CURSOR_BTREE *cbt, int discard)
LF_SET(WT_TREE_DISCARD);
__cursor_func_init(cbt, 0);
__cursor_position_clear(cbt);
/*
* If we aren't already iterating in the right direction, there's

View File

@@ -112,6 +112,7 @@ __wt_btcur_reset(WT_CURSOR_BTREE *cbt)
__cursor_leave(cbt);
__cursor_search_clear(cbt);
__cursor_position_clear(cbt);
return (0);
}

View File

@@ -208,12 +208,15 @@ int
__wt_debug_off(
WT_SESSION_IMPL *session, uint32_t offset, uint32_t size, const char *ofile)
{
WT_BTREE *btree;
WT_DECL_ITEM(buf);
WT_DECL_RET;
btree = session->btree;
WT_RET(__wt_scr_alloc(session, size, &buf));
WT_ERR(__wt_block_read_off(
session, session->btree->block, buf, offset, size, 0));
WT_ERR(__wt_block_read_off(session,
btree->block, buf, offset, size, WT_BLOCK_CHECKSUM_NOT_SET));
ret = __wt_debug_disk(session, buf->mem, ofile);
err: __wt_scr_free(&buf);

View File

@@ -428,9 +428,13 @@ __evict_page(WT_SESSION_IMPL *session, WT_PAGE *page)
WT_RET(__wt_txn_init(session));
__wt_txn_get_evict_snapshot(session);
saved_txn.oldest_snap_min = txn->oldest_snap_min;
txn->isolation = TXN_ISO_READ_COMMITTED;
ret = __wt_rec_evict(session, page, 0);
/* Keep count of any failures. */
saved_txn.eviction_fails = txn->eviction_fails;
if (was_running) {
WT_ASSERT(session, txn->snapshot == NULL ||
txn->snapshot != saved_txn.snapshot);
@@ -720,7 +724,8 @@ __evict_walk(WT_SESSION_IMPL *session)
* get some pages from each underlying file. In practice, a realloc
* is rarely needed, so it is worth avoiding the LRU lock.
*/
elem = WT_EVICT_WALK_BASE + (conn->btqcnt * WT_EVICT_WALK_PER_TABLE);
elem = WT_EVICT_WALK_BASE +
(conn->open_btree_count * WT_EVICT_WALK_PER_TABLE);
if (elem > cache->evict_entries) {
__wt_spin_lock(session, &cache->evict_lock);
/* Save the offset of the eviction point. */
@@ -792,16 +797,22 @@ __evict_walk_file(WT_SESSION_IMPL *session, u_int *slotp)
end = cache->evict + cache->evict_entries;
/*
* Get the next WT_EVICT_WALK_PER_TABLE entries.
*
* We can't evict the page just returned to us, it marks our place in
* the tree. So, always stay one page ahead of the page being returned.
* Get some more eviction candidate pages.
*/
for (evict = start, restarts = 0;
evict < end && restarts <= 1 && ret == 0;
evict < end && ret == 0;
ret = __wt_tree_walk(session, &btree->evict_page, WT_TREE_EVICT)) {
if ((page = btree->evict_page) == NULL) {
++restarts;
/*
* Take care with terminating this loop.
*
* Don't make an extra call to __wt_tree_walk: that
* will leave a page in the WT_REF_EVICT_WALK state,
* unable to be evicted, which may prevent any work
* from being done.
*/
if (++restarts == 2)
break;
continue;
}
@@ -935,6 +946,7 @@ int
__wt_evict_lru_page(WT_SESSION_IMPL *session, int is_app)
{
WT_BTREE *btree, *saved_btree;
WT_DECL_RET;
WT_PAGE *page;
__evict_get_page(session, is_app, &btree, &page);
@@ -947,19 +959,14 @@ __wt_evict_lru_page(WT_SESSION_IMPL *session, int is_app)
saved_btree = session->btree;
WT_SET_BTREE_IN_SESSION(session, btree);
/*
* We don't care why eviction failed (maybe the page was dirty and
* we're out of disk space, or the page had an in-memory subtree
* already being evicted).
*/
(void)__evict_page(session, page);
ret = __evict_page(session, page);
(void)WT_ATOMIC_SUB(btree->lru_count, 1);
WT_CLEAR_BTREE_IN_SESSION(session);
session->btree = saved_btree;
return (0);
return (ret);
}
/*

View File

@@ -7,7 +7,7 @@
#include "wt_internal.h"
static int __btree_conf(WT_SESSION_IMPL *);
static int __btree_conf(WT_SESSION_IMPL *, const char *[]);
static int __btree_get_last_recno(WT_SESSION_IMPL *);
static int __btree_page_sizes(WT_SESSION_IMPL *, const char *);
static int __btree_tree_open_empty(WT_SESSION_IMPL *, int);
@@ -54,16 +54,11 @@ __wt_btree_open(WT_SESSION_IMPL *session,
WT_CLEAR(dsk);
/* Initialize and configure the WT_BTREE structure. */
WT_ERR(__btree_conf(session));
WT_ERR(__btree_conf(session, cfg));
/*
* Bulk-load is only permitted on newly created files, not any empty
* file. The reason is because deleting a checkpoint requires writing
* the file, and a fake checkpoint can't write the file. So, if you
* have a named checkpoint in the file, then, because tree is empty,
* you start bulk-loading it, then you enter another checkpoint with
* the same name, you end up using a fake checkpoint to delete a real
* checkpoint, and that's going to end in tears.
* file -- see the checkpoint code for a discussion.
*/
created = addr == NULL || addr_size == 0;
if (!created && F_ISSET(btree, WT_BTREE_BULK))
@@ -72,11 +67,11 @@ __wt_btree_open(WT_SESSION_IMPL *session,
/* Handle salvage configuration. */
forced_salvage = 0;
if (F_ISSET(btree, WT_BTREE_SALVAGE)) {
if (F_ISSET(btree, WT_BTREE_SALVAGE) && cfg != NULL) {
ret = __wt_config_gets(session, cfg, "force", &cval);
if (ret != 0 && ret != WT_NOTFOUND)
WT_ERR(ret);
if (cval.val != 0)
if (ret == 0 && cval.val != 0)
forced_salvage = 1;
}
@@ -160,11 +155,12 @@ __wt_btree_close(WT_SESSION_IMPL *session)
* Configure a WT_BTREE structure.
*/
static int
__btree_conf(WT_SESSION_IMPL *session)
__btree_conf(WT_SESSION_IMPL *session, const char *cfg[])
{
WT_BTREE *btree;
WT_CONFIG_ITEM cval;
WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
WT_NAMED_COLLATOR *ncoll;
uint32_t bitcnt;
int fixed;
@@ -188,8 +184,7 @@ __btree_conf(WT_SESSION_IMPL *session)
/* Row-store key comparison and key gap for prefix compression. */
if (btree->type == BTREE_ROW) {
WT_RET(__wt_config_getones(
session, config, "collator", &cval));
WT_RET(__wt_config_getones(session, config, "collator", &cval));
if (cval.len > 0) {
TAILQ_FOREACH(ncoll, &conn->collqh, q) {
if (WT_STRING_MATCH(
@@ -235,23 +230,32 @@ __btree_conf(WT_SESSION_IMPL *session)
F_CLR(btree, WT_BTREE_NO_EVICTION);
}
/* No-cache files are never evicted or cached. */
if (cfg != NULL) {
ret = __wt_config_gets(session, cfg, "no_cache", &cval);
if (ret != 0 && ret != WT_NOTFOUND)
WT_RET(ret);
if (ret == 0 && cval.val != 0)
F_SET(session->btree, WT_BTREE_NO_CACHE |
WT_BTREE_NO_EVICTION | WT_BTREE_NO_HAZARD);
}
/* Huffman encoding */
WT_RET(__wt_btree_huffman_open(session, config));
/* Reconciliation configuration. */
WT_RET(__wt_config_getones(
session, btree->config, "dictionary", &cval));
WT_RET(__wt_config_getones(session, config, "dictionary", &cval));
btree->dictionary = (u_int)cval.val;
WT_RET(__wt_config_getones(
session, btree->config, "internal_key_truncate", &cval));
session, config, "internal_key_truncate", &cval));
btree->internal_key_truncate = cval.val == 0 ? 0 : 1;
WT_RET(__wt_config_getones(
session, btree->config, "prefix_compression", &cval));
WT_RET(
__wt_config_getones(session, config, "prefix_compression", &cval));
btree->prefix_compression = cval.val == 0 ? 0 : 1;
WT_RET(__wt_config_getones(session, btree->config, "split_pct", &cval));
WT_RET(__wt_config_getones(session, config, "split_pct", &cval));
btree->split_pct = (u_int)cval.val;
WT_RET(__wt_stat_alloc_btree_stats(session, &btree->stats));
@@ -476,7 +480,7 @@ __btree_get_last_recno(WT_SESSION_IMPL *session)
return (WT_NOTFOUND);
btree->last_recno = __col_last_recno(page);
__wt_page_release(session, page);
__wt_stack_release(session, page);
return (0);
}

View File

@@ -42,9 +42,20 @@ __wt_page_in_func(
case WT_REF_DISK:
case WT_REF_DELETED:
/* The page isn't in memory, attempt to read it. */
/* Check if there is space in the cache. */
__wt_eviction_check(session, &read_lockout, wake);
wake = 0;
if (read_lockout)
/*
* If the cache is full, give up, but only if we are
* not holding the schema lock. The schema lock can
* block checkpoints, and thus eviction, so it is not
* safe to wait for eviction if we are holding it.
*/
if (read_lockout &&
!F_ISSET(session, WT_SESSION_SCHEMA_LOCKED) &&
!F_ISSET(session->btree, WT_BTREE_NO_CACHE))
break;
WT_RET(__wt_cache_read(session, parent, ref));
@@ -103,13 +114,13 @@ __wt_page_in_func(
WT_ILLEGAL_VALUE(session);
}
/*
* Find a page to evict -- if that fails, we don't care why,
* but we may need to wake the eviction server again if the
* cache is still full.
*/
if (__wt_evict_lru_page(session, 1) != 0)
/* Find a page to evict -- if the page is busy, keep trying. */
if ((ret = __wt_evict_lru_page(session, 1)) == EBUSY)
__wt_yield();
else if (ret == WT_NOTFOUND)
wake = 1;
else
WT_RET(ret);
}
}

View File

@@ -148,7 +148,6 @@ int
__wt_tree_walk(WT_SESSION_IMPL *session, WT_PAGE **pagep, uint32_t flags)
{
WT_BTREE *btree;
WT_DECL_RET;
WT_PAGE *page, *t;
WT_REF *ref;
uint32_t slot;
@@ -184,30 +183,14 @@ __wt_tree_walk(WT_SESSION_IMPL *session, WT_PAGE **pagep, uint32_t flags)
t = page->parent;
slot = (uint32_t)(page->ref - t->u.intl.t);
/*
* Swap our hazard reference for the hazard reference of our parent,
* if it's not the root page (we could access it directly because we
* know it's in memory, but we need a hazard reference). Don't leave
* a hazard reference dangling on error.
*
* We're hazard-reference coupling up the tree and that's OK: first,
* hazard references can't deadlock, so there's none of the usual
* problems found when logically locking up a Btree; second, we don't
* release our current hazard reference until we have our parent's
* hazard reference. If the eviction thread tries to evict the active
* page, that fails because of our hazard reference. If eviction tries
* to evict our parent, that fails because the parent has a child page
* that can't be discarded.
*/
/* If not the eviction thread, release the page's hazard reference. */
if (eviction) {
if (page->ref->state == WT_REF_EVICT_WALK)
page->ref->state = WT_REF_MEM;
} else {
if (!WT_PAGE_IS_ROOT(t))
ret = __wt_page_in(session, t, t->ref);
} else
__wt_page_release(session, page);
WT_RET(ret);
}
/* Switch to the parent. */
page = t;
/*
@@ -283,14 +266,7 @@ descend: for (;;) {
break;
}
/*
* Swap hazard references at each level (but
* don't leave a hazard reference dangling on
* error).
*/
ret = __wt_page_in(session, page, ref);
__wt_page_release(session, page);
WT_RET(ret);
WT_RET(__wt_page_in(session, page, ref));
}
page = ref->page;

View File

@@ -68,9 +68,8 @@ __wt_col_search(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt, int is_modify)
ref = page->u.intl.t + (base - 1);
}
/* Swap the parent page for the child page. */
/* Move to the child page. */
WT_ERR(__wt_page_in(session, page, ref));
__wt_page_release(session, page);
page = ref->page;
}
@@ -159,6 +158,6 @@ past_end:
F_SET(cbt, WT_CBT_MAX_RECORD);
return (0);
err: __wt_page_release(session, page);
err: __wt_stack_release(session, page);
return (ret);
}

View File

@@ -238,8 +238,11 @@ __rec_review(WT_SESSION_IMPL *session,
{
WT_DECL_RET;
WT_PAGE_MODIFY *mod;
WT_TXN *txn;
uint32_t i;
txn = &session->txn;
/*
* Get exclusive access to the page if our caller doesn't have the tree
* locked down.
@@ -327,9 +330,17 @@ __rec_review(WT_SESSION_IMPL *session,
WT_VERBOSE_RET(session, evict,
"page %p written but not clean", page);
if (F_ISSET(txn, TXN_RUNNING) &&
++txn->eviction_fails >= 100) {
txn->eviction_fails = 0;
ret = WT_DEADLOCK;
WT_STAT_INCR(
S2C(session)->stats, txn_fail_cache);
}
/*
* If there is only a single cursor open, there are no
* consistency issues: try to bump our snapshot.
* If there aren't multiple cursors active, there
* are no consistency issues: try to bump our snapshot.
*/
if (session->ncursors <= 1) {
__wt_txn_read_last(session);
@@ -347,6 +358,8 @@ __rec_review(WT_SESSION_IMPL *session,
}
}
WT_RET(ret);
txn->eviction_fails = 0;
}
/*

View File

@@ -1965,7 +1965,7 @@ __rec_col_var(WT_SESSION_IMPL *session,
WT_INSERT *ins;
WT_INSERT_HEAD *append;
WT_ITEM *last;
WT_UPDATE *next_upd, *upd;
WT_UPDATE *upd;
uint64_t n, nrepeat, repeat_count, rle, slvg_missing, src_recno;
uint32_t i, size;
int deleted, last_deleted, orig_deleted, update_no_copy;
@@ -2016,21 +2016,13 @@ __rec_col_var(WT_SESSION_IMPL *session,
WT_COL_FOREACH(page, cip, i) {
ovfl_state = OVFL_IGNORE;
if ((cell = WT_COL_PTR(page, cip)) == NULL) {
ins = NULL;
nrepeat = 1;
ins = NULL;
orig_deleted = 1;
} else {
__wt_cell_unpack(cell, unpack);
nrepeat = __wt_cell_rle(unpack);
ins = WT_SKIP_FIRST(WT_COL_UPDATE(page, cip));
while (ins != NULL) {
WT_ERR(
__rec_txn_read(session, r, ins->upd, &upd));
if (upd != NULL)
break;
ins = WT_SKIP_NEXT(ins);
}
/*
* If the original value is "deleted", there's no value
@@ -2090,19 +2082,13 @@ record_loop: /*
*/
for (n = 0;
n < nrepeat; n += repeat_count, src_recno += repeat_count) {
if (ins != NULL &&
WT_INSERT_RECNO(ins) == src_recno) {
upd = NULL;
if (ins != NULL && WT_INSERT_RECNO(ins) == src_recno) {
WT_ERR(
__rec_txn_read(session, r, ins->upd, &upd));
WT_ASSERT(session, upd != NULL);
do {
ins = WT_SKIP_NEXT(ins);
if (ins == NULL)
break;
WT_ERR(__rec_txn_read(
session, r, ins->upd, &next_upd));
} while (next_upd == NULL);
ins = WT_SKIP_NEXT(ins);
}
if (upd != NULL) {
update_no_copy = 1; /* No data copy */
repeat_count = 1;

View File

@@ -269,7 +269,9 @@ err: __wt_session_serialize_wrapup(session, page, ret);
int
__wt_update_check(WT_SESSION_IMPL *session, WT_PAGE *page, WT_UPDATE *next)
{
WT_DECL_RET;
WT_TXN *txn;
int lockout, wake = 1;
/* Discard obsolete WT_UPDATE structures. */
if (next != NULL)
@@ -278,6 +280,22 @@ __wt_update_check(WT_SESSION_IMPL *session, WT_PAGE *page, WT_UPDATE *next)
/* Before allocating anything, make sure this update is permitted. */
WT_RET(__wt_txn_update_check(session, next));
/*
* Pause if the cache is full.
* This matches the logic in __wt_page_in_func.
*/
for (;;) {
__wt_eviction_check(session, &lockout, wake);
wake = 0;
if (!lockout ||
F_ISSET(session, WT_SESSION_SCHEMA_LOCKED))
break;
if ((ret = __wt_evict_lru_page(session, 1)) == EBUSY)
__wt_yield();
else
WT_RET_NOTFOUND_OK(ret);
}
/*
* Record the transaction ID for the first update to a page.
* We don't care if this races: there is a buffer built into the

View File

@@ -153,9 +153,8 @@ __wt_row_search(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt, int is_modify)
if (cmp != 0)
ref = page->u.intl.t + (base - 1);
/* Swap the parent page for the child page. */
/* Move to the child page. */
WT_ERR(__wt_page_in(session, page, ref));
__wt_page_release(session, page);
page = ref->page;
}
@@ -243,7 +242,7 @@ __wt_row_search(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt, int is_modify)
WT_ERR(__wt_search_insert(session, cbt, cbt->ins_head, srch_key));
return (0);
err: __wt_page_release(session, page);
err: __wt_stack_release(session, page);
return (ret);
}
@@ -270,7 +269,6 @@ __wt_row_random(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt)
/* Swap the parent page for the child page. */
WT_ERR(__wt_page_in(session, page, ref));
__wt_page_release(session, page);
page = ref->page;
}
@@ -311,6 +309,6 @@ __wt_row_random(WT_SESSION_IMPL *session, WT_CURSOR_BTREE *cbt)
return (0);
err: __wt_page_release(session, page);
err: __wt_stack_release(session, page);
return (ret);
}

View File

@@ -113,8 +113,8 @@ __wt_confdfl_file_meta =
"huffman_value=,internal_item_max=0,internal_key_truncate=,"
"internal_page_max=2KB,key_format=u,key_gap=10,leaf_item_max=0,"
"leaf_page_max=1MB,lsm_bloom_bit_count=8,lsm_bloom_hash_count=4,"
"lsm_chunk_size=2MB,prefix_compression=,split_pct=75,type=btree,"
"value_format=u,version=(major=0,minor=0)";
"lsm_chunk_size=2MB,lsm_merge_max=15,prefix_compression=,split_pct=75"
",type=btree,value_format=u,version=(major=0,minor=0)";
WT_CONFIG_CHECK
__wt_confchk_file_meta[] = {
@@ -138,6 +138,7 @@ __wt_confchk_file_meta[] = {
{ "lsm_bloom_bit_count", "int", "min=2,max=1000" },
{ "lsm_bloom_hash_count", "int", "min=2,max=100" },
{ "lsm_chunk_size", "int", "min=512K,max=500MB" },
{ "lsm_merge_max", "int", "min=2,max=100" },
{ "prefix_compression", "boolean", NULL },
{ "split_pct", "int", "min=25,max=100" },
{ "type", "string", "choices=[\"btree\"]" },
@@ -213,8 +214,8 @@ __wt_confdfl_session_create =
"internal_key_truncate=,internal_page_max=2KB,key_format=u,"
"key_format=u,key_gap=10,leaf_item_max=0,leaf_page_max=1MB,"
"lsm_bloom_bit_count=8,lsm_bloom_hash_count=4,lsm_chunk_size=2MB,"
"prefix_compression=,split_pct=75,type=btree,value_format=u,"
"value_format=u";
"lsm_merge_max=15,prefix_compression=,split_pct=75,type=btree,"
"value_format=u,value_format=u";
WT_CONFIG_CHECK
__wt_confchk_session_create[] = {
@@ -242,6 +243,7 @@ __wt_confchk_session_create[] = {
{ "lsm_bloom_bit_count", "int", "min=2,max=1000" },
{ "lsm_bloom_hash_count", "int", "min=2,max=100" },
{ "lsm_chunk_size", "int", "min=512K,max=500MB" },
{ "lsm_merge_max", "int", "min=2,max=100" },
{ "prefix_compression", "boolean", NULL },
{ "split_pct", "int", "min=25,max=100" },
{ "type", "string", "choices=[\"btree\"]" },
@@ -280,8 +282,8 @@ __wt_confchk_session_log_printf[] = {
const char *
__wt_confdfl_session_open_cursor =
"append=0,bulk=0,checkpoint=,dump=,next_random=0,overwrite=0,raw=0,"
"statistics=0,statistics_clear=0,target=";
"append=0,bulk=0,checkpoint=,dump=,next_random=0,no_cache=0,"
"overwrite=0,raw=0,statistics=0,statistics_clear=0,target=";
WT_CONFIG_CHECK
__wt_confchk_session_open_cursor[] = {
@@ -290,6 +292,7 @@ __wt_confchk_session_open_cursor[] = {
{ "checkpoint", "string", NULL },
{ "dump", "string", "choices=[\"hex\",\"print\"]" },
{ "next_random", "boolean", NULL },
{ "no_cache", "boolean", NULL },
{ "overwrite", "boolean", NULL },
{ "raw", "boolean", NULL },
{ "statistics", "boolean", NULL },
@@ -381,7 +384,7 @@ const char *
__wt_confdfl_wiredtiger_open =
"buffer_alignment=-1,cache_size=100MB,create=0,direct_io=,"
"error_prefix=,eviction_target=80,eviction_trigger=95,extensions=,"
"hazard_max=30,logging=0,multiprocess=0,session_max=50,sync=,"
"hazard_max=1000,logging=0,multiprocess=0,session_max=50,sync=,"
"transactional=,use_environment_priv=0,verbose=";
WT_CONFIG_CHECK

View File

@@ -845,7 +845,7 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
WT_ERR(__conn_config_env(session, cfg));
WT_ERR(__wt_config_gets(session, cfg, "hazard_max", &cval));
conn->hazard_size = (uint32_t)cval.val;
conn->hazard_max = (uint32_t)cval.val;
WT_ERR(__wt_config_gets(session, cfg, "session_max", &cval));
conn->session_size = (uint32_t)cval.val + WT_NUM_INTERNAL_SESSIONS;
WT_ERR(__wt_config_gets(session, cfg, "sync", &cval));

View File

@@ -93,15 +93,18 @@ __conn_btree_get(WT_SESSION_IMPL *session,
WT_ASSERT(session, F_ISSET(session, WT_SESSION_SCHEMA_LOCKED));
/* Increment the reference count if we already have the btree open. */
TAILQ_FOREACH(btree, &conn->btqh, q)
if (strcmp(name, btree->name) == 0 &&
((ckpt == NULL && btree->checkpoint == NULL) ||
(ckpt != NULL && btree->checkpoint != NULL &&
strcmp(ckpt, btree->checkpoint) == 0))) {
++btree->refcnt;
session->btree = btree;
return (__conn_btree_open_lock(session, flags));
if (!LF_ISSET(WT_BTREE_NO_CACHE)) {
TAILQ_FOREACH(btree, &conn->btqh, q) {
if (strcmp(name, btree->name) == 0 &&
((ckpt == NULL && btree->checkpoint == NULL) ||
(ckpt != NULL && btree->checkpoint != NULL &&
strcmp(ckpt, btree->checkpoint) == 0))) {
++btree->refcnt;
session->btree = btree;
return (__conn_btree_open_lock(session, flags));
}
}
}
/*
* Allocate the WT_BTREE structure, its lock, and set the name so we
@@ -118,10 +121,11 @@ __conn_btree_get(WT_SESSION_IMPL *session,
__wt_writelock(session, btree->rwlock);
F_SET(btree, WT_BTREE_EXCLUSIVE);
/* Add to the connection list. */
btree->refcnt = 1;
TAILQ_INSERT_TAIL(&conn->btqh, btree, q);
++conn->btqcnt;
if (!LF_ISSET(WT_BTREE_NO_CACHE)) {
/* Add to the connection list. */
btree->refcnt = 1;
TAILQ_INSERT_TAIL(&conn->btqh, btree, q);
}
}
if (ret == 0)
@@ -153,6 +157,9 @@ __wt_conn_btree_sync_and_close(WT_SESSION_IMPL *session)
if (!F_ISSET(btree, WT_BTREE_OPEN))
return (0);
if (btree->checkpoint == NULL)
--S2C(session)->open_btree_count;
/*
* Checkpoint to flush out the file's changes. This usually happens on
* session handle close (which means we're holding the handle lock, so
@@ -230,6 +237,12 @@ __conn_btree_open(WT_SESSION_IMPL *session,
WT_ERR(__wt_btree_open(session, addr->data, addr->size, cfg,
btree->checkpoint == NULL ? 0 : 1));
F_SET(btree, WT_BTREE_OPEN);
/*
* Checkpoint handles are read only, so eviction calculations
* based on the number of btrees are better to ignore them.
*/
if (btree->checkpoint == NULL)
++S2C(session)->open_btree_count;
/* Drop back to a readlock if that is all that was needed. */
if (!LF_ISSET(WT_BTREE_EXCLUSIVE)) {
@@ -483,11 +496,11 @@ err: WT_CLEAR_BTREE_IN_SESSION(session);
}
/*
* __conn_btree_discard --
* __wt_conn_btree_discard_single --
* Discard a single btree file handle structure.
*/
static int
__conn_btree_discard(WT_SESSION_IMPL *session, WT_BTREE *btree)
int
__wt_conn_btree_discard_single(WT_SESSION_IMPL *session, WT_BTREE *btree)
{
WT_DECL_RET;
@@ -535,8 +548,7 @@ restart:
continue;
TAILQ_REMOVE(&conn->btqh, btree, q);
--conn->btqcnt;
WT_TRET(__conn_btree_discard(session, btree));
WT_TRET(__wt_conn_btree_discard_single(session, btree));
goto restart;
}
@@ -552,8 +564,7 @@ restart:
/* Close the metadata file handle. */
while ((btree = TAILQ_FIRST(&conn->btqh)) != NULL) {
TAILQ_REMOVE(&conn->btqh, btree, q);
--conn->btqcnt;
WT_TRET(__conn_btree_discard(session, btree));
WT_TRET(__wt_conn_btree_discard_single(session, btree));
}
return (ret);

View File

@@ -73,8 +73,9 @@ __wt_connection_destroy(WT_CONNECTION_IMPL *conn)
__wt_spin_destroy(session, &conn->api_lock);
__wt_spin_destroy(session, &conn->fh_lock);
__wt_spin_destroy(session, &conn->serial_lock);
__wt_spin_destroy(session, &conn->metadata_lock);
__wt_spin_destroy(session, &conn->schema_lock);
__wt_spin_destroy(session, &conn->serial_lock);
/* Free allocated memory. */
__wt_free(session, conn->home);

View File

@@ -336,6 +336,17 @@ __wt_curfile_create(WT_SESSION_IMPL *session,
if (bulk)
WT_ERR(__wt_curbulk_init((WT_CURSOR_BULK *)cbt));
/*
* no_cache
* No cache cursors are read-only.
*/
WT_ERR(__wt_config_gets_defno(session, cfg, "no_cache", &cval));
if (cval.val != 0) {
cursor->insert = __wt_cursor_notsup;
cursor->update = __wt_cursor_notsup;
cursor->remove = __wt_cursor_notsup;
}
/*
* random_retrieval
* Random retrieval cursors only support next, reset and close.
@@ -368,20 +379,30 @@ __wt_curfile_open(WT_SESSION_IMPL *session, const char *uri,
{
WT_CONFIG_ITEM cval;
WT_DECL_RET;
int bulk;
uint32_t flags;
/*
* Bulk and no cache handles are exclusive and may not be used by more
* than a single thread.
* Additionally set the discard flag on no cache handles so they are
* destroyed on close.
*/
flags = 0;
WT_RET(__wt_config_gets_defno(session, cfg, "bulk", &cval));
bulk = (cval.val != 0);
if (cval.val != 0)
LF_SET(WT_BTREE_EXCLUSIVE | WT_BTREE_BULK);
WT_RET(__wt_config_gets_defno(session, cfg, "no_cache", &cval));
if (cval.val != 0)
LF_SET(WT_BTREE_EXCLUSIVE | WT_BTREE_NO_CACHE);
/* TODO: handle projections. */
/* Get the handle and lock it while the cursor is using it. */
if (WT_PREFIX_MATCH(uri, "colgroup:") || WT_PREFIX_MATCH(uri, "index:"))
WT_RET(__wt_schema_get_btree(session, uri, strlen(uri), cfg,
bulk ? WT_BTREE_BULK | WT_BTREE_EXCLUSIVE : 0));
WT_RET(__wt_schema_get_btree(
session, uri, strlen(uri), cfg, flags));
else if (WT_PREFIX_MATCH(uri, "file:"))
WT_RET(__wt_session_get_btree_ckpt(session, uri, cfg,
bulk ? WT_BTREE_BULK | WT_BTREE_EXCLUSIVE : 0));
WT_RET(__wt_session_get_btree_ckpt(session, uri, cfg, flags));
else
WT_RET(__wt_bad_object_type(session, uri));

View File

@@ -170,8 +170,7 @@ __wt_cursor_get_keyv(WT_CURSOR *cursor, uint32_t flags, va_list ap)
*va_arg(ap, uint64_t *) = cursor->recno;
} else {
fmt = cursor->key_format;
if (LF_ISSET(
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW))
if (LF_ISSET(WT_CURSOR_RAW_OK))
fmt = "u";
ret = __wt_struct_unpackv(
session, cursor->key.data, cursor->key.size, fmt, ap);
@@ -212,17 +211,14 @@ __wt_cursor_set_keyv(WT_CURSOR *cursor, uint32_t flags, va_list ap)
sz = sizeof(cursor->recno);
} else {
fmt = cursor->key_format;
if (LF_ISSET(
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW))
fmt = "u";
if (strcmp(fmt, "S") == 0) {
str = va_arg(ap, const char *);
sz = strlen(str) + 1;
cursor->key.data = (void *)str;
} else if (strcmp(fmt, "u") == 0) {
if (LF_ISSET(WT_CURSOR_RAW_OK) || strcmp(fmt, "u") == 0) {
item = va_arg(ap, WT_ITEM *);
sz = item->size;
cursor->key.data = (void *)item->data;
} else if (strcmp(fmt, "S") == 0) {
str = va_arg(ap, const char *);
sz = strlen(str) + 1;
cursor->key.data = (void *)str;
} else {
buf = &cursor->key;
@@ -269,9 +265,7 @@ __wt_cursor_get_value(WT_CURSOR *cursor, ...)
WT_CURSOR_NEEDVALUE(cursor);
va_start(ap, cursor);
fmt = F_ISSET(cursor,
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW) ?
"u" : cursor->value_format;
fmt = F_ISSET(cursor, WT_CURSOR_RAW_OK) ? "u" : cursor->value_format;
ret = __wt_struct_unpackv(session,
cursor->value.data, cursor->value.size, fmt, ap);
va_end(ap);
@@ -297,38 +291,42 @@ __wt_cursor_set_value(WT_CURSOR *cursor, ...)
CURSOR_API_CALL(cursor, session, set_value, NULL);
va_start(ap, cursor);
fmt = F_ISSET(cursor,
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW) ?
"u" : cursor->value_format;
/* Fast path some common cases: single strings or byte arrays. */
fmt = F_ISSET(cursor, WT_CURSOR_RAW_OK) ? "u" : cursor->value_format;
/* Fast path some common cases: single strings, byte arrays and bits. */
if (strcmp(fmt, "S") == 0) {
str = va_arg(ap, const char *);
sz = strlen(str) + 1;
cursor->value.data = str;
} else if (strcmp(fmt, "u") == 0) {
} else if (F_ISSET(cursor, WT_CURSOR_RAW_OK) || strcmp(fmt, "u") == 0) {
item = va_arg(ap, WT_ITEM *);
sz = item->size;
cursor->value.data = item->data;
} else {
} else if (strcmp(fmt, "t") == 0 ||
(isdigit(fmt[0]) && strcmp(fmt + 1, "t"))) {
sz = 1;
buf = &cursor->value;
ret = __wt_struct_sizev(session, &sz, cursor->value_format, ap);
WT_ERR(__wt_buf_initsize(session, buf, sz));
*(uint8_t *)buf->mem = (uint8_t)va_arg(ap, int);
} else {
WT_ERR(
__wt_struct_sizev(session, &sz, cursor->value_format, ap));
va_end(ap);
WT_ERR(ret);
va_start(ap, cursor);
if ((ret = __wt_buf_initsize(session, buf, sz)) != 0 ||
(ret = __wt_struct_packv(session, buf->mem, sz,
cursor->value_format, ap)) != 0) {
cursor->saved_err = ret;
F_CLR(cursor, WT_CURSTD_VALUE_SET);
goto err;
}
cursor->value.data = buf->mem;
buf = &cursor->value;
WT_ERR(__wt_buf_initsize(session, buf, sz));
WT_ERR(__wt_struct_packv(session, buf->mem, sz,
cursor->value_format, ap));
}
F_SET(cursor, WT_CURSTD_VALUE_SET);
cursor->value.size = WT_STORE_SIZE(sz);
va_end(ap);
err: API_END(session);
if (0) {
err: cursor->saved_err = ret;
F_CLR(cursor, WT_CURSTD_VALUE_SET);
}
va_end(ap);
API_END(session);
}
/*

View File

@@ -77,8 +77,7 @@ __wt_curtable_get_value(WT_CURSOR *cursor, ...)
WT_CURSOR_NEEDVALUE(primary);
va_start(ap, cursor);
if (F_ISSET(cursor,
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW)) {
if (F_ISSET(cursor, WT_CURSOR_RAW_OK)) {
ret = __wt_schema_project_merge(session,
ctable->cg_cursors, ctable->plan,
cursor->value_format, &cursor->value);
@@ -147,8 +146,7 @@ __wt_curtable_set_value(WT_CURSOR *cursor, ...)
CURSOR_API_CALL(cursor, session, set_value, NULL);
va_start(ap, cursor);
if (F_ISSET(cursor,
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW)) {
if (F_ISSET(cursor, WT_CURSOR_RAW_OK)) {
item = va_arg(ap, WT_ITEM *);
cursor->value.data = item->data;
cursor->value.size = item->size;

View File

@@ -101,8 +101,14 @@ To remove existing data using a cursor, use the WT_CURSOR::remove method:
@section cursor_error Cursor position after error
After any cursor handle method failure, the cursor's position is
undetermined. Applications that cannot re-position the cursor after
failure must duplicate the cursor before calling a cursor method that will
undetermined. For cursor operations that expect a key to be set before the
operation begins (including WT_CURSOR::search, WT_CURSOR::insert,
WT_CURSOR::update and WT_CURSOR::remove), the application's key and value
will not be cleared by an error.
Applications that cannot re-position the cursor after failure must
duplicate the cursor by calling WT_SESSION::open_cursor and passing the
cursor as the \c to_dup parameter before calling a cursor method that will
attempt to re-position the cursor.
*/

View File

@@ -13,7 +13,7 @@ To ask questions or discuss issues related to using WiredTiger, visit our
View the documentation online:
- <a href="1.3.0/index.html"><b>WiredTiger 1.3.0 (current release)</b></a>
- <a href="1.3.2/index.html"><b>WiredTiger 1.3.2 (current release)</b></a>
- <a href="1.2.2/index.html"><b>WiredTiger 1.2.2</b></a>
- <a href="1.1.5/index.html"><b>WiredTiger 1.1.5</b></a>

View File

@@ -72,6 +72,11 @@ updating the same value will fail with ::WT_DEADLOCK. Some applications
may benefit from application-level synchronization to avoid repeated
attempts to rollback and update the same value.
Operations in transactions may also fail with the ::WT_DEADLOCK error if
some resource cannot be allocated after repeated attempts. For example, if
the cache is not large enough to hold the updates required to satisfy
transactional readers, an operation may fail and return ::WT_DEADLOCK.
@section transaction_isolation Isolation levels
WiredTiger supports <code>read-uncommitted</code>,

View File

@@ -130,6 +130,7 @@ struct __wt_session_impl {
* easily call a function to clear memory up to, but not including, the
* hazard reference.
*/
uint32_t hazard_size; /* Count of used hazard references */
u_int nhazard;
#define WT_SESSION_CLEAR(s) memset(s, 0, WT_PTRDIFF(&(s)->hazard, s))
WT_HAZARD *hazard; /* Hazard reference array */
@@ -213,7 +214,7 @@ struct __wt_connection_impl {
/* Locked: library list */
TAILQ_HEAD(__wt_dlh_qh, __wt_dlh) dlhqh;
u_int btqcnt; /* Locked: btree count */
u_int open_btree_count; /* Locked: open writable btree count */
u_int next_file_id; /* Locked: file ID counter */
/*
@@ -235,7 +236,7 @@ struct __wt_connection_impl {
* WiredTiger allocates space for a fixed number of hazard references
* in each thread of control.
*/
uint32_t hazard_size; /* Hazard array size */
uint32_t hazard_max; /* Hazard array size */
WT_CACHE *cache; /* Page cache */
uint64_t cache_size;
@@ -329,13 +330,19 @@ struct __wt_connection_impl {
__wt_txn_read_first(session); \
} \
ret = __wt_txn_commit((s), NULL); \
} else \
} else { \
(void)__wt_txn_rollback((s), NULL); \
if (ret == 0 || ret == WT_DEADLOCK) { \
ret = 0; \
continue; \
} \
} \
} else if ((ret) != 0 && \
(ret) != WT_NOTFOUND && \
(ret) != WT_DUPLICATE_KEY) \
F_SET(&(s)->txn, TXN_ERROR); \
} while (0)
break; \
} while (1)
/*
* If a session or connection method is about to return WT_NOTFOUND (some

View File

@@ -125,18 +125,20 @@ struct __wt_btree {
#define WT_BTREE_DISCARD 0x0002 /* Discard on release */
#define WT_BTREE_EXCLUSIVE 0x0004 /* Need exclusive access to handle */
#define WT_BTREE_LOCK_ONLY 0x0008 /* Handle is only needed for locking */
#define WT_BTREE_NO_EVICTION 0x0010 /* Disable eviction */
#define WT_BTREE_NO_HAZARD 0x0020 /* Disable hazard references */
#define WT_BTREE_OPEN 0x0040 /* Handle is open */
#define WT_BTREE_SALVAGE 0x0080 /* Handle is for salvage */
#define WT_BTREE_UPGRADE 0x0100 /* Handle is for upgrade */
#define WT_BTREE_VERIFY 0x0200 /* Handle is for verify */
#define WT_BTREE_NO_CACHE 0x0010 /* Disable caching */
#define WT_BTREE_NO_EVICTION 0x0020 /* Disable eviction */
#define WT_BTREE_NO_HAZARD 0x0040 /* Disable hazard references */
#define WT_BTREE_OPEN 0x0080 /* Handle is open */
#define WT_BTREE_SALVAGE 0x0100 /* Handle is for salvage */
#define WT_BTREE_UPGRADE 0x0200 /* Handle is for upgrade */
#define WT_BTREE_VERIFY 0x0400 /* Handle is for verify */
uint32_t flags;
};
/* Flags that make a btree handle special (not for normal use). */
#define WT_BTREE_SPECIAL_FLAGS \
(WT_BTREE_BULK | WT_BTREE_SALVAGE | WT_BTREE_UPGRADE | WT_BTREE_VERIFY)
(WT_BTREE_BULK | WT_BTREE_NO_CACHE | \
WT_BTREE_SALVAGE | WT_BTREE_UPGRADE | WT_BTREE_VERIFY)
/*
* WT_SALVAGE_COOKIE --

View File

@@ -295,9 +295,43 @@ __wt_get_addr(
static inline void
__wt_page_release(WT_SESSION_IMPL *session, WT_PAGE *page)
{
/* We never acquired a hazard reference on the root page. */
if (page != NULL && !WT_PAGE_IS_ROOT(page))
__wt_hazard_clear(session, page);
WT_BTREE *btree;
btree = session->btree;
/*
* Fast-track pages we don't have and the root page, which sticks
* in memory, regardless.
*/
if (page == NULL || WT_PAGE_IS_ROOT(page))
return;
/* If this is a non cached page, discard it. */
if (F_ISSET(btree, WT_BTREE_NO_CACHE)) {
page->ref->page = NULL;
page->ref->state = WT_REF_DISK;
__wt_page_out(session, &page, 0);
return;
}
/* Discard our hazard reference. */
__wt_hazard_clear(session, page);
}
/*
* __wt_stack_release --
* Release references to a page stack.
*/
static inline void
__wt_stack_release(WT_SESSION_IMPL *session, WT_PAGE *page)
{
WT_PAGE *next;
while (page != NULL && !WT_PAGE_IS_ROOT(page)) {
next = page->parent;
__wt_page_release(session, page);
page = next;
}
}
/*
@@ -310,7 +344,7 @@ __wt_page_hazard_check(WT_SESSION_IMPL *session, WT_PAGE *page)
WT_CONNECTION_IMPL *conn;
WT_HAZARD *hp;
WT_SESSION_IMPL *s;
uint32_t i, session_cnt;
uint32_t i, hazard_size, session_cnt;
conn = S2C(session);
@@ -326,7 +360,8 @@ __wt_page_hazard_check(WT_SESSION_IMPL *session, WT_PAGE *page)
for (s = conn->sessions, i = 0; i < session_cnt; ++s, ++i) {
if (!s->active)
continue;
for (hp = s->hazard; hp < s->hazard + conn->hazard_size; ++hp)
WT_ORDERED_READ(hazard_size, s->hazard_size);
for (hp = s->hazard; hp < s->hazard + hazard_size; ++hp)
if (hp->page == page)
return (hp);
}

View File

@@ -57,9 +57,13 @@ __wt_eviction_page_check(WT_SESSION_IMPL *session, WT_PAGE *page)
F_ISSET(session->btree, WT_BTREE_NO_EVICTION))
return (0);
/* Check the page's memory footprint. */
if ((int64_t)page->memory_footprint > conn->cache_size / 2 ||
page->memory_footprint > 20 * session->btree->maxleafpage)
/*
* Check the page's memory footprint - evict pages that take up more
* than their fair share of the cache. We define a fair share as
* approximately half the cache size per open writable btree handle.
*/
if ((int64_t)page->memory_footprint >
conn->cache_size / (2 * (conn->open_btree_count + 1)))
return (1);
/*

View File

@@ -197,3 +197,6 @@ struct __wt_cursor_table {
if (!F_ISSET(cursor, WT_CURSTD_VALUE_SET)) \
WT_ERR(__wt_cursor_kv_not_set(cursor, 0)); \
} while (0)
#define WT_CURSOR_RAW_OK \
WT_CURSTD_DUMP_HEX | WT_CURSTD_DUMP_PRINT | WT_CURSTD_RAW

View File

@@ -16,6 +16,16 @@ __cursor_set_recno(WT_CURSOR_BTREE *cbt, uint64_t v)
cbt->iface.recno = cbt->recno = v;
}
/*
* __cursor_position_clear --
* Forget the current key and value in a cursor.
*/
static inline void
__cursor_position_clear(WT_CURSOR_BTREE *cbt)
{
F_CLR(&cbt->iface, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
}
/*
* __cursor_search_clear --
* Reset the cursor's state for a search.
@@ -56,14 +66,9 @@ __cursor_leave(WT_CURSOR_BTREE *cbt)
cursor = &cbt->iface;
session = (WT_SESSION_IMPL *)cursor->session;
/* Optionally release any page references we're holding. */
if (cbt->page != NULL) {
__wt_page_release(session, cbt->page);
cbt->page = NULL;
}
/* Reset the returned key/value state. */
F_CLR(cursor, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
/* Release any page references we're holding. */
__wt_stack_release(session, cbt->page);
cbt->page = NULL;
if (F_ISSET(cbt, WT_CBT_ACTIVE)) {
WT_ASSERT(session, session->ncursors > 0);

View File

@@ -589,6 +589,8 @@ extern int __wt_conn_btree_apply_single(WT_SESSION_IMPL *session,
extern int __wt_conn_btree_close(WT_SESSION_IMPL *session, int locked);
extern int __wt_conn_btree_close_all(WT_SESSION_IMPL *session,
const char *name);
extern int __wt_conn_btree_discard_single(WT_SESSION_IMPL *session,
WT_BTREE *btree);
extern int __wt_conn_btree_discard(WT_CONNECTION_IMPL *conn);
extern int __wt_connection_init(WT_CONNECTION_IMPL *conn);
extern void __wt_connection_destroy(WT_CONNECTION_IMPL *conn);
@@ -670,7 +672,9 @@ extern int __wt_log_printf(WT_SESSION_IMPL *session,
2,
3)));
extern WT_LOGREC_DESC __wt_logdesc_debug;
extern int __wt_clsm_init_merge(WT_CURSOR *cursor, int nchunks);
extern int __wt_clsm_init_merge(WT_CURSOR *cursor,
int start_chunk,
int nchunks);
extern int __wt_clsm_open(WT_SESSION_IMPL *session,
const char *uri,
const char *cfg[],
@@ -679,10 +683,10 @@ extern int __wt_lsm_init(WT_CONNECTION *wt_conn, const char *config);
extern int __wt_lsm_cleanup(WT_CONNECTION *wt_conn);
extern int __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
WT_LSM_TREE *lsm_tree,
int start_chunk,
int nchunks,
WT_LSM_CHUNK **chunkp);
extern int __wt_lsm_major_merge(WT_SESSION_IMPL *session,
WT_LSM_TREE *lsm_tree);
extern int __wt_lsm_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
extern int __wt_lsm_meta_read(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
extern int __wt_lsm_meta_write(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
extern int __wt_lsm_tree_close_all(WT_SESSION_IMPL *session);
@@ -724,6 +728,7 @@ extern int __wt_lsm_tree_worker(WT_SESSION_IMPL *session,
const char *cfg[],
uint32_t open_flags);
extern void *__wt_lsm_worker(void *arg);
extern void *__wt_lsm_checkpoint_worker(void *arg);
extern int __wt_metadata_get(WT_SESSION *session,
const char *uri,
const char **valuep);

View File

@@ -21,9 +21,10 @@ struct __wt_cursor_lsm {
#define WT_CLSM_ITERATE_NEXT 0x01 /* Forward iteration */
#define WT_CLSM_ITERATE_PREV 0x02 /* Backward iteration */
#define WT_CLSM_MERGE 0x04 /* Merge cursor, don't update. */
#define WT_CLSM_MULTIPLE 0x08 /* Multiple cursors have values for the
#define WT_CLSM_MINOR_MERGE 0x08 /* Minor merge, include tombstones. */
#define WT_CLSM_MULTIPLE 0x10 /* Multiple cursors have values for the
current key */
#define WT_CLSM_UPDATED 0x10 /* Cursor has done updates */
#define WT_CLSM_UPDATED 0x20 /* Cursor has done updates */
uint32_t flags;
};
@@ -51,17 +52,21 @@ struct __wt_lsm_tree {
uint32_t *memsizep;
/* Configuration parameters */
uint32_t threshold;
uint32_t bloom_bit_count;
uint32_t bloom_hash_count;
uint32_t chunk_size;
uint32_t merge_max;
WT_SESSION_IMPL *worker_session;/* Passed to thread_create */
pthread_t worker_tid; /* LSM worker thread */
WT_SESSION_IMPL *ckpt_session; /* For checkpoint worker */
pthread_t ckpt_tid; /* LSM checkpoint worker thread */
int nchunks; /* Number of active chunks */
int last; /* Last allocated ID. */
WT_LSM_CHUNK **chunk; /* Array of active LSM chunks */
size_t chunk_alloc; /* Space allocated for chunks */
WT_LSM_CHUNK **old_chunks; /* Array of old LSM chunks */
size_t old_alloc; /* Space allocated for old chunks */
int nold_chunks; /* Number of old chunks */
@@ -77,3 +82,12 @@ struct __wt_lsm_data_source {
WT_RWLOCK *rwlock;
};
struct __wt_lsm_worker_cookie {
WT_LSM_CHUNK **chunk_array;
size_t chunk_alloc;
int nchunks;
#define WT_LSM_WORKER_MERGE 0x01
#define WT_LSM_WORKER_CHECKPOINT 0x02
uint32_t flags;
};

View File

@@ -52,6 +52,9 @@
#define WT_SKIP_MAXDEPTH 10
#define WT_SKIP_PROBABILITY (UINT32_MAX >> 2)
/* The number of hazard references that can be in use is grown dynamically. */
#define WT_HAZARD_INCR 10
/*
* Quiet compiler warnings about unused parameters.
*/

View File

@@ -119,6 +119,7 @@ struct __wt_connection_stats {
WT_STATS memfree;
WT_STATS total_read_io;
WT_STATS total_write_io;
WT_STATS txn_fail_cache;
WT_STATS txn_begin;
WT_STATS txn_commit;
WT_STATS txn_rollback;

View File

@@ -34,15 +34,17 @@ typedef uint32_t wt_txnid_t;
* remains in the system after 2 billion transactions it can no longer be
* compared with current transaction ID.
*/
#define TXNID_LT(t1, t2) \
(((t1) == (t2) || \
(t1) == WT_TXN_ABORTED || (t2) == WT_TXN_NONE) ? 0 : \
((t1) == WT_TXN_NONE || (t2) == WT_TXN_ABORTED) ? 1 : \
#define TXNID_LE(t1, t2) \
(((t1) == WT_TXN_ABORTED || (t2) == WT_TXN_NONE) ? 0 : \
((t1) == WT_TXN_NONE || (t2) == WT_TXN_ABORTED) ? 1 : \
(t2) - (t1) < (UINT32_MAX / 2))
#define TXNID_LT(t1, t2) \
((t1) != (t2) && TXNID_LE(t1, t2))
struct __wt_txn_state {
wt_txnid_t id;
wt_txnid_t snap_min;
volatile wt_txnid_t id;
volatile wt_txnid_t snap_min;
};
struct __wt_txn_global {
@@ -89,6 +91,12 @@ struct __wt_txn {
size_t modref_alloc;
u_int modref_count;
/*
* Count of unsuccessful eviction attempts, used to abort if the cache
* is full and no progress can be made.
*/
u_int eviction_fails;
#define TXN_AUTOCOMMIT 0x01
#define TXN_ERROR 0x02
#define TXN_RUNNING 0x04

View File

@@ -133,9 +133,7 @@ __wt_txn_visible_all(WT_SESSION_IMPL *session, wt_txnid_t id)
WT_TXN *txn;
txn = &session->txn;
if (TXNID_LT(txn->oldest_snap_min, id))
return (0);
return (1);
return (TXNID_LT(id, txn->oldest_snap_min));
}
/*

View File

@@ -522,15 +522,14 @@ struct __wt_session {
* @config{append, append the value as a new record\, creating a new
* record number key; valid only for cursors with record number keys.,a
* boolean flag; default \c false.}
* @config{bulk, configure the cursor for bulk loads; bulk-load is a
* fast load path for newly created objects and only newly created
* objects may be bulk-loaded. Cursors configured for bulk load only
* support the WT_CURSOR::insert and WT_CURSOR::close methods.,a boolean
* flag; default \c false.}
* @config{checkpoint, the name of a checkpoint to open; the reserved
* checkpoint name "WiredTigerCheckpoint" opens a cursor on the most
* recent internal checkpoint taken for the object.,a string; default
* empty.}
* @config{bulk, configure the cursor for bulk loads\, a fast load path
* that may only be used for newly created objects. Cursors configured
* for bulk load only support the WT_CURSOR::insert and WT_CURSOR::close
* methods.,a boolean flag; default \c false.}
* @config{checkpoint, the name of a checkpoint to open (the reserved
* name "WiredTigerCheckpoint" opens the most recent internal checkpoint
* taken for the object). The cursor does not support data
* modification.,a string; default empty.}
* @config{dump, configure the cursor for dump format inputs and
* outputs: "hex" selects a simple hexadecimal format\, "print" selects
* a format where only non-printing characters are hexadecimal encoded.
@@ -663,6 +662,8 @@ struct __wt_session {
* for LSM bloom filters..,an integer between 2 and 100; default \c 4.}
* @config{lsm_chunk_size, the maximum size of the in-memory chunk of an
* LSM tree.,an integer between 512K and 500MB; default \c 2MB.}
* @config{lsm_merge_max, the maximum number of chunks to include in a
* merge operation.,an integer between 2 and 100; default \c 15.}
* @config{prefix_compression, configure row-store format key prefix
* compression.,a boolean flag; default \c true.}
* @config{split_pct, the Btree page split size as a percentage of the
@@ -1146,8 +1147,8 @@ struct __wt_connection {
* may need quoting\, for example\,
* <code>extensions=("/path/to/ext.so"="entry=my_entry")</code>.,a list of
* strings; default empty.}
* @config{hazard_max, number of simultaneous hazard references per session
* handle.,an integer greater than or equal to 15; default \c 30.}
* @config{hazard_max, maximum number of simultaneous hazard references per
* session handle.,an integer greater than or equal to 15; default \c 1000.}
* @config{logging, enable logging.,a boolean flag; default \c false.}
* @config{multiprocess, permit sharing between processes (will automatically
* start an RPC server for primary processes and use RPC for secondary
@@ -1669,12 +1670,14 @@ extern int wiredtiger_extension_init(WT_SESSION *session,
#define WT_STAT_total_read_io 18
/*! total write I/Os */
#define WT_STAT_total_write_io 19
/*! transaction failures due to cache overflow */
#define WT_STAT_txn_fail_cache 20
/*! transactions */
#define WT_STAT_txn_begin 20
#define WT_STAT_txn_begin 21
/*! transactions committed */
#define WT_STAT_txn_commit 21
#define WT_STAT_txn_commit 22
/*! transactions rolled-back */
#define WT_STAT_txn_rollback 22
#define WT_STAT_txn_rollback 23
/*!
* @}

View File

@@ -135,6 +135,8 @@ struct __wt_lsm_data_source;
typedef struct __wt_lsm_data_source WT_LSM_DATA_SOURCE;
struct __wt_lsm_tree;
typedef struct __wt_lsm_tree WT_LSM_TREE;
struct __wt_lsm_worker_cookie;
typedef struct __wt_lsm_worker_cookie WT_LSM_WORKER_COOKIE;
struct __wt_named_collator;
typedef struct __wt_named_collator WT_NAMED_COLLATOR;
struct __wt_named_compressor;

View File

@@ -33,7 +33,7 @@
CURSOR_UPDATE_API_CALL(cursor, session, n, NULL); \
WT_ERR(__clsm_enter(clsm))
static int __clsm_open_cursors(WT_CURSOR_LSM *);
static int __clsm_open_cursors(WT_CURSOR_LSM *, int);
static int __clsm_search(WT_CURSOR *);
static inline int
@@ -41,7 +41,7 @@ __clsm_enter(WT_CURSOR_LSM *clsm)
{
if (!F_ISSET(clsm, WT_CLSM_MERGE) &&
clsm->dsk_gen != clsm->lsm_tree->dsk_gen)
WT_RET(__clsm_open_cursors(clsm));
WT_RET(__clsm_open_cursors(clsm, 0));
return (0);
}
@@ -54,7 +54,7 @@ static WT_ITEM __lsm_tombstone = { "", 0, 0, NULL, 0 };
#define WT_LSM_NEEDVALUE(c) do { \
WT_CURSOR_NEEDVALUE(c); \
if (__clsm_deleted(&(c)->value)) \
if (__clsm_deleted((WT_CURSOR_LSM *)(c), &(c)->value)) \
WT_ERR(__wt_cursor_kv_not_set(cursor, 0)); \
} while (0)
@@ -63,9 +63,9 @@ static WT_ITEM __lsm_tombstone = { "", 0, 0, NULL, 0 };
* Check whether the current value is a tombstone.
*/
static inline int
__clsm_deleted(WT_ITEM *item)
__clsm_deleted(WT_CURSOR_LSM *clsm, WT_ITEM *item)
{
return (item->size == 0);
return (!F_ISSET(clsm, WT_CLSM_MINOR_MERGE) && item->size == 0);
}
/*
@@ -106,7 +106,7 @@ __clsm_close_cursors(WT_CURSOR_LSM *clsm)
* Open cursors for the current set of files.
*/
static int
__clsm_open_cursors(WT_CURSOR_LSM *clsm)
__clsm_open_cursors(WT_CURSOR_LSM *clsm, int start_chunk)
{
WT_CURSOR *c, **cp;
WT_DECL_RET;
@@ -115,6 +115,8 @@ __clsm_open_cursors(WT_CURSOR_LSM *clsm)
WT_SESSION_IMPL *session;
const char *ckpt_cfg[] = API_CONF_DEFAULTS(session, open_cursor,
"checkpoint=WiredTigerCheckpoint");
const char *merge_cfg[] = API_CONF_DEFAULTS(session, open_cursor,
"checkpoint=WiredTigerCheckpoint,no_cache");
int i, nchunks;
session = (WT_SESSION_IMPL *)clsm->iface.session;
@@ -152,10 +154,11 @@ __clsm_open_cursors(WT_CURSOR_LSM *clsm)
* Read from the checkpoint if the file has been written.
* Once all cursors switch, the in-memory tree can be evicted.
*/
chunk = lsm_tree->chunk[i];
chunk = lsm_tree->chunk[i + start_chunk];
ret = __wt_curfile_open(session,
chunk->uri, &clsm->iface,
F_ISSET(chunk, WT_LSM_CHUNK_ONDISK) ? ckpt_cfg : NULL, cp);
!F_ISSET(chunk, WT_LSM_CHUNK_ONDISK) ? NULL :
(F_ISSET(clsm, WT_CLSM_MERGE) ? merge_cfg : ckpt_cfg), cp);
/*
* XXX kludge: we may have an empty chunk where no checkpoint
@@ -200,15 +203,17 @@ err: __wt_spin_unlock(session, &lsm_tree->lock);
* Initialize an LSM cursor for a (major) merge.
*/
int
__wt_clsm_init_merge(WT_CURSOR *cursor, int nchunks)
__wt_clsm_init_merge(WT_CURSOR *cursor, int start_chunk, int nchunks)
{
WT_CURSOR_LSM *clsm;
clsm = (WT_CURSOR_LSM *)cursor;
F_SET(clsm, WT_CLSM_MERGE);
if (start_chunk != 0)
F_SET(clsm, WT_CLSM_MINOR_MERGE);
clsm->nchunks = nchunks;
return (__clsm_open_cursors(clsm));
return (__clsm_open_cursors(clsm, start_chunk));
}
/*
@@ -255,7 +260,7 @@ __clsm_get_current(
WT_RET(current->get_key(current, &c->key));
WT_RET(current->get_value(current, &c->value));
if ((*deletedp = __clsm_deleted(&c->value)) == 0)
if ((*deletedp = __clsm_deleted(clsm, &c->value)) == 0)
F_SET(c, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
else
F_CLR(c, WT_CURSTD_KEY_SET | WT_CURSTD_VALUE_SET);
@@ -517,7 +522,7 @@ __clsm_search(WT_CURSOR *cursor)
WT_ERR(c->get_key(c, &cursor->key));
WT_ERR(c->get_value(c, &cursor->value));
clsm->current = c;
if (__clsm_deleted(&cursor->value))
if (__clsm_deleted(clsm, &cursor->value))
ret = WT_NOTFOUND;
goto done;
} else if (ret != WT_NOTFOUND)
@@ -573,7 +578,7 @@ __clsm_search_near(WT_CURSOR *cursor, int *exactp)
goto err;
WT_ERR(c->get_value(c, &v));
deleted = __clsm_deleted(&v);
deleted = __clsm_deleted(clsm, &v);
if (cmp == 0 && !deleted) {
clsm->current = c;
@@ -588,13 +593,13 @@ __clsm_search_near(WT_CURSOR *cursor, int *exactp)
while (deleted && (ret = c->next(c)) == 0) {
cmp = 1;
WT_ERR(c->get_value(c, &v));
deleted = __clsm_deleted(&v);
deleted = __clsm_deleted(clsm, &v);
}
WT_ERR_NOTFOUND_OK(ret);
while (deleted && (ret = c->prev(c)) == 0) {
cmp = -1;
WT_ERR(c->get_value(c, &v));
deleted = __clsm_deleted(&v);
deleted = __clsm_deleted(clsm, &v);
}
WT_ERR_NOTFOUND_OK(ret);
if (deleted)
@@ -698,7 +703,7 @@ __clsm_put(
clsm->current = primary;
if ((memsizep = lsm_tree->memsizep) != NULL &&
*memsizep > lsm_tree->threshold) {
*memsizep > lsm_tree->chunk_size) {
/*
* Close our cursors: if we are the only open cursor, this
* means the btree handle is unlocked.

View File

@@ -14,10 +14,10 @@
*/
int
__wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
WT_LSM_TREE *lsm_tree, int nchunks, WT_LSM_CHUNK **chunkp)
WT_LSM_TREE *lsm_tree, int start_chunk, int nchunks, WT_LSM_CHUNK **chunkp)
{
WT_LSM_CHUNK *chunk;
size_t chunk_sz;
size_t chunk_sz, chunks_after_merge;
int i, j;
/* Setup the array of obsolete chunks. */
@@ -34,7 +34,9 @@ __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
/* Copy entries one at a time, so we can reuse gaps in the list. */
for (i = j = 0; j < nchunks && i < lsm_tree->nold_chunks; i++) {
if (lsm_tree->old_chunks[i] == NULL) {
lsm_tree->old_chunks[i] = lsm_tree->chunk[j++];
lsm_tree->old_chunks[i] =
lsm_tree->chunk[start_chunk + j];
++j;
--lsm_tree->old_avail;
}
}
@@ -42,13 +44,15 @@ __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
WT_ASSERT(session, j == nchunks);
/* Update the current chunk list. */
memmove(lsm_tree->chunk + 1, lsm_tree->chunk + nchunks,
(lsm_tree->nchunks - nchunks) * sizeof(*lsm_tree->chunk));
chunks_after_merge = lsm_tree->nchunks - (nchunks + start_chunk);
memmove(lsm_tree->chunk + start_chunk + 1,
lsm_tree->chunk + start_chunk + nchunks,
chunks_after_merge * sizeof(*lsm_tree->chunk));
lsm_tree->nchunks -= nchunks - 1;
memset(lsm_tree->chunk + lsm_tree->nchunks, 0,
(nchunks - 1) * sizeof(*lsm_tree->chunk));
WT_RET(__wt_calloc_def(session, 1, &chunk));
lsm_tree->chunk[0] = chunk;
lsm_tree->chunk[start_chunk] = chunk;
lsm_tree->dsk_gen++;
*chunkp = chunk;
@@ -56,11 +60,11 @@ __wt_lsm_merge_update_tree(WT_SESSION_IMPL *session,
}
/*
* __wt_lsm_major_merge --
* Merge a set of chunks of an LSM tree including the oldest.
* __wt_lsm_merge --
* Merge a set of chunks of an LSM tree.
*/
int
__wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
__wt_lsm_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
{
WT_BLOOM *bloom;
WT_CURSOR *src, *dest;
@@ -71,7 +75,7 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
WT_SESSION *wt_session;
const char *dest_uri;
uint64_t insert_count, record_count;
int dest_id, i, nchunks;
int dest_id, i, nchunks, start_chunk;
src = dest = NULL;
dest_uri = NULL;
@@ -94,21 +98,46 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
return (WT_NOTFOUND);
/*
* We have a limited number of hazard references, and we want to bound
* the amount of work in the merge.
*
* Use the lsm_tree lock to read the chunks (so no switches occur), but
* avoid holding it while the merge is in progress: that may take a
* long time.
*/
nchunks = WT_MIN((int)S2C(session)->hazard_size / 2, nchunks);
__wt_spin_lock(session, &lsm_tree->lock);
/* Only include chunks that are on disk */
while (nchunks > 1 &&
(!F_ISSET(lsm_tree->chunk[nchunks - 1], WT_LSM_CHUNK_ONDISK) ||
lsm_tree->chunk[nchunks - 1]->ncursor > 0))
--nchunks;
/*
* Look for a minor merge to do in preference to a major merge.
*
* The difference is whether the oldest chunk is involved: if it is, we
* can discard tombstones, because there can be no older record to
* marked deleted.
*
* We look at the Bloom URI to decide whether a chunk is the result of
* an earlier merge. In a minor merge, we take as many chunks as we
* can that have not yet been merged. If there are less than 2 "new"
* chunks, fall back to a major merge.
*/
for (i = 0; i < nchunks; i++)
if (lsm_tree->chunk[i]->bloom_uri == NULL)
break;
if (i < nchunks - 2) {
start_chunk = i;
nchunks -= i;
} else
start_chunk = 0;
/* Respect the configured limit on the number of chunks to merge. */
if (nchunks > (int)lsm_tree->merge_max)
nchunks = (int)lsm_tree->merge_max;
for (record_count = 0, i = 0; i < nchunks; i++)
record_count += lsm_tree->chunk[i]->count;
record_count += lsm_tree->chunk[start_chunk + i]->count;
__wt_spin_unlock(session, &lsm_tree->lock);
if (nchunks <= 1)
@@ -118,7 +147,8 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
dest_id = WT_ATOMIC_ADD(lsm_tree->last, 1);
WT_VERBOSE_RET(session, lsm,
"Merging first %d chunks into %d\n", nchunks, dest_id);
"Merging chunks %d-%d into %d (%" PRIu64 " records)\n",
start_chunk, start_chunk + nchunks, dest_id, record_count);
if (record_count != 0) {
WT_RET(__wt_scr_alloc(session, 0, &bbuf));
@@ -140,7 +170,7 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
WT_ERR(wt_session->open_cursor(
wt_session, lsm_tree->name, NULL, NULL, &src));
F_SET(src, WT_CURSTD_RAW);
WT_ERR(__wt_clsm_init_merge(src, nchunks));
WT_ERR(__wt_clsm_init_merge(src, start_chunk, nchunks));
WT_WITH_SCHEMA_LOCK(session, ret = __wt_lsm_tree_create_chunk(
session, lsm_tree, dest_id, &dest_uri));
@@ -174,7 +204,8 @@ __wt_lsm_major_merge(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
WT_ERR(ret);
__wt_spin_lock(session, &lsm_tree->lock);
ret = __wt_lsm_merge_update_tree(session, lsm_tree, nchunks, &chunk);
ret = __wt_lsm_merge_update_tree(
session, lsm_tree, start_chunk, nchunks, &chunk);
chunk->uri = dest_uri;
dest_uri = NULL;

View File

@@ -44,8 +44,10 @@ __wt_lsm_meta_read(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
else if (WT_STRING_MATCH(
"lsm_bloom_hash_count", ck.str, ck.len))
lsm_tree->bloom_hash_count = (uint32_t)cv.val;
else if (WT_STRING_MATCH("threshold", ck.str, ck.len))
lsm_tree->threshold = (uint32_t)cv.val;
else if (WT_STRING_MATCH("lsm_chunk_size", ck.str, ck.len))
lsm_tree->chunk_size = (uint32_t)cv.val;
else if (WT_STRING_MATCH("lsm_merge_max", ck.str, ck.len))
lsm_tree->merge_max = (uint32_t)cv.val;
else if (WT_STRING_MATCH("last", ck.str, ck.len))
lsm_tree->last = (int)cv.val;
else if (WT_STRING_MATCH("chunks", ck.str, ck.len)) {
@@ -135,9 +137,11 @@ __wt_lsm_meta_write(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
lsm_tree->file_config,
lsm_tree->key_format, lsm_tree->value_format));
WT_ERR(__wt_buf_catfmt(session, buf,
",last=%" PRIu32 ",threshold=%" PRIu64
",last=%" PRIu32
",lsm_chunk_size=%" PRIu64 ",lsm_merge_max=%" PRIu32
",lsm_bloom_bit_count=%" PRIu32 ",lsm_bloom_hash_count=%" PRIu32,
lsm_tree->last, (uint64_t)lsm_tree->threshold,
lsm_tree->last,
(uint64_t)lsm_tree->chunk_size, lsm_tree->merge_max,
lsm_tree->bloom_bit_count, lsm_tree->bloom_hash_count));
WT_ERR(__wt_buf_catfmt(session, buf, ",chunks=["));
for (i = 0; i < lsm_tree->nchunks; i++) {
@@ -166,7 +170,10 @@ __wt_lsm_meta_write(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
WT_ERR(__wt_buf_catfmt(session, buf, "\"%s\"", chunk->uri));
}
WT_ERR(__wt_buf_catfmt(session, buf, "]"));
WT_ERR(__wt_metadata_update(session, lsm_tree->name, buf->data));
__wt_spin_lock(session, &S2C(session)->metadata_lock);
ret = __wt_metadata_update(session, lsm_tree->name, buf->data);
__wt_spin_unlock(session, &S2C(session)->metadata_lock);
WT_ERR(ret);
err: __wt_scr_free(&buf);
return (ret);

View File

@@ -7,6 +7,7 @@
#include "wt_internal.h"
static int __lsm_tree_open_check(WT_SESSION_IMPL *, WT_LSM_TREE *);
static int __lsm_tree_open(WT_SESSION_IMPL *, const char *, WT_LSM_TREE **);
/*
@@ -62,11 +63,13 @@ __lsm_tree_close(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
if (F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
F_CLR(lsm_tree, WT_LSM_TREE_WORKING);
WT_TRET(__wt_thread_join(lsm_tree->worker_tid));
WT_TRET(__wt_thread_join(lsm_tree->ckpt_tid));
}
/*
* Close the session and free its hazard array (necessary because
* we set WT_SESSION_INTERNAL to simplify shutdown ordering.
* Close the worker thread sessions and free their hazard arrays
* (necessary because we set WT_SESSION_INTERNAL to simplify shutdown
* ordering.
*
* Do this in the main thread to avoid deadlocks.
*/
@@ -83,6 +86,19 @@ __lsm_tree_close(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
*/
__wt_free(NULL, lsm_tree->worker_session->hazard);
}
if (lsm_tree->ckpt_session != NULL) {
F_SET(lsm_tree->ckpt_session,
F_ISSET(session, WT_SESSION_SCHEMA_LOCKED));
wt_session = &lsm_tree->ckpt_session->iface;
WT_TRET(wt_session->close(wt_session, NULL));
/*
* This is safe after the close because session handles are
* not freed, but are managed by the connection.
*/
__wt_free(NULL, lsm_tree->ckpt_session->hazard);
}
return (ret);
}
@@ -167,11 +183,17 @@ __lsm_tree_start_worker(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
lsm_tree->worker_session = (WT_SESSION_IMPL *)wt_session;
F_SET(lsm_tree->worker_session, WT_SESSION_INTERNAL);
WT_RET(wt_conn->open_session(wt_conn, NULL, NULL, &wt_session));
lsm_tree->ckpt_session = (WT_SESSION_IMPL *)wt_session;
F_SET(lsm_tree->ckpt_session, WT_SESSION_INTERNAL);
F_SET(lsm_tree, WT_LSM_TREE_WORKING);
/* The new thread will rely on the WORKING value being visible. */
WT_FULL_BARRIER();
WT_RET(__wt_thread_create(
&lsm_tree->worker_tid, __wt_lsm_worker, lsm_tree));
WT_RET(__wt_thread_create(
&lsm_tree->ckpt_tid, __wt_lsm_checkpoint_worker, lsm_tree));
return (0);
}
@@ -219,12 +241,14 @@ __wt_lsm_tree_create(WT_SESSION_IMPL *session,
WT_ERR(__wt_strndup(session, cval.str, cval.len,
&lsm_tree->value_format));
WT_ERR(__wt_config_gets(session, cfg, "lsm_chunk_size", &cval));
lsm_tree->threshold = (uint32_t)cval.val;
WT_ERR(__wt_config_gets(session, cfg, "lsm_bloom_bit_count", &cval));
lsm_tree->bloom_bit_count = (uint32_t)cval.val;
WT_ERR(__wt_config_gets(session, cfg, "lsm_bloom_hash_count", &cval));
lsm_tree->bloom_hash_count = (uint32_t)cval.val;
WT_ERR(__wt_config_gets(session, cfg, "lsm_chunk_size", &cval));
lsm_tree->chunk_size = (uint32_t)cval.val;
WT_ERR(__wt_config_gets(session, cfg, "lsm_merge_max", &cval));
lsm_tree->merge_max = (uint32_t)cval.val;
WT_ERR(__wt_scr_alloc(session, 0, &buf));
WT_ERR(__wt_buf_fmt(session, buf,
@@ -238,8 +262,12 @@ __wt_lsm_tree_create(WT_SESSION_IMPL *session,
__lsm_tree_discard(session, lsm_tree);
lsm_tree = NULL;
/* Open our new tree and add it to the handle cache. */
WT_ERR(__lsm_tree_open(session, uri, &lsm_tree));
/*
* Open our new tree and add it to the handle cache. Don't discard on
* error the returned handle is NULL on error, and the metadata tracking
* macros handle cleaning up on failure.
*/
ret = __lsm_tree_open(session, uri, &lsm_tree);
if (0) {
err: __lsm_tree_discard(session, lsm_tree);
@@ -248,6 +276,34 @@ err: __lsm_tree_discard(session, lsm_tree);
return (ret);
}
/*
* __lsm_tree_open_check --
* Validate the configuration of an LSM tree.
*/
static int
__lsm_tree_open_check(
WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
{
WT_CONFIG_ITEM cval;
const char *cfg[] = API_CONF_DEFAULTS(
session, create, lsm_tree->file_config);
uint32_t maxleafpage;
uint64_t req;
WT_RET(__wt_config_gets(
session, cfg, "leaf_page_max", &cval));
maxleafpage = (uint32_t)cval.val;
/* Three chunks, plus one page for each participant in a merge. */
req = 3 * lsm_tree->chunk_size + (lsm_tree->merge_max * maxleafpage);
if (S2C(session)->cache_size < req)
WT_RET_MSG(session, EINVAL,
"The LSM configuration requires a cache size of at least %"
PRIu64 ". Configured size is %" PRIu64,
req, S2C(session)->cache_size);
return (0);
}
/*
* __lsm_tree_open --
* Open an LSM tree structure.
@@ -273,6 +329,12 @@ __lsm_tree_open(
lsm_tree->filename = lsm_tree->name + strlen("lsm:");
WT_ERR(__wt_lsm_meta_read(session, lsm_tree));
/*
* Sanity check the configuration. Do it now since this is the first
* time we have the LSM tree configuration.
*/
WT_ERR(__lsm_tree_open_check(session, lsm_tree));
if (lsm_tree->nchunks == 0)
WT_ERR(__wt_lsm_tree_switch(session, lsm_tree));
@@ -332,7 +394,7 @@ __wt_lsm_tree_switch(
WT_VERBOSE_RET(session, lsm,
"Tree switch to: %d because %d > %d", lsm_tree->last + 1,
(lsm_tree->memsizep == NULL ? 0 : (int)*lsm_tree->memsizep),
(int)lsm_tree->threshold);
(int)lsm_tree->chunk_size);
lsm_tree->memsizep = NULL;
@@ -500,7 +562,7 @@ __wt_lsm_tree_truncate(
/* Mark all chunks old. */
WT_ERR(__wt_lsm_merge_update_tree(
session, lsm_tree, lsm_tree->nchunks, &chunk));
session, lsm_tree, 0, lsm_tree->nchunks, &chunk));
/* Create the new chunk. */
WT_ERR(__wt_lsm_tree_create_chunk(

View File

@@ -7,8 +7,8 @@
#include "wt_internal.h"
static int
__lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
static int __lsm_free_chunks(WT_SESSION_IMPL *, WT_LSM_TREE *);
static int __lsm_copy_chunks(WT_LSM_TREE *, WT_LSM_WORKER_COOKIE *);
/*
* __wt_lsm_worker --
@@ -18,78 +18,20 @@ __lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree);
void *
__wt_lsm_worker(void *arg)
{
WT_DECL_RET;
WT_LSM_CHUNK *chunk, **chunk_array;
WT_LSM_TREE *lsm_tree;
WT_SESSION_IMPL *session;
const char *cfg[] = API_CONF_DEFAULTS(session, checkpoint, NULL);
size_t chunk_alloc;
int i, nchunks, progress;
int progress;
lsm_tree = arg;
session = lsm_tree->worker_session;
chunk_array = NULL;
chunk_alloc = 0;
while (F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
progress = 0;
__wt_spin_lock(session, &lsm_tree->lock);
if (!F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
__wt_spin_unlock(session, &lsm_tree->lock);
break;
}
/*
* Take a copy of the current state of the LSM tree. Skip
* the last chunk - since it is the active one and not relevant
* to merge operations.
*/
for (nchunks = lsm_tree->nchunks - 1;
nchunks > 0 && lsm_tree->chunk[nchunks - 1]->ncursor > 0;
--nchunks)
;
if (chunk_alloc < lsm_tree->chunk_alloc)
ret = __wt_realloc(session,
&chunk_alloc, lsm_tree->chunk_alloc,
&chunk_array);
if (ret == 0 && nchunks > 0)
memcpy(chunk_array, lsm_tree->chunk,
nchunks * sizeof(*lsm_tree->chunk));
__wt_spin_unlock(session, &lsm_tree->lock);
WT_ERR(ret);
/*
* Write checkpoints in all completed files, then find
* something to merge.
*/
for (i = 0; i < nchunks; i++) {
chunk = chunk_array[i];
if (F_ISSET(chunk, WT_LSM_CHUNK_ONDISK) ||
chunk->ncursor > 0)
continue;
/* XXX durability: need to checkpoint the metadata? */
/*
* NOTE: we pass a non-NULL config, because otherwise
* __wt_checkpoint thinks we're closing the file.
*/
WT_WITH_SCHEMA_LOCK(session, ret =
__wt_schema_worker(session, chunk->uri,
__wt_checkpoint, cfg, 0));
if (ret == 0) {
__wt_spin_lock(session, &lsm_tree->lock);
F_SET(lsm_tree->chunk[i], WT_LSM_CHUNK_ONDISK);
lsm_tree->dsk_gen++;
__wt_spin_unlock(session, &lsm_tree->lock);
progress = 1;
}
}
/* Clear any state from previous worker thread iterations. */
session->btree = NULL;
if (nchunks > 0 && __wt_lsm_major_merge(session, lsm_tree) == 0)
if (__wt_lsm_merge(session, lsm_tree) == 0)
progress = 1;
/* Clear any state from previous worker thread iterations. */
@@ -103,11 +45,120 @@ __wt_lsm_worker(void *arg)
__wt_sleep(0, 10);
}
err: __wt_free(session, chunk_array);
return (NULL);
}
/*
* __wt_lsm_checkpoint_worker --
* A worker thread for an LSM tree, responsible for checkpointing chunks
* once they become read only.
*/
void *
__wt_lsm_checkpoint_worker(void *arg)
{
WT_DECL_RET;
WT_LSM_CHUNK *chunk;
WT_LSM_TREE *lsm_tree;
WT_LSM_WORKER_COOKIE cookie;
WT_SESSION_IMPL *session;
const char *cfg[] = { "name=,drop=", NULL };
int i, j;
lsm_tree = arg;
session = lsm_tree->ckpt_session;
memset(&cookie, 0, sizeof(cookie));
F_SET(&cookie, WT_LSM_WORKER_CHECKPOINT);
while (F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
WT_ERR(__lsm_copy_chunks(lsm_tree, &cookie));
/* Write checkpoints in all completed files. */
for (i = 0, j = 0; i < cookie.nchunks; i++) {
chunk = cookie.chunk_array[i];
if (F_ISSET(chunk, WT_LSM_CHUNK_ONDISK))
continue;
/*
* NOTE: we pass a non-NULL config, because otherwise
* __wt_checkpoint thinks we're closing the file.
*/
WT_WITH_SCHEMA_LOCK(session,
ret = __wt_schema_worker(session, chunk->uri,
__wt_checkpoint, cfg, 0));
if (ret == 0) {
++j;
__wt_spin_lock(session, &lsm_tree->lock);
F_SET(chunk, WT_LSM_CHUNK_ONDISK);
lsm_tree->dsk_gen++;
__wt_spin_unlock(session, &lsm_tree->lock);
WT_VERBOSE_ERR(session, lsm,
"LSM worker checkpointed %d.", i);
}
}
if (j == 0)
__wt_sleep(0, 10);
}
err: __wt_free(session, cookie.chunk_array);
return (NULL);
}
/*
* Take a copy of part of the LSM tree chunk array so that we can work on
* the contents without holding the LSM tree handle lock long term.
*/
static int
__lsm_copy_chunks(WT_LSM_TREE *lsm_tree, WT_LSM_WORKER_COOKIE *cookie)
{
WT_DECL_RET;
WT_SESSION_IMPL *session;
int nchunks;
/* Always return zero chunks on error. */
cookie->nchunks = 0;
if (F_ISSET(cookie, WT_LSM_WORKER_CHECKPOINT))
session = lsm_tree->ckpt_session;
else
session = lsm_tree->worker_session;
__wt_spin_lock(session, &lsm_tree->lock);
if (!F_ISSET(lsm_tree, WT_LSM_TREE_WORKING)) {
__wt_spin_unlock(session, &lsm_tree->lock);
/* The actual error value is ignored. */
return (WT_ERROR);
}
/*
* Take a copy of the current state of the LSM tree. Skip
* the last chunk - since it is the active one and not relevant
* to merge operations.
*/
nchunks = lsm_tree->nchunks - 1;
/* Checkpoint doesn't care if there are active cursors, merge does. */
if (F_ISSET(cookie, WT_LSM_WORKER_MERGE)) {
for (; nchunks > 0 && lsm_tree->chunk[nchunks - 1]->ncursor > 0;
--nchunks)
;
}
/*
* If the tree array of active chunks is larger than our current buffer,
* increase the size of our current buffer to match.
*/
if (cookie->chunk_alloc < lsm_tree->chunk_alloc)
ret = __wt_realloc(session,
&cookie->chunk_alloc, lsm_tree->chunk_alloc,
&cookie->chunk_array);
if (ret == 0 && nchunks > 0)
memcpy(cookie->chunk_array, lsm_tree->chunk,
nchunks * sizeof(*lsm_tree->chunk));
__wt_spin_unlock(session, &lsm_tree->lock);
if (ret == 0)
cookie->nchunks = nchunks;
return (ret);
}
static int
__lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
{
@@ -137,6 +188,10 @@ __lsm_free_chunks(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_tree)
chunk->bloom_uri = NULL;
} else if (ret != EBUSY)
goto err;
if (ret == EBUSY)
WT_VERBOSE_ERR(session, lsm,
"LSM worker bloom drop busy: %s.",
chunk->bloom_uri);
}
if (chunk->uri != NULL) {
WT_WITH_SCHEMA_LOCK(session, ret =

View File

@@ -702,8 +702,15 @@ __wt_open_session(WT_CONNECTION_IMPL *conn, int internal,
* first time we open this session.
*/
if (session_ret->hazard == NULL)
WT_ERR(__wt_calloc(session, conn->hazard_size,
WT_ERR(__wt_calloc(session, conn->hazard_max,
sizeof(WT_HAZARD), &session_ret->hazard));
/*
* Set an initial size for the hazard array. It will be grown as
* required up to hazard_max. The hazard_size is reset on close, since
* __wt_hazard_close ensures the array is cleared - so it is safe to
* reset the starting size on each open.
*/
session_ret->hazard_size = WT_HAZARD_INCR;
/*
* Public sessions are automatically closed during WT_CONNECTION->close.

View File

@@ -103,20 +103,29 @@ __wt_session_release_btree(WT_SESSION_IMPL *session)
btree = session->btree;
/*
* If we had special flags set, close the handle so that future access
* can get a handle without special flags.
* If we had no cache flag set, close and free the btree handle. It was
* never added to the handle cache.
*/
if (F_ISSET(btree, WT_BTREE_DISCARD | WT_BTREE_SPECIAL_FLAGS)) {
WT_ASSERT(session, F_ISSET(btree, WT_BTREE_EXCLUSIVE));
F_CLR(btree, WT_BTREE_DISCARD);
if (F_ISSET(btree, WT_BTREE_NO_CACHE))
WT_RET(__wt_conn_btree_discard_single(session, btree));
else {
WT_RET(__wt_conn_btree_sync_and_close(session));
/*
* If we had special flags set, close the handle so that future
* access can get a handle without special flags.
*/
if (F_ISSET(btree, WT_BTREE_DISCARD | WT_BTREE_SPECIAL_FLAGS)) {
WT_ASSERT(session, F_ISSET(btree, WT_BTREE_EXCLUSIVE));
F_CLR(btree, WT_BTREE_DISCARD);
WT_RET(__wt_conn_btree_sync_and_close(session));
}
if (F_ISSET(btree, WT_BTREE_EXCLUSIVE))
F_CLR(btree, WT_BTREE_EXCLUSIVE);
__wt_rwunlock(session, btree->rwlock);
}
if (F_ISSET(btree, WT_BTREE_EXCLUSIVE))
F_CLR(btree, WT_BTREE_EXCLUSIVE);
__wt_rwunlock(session, btree->rwlock);
session->btree = NULL;
return (ret);
@@ -193,34 +202,41 @@ __wt_session_get_btree(WT_SESSION_IMPL *session,
WT_DECL_RET;
btree = NULL;
btree_session = NULL;
TAILQ_FOREACH(btree_session, &session->btrees, q) {
btree = btree_session->btree;
if (strcmp(uri, btree->name) != 0)
continue;
if ((checkpoint == NULL && btree->checkpoint == NULL) ||
(checkpoint != NULL && btree->checkpoint != NULL &&
strcmp(checkpoint, btree->checkpoint) == 0))
break;
}
if (btree_session == NULL)
session->btree = NULL;
else {
session->btree = btree;
/*
* Try and lock the file; if we succeed, our "exclusive" state
* must match.
*/
if ((ret =
__wt_session_lock_btree(session, flags)) != WT_NOTFOUND) {
WT_ASSERT(session, ret != 0 ||
LF_ISSET(WT_BTREE_EXCLUSIVE) ==
F_ISSET(session->btree, WT_BTREE_EXCLUSIVE));
return (ret);
/*
* If the no cache flag is set, we never use the handle cache to
* store or retrieve the handle.
*/
if (!LF_ISSET(WT_BTREE_NO_CACHE)) {
TAILQ_FOREACH(btree_session, &session->btrees, q) {
btree = btree_session->btree;
if (strcmp(uri, btree->name) != 0)
continue;
if ((checkpoint == NULL && btree->checkpoint == NULL) ||
(checkpoint != NULL && btree->checkpoint != NULL &&
strcmp(checkpoint, btree->checkpoint) == 0))
break;
}
if (btree_session == NULL)
session->btree = NULL;
else {
session->btree = btree;
/*
* Try and lock the file; if we succeed, our "exclusive"
* state must match.
*/
if ((ret = __wt_session_lock_btree(
session, flags)) != WT_NOTFOUND) {
WT_ASSERT(session, ret != 0 ||
LF_ISSET(WT_BTREE_EXCLUSIVE) == F_ISSET(
session->btree, WT_BTREE_EXCLUSIVE));
return (ret);
}
ret = 0;
}
ret = 0;
}
/*
@@ -231,7 +247,7 @@ __wt_session_get_btree(WT_SESSION_IMPL *session,
ret = __wt_conn_btree_get(session, uri, checkpoint, cfg, flags));
WT_RET(ret);
if (btree_session == NULL)
if (btree_session == NULL && !LF_ISSET(WT_BTREE_NO_CACHE))
WT_RET(__wt_session_add_btree(session, NULL));
WT_ASSERT(session, LF_ISSET(WT_BTREE_LOCK_ONLY) ||

View File

@@ -23,11 +23,9 @@ __wt_hazard_set(WT_SESSION_IMPL *session, WT_REF *ref, int *busyp
)
{
WT_BTREE *btree;
WT_CONNECTION_IMPL *conn;
WT_HAZARD *hp;
btree = session->btree;
conn = S2C(session);
*busyp = 0;
/* If a file can never be evicted, hazard references aren't required. */
@@ -48,8 +46,16 @@ __wt_hazard_set(WT_SESSION_IMPL *session, WT_REF *ref, int *busyp
* state to WT_REF_LOCKED, then flushes memory and checks the hazard
* references).
*/
for (hp = session->hazard;
hp < session->hazard + conn->hazard_size; ++hp) {
for (hp = session->hazard; ; ++hp) {
/* Expand the number of hazard references if available.*/
if (hp >= session->hazard + session->hazard_size) {
if (session->hazard_size >= S2C(session)->hazard_max)
break;
WT_PUBLISH(session->hazard_size,
WT_MIN(session->hazard_size + WT_HAZARD_INCR,
S2C(session)->hazard_max));
}
if (hp->page != NULL)
continue;
@@ -114,11 +120,9 @@ void
__wt_hazard_clear(WT_SESSION_IMPL *session, WT_PAGE *page)
{
WT_BTREE *btree;
WT_CONNECTION_IMPL *conn;
WT_HAZARD *hp;
btree = session->btree;
conn = S2C(session);
/* If a file can never be evicted, hazard references aren't required. */
if (F_ISSET(btree, WT_BTREE_NO_HAZARD))
@@ -132,7 +136,7 @@ __wt_hazard_clear(WT_SESSION_IMPL *session, WT_PAGE *page)
/* Clear the caller's hazard pointer. */
for (hp = session->hazard;
hp < session->hazard + conn->hazard_size; ++hp)
hp < session->hazard + session->hazard_size; ++hp)
if (hp->page == page) {
/*
* Check to see if the page has grown too big and force
@@ -180,15 +184,12 @@ __wt_hazard_clear(WT_SESSION_IMPL *session, WT_PAGE *page)
void
__wt_hazard_close(WT_SESSION_IMPL *session)
{
WT_CONNECTION_IMPL *conn;
WT_HAZARD *hp;
int found;
conn = S2C(session);
/* Check for a set hazard reference and complain if we find one. */
for (found = 0, hp = session->hazard;
hp < session->hazard + conn->hazard_size; ++hp)
hp < session->hazard + session->hazard_size; ++hp)
if (hp->page != NULL) {
__wt_errx(session,
"session %p: hazard reference table not empty: "
@@ -212,7 +213,7 @@ __wt_hazard_close(WT_SESSION_IMPL *session)
* evicted.
*/
for (hp = session->hazard;
hp < session->hazard + conn->hazard_size; ++hp)
hp < session->hazard + session->hazard_size; ++hp)
if (hp->page != NULL)
__wt_hazard_clear(session, hp->page);
@@ -230,13 +231,10 @@ __wt_hazard_close(WT_SESSION_IMPL *session)
static void
__hazard_dump(WT_SESSION_IMPL *session)
{
WT_CONNECTION_IMPL *conn;
WT_HAZARD *hp;
conn = S2C(session);
for (hp = session->hazard;
hp < session->hazard + conn->hazard_size; ++hp)
hp < session->hazard + session->hazard_size; ++hp)
if (hp->page != NULL)
__wt_errx(session,
"session %p: hazard reference %p: %s, line %d",

View File

@@ -35,13 +35,10 @@ __wt_session_dump_all(WT_SESSION_IMPL *session)
void
__wt_session_dump(WT_SESSION_IMPL *session)
{
WT_CONNECTION_IMPL *conn;
WT_CURSOR *cursor;
WT_HAZARD *hp;
int first;
conn = S2C(session);
(void)__wt_msg(session, "session: %s%s%p",
session->name == NULL ? "" : session->name,
session->name == NULL ? "" : " ", session);
@@ -55,7 +52,7 @@ __wt_session_dump(WT_SESSION_IMPL *session)
first = 0;
for (hp = session->hazard;
hp < session->hazard + conn->hazard_size; ++hp) {
hp < session->hazard + session->hazard_size; ++hp) {
if (hp->page == NULL)
continue;
if (++first == 1)

View File

@@ -146,6 +146,8 @@ __wt_stat_alloc_connection_stats(WT_SESSION_IMPL *session, WT_CONNECTION_STATS *
stats->txn_ancient.desc = "ancient transactions";
stats->txn_begin.desc = "transactions";
stats->txn_commit.desc = "transactions committed";
stats->txn_fail_cache.desc =
"transaction failures due to cache overflow";
stats->txn_rollback.desc = "transactions rolled-back";
*statsp = stats;
@@ -177,5 +179,6 @@ __wt_stat_clear_connection_stats(WT_STATS *stats_arg)
stats->txn_ancient.v = 0;
stats->txn_begin.v = 0;
stats->txn_commit.v = 0;
stats->txn_fail_cache.v = 0;
stats->txn_rollback.v = 0;
}

View File

@@ -74,11 +74,10 @@ __wt_txn_get_snapshot(WT_SESSION_IMPL *session, wt_txnid_t max_id)
conn = S2C(session);
txn = &session->txn;
txn_global = &conn->txn_global;
oldest_snap_min = WT_TXN_ABORTED;
do {
/* Take a copy of the current session ID. */
current_id = txn_global->current;
current_id = oldest_snap_min = txn_global->current;
/* Copy the array of concurrent transactions. */
WT_ORDERED_READ(session_cnt, conn->session_cnt);
@@ -93,6 +92,12 @@ __wt_txn_get_snapshot(WT_SESSION_IMPL *session, wt_txnid_t max_id)
else if (max_id == WT_TXN_NONE || TXNID_LT(id, max_id))
txn->snapshot[n++] = id;
}
/*
* Ensure the snapshot reads are scheduled before re-checking
* the global current ID.
*/
WT_READ_BARRIER();
} while (current_id != txn_global->current);
__txn_sort_snapshot(session, n,
@@ -116,11 +121,10 @@ __wt_txn_get_evict_snapshot(WT_SESSION_IMPL *session)
conn = S2C(session);
txn_global = &conn->txn_global;
oldest_snap_min = WT_TXN_ABORTED;
do {
/* Take a copy of the current session ID. */
current_id = txn_global->current;
current_id = oldest_snap_min = txn_global->current;
/* Walk the array of concurrent transactions. */
WT_ORDERED_READ(session_cnt, conn->session_cnt);
@@ -128,6 +132,12 @@ __wt_txn_get_evict_snapshot(WT_SESSION_IMPL *session)
if ((id = s->snap_min) != WT_TXN_NONE &&
TXNID_LT(id, oldest_snap_min))
oldest_snap_min = id;
/*
* Ensure the snapshot reads are scheduled before re-checking
* the global current ID.
*/
WT_READ_BARRIER();
} while (current_id != txn_global->current);
__txn_sort_snapshot(session, 0, oldest_snap_min, oldest_snap_min);
@@ -169,8 +179,26 @@ __wt_txn_begin(WT_SESSION_IMPL *session, const char *cfg[])
F_SET(txn, TXN_RUNNING);
do {
/* Take a copy of the current session ID. */
txn->id = txn_global->current;
/*
* Allocate a transaction ID.
*
* We use an atomic increment to ensure that we get a unique
* ID, then publish that to the global state table.
*
* If two threads race to allocate an ID, only the latest ID
* will proceed. The winning thread can be sure its snapshot
* contains all of the earlier active IDs. Threads that race
* race and get an earlier ID may not appear in the snapshot,
* but they will loop and allocate a new ID before proceeding
* to make any updates.
*
* This potentially wastes transaction IDs when threads race to
* begin transactions, but that is the price we pay to keep
* this path latch free.
*/
do {
txn->id = WT_ATOMIC_ADD(txn_global->current, 1);
} while (txn->id == WT_TXN_NONE || txn->id == WT_TXN_ABORTED);
WT_PUBLISH(txn_state->id, txn->id);
/*
@@ -200,8 +228,13 @@ __wt_txn_begin(WT_SESSION_IMPL *session, const char *cfg[])
session, n, txn->id, oldest_snap_min);
txn_state->snap_min = txn->snap_min;
}
} while (!WT_ATOMIC_CAS(txn_global->current, txn->id, txn->id + 1) ||
txn->id == WT_TXN_NONE || txn->id == WT_TXN_ABORTED);
/*
* Ensure the snapshot reads are scheduled before re-checking
* the global current ID.
*/
WT_READ_BARRIER();
} while (txn->id != txn_global->current);
return (0);
}
@@ -223,7 +256,8 @@ __wt_txn_release(WT_SESSION_IMPL *session)
/* Clear the transaction's ID from the global table. */
WT_ASSERT(session, txn_state->id != WT_TXN_NONE &&
txn->id != WT_TXN_NONE);
txn_state->id = txn_state->snap_min = WT_TXN_NONE;
WT_PUBLISH(txn_state->id, WT_TXN_NONE);
txn_state->snap_min = WT_TXN_NONE;
/* Reset the transaction state to not running. */
txn->id = WT_TXN_NONE;

View File

@@ -511,6 +511,25 @@ __wt_checkpoint(WT_SESSION_IMPL *session, const char *cfg[])
"or verify operations");
}
/*
* If an object has never been used (in other words, if it could become
* a bulk-loaded file), then we must fake the checkpoint. This is good
* because we don't write physical checkpoint blocks for just-created
* files, but it's not just a good idea. The reason is because deleting
* a physical checkpoint requires writing the file, and fake checkpoints
* can't write the file. If you (1) create a physical checkpoint for an
* empty file which writes blocks, (2) start bulk-loading records into
* the file, (3) during the bulk-load perform another checkpoint with
* the same name; in order to keep from having two checkpoints with the
* same name you would have to use the bulk-load's fake checkpoint to
* delete a physical checkpoint, and that will end in tears.
*/
if (is_checkpoint)
if (btree->bulk_load_ok) {
track_ckpt = 0;
goto fake;
}
/*
* Mark the root page dirty to ensure something gets written.
*

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
noinst_PROGRAMS = t
t_SOURCES = test_bloom.c

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir)
AM_CPPFLAGS = -I$(top_builddir)
noinst_PROGRAMS = t
t_LDADD = $(top_builddir)/libwiredtiger.la

View File

@@ -17,7 +17,7 @@ obj_bulk(void)
if ((ret = conn->open_session(conn, NULL, NULL, &session)) != 0)
die("conn.session", ret);
if ((ret = session->create(session, uri, NULL)) != 0)
if ((ret = session->create(session, uri, config)) != 0)
if (ret != EEXIST && ret != EBUSY)
die("session.create", ret);
@@ -44,7 +44,7 @@ obj_create(void)
if ((ret = conn->open_session(conn, NULL, NULL, &session)) != 0)
die("conn.session", ret);
if ((ret = session->create(session, uri, NULL)) != 0)
if ((ret = session->create(session, uri, config)) != 0)
if (ret != EEXIST && ret != EBUSY)
die("session.create", ret);

View File

@@ -10,6 +10,7 @@
WT_CONNECTION *conn; /* WiredTiger connection */
u_int nops; /* Operations */
const char *uri; /* Object */
const char *config; /* Object config */
static char *progname; /* Program name */
static FILE *logfp; /* Log file */
@@ -28,8 +29,11 @@ main(int argc, char *argv[])
u_int nthreads;
int ch, cnt, runs;
char *config_open;
const char **objp;
const char **confp, **objp;
const char *objs[] = { "file:__wt", "table:__wt", "lsm:__wt", NULL };
/* LSM needs configuration or it fails the minimum cache size check. */
const char *configs[] = { NULL, NULL,
"lsm_chunk_size=1m,lsm_merge_max=2,leaf_page_max=256k", NULL };
if ((progname = strrchr(argv[0], '/')) == NULL)
progname = argv[0];
@@ -78,8 +82,10 @@ main(int argc, char *argv[])
for (cnt = 1; runs == 0 || cnt <= runs; ++cnt) {
shutdown(); /* Clean up previous runs */
for (objp = objs; *objp != NULL; objp++) {
for (objp = objs, confp = configs; *objp != NULL;
objp++, confp++) {
uri = *objp;
config = *confp;
printf("%5d: %u threads on %s\n", cnt, nthreads, uri);
wt_startup(config_open);
if (fop_start(nthreads))
@@ -104,15 +110,16 @@ wt_startup(char *config_open)
NULL
};
int ret;
char config[128];
char config_buf[128];
snprintf(config, sizeof(config),
snprintf(config_buf, sizeof(config_buf),
"create,error_prefix=\"%s\",cache_size=5MB%s%s",
progname,
config_open == NULL ? "" : ",",
config_open == NULL ? "" : config_open);
if ((ret = wiredtiger_open(NULL, &event_handler, config, &conn)) != 0)
if ((ret = wiredtiger_open(
NULL, &event_handler, config_buf, &conn)) != 0)
die("wiredtiger_open", ret);
}

View File

@@ -26,6 +26,7 @@ extern WT_CONNECTION *conn; /* WiredTiger connection */
extern u_int nops; /* Operations per thread */
extern const char *uri; /* Object */
extern const char *config; /* Object config */
#if defined (__GNUC__)
void die(const char *, int) __attribute__((noreturn));

View File

@@ -1,5 +1,5 @@
BDB = $(top_builddir)/db
INCLUDES = -I$(top_builddir) -I$(BDB)
AM_CPPFLAGS = -I$(top_builddir) -I$(BDB)
noinst_PROGRAMS = t
noinst_SCRIPTS = s_dumpcmp

View File

@@ -75,14 +75,14 @@ wts_open(void)
die(ret, "connection.open_session");
maxintlpage = 1U << g.c_intl_page_max;
/* Make sure at least 2 internal page per thread can fix in cache. */
/* Make sure at least 2 internal page per thread can fit in cache. */
while (2 * g.c_threads * maxintlpage > g.c_cache << 20)
maxintlpage >>= 1;
maxintlitem = MMRAND(maxintlpage / 50, maxintlpage / 40);
if (maxintlitem < 40)
maxintlitem = 40;
maxleafpage = 1U << g.c_leaf_page_max;
/* Make sure at least one leaf page per thread can fix in cache. */
/* Make sure at least one leaf page per thread can fit in cache. */
while (g.c_threads * (maxintlpage + maxleafpage) > g.c_cache << 20)
maxleafpage >>= 1;
maxleafitem = MMRAND(maxleafpage / 50, maxleafpage / 40);

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir) -I$(top_srcdir)/src/include
AM_CPPFLAGS = -I$(top_builddir) -I$(top_srcdir)/src/include
noinst_PROGRAMS = t
t_SOURCES = salvage.c

58
test/suite/test_bug003.py Normal file
View File

@@ -0,0 +1,58 @@
#!/usr/bin/env python
#
# Public Domain 2008-2012 WiredTiger, Inc.
#
# This is free and unencumbered software released into the public domain.
#
# Anyone is free to copy, modify, publish, use, compile, sell, or
# distribute this software, either in source code form or as a compiled
# binary, for any purpose, commercial or non-commercial, and by any
# means.
#
# In jurisdictions that recognize copyright laws, the author or authors
# of this software dedicate any and all copyright interest in the
# software to the public domain. We make this dedication for the benefit
# of the public at large and to the detriment of our heirs and
# successors. We intend this dedication to be an overt act of
# relinquishment in perpetuity of all present and future rights to this
# software under copyright law.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
# IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
#
# test_bug003.py
# Regression tests.
import wiredtiger, wttest
from wtscenario import multiply_scenarios, number_scenarios
# Regression tests.
class test_bug003(wttest.WiredTigerTestCase):
types = [
('file', dict(uri='file:data')),
('table', dict(uri='table:data')),
]
ckpt = [
('no', dict(name=0)),
('yes', dict(name=1)),
]
scenarios = number_scenarios(multiply_scenarios('.', types, ckpt))
# Confirm bulk-load isn't stopped by checkpoints.
def test_bug003(self):
self.session.create(self.uri, "key_format=S,value_format=S")
if self.name == 1:
self.session.checkpoint("name=ckpt")
else:
self.session.checkpoint()
cursor = self.session.open_cursor(self.uri, None, "bulk")
if __name__ == '__main__':
wttest.run()

View File

@@ -48,7 +48,7 @@ class test_cursor03(TestCursorTracker):
('col.val10k', dict(tablekind='col', keysize=None, valsize=[10, 10000], uri='table')),
('row.keyval10k', dict(tablekind='row', keysize=[10,10000], valsize=[10, 10000], uri='table')),
], [
('count1000', dict(tablecount=1000,cache_size=20*1024*1024)),
('count1000', dict(tablecount=1000,cache_size=25*1024*1024)),
('count10000', dict(tablecount=10000, cache_size=64*1024*1024))
])

View File

@@ -137,14 +137,6 @@ class test_cursor04(wttest.WiredTigerTestCase):
cursor.set_key(self.genkey(self.nentries))
self.assertEqual(cursor.search(), wiredtiger.WT_NOTFOUND)
# The key/value should be cleared on NOTFOUND
keymsg = 'cursor.get_key: requires key be set: Invalid argument\n'
valuemsg = 'cursor.get_value: requires value be set: Invalid argument\n'
self.assertRaisesWithMessage(wiredtiger.WiredTigerError,
cursor.get_key, keymsg)
self.assertRaisesWithMessage(wiredtiger.WiredTigerError,
cursor.get_value, valuemsg)
# 2. Calling search_near for a value beyond the end
cursor.set_key(self.genkey(self.nentries))
cmp = cursor.search_near()

View File

@@ -0,0 +1,69 @@
#!/usr/bin/env python
#
# Public Domain 2008-2012 WiredTiger, Inc.
#
# This is free and unencumbered software released into the public domain.
#
# Anyone is free to copy, modify, publish, use, compile, sell, or
# distribute this software, either in source code form or as a compiled
# binary, for any purpose, commercial or non-commercial, and by any
# means.
#
# In jurisdictions that recognize copyright laws, the author or authors
# of this software dedicate any and all copyright interest in the
# software to the public domain. We make this dedication for the benefit
# of the public at large and to the detriment of our heirs and
# successors. We intend this dedication to be an overt act of
# relinquishment in perpetuity of all present and future rights to this
# software under copyright law.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
# IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
import wiredtiger, wttest
from helper import simple_populate, key_populate, value_populate
# Test no-cache flag.
class test_no_cache(wttest.WiredTigerTestCase):
name = 'no_cache'
scenarios = [
('file', dict(type='file:')),
('table', dict(type='table:'))
]
# Create an object, and run an uncached cursor through it.
def test_no_cache(self):
uri = self.type + self.name
simple_populate(self, uri, 'key_format=S,leaf_page_max=512', 10000)
cursor = self.session.open_cursor(uri, None, "no_cache")
i = 0
for key,val in cursor:
i += 1
self.assertEqual(key, key_populate(cursor, i))
self.assertEqual(val, value_populate(cursor, i))
# Create an object, and run an uncached cursor through part of it to
# confirm that we release the full stack on an uncached cursor.
def test_no_cache_partial(self):
uri = self.type + self.name
simple_populate(self, uri, 'key_format=S,leaf_page_max=512', 10000)
cursor = self.session.open_cursor(uri, None, "no_cache")
i = 0
for key,val in cursor:
i += 1
if i > 2000:
break;
self.assertEqual(key, key_populate(cursor, i))
self.assertEqual(val, value_populate(cursor, i))
cursor.close()
if __name__ == '__main__':
wttest.run()

View File

@@ -1,4 +1,4 @@
INCLUDES = -I$(top_builddir)
AM_CPPFLAGS = -I$(top_builddir)
noinst_PROGRAMS = t
t_LDADD = $(top_builddir)/libwiredtiger.la