- These functions are declared twice in slow-hash.c. Remove one of the copies.
- The declarations have the wrong return type, should be void, not int.
Function definitions here: 1e74586ee9/src/crypto/aesb.c (L151-L180)
Test plan: make release-test
9137ad2c blockchain: add a testnet v9 a day after v8 (moneromooo-monero)
ac4f71c2 wallet2: bump testnet rollback to account for coming reorg (moneromooo-monero)
8f418a6d bulletproofs: #include <openssl/bn.h> (moneromooo-monero)
2bf63650 bulletproofs: speed up the latest changes a bit (moneromooo-monero)
044dff5a bulletproofs: scale points by 8 to ensure subgroup validity (moneromooo-monero)
c83012c4 bulletproofs: match aggregated verification to sarang's latest prototype (moneromooo-monero)
ce0c7432 performance_tests: add padded bulletproof construction (moneromooo-monero)
1224e53b core_tests: add a test for 4-aggregated BP verification (moneromooo-monero)
0e6ed559 fuzz_tests: add a bulletproof fuzz test (moneromooo-monero)
463434d1 more comprehensive test for ge_p3 comparison to identity/point at infinity (moneromooo-monero)
d0a0565f unit_tests: add a few more multiexp unit tests (moneromooo-monero)
6526d87f core_tests: add a test for a tx with empty bulletproof (moneromooo-monero)
a129bbd9 multiexp: fix maxscalar off by one (moneromooo-monero)
7ed496cc ringct: error out when hashToPoint* returns the point at infinity (moneromooo-monero)
d1591853 cryptonote_basic: check output type before using it (moneromooo-monero)
61632dc1 ringct: prevent a potential very large allocation (moneromooo-monero)
a4317e61 crypto: some paranoid checks in generate_signature/check_signature (moneromooo-monero)
7434df1c crypto: never return zero in random32_unbiased (moneromooo-monero)
0825e974 multiexp: fix wrong Bos-Coster result for 1 non trivial input (moneromooo-monero)
a1359ad4 Check inputs to addKeys are in range (moneromooo-monero)
fe0fa3b9 bulletproofs: reject x, y, z, or w[i] being zero (moneromooo-monero)
5ffb2ff9 v8: per byte fee, pad bulletproofs, fixed 11 ring size (moneromooo-monero)
869b3bf8 bulletproofs: a few fixes from the Kudelski review (moneromooo-monero)
c4291762 bulletproofs: reject points not in the main subgroup (moneromooo-monero)
15697177 bulletproofs: speed up a few multiplies using existing Hi cache (moneromooo-monero)
0b05a0fa Add Pippenger cache and limit Straus cache size (moneromooo-monero)
51eb3bdc add pippenger unit tests (moneromooo-monero)
b17b8db3 performance_tests: add stats and loop count multiplier options (moneromooo-monero)
7314d919 perf_timer: split timer class into a base one and a logging one (moneromooo-monero)
d126a02b performance_tests: add aggregated bulletproof tx verification (moneromooo-monero)
263431c4 Pippenger multiexp (moneromooo-monero)
1ed0ed4d multiexp: cut down on memory allocations (moneromooo-monero)
1b867e7f precalc the ge_p3 representation of H (moneromooo-monero)
ef56529f performance_tests: document the tested bulletproof layouts (moneromooo-monero)
30111780 unit_tests: a couple more bulletproof unit tests for gamma (moneromooo-monero)
c444b1b2 require canonical multi output bulletproof layout (moneromooo-monero)
7e67c52f Add a define for the max number of bulletproof multi-outputs (moneromooo-monero)
2a8fcb42 Bulletproof aggregated verification and tests (moneromooo-monero)
126196b0 multiexp: some speedups (moneromooo-monero)
71d67bda aligned: aligned memory alloc/realloc/free (moneromooo-monero)
cb9ecab1 performance_tests: add signature generation/verification (moneromooo-monero)
bacf0a1e bulletproofs: add aggregated verification (moneromooo-monero)
e895c3de make straus cached mode thread safe, and add tests for it (moneromooo-monero)
7f48bf05 multiexp: bos coster now works for just one point (moneromooo-monero)
9ce9f8ca bulletproofs: add multi output bulletproofs to rct (moneromooo-monero)
f34e2e20 performance_tests: add tx checking tests with more than 2 outputs (moneromooo-monero)
0793184b performance_tests: add a --verbose flag, and default to terse (moneromooo-monero)
939bc223 add Straus multiexp (moneromooo-monero)
9ff6e6a0 ringct: add bos coster multiexp (moneromooo-monero)
e9164bb3 bulletproofs: misc optimizations (moneromooo-monero)
112f32f0 performance_tests: add crypto ops (moneromooo-monero)
f5d7b993 performance_tests: add bulletproofs (moneromooo-monero)
8f4ce989 performance_tests: add RingCT MLSAG gen/ver tests (moneromooo-monero)
1aa10c43 performance_tests: add (Borromean) range proofs (moneromooo-monero)
aacfd6e3 bulletproofs: multi-output bulletproofs (moneromooo-monero)
cb1cc757 performance_tests: don't override log level to 0 (moneromooo-monero)
- fix integer overflow in n_bulletproof_amounts
- check input scalars are in range
- remove use of environment variable to tweak straus performance
- do not use implementation defined signed shift for signum
Contains two modifications to improve ASIC resistance: shuffle and integer math.
Shuffle makes use of the whole 64-byte cache line instead of 16 bytes only, making Cryptonight 4 times more demanding for memory bandwidth.
Integer math adds 64:32 bit integer division followed by 64 bit integer square root, adding large and unavoidable computational latency to the main loop.
More details and performance numbers: https://github.com/SChernykh/xmr-stak-cpu/blob/master/README.md
63e342b crypto: move null_pkey/null_skey to the cpp file (moneromooo-monero)
0496c7c crypto: do not use boost::value_initialized to init null skey/pkey (moneromooo-monero)
hash: add prehashed version cn_slow_hash_prehashed
slow-hash: let cn_slow_hash take 4th parameter for deciding prehashed or not
slow-hash: add support for prehashed version for the other 3 platforms
When #3303 was merged, a cyclic dependency chain was generated:
libdevice <- libcncrypto <- libringct <- libdevice
This was because libdevice needs access to a set of basic crypto operations
implemented in libringct such as scalarmultBase(), while libringct also needs
access to abstracted crypto operations implemented in libdevice such as
ecdhEncode(). To untangle this cyclic dependency chain, this patch splits libringct
into libringct_basic and libringct, where the basic crypto ops previously in
libringct are moved into libringct_basic. The cyclic dependency is now resolved
thanks to this separation:
libcncrypto <- libringct_basic <- libdevice <- libcryptonote_basic <- libringct
This eliminates the need for crypto_device.cpp and rctOps_device.cpp.
Also, many abstracted interfaces of hw::device such as encrypt_payment_id() and
get_subaddress_secret_key() were previously implemented in libcryptonote_basic
(cryptonote_format_utils.cpp) and were then called from hw::core::device_default,
which is odd because libdevice is supposed to be independent of libcryptonote_basic.
Therefore, those functions were moved to device_default.cpp.
The basic approach it to delegate all sensitive data (master key, secret
ephemeral key, key derivation, ....) and related operations to the device.
As device has low memory, it does not keep itself the values
(except for view/spend keys) but once computed there are encrypted (with AES
are equivalent) and return back to monero-wallet-cli. When they need to be
manipulated by the device, they are decrypted on receive.
Moreover, using the client for storing the value in encrypted form limits
the modification in the client code. Those values are transfered from one
C-structure to another one as previously.
The code modification has been done with the wishes to be open to any
other hardware wallet. To achieve that a C++ class hw::Device has been
introduced. Two initial implementations are provided: the "default", which
remaps all calls to initial Monero code, and the "Ledger", which delegates
all calls to Ledger device.
e4646379 keccak: fix mdlen bounds sanity checking (moneromooo-monero)
2e3e90ac pass large parameters by const ref, not value (moneromooo-monero)
61defd89 blockchain: sanity check number of precomputed hash of hash blocks (moneromooo-monero)
9af6b2d1 ringct: fix infinite loop in unused h2b function (moneromooo-monero)
8cea8d0c simplewallet: double check a new multisig wallet is multisig (moneromooo-monero)
9b98a6ac threadpool: catch exceptions in dtor, to avoid terminate (moneromooo-monero)
24803ed9 blockchain_export: fix buffer overflow in exporter (moneromooo-monero)
f3f7da62 perf_timer: rewrite to make it clear there is no division by zero (moneromooo-monero)
c6ea3df0 performance_tests: remove add_arg call stray extra param (moneromooo-monero)
fa6b4566 fuzz_tests: fix an uninitialized var in setup (moneromooo-monero)
03887f11 keccak: fix sanity check bounds test (moneromooo-monero)
ad11db91 blockchain_db: initialize m_open in base class ctor (moneromooo-monero)
bece67f9 miner: restore std::cout precision after modification (moneromooo-monero)
1aabd14c db_lmdb: check hard fork info drop succeeded (moneromooo-monero)
ge_scalarmult_p3
ge_double_scalarmult_precomp_vartime2_p3
ge_double_scalarmult_base_vartime_p3
This makes it possible to reuse the result without having to
convert back to unsigned char[32] and back to ge types.
Partially implements #74.
Securely erases keys from memory after they are no longer needed. Might have a
performance impact, which I haven't measured (perf measurements aren't
generally reliable on laptops).
Thanks to @stoffu for the suggestion to specialize the pod_to_hex/hex_to_pod
functions. Using overloads + SFINAE instead generalizes it so other types can
be marked as scrubbed without adding more boilerplate.
3dffe71b new wipeable_string class to replace std::string passphrases (moneromooo-monero)
7a2a5741 utils: initialize easylogging++ in on_startup (moneromooo-monero)
54950829 use memwipe in a few relevant places (moneromooo-monero)
000666ff add a memwipe function (moneromooo-monero)
CryptoNight does exactly 524,288 iterations over the scratchpad as defined in CNS008, saying 500,000 could be confusing. I know its meant to give a rough idea (around 500k) to the reader but if you are reading the code, might as well know the exact number.
View wallets do not have the spend secret key, and are thus
unable to derive key images for incoming outputs. Moreover,
a previous patch set key images to zero as a means to mark
an output as having an unknown key image, so they could be
filled in when importing key images at a later time. That
later patch caused spurious collisions. We now use public
keys to detect duplicate outputs. Public keys obtained from
the blockchain are checked to be identical to the ones
derived locally, so can't be spoofed.
Keep the immediate direct deps at the library that depends on them,
declare deps as PUBLIC so that targets that link against that library
get the library's deps as transitive deps.
Break dep cycle between blockchain_db <-> crytonote_core.
No code refactoring, just hide cycle from cmake so that
it doesn't complain (cycles are allowed only between
static libs, not shared libs).
This is in preparation for supproting BUILD_SHARED_LIBS cmake
built-in option for building internal libs as shared.
This was disabled earlier as part of diagnosing failing tests
on ARM, which turned out to be due to aliasing, fixed by
adding -fno-strict-aliasing. So, re-enabling it back.
This allows the key to be not the same for two outputs sent to
the same address (eg, if you pay yourself, and also get change
back). Also remove the key amounts lists and return parameters
since we don't actually generate random ones, so we don't need
to save them as we can recalculate them when needed if we have
the correct keys.
Setting to no or 0 also works. If set, any other value enables it.
Useful for running with valgrind in cases where it fails at
properly implementing AES-NI.
0a4bc84 Added ref10 shen_ed25519_ref code, which includes code that can replace crypto-ops with a version straight from Bernstein's ref 10 (ShenNoether)
0d70fdc revert to 776b4fc91a (ShenNoether)
b01f286 Added shen_ed25519_ref to crypto ops subfolder, the point is to directly have bitmonero's crypto code come from bernstein et al's ref 10 code (ShenNoether)
Pros:
- smaller on the blockchain
- shorter integrated addresses
Cons:
- less sparseness
- less ability to embed actual information
The boolean argument to encrypt payment ids is now gone from the
RPC calls, since the decision is made based on the length of the
payment id passed.
Bockchain:
1. Optim: Multi-thread long-hash computation when encountering groups of blocks.
2. Optim: Cache verified txs and return result from cache instead of re-checking whenever possible.
3. Optim: Preload output-keys when encoutering groups of blocks. Sort by amount and global-index before bulk querying database and multi-thread when possible.
4. Optim: Disable double spend check on block verification, double spend is already detected when trying to add blocks.
5. Optim: Multi-thread signature computation whenever possible.
6. Patch: Disable locking (recursive mutex) on called functions from check_tx_inputs which causes slowdowns (only seems to happen on ubuntu/VMs??? Reason: TBD)
7. Optim: Removed looped full-tx hash computation when retrieving transactions from pool (???).
8. Optim: Cache difficulty/timestamps (735 blocks) for next-difficulty calculations so that only 2 db reads per new block is needed when a new block arrives (instead of 1470 reads).
Berkeley-DB:
1. Fix: 32-bit data errors causing wrong output global indices and failure to send blocks to peers (etc).
2. Fix: Unable to pop blocks on reorganize due to transaction errors.
3. Patch: Large number of transaction aborts when running multi-threaded bulk queries.
4. Patch: Insufficient locks error when running full sync.
5. Patch: Incorrect db stats when returning from an immediate exit from "pop block" operation.
6. Optim: Add bulk queries to get output global indices.
7. Optim: Modified output_keys table to store public_key+unlock_time+height for single transaction lookup (vs 3)
8. Optim: Used output_keys table retrieve public_keys instead of going through output_amounts->output_txs+output_indices->txs->output:public_key
9. Optim: Added thread-safe buffers used when multi-threading bulk queries.
10. Optim: Added support for nosync/write_nosync options for improved performance (*see --db-sync-mode option for details)
11. Mod: Added checkpoint thread and auto-remove-logs option.
12. *Now usable on 32-bit systems like RPI2.
LMDB:
1. Optim: Added custom comparison for 256-bit key tables (minor speed-up, TBD: get actual effect)
2. Optim: Modified output_keys table to store public_key+unlock_time+height for single transaction lookup (vs 3)
3. Optim: Used output_keys table retrieve public_keys instead of going through output_amounts->output_txs+output_indices->txs->output:public_key
4. Optim: Added support for sync/writemap options for improved performance (*see --db-sync-mode option for details)
5. Mod: Auto resize to +1GB instead of multiplier x1.5
ETC:
1. Minor optimizations for slow-hash for ARM (RPI2). Incomplete.
2. Fix: 32-bit saturation bug when computing next difficulty on large blocks.
[PENDING ISSUES]
1. Berkely db has a very slow "pop-block" operation. This is very noticeable on the RPI2 as it sometimes takes > 10 MINUTES to pop a block during reorganization.
This does not happen very often however, most reorgs seem to take a few seconds but it possibly depends on the number of outputs present. TBD.
2. Berkeley db, possible bug "unable to allocate memory". TBD.
[NEW OPTIONS] (*Currently all enabled for testing purposes)
1. --fast-block-sync arg=[0:1] (default: 1)
a. 0 = Compute long hash per block (may take a while depending on CPU)
b. 1 = Skip long-hash and verify blocks based on embedded known good block hashes (faster, minimal CPU dependence)
2. --db-sync-mode arg=[[safe|fast|fastest]:[sync|async]:[nblocks_per_sync]] (default: fastest:async:1000)
a. safe = fdatasync/fsync (or equivalent) per stored block. Very slow, but safest option to protect against power-out/crash conditions.
b. fast/fastest = Enables asynchronous fdatasync/fsync (or equivalent). Useful for battery operated devices or STABLE systems with UPS and/or systems with battery backed write cache/solid state cache.
Fast - Write meta-data but defer data flush.
Fastest - Defer meta-data and data flush.
Sync - Flush data after nblocks_per_sync and wait.
Async - Flush data after nblocks_per_sync but do not wait for the operation to finish.
3. --prep-blocks-threads arg=[n] (default: 4 or system max threads, whichever is lower)
Max number of threads to use when computing long-hash in groups.
4. --show-time-stats arg=[0:1] (default: 1)
Show benchmark related time stats.
5. --db-auto-remove-logs arg=[0:1] (default: 1)
For berkeley-db only. Auto remove logs if enabled.
**Note: lmdb and berkeley-db have changes to the tables and are not compatible with official git head version.
At the moment, you need a full resync to use this optimized version.
[PERFORMANCE COMPARISON]
**Some figures are approximations only.
Using a baseline machine of an i7-2600K+SSD+(with full pow computation):
1. The optimized lmdb/blockhain core can process blocks up to 585K for ~1.25 hours + download time, so it usually takes 2.5 hours to sync the full chain.
2. The current head with memory can process blocks up to 585K for ~4.2 hours + download time, so it usually takes 5.5 hours to sync the full chain.
3. The current head with lmdb can process blocks up to 585K for ~32 hours + download time and usually takes 36 hours to sync the full chain.
Averate procesing times (with full pow computation):
lmdb-optimized:
1. tx_ave = 2.5 ms / tx
2. block_ave = 5.87 ms / block
memory-official-repo:
1. tx_ave = 8.85 ms / tx
2. block_ave = 19.68 ms / block
lmdb-official-repo (0f4a036437)
1. tx_ave = 47.8 ms / tx
2. block_ave = 64.2 ms / block
**Note: The following data denotes processing times only (does not include p2p download time)
lmdb-optimized processing times (with full pow computation):
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 1.25 hours processing time (--db-sync-mode=fastest:async:1000).
2. Laptop, Dual-core / 4-threads U4200 (3Mb) - 4.90 hours processing time (--db-sync-mode=fastest:async:1000).
3. Embedded, Quad-core / 4-threads Z3735F (2x1Mb) - 12.0 hours processing time (--db-sync-mode=fastest:async:1000).
lmdb-optimized processing times (with per-block-checkpoint)
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 10 minutes processing time (--db-sync-mode=fastest:async:1000).
berkeley-db optimized processing times (with full pow computation)
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 1.8 hours processing time (--db-sync-mode=fastest:async:1000).
2. RPI2. Improved from estimated 3 months(???) into 2.5 days (*Need 2AMP supply + Clock:1Ghz + [usb+ssd] to achieve this speed) (--db-sync-mode=fastest:async:1000).
berkeley-db optimized processing times (with per-block-checkpoint)
1. RPI2. 12-15 hours (*Need 2AMP supply + Clock:1Ghz + [usb+ssd] to achieve this speed) (--db-sync-mode=fastest:async:1000).