summaryrefslogtreecommitdiffhomepage
path: root/src/hash.c
AgeCommit message (Collapse)Author
2021-11-04hash.c: avoid `mrb_obj_id` to get the hash value if possible.Yukihiro "Matz" Matsumoto
2021-10-24Make `mrb_static_assert()` a variable argumentdearblue
`mrb_static_assert()` extends the macro function to take one or two arguments. If the argument is other than that, an error will occur. References: - static_assert のメッセージ省略を許可 - cpprefjp C++日本語リファレンス https://cpprefjp.github.io/lang/cpp17/extending_static_assert.html - c - Overloading Macro on Number of Arguments - Stack Overflow https://stackoverflow.com/a/11763277
2021-10-12Support Ruby3.0 keyword arguments.Yukihiro "Matz" Matsumoto
The Difference Since Ruby1.9, the keyword arguments were emulated by Ruby using the hash object at the bottom of the arguments. But we have gradually moved toward keyword arguments separated from normal (positinal) arguments. At the same time, we value compatibility, so that Ruby3.0 keyword arguments are somewhat compromise. Basically, keyword arguments are separated from positional arguments, except when the method does not take any formal keyword arguments, given keyword arguments (packed in the hash object) are considered as the last argument. And we also allow non symbol keys in the keyword arguments. In that case, those keys are just passed in the `**` hash (or raise `ArgumentError` for unknown keys). The Instruction Changes We have changed `OP_SEND` instruction. `OP_SEND` instruction used to take 3 operands, the register, the symbol, the number of (positional) arguments. The meaning of the third operand has been changed. It is now considered as `n|(nk<<4)`, where `n` is the number of positional arguments, and `nk` is the number of keyword arguments, both occupies 4 bits in the operand. The number `15` in both `n` and `nk` means variable sized arguments are packed in the object. Positional arguments will be packed in the array, and keyword arguments will be packed in the hash object. That means arguments more than 14 values are always packed in the object. Arguments information for other instructions (`OP_SENDB` and `OP_SUPER`) are also changed. It works as the third operand of `OP_SEND`. the difference between `OP_SEND` and `OP_SENDB` is just trivial. It assigns `nil` to the block hidden arguments (right after arguments). The instruction `OP_SENDV` and `OP_SENDVB` are removed. Those instructions are replaced by `OP_SEND` and `OP_SENDB` respectively with the `15` (variable sized) argument information. Calling Convention When calling a method, the stack elements shall be in the order of the receiver of the method, positional arguments, keyword arguments and the block argument. If the number of positional or keyword arugument (`n` or `nk`) is zero, corresponding arguments will be empty. So when `n=0` and `nk=0` the stack layout (from bottom to top) will be: +-----------------------+ | recv | block (or nil) | +-----------------------+ The last elements `block` should be explicitly filled before `OP_SEND` or assigned to `nil` by `OP_SENDB` internally. In other words, the following have exactly same behavior: OP_SENDB clears `block` implicitly: ``` OP_SENDB reg sym 0 ``` OP_SEND clears `block` implicitly: ``` OP_LOADNIL R2 OP_SEND R2 sym 0 ``` When calling a method with only positional arguments (n=0..14) without keyword arguments, the stack layout will be like following: +--------------------------------------------+ | recv | arg1 | ... | arg_n | block (or nil) | +--------------------------------------------+ When calling a method with arguments packed in the array (n=15) which means argument splat (*) is used in the actual arguments, or more than 14 arguments are passed the stack layout will be like following: +-------------------------------+ | recv | array | block (or nil) | +-------------------------------+ The number of the actual arguments is determined by the length of the argument array. When keyword arguments are given (nk>0), keyword arguments are passed between positional arguments and the block argument. For example, when we pass one positional argument `1` and one keyword argument `a: 2`, the stack layout will be like: +------------------------------------+ | recv | 1 | :a | 2 | block (or nil) | +------------------------------------+ Note that keyword arguments consume `2*nk` elements in the stack when `nk=0..14` (unpacked). When calling a method with keyword arguments packed in the hash object (nk=15) which means keyword argument splat (**) is used or more than 14 keyword arguments in the actual arguments, the stack layout will be like: +------------------------------+ | recv | hash | block (or nil) | +------------------------------+ Note for mruby/c When mruby/c authors try to support new keyword arguments, they need to handle the new meaning of the argument information operand. If they choose not to support keyword arguments in mruby/c, it just raise error when `nk` (taken by `(c>>4)&0xf`) is not zero. And combine `OP_SENDV` behavior with `OP_SEND` when `n` is `15`. If they want to support keyword arguments seriously, contact me at <[email protected]> or `@yukihiro_matz`. I can help you.
2021-07-10Update internal methods not to be listed in backtraces.Yukihiro "Matz" Matsumoto
- String#__lines - Array#__ary_eq - Array#__ary_cmp - Hash#__delete - Kernel#__case_eqq - Integer#__coerce_step_counter
2021-06-20Added `MRB_OBJ_ALLOC()` macro that does not require a castdearblue
The `MRB_OBJ_ALLOC()` macro function returns a pointer of the type corresponding to the constant literal defined in `enum mrb_vtype`.
2021-03-17hash.c: `Hash#shift` to return `nil` when a hash is empty.Yukihiro "Matz" Matsumoto
It used to be return the default value if available, but it should ignore the default value for behavior consistency. CRuby will adopt this behavior too in the future. [ruby-bugs:16908]
2021-03-08ISO C99 doesn't support unnamed unions; fix #5354Yukihiro "Matz" Matsumoto
2021-02-12Avoid possibility of reading uninitialized areas in `h_check_modified()`KOBAYASHI Shuji
In `h_check_modified()`, in the case of `MRB_NO_BOXING`, `ht_ea()` or `ht_ea_capa()` for AR may read uninitialized area. Therefore, do not use those macros for AR in `MRB_NO_BOXING` (but in the case of `MRB_64BIT`, `ht_ea_capa()` is the same as `ar_ea_capa()`, so use it). fix #5332
2021-02-10Fix heap-buffer-overflow for small `Hash` (HT) in `Hash#rehash`KOBAYASHI Shuji
### Example ##### example.rb ```ruby h = {} (1..17).each{h[_1] = _1} (1..16).each{h.delete(_1)} h.rehash ``` ##### ASAN report ```console $ bin/mruby example.rb ==52587==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000006998 at pc 0x55a29cddf96b bp 0x7fff7b1b1720 sp 0x7fff7b1b1710 READ of size 4 at 0x602000006998 thread T0 #0 0x55a29cddf96a in ib_it_next /mruby/src/hash.c:639 #1 0x55a29cde2ca2 in ht_rehash /mruby/src/hash.c:900 #2 0x55a29cde379f in h_rehash /mruby/src/hash.c:996 #3 0x55a29cde7f3d in mrb_hash_rehash /mruby/src/hash.c:1735 #4 0x55a29ce77b62 in mrb_vm_exec /mruby/src/vm.c:1451 #5 0x55a29ce5fa88 in mrb_vm_run /mruby/src/vm.c:981 #6 0x55a29ceb87e1 in mrb_top_run /mruby/src/vm.c:2874 #7 0x55a29cf36bdf in mrb_load_exec mrbgems/mruby-compiler/core/parse.y:6805 #8 0x55a29cf36f25 in mrb_load_detect_file_cxt mrbgems/mruby-compiler/core/parse.y:6848 #9 0x55a29cdba0a2 in main /mruby/mrbgems/mruby-bin-mruby/tools/mruby/mruby.c:347 #10 0x7f24ef43b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) #11 0x55a29cdb4a6d in _start (/mruby/bin/mruby+0x2a3a6d) 0x602000006998 is located 0 bytes to the right of 8-byte region [0x602000006990,0x602000006998) allocated by thread T0 here: #0 0x7f24f01cfffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x55a29ceb9440 in mrb_default_allocf /mruby/src/state.c:68 #2 0x55a29cdba747 in mrb_realloc_simple /mruby/src/gc.c:228 #3 0x55a29cdba928 in mrb_realloc /mruby/src/gc.c:242 #4 0x55a29cde12e5 in ht_init /mruby/src/hash.c:749 #5 0x55a29cde2b8e in ht_rehash /mruby/src/hash.c:897 #6 0x55a29cde379f in h_rehash /mruby/src/hash.c:996 #7 0x55a29cde7f3d in mrb_hash_rehash /mruby/src/hash.c:1735 #8 0x55a29ce77b62 in mrb_vm_exec /mruby/src/vm.c:1451 #9 0x55a29ce5fa88 in mrb_vm_run /mruby/src/vm.c:981 #10 0x55a29ceb87e1 in mrb_top_run /mruby/src/vm.c:2874 #11 0x55a29cf36bdf in mrb_load_exec mrbgems/mruby-compiler/core/parse.y:6805 #12 0x55a29cf36f25 in mrb_load_detect_file_cxt mrbgems/mruby-compiler/core/parse.y:6848 #13 0x55a29cdba0a2 in main /mruby/mrbgems/mruby-bin-mruby/tools/mruby/mruby.c:347 #14 0x7f24ef43b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) ```
2021-01-26Revert "Minimize the changes in #5277"Yukihiro "Matz" Matsumoto
This reverts commit dc51d89ac22acc60b9bfeed87115863565b74085.
2021-01-22Minimize the changes in #5277Yukihiro "Matz" Matsumoto
Instead of including `mruby/presym.h` everywhere, we provided the fallback `mruby/presym.inc` under `include/mruby` directory, and specify `-I<build-dir>/include` before `-I<top-dir>/include` in `presym.rake`. So even when someone drops `-I<build-dir>/include` in compiler options, it just compiles without failure.
2021-01-21Merge branch 'avoid-including-presym.inc-in-existing-header-files' of ↵Yukihiro "Matz" Matsumoto
https://github.com/shuujii/mruby into shuujii-avoid-including-presym.inc-in-existing-header-files
2021-01-18Fix that Hash may not contain any empty bucketsKOBAYASHI Shuji
The Hash implementation assumed that there were always empty buckets, but sometimes there were only active or deleted buckets (no empty buckets). Therefore, fix it so that this situation does not occur. ### Example ```ruby # example.rb class A attr_reader :v def initialize(v) @v = v end def ==(o) @v == o.v end def hash; @v end def to_s; "#{self.class}[#{@v}]" end alias eql? == alias inspect to_s end keys = (0..31).map{A.new(_1)} h = {} (0..16).each{h[keys[_1]] = _1} (17..31).each do k = keys[_1] h[k] = _1 h.delete(k) end p h.keys ``` #### Before this patch: ```console $ bin/mruby example.rb [A[0], A[1], A[2], A[3], A[4], A[5], A[6], A[7], A[8], A[9], A[10], A[11], A[12], A[13], A[14], A[15], A[16], A[30], A[31]] ``` #### After this patch: ```console $ bin/mruby example.rb [A[0], A[1], A[2], A[3], A[4], A[5], A[6], A[7], A[8], A[9], A[10], A[11], A[12], A[13], A[14], A[15], A[16]] ```
2021-01-11Add missing cast in `ea_next_capa_for`KOBAYASHI Shuji
2021-01-11Avoid including `presym.inc` in existing header filesKOBAYASHI Shuji
Addressed an issue where existing programs linking `libmruby.a` could only be built by adding `<build-dir>/include` to compiler's include path.
2021-01-03Avoid 64-bit operations in `src/hash.c`; close #5201KOBAYASHI Shuji
The idea of using `size_t` in `ea_next_capa_for` is by @dearblue.
2020-11-18Use `mrb_int_value` instead of `mrb_fixnum_value` in `src/hash.c`KOBAYASHI Shuji
2020-11-13Include size of iv table in `ObjectSpace.memsize_of` to `Hash` objectKOBAYASHI Shuji
2020-11-13Rename `mrb_os_memsize_of_hash_table` to `mrb_hash_memsize`KOBAYASHI Shuji
* The term `hash_table` can be misleading because the return value of this function includes memory usage of entire `Hash` object, including not only hash table part but also entry list part, etc. * This function takes a `Hash` object as a receiver and is defined in `src/hash.c`, so it is natural to have a `mrb_hash_` prefix.
2020-11-10Reduce memory usage of Hash objectKOBAYASHI Shuji
## Implementation Summary * Change entry list from segmented list to flat array. * Change value of hash bucket from pointer to entry to index of entry list, and represent it by variable length bits according to capacity of hash buckets. * Store management information about entry list and hash table to `struct RHash` as much as possible. ## Benchmark Summary Only the results of typical situations on 64-bit Word-boxing are present here. For more detailed information, including consideration, see below (although most of the body is written in Japanese). * https://shuujii.github.io/mruby-hash-benchmark ### Memory Usage Lower value is better. | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 344B | 256B | 0.74419x | | 40 | 1,464B | 840B | 0.57377x | | 200 | 8,056B | 3,784B | 0.46971x | | 500 | 17,169B | 9,944B | 0.57949x | ### Performance Higher value is better. #### `mrb_hash_set` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 1.41847M i/s | 1.36004M i/s | 0.95881x | | 40 | 0.39224M i/s | 0.31888M i/s | 0.81296x | | 200 | 0.03780M i/s | 0.04290M i/s | 1.13494x | | 500 | 0.01225M i/s | 0.01314M i/s | 1.07275x | #### `mrb_hash_get` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 26.05920M i/s | 30.19543M i/s | 1.15872x | | 40 | 44.26420M i/s | 32.75781M i/s | 0.74005x | | 200 | 44.55171M i/s | 31.56926M i/s | 0.70860x | | 500 | 39.19250M i/s | 29.73806M i/s | 0.75877x | #### `mrb_hash_each` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 25.11964M i/s | 30.34167M i/s | 1.20789x | | 40 | 11.74253M i/s | 13.25539M i/s | 1.12884x | | 200 | 2.01133M i/s | 2.97214M i/s | 1.47770x | | 500 | 0.87411M i/s | 1.21178M i/s | 1.38631x | #### `Hash#[]=` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.50095M i/s | 0.56490M i/s | 1.12764x | | 40 | 0.19132M i/s | 0.18392M i/s | 0.96129x | | 200 | 0.03624M i/s | 0.03256M i/s | 0.89860x | | 500 | 0.01527M i/s | 0.01236M i/s | 0.80935x | #### `Hash#[]` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 11.53211M i/s | 12.78806M i/s | 1.10891x | | 40 | 15.26920M i/s | 13.37529M i/s | 0.87596x | | 200 | 15.28550M i/s | 13.36410M i/s | 0.87430x | | 500 | 14.57695M i/s | 12.75388M i/s | 0.87494x | #### `Hash#each` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.30462M i/s | 0.27080M i/s | 0.88898x | | 40 | 0.12912M i/s | 0.11704M i/s | 0.90642x | | 200 | 0.02638M i/s | 0.02402M i/s | 0.91071x | | 500 | 0.01066M i/s | 0.00959M i/s | 0.89953x | #### `Hash#delete` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 7.84167M i/s | 6.96419M i/s | 0.88810x | | 40 | 6.91292M i/s | 7.41427M i/s | 1.07252x | | 200 | 3.75952M i/s | 7.32080M i/s | 1.94727x | | 500 | 2.10754M i/s | 7.05963M i/s | 3.34970x | #### `Hash#shift` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 14.66444M i/s | 13.18876M i/s | 0.89937x | | 40 | 11.95124M i/s | 11.10420M i/s | 0.92913x | | 200 | 5.53681M i/s | 7.88155M i/s | 1.42348x | | 500 | 2.96728M i/s | 5.40405M i/s | 1.82121x | #### `Hash#dup` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.15063M i/s | 5.37889M i/s | 35.71024x | | 40 | 0.06515M i/s | 3.38196M i/s | 51.91279x | | 200 | 0.01359M i/s | 1.46538M i/s | 107.84056x | | 500 | 0.00559M i/s | 0.75411M i/s | 134.88057x | ### Binary Size Lower value is better. | File | Baseline | New | Factor | |:-----------|--------------:|--------------:|----------:| | mruby | 730,408B | 734,176B | 1.00519x | | libmruby.a | 1,068,134B | 1,072,846B | 1.00441x | ## Other Fixes The following issues have also been fixed in the parts where there was some change this time. * [Heap use-after-free in `Hash#value?`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-hash-value-md) * [Heap use-after-free in `ht_hash_equal`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_equal-md) * [Heap use-after-free in `ht_hash_func`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_func-md) * [Heap use-after-free in `mrb_hash_merge`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-mrb_hash_merge-md) * [Self-replacement does not work for `Hash#replace`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-self-replacement-does-not-work-for-hash-replace-md) * [Repeated deletes and inserts increase memory usage of `Hash`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-repeated-deletes-and-inserts-increase-memory-usage-of-hash-md) * [`Hash#rehash` does not reindex completely](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-hash-rehash-does-not-reindex-completely-md) * `mrb_hash_delete_key` does not cause an error for frozen object * `mrb_hash_new_capa` does not allocate required space first * [`mrb_os_memsize_of_hash_table` result is incorrect](https://github.com/mruby/mruby/pull/5032#discussion_r457994075)
2020-10-12Reorganize `Integer` system.Yukihiro "Matz" Matsumoto
- Integrate `Fixnum` and `Integer` - Remove `Integral` - `int / int -> int` - Replace `mrb_fixnum()` to `mrb_int()` - Replace `mrb_fixnum_value()` to `mrb_int_value()`. - Use `mrb_integer_p()` instead of `mrb_fixnum_p()`
2020-10-12Rename `MRB_TT_FIXNUM` to `MRB_TT_INTEGER`.Yukihiro "Matz" Matsumoto
We still have `#define MRB_TT_FIXNUM MRB_TT_INTEGER` for compatibility.
2020-10-12Fix `Fixnum` and `Float` comparison in `Hash` lookupKOBAYASHI Shuji
```console $ bin/mruby -e 'p({1 => 2}.key?(1.0))' true ``` ```console $ bin/mruby -e 'p({1 => 2}.key?(1.0))' false ```
2020-10-12Rename float configuration option names.Yukihiro "Matz" Matsumoto
- `MRB_WITHOUT_FLOAT` => `MRB_NO_FLOAT` - `MRB_USE_FLOAT` => `MRB_USE_FLOAT32` The former is to use `USE_XXX` naming convention. The latter is to make sure `float` is 32bit float and not floating point number in general.
2020-10-12Rename `mrb_hash_modify` to `hash_modify`.Yukihiro "Matz" Matsumoto
Since it's an internal static function.
2020-10-12Use `mrb_funcall_id()` extensively.Yukihiro "Matz" Matsumoto
Except for support files e.g. `mruby-test/driver.c`, which are not target of symbol collection via `rake gensym`.
2020-10-12Add `MRB_SYM()` for inline symbols.Yukihiro "Matz" Matsumoto
2020-08-11Fix `mrb_int` and `size_t` combination warnings.Yukihiro "Matz" Matsumoto
2020-08-03Initialized local variables in `mrb_hash_shift()`.Yukihiro "Matz" Matsumoto
2020-07-25Use type tag for hash code in `ht_hash_func()`KOBAYASHI Shuji
The function corresponding to `ht_hash_func()` was as follows in the days of khash implementation (before d78acc7a). ```c mrb_hash_ht_hash_func(mrb_state *mrb, mrb_value key) { enum mrb_vtype t = mrb_type(key); ... switch (t) { ... default: hv = mrb_funcall(mrb, key, "hash", 0); h = (khint_t)t ^ (khint_t)mrb_fixnum(hv); break; } ... } ``` When switched to the segmented list implementation (d78acc7a), this function was changed as follows. ```c sg_hash_func(mrb_state *mrb, seglist *t, mrb_value key) { enum mrb_vtype tt = mrb_type(key); ... switch (tt) { ... default: hv = mrb_funcall(mrb, key, "hash", 0); h = (size_t)t ^ (size_t)mrb_fixnum(hv); break; } ... } ``` Since the argument `t` was added, the variable for type tag was changed from `t` to `tt`, but the variable used in the expression of `h` remained `t`. Probably this is an omission of change, so fixed it.
2020-07-23Fix a bug with `ht_index` called with `size==0`; fix #5046Yukihiro "Matz" Matsumoto
It happens when a hash made empty calls `rehash`.
2020-07-15mrb_ prefix conventionRory O'Connell
2020-07-13Use size of hash's table in calculationRory OConnell
2020-06-20Add `mrb_get_arg1()` that retrieves single (and only) argument.Yukihiro "Matz" Matsumoto
`mrb_get_arg1()` raises `ArgumentError` if the method does not receive one argument. And replaces all `mrb_get_args(mrb, "o", &arg)` by the new function.
2020-06-05Add proper casts to silence VC warnings.Yukihiro "Matz" Matsumoto
2020-05-15Unify `eql?` receiver in `Hash` according to RubyKOBAYASHI Shuji
### Example ```ruby # example.rb class A def eql?(o) p self.class super end def hash 1 end end class B < A; end h = {A.new => 1} h[B.new] ``` #### Before this patch: ```console $ bin/mruby example.rb A ``` #### After this patch (same as Ruby) ```console $ bin/mruby example.rb B ```
2020-01-31Avoid implicit integer casting in `backtrace.c` and `hash.c`.Yukihiro "Matz" Matsumoto
2020-01-01Rename `mrb_num_args_error` to `mrb_argnum_error`; ref #4863Yukihiro "Matz" Matsumoto
2019-12-12Add `mrb_num_args_error()` for "wrong number of arguments" errorKOBAYASHI Shuji
To unify the style of messages.
2019-11-02Fix argument specs to `Hash`KOBAYASHI Shuji
2019-09-26Use type predicate macros instead of `mrb_type` if possibleKOBAYASHI Shuji
For efficiency with `MRB_WORD_BOXING` (implement type predicate macros for all `enum mrb_vtype`).
2019-09-14Add a macro `mrb_frozen_p` that points to `MRB_FROZEN_P`.Yukihiro "Matz" Matsumoto
2019-07-04It was too early to check `key` for `undef`; ref #4534Yukihiro "Matz" Matsumoto
2019-06-27Skip copying delete keys in a hash; fix #4534Yukihiro "Matz" Matsumoto
2019-06-22Refine `Hash#rehash` example [ci skip]KOBAYASHI Shuji
Previous example doesn't work because string key (frozen) can't be modified.
2019-04-14Fix memory leak for hash table index if occur out of memorydearblue
2019-04-09Extract frozen checking to functionKOBAYASHI Shuji
2019-02-11Should not copy keys&values when a hash table is empty; fix #4270Yukihiro "Matz" Matsumoto
2018-12-17Small refactoring of #4188Yukihiro "Matz" Matsumoto
2018-12-14Add `mrb_hash_size()` function.dearblue