summaryrefslogtreecommitdiffhomepage
path: root/src/hash.c
AgeCommit message (Collapse)Author
2021-03-17hash.c: `Hash#shift` to return `nil` when a hash is empty.Yukihiro "Matz" Matsumoto
It used to be return the default value if available, but it should ignore the default value for behavior consistency. CRuby will adopt this behavior too in the future. [ruby-bugs:16908]
2021-03-08ISO C99 doesn't support unnamed unions; fix #5354Yukihiro "Matz" Matsumoto
2021-02-12Avoid possibility of reading uninitialized areas in `h_check_modified()`KOBAYASHI Shuji
In `h_check_modified()`, in the case of `MRB_NO_BOXING`, `ht_ea()` or `ht_ea_capa()` for AR may read uninitialized area. Therefore, do not use those macros for AR in `MRB_NO_BOXING` (but in the case of `MRB_64BIT`, `ht_ea_capa()` is the same as `ar_ea_capa()`, so use it). fix #5332
2021-02-10Fix heap-buffer-overflow for small `Hash` (HT) in `Hash#rehash`KOBAYASHI Shuji
### Example ##### example.rb ```ruby h = {} (1..17).each{h[_1] = _1} (1..16).each{h.delete(_1)} h.rehash ``` ##### ASAN report ```console $ bin/mruby example.rb ==52587==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000006998 at pc 0x55a29cddf96b bp 0x7fff7b1b1720 sp 0x7fff7b1b1710 READ of size 4 at 0x602000006998 thread T0 #0 0x55a29cddf96a in ib_it_next /mruby/src/hash.c:639 #1 0x55a29cde2ca2 in ht_rehash /mruby/src/hash.c:900 #2 0x55a29cde379f in h_rehash /mruby/src/hash.c:996 #3 0x55a29cde7f3d in mrb_hash_rehash /mruby/src/hash.c:1735 #4 0x55a29ce77b62 in mrb_vm_exec /mruby/src/vm.c:1451 #5 0x55a29ce5fa88 in mrb_vm_run /mruby/src/vm.c:981 #6 0x55a29ceb87e1 in mrb_top_run /mruby/src/vm.c:2874 #7 0x55a29cf36bdf in mrb_load_exec mrbgems/mruby-compiler/core/parse.y:6805 #8 0x55a29cf36f25 in mrb_load_detect_file_cxt mrbgems/mruby-compiler/core/parse.y:6848 #9 0x55a29cdba0a2 in main /mruby/mrbgems/mruby-bin-mruby/tools/mruby/mruby.c:347 #10 0x7f24ef43b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) #11 0x55a29cdb4a6d in _start (/mruby/bin/mruby+0x2a3a6d) 0x602000006998 is located 0 bytes to the right of 8-byte region [0x602000006990,0x602000006998) allocated by thread T0 here: #0 0x7f24f01cfffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x55a29ceb9440 in mrb_default_allocf /mruby/src/state.c:68 #2 0x55a29cdba747 in mrb_realloc_simple /mruby/src/gc.c:228 #3 0x55a29cdba928 in mrb_realloc /mruby/src/gc.c:242 #4 0x55a29cde12e5 in ht_init /mruby/src/hash.c:749 #5 0x55a29cde2b8e in ht_rehash /mruby/src/hash.c:897 #6 0x55a29cde379f in h_rehash /mruby/src/hash.c:996 #7 0x55a29cde7f3d in mrb_hash_rehash /mruby/src/hash.c:1735 #8 0x55a29ce77b62 in mrb_vm_exec /mruby/src/vm.c:1451 #9 0x55a29ce5fa88 in mrb_vm_run /mruby/src/vm.c:981 #10 0x55a29ceb87e1 in mrb_top_run /mruby/src/vm.c:2874 #11 0x55a29cf36bdf in mrb_load_exec mrbgems/mruby-compiler/core/parse.y:6805 #12 0x55a29cf36f25 in mrb_load_detect_file_cxt mrbgems/mruby-compiler/core/parse.y:6848 #13 0x55a29cdba0a2 in main /mruby/mrbgems/mruby-bin-mruby/tools/mruby/mruby.c:347 #14 0x7f24ef43b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) ```
2021-01-26Revert "Minimize the changes in #5277"Yukihiro "Matz" Matsumoto
This reverts commit dc51d89ac22acc60b9bfeed87115863565b74085.
2021-01-22Minimize the changes in #5277Yukihiro "Matz" Matsumoto
Instead of including `mruby/presym.h` everywhere, we provided the fallback `mruby/presym.inc` under `include/mruby` directory, and specify `-I<build-dir>/include` before `-I<top-dir>/include` in `presym.rake`. So even when someone drops `-I<build-dir>/include` in compiler options, it just compiles without failure.
2021-01-21Merge branch 'avoid-including-presym.inc-in-existing-header-files' of ↵Yukihiro "Matz" Matsumoto
https://github.com/shuujii/mruby into shuujii-avoid-including-presym.inc-in-existing-header-files
2021-01-18Fix that Hash may not contain any empty bucketsKOBAYASHI Shuji
The Hash implementation assumed that there were always empty buckets, but sometimes there were only active or deleted buckets (no empty buckets). Therefore, fix it so that this situation does not occur. ### Example ```ruby # example.rb class A attr_reader :v def initialize(v) @v = v end def ==(o) @v == o.v end def hash; @v end def to_s; "#{self.class}[#{@v}]" end alias eql? == alias inspect to_s end keys = (0..31).map{A.new(_1)} h = {} (0..16).each{h[keys[_1]] = _1} (17..31).each do k = keys[_1] h[k] = _1 h.delete(k) end p h.keys ``` #### Before this patch: ```console $ bin/mruby example.rb [A[0], A[1], A[2], A[3], A[4], A[5], A[6], A[7], A[8], A[9], A[10], A[11], A[12], A[13], A[14], A[15], A[16], A[30], A[31]] ``` #### After this patch: ```console $ bin/mruby example.rb [A[0], A[1], A[2], A[3], A[4], A[5], A[6], A[7], A[8], A[9], A[10], A[11], A[12], A[13], A[14], A[15], A[16]] ```
2021-01-11Add missing cast in `ea_next_capa_for`KOBAYASHI Shuji
2021-01-11Avoid including `presym.inc` in existing header filesKOBAYASHI Shuji
Addressed an issue where existing programs linking `libmruby.a` could only be built by adding `<build-dir>/include` to compiler's include path.
2021-01-03Avoid 64-bit operations in `src/hash.c`; close #5201KOBAYASHI Shuji
The idea of using `size_t` in `ea_next_capa_for` is by @dearblue.
2020-11-18Use `mrb_int_value` instead of `mrb_fixnum_value` in `src/hash.c`KOBAYASHI Shuji
2020-11-13Include size of iv table in `ObjectSpace.memsize_of` to `Hash` objectKOBAYASHI Shuji
2020-11-13Rename `mrb_os_memsize_of_hash_table` to `mrb_hash_memsize`KOBAYASHI Shuji
* The term `hash_table` can be misleading because the return value of this function includes memory usage of entire `Hash` object, including not only hash table part but also entry list part, etc. * This function takes a `Hash` object as a receiver and is defined in `src/hash.c`, so it is natural to have a `mrb_hash_` prefix.
2020-11-10Reduce memory usage of Hash objectKOBAYASHI Shuji
## Implementation Summary * Change entry list from segmented list to flat array. * Change value of hash bucket from pointer to entry to index of entry list, and represent it by variable length bits according to capacity of hash buckets. * Store management information about entry list and hash table to `struct RHash` as much as possible. ## Benchmark Summary Only the results of typical situations on 64-bit Word-boxing are present here. For more detailed information, including consideration, see below (although most of the body is written in Japanese). * https://shuujii.github.io/mruby-hash-benchmark ### Memory Usage Lower value is better. | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 344B | 256B | 0.74419x | | 40 | 1,464B | 840B | 0.57377x | | 200 | 8,056B | 3,784B | 0.46971x | | 500 | 17,169B | 9,944B | 0.57949x | ### Performance Higher value is better. #### `mrb_hash_set` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 1.41847M i/s | 1.36004M i/s | 0.95881x | | 40 | 0.39224M i/s | 0.31888M i/s | 0.81296x | | 200 | 0.03780M i/s | 0.04290M i/s | 1.13494x | | 500 | 0.01225M i/s | 0.01314M i/s | 1.07275x | #### `mrb_hash_get` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 26.05920M i/s | 30.19543M i/s | 1.15872x | | 40 | 44.26420M i/s | 32.75781M i/s | 0.74005x | | 200 | 44.55171M i/s | 31.56926M i/s | 0.70860x | | 500 | 39.19250M i/s | 29.73806M i/s | 0.75877x | #### `mrb_hash_each` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 25.11964M i/s | 30.34167M i/s | 1.20789x | | 40 | 11.74253M i/s | 13.25539M i/s | 1.12884x | | 200 | 2.01133M i/s | 2.97214M i/s | 1.47770x | | 500 | 0.87411M i/s | 1.21178M i/s | 1.38631x | #### `Hash#[]=` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.50095M i/s | 0.56490M i/s | 1.12764x | | 40 | 0.19132M i/s | 0.18392M i/s | 0.96129x | | 200 | 0.03624M i/s | 0.03256M i/s | 0.89860x | | 500 | 0.01527M i/s | 0.01236M i/s | 0.80935x | #### `Hash#[]` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 11.53211M i/s | 12.78806M i/s | 1.10891x | | 40 | 15.26920M i/s | 13.37529M i/s | 0.87596x | | 200 | 15.28550M i/s | 13.36410M i/s | 0.87430x | | 500 | 14.57695M i/s | 12.75388M i/s | 0.87494x | #### `Hash#each` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.30462M i/s | 0.27080M i/s | 0.88898x | | 40 | 0.12912M i/s | 0.11704M i/s | 0.90642x | | 200 | 0.02638M i/s | 0.02402M i/s | 0.91071x | | 500 | 0.01066M i/s | 0.00959M i/s | 0.89953x | #### `Hash#delete` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 7.84167M i/s | 6.96419M i/s | 0.88810x | | 40 | 6.91292M i/s | 7.41427M i/s | 1.07252x | | 200 | 3.75952M i/s | 7.32080M i/s | 1.94727x | | 500 | 2.10754M i/s | 7.05963M i/s | 3.34970x | #### `Hash#shift` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 14.66444M i/s | 13.18876M i/s | 0.89937x | | 40 | 11.95124M i/s | 11.10420M i/s | 0.92913x | | 200 | 5.53681M i/s | 7.88155M i/s | 1.42348x | | 500 | 2.96728M i/s | 5.40405M i/s | 1.82121x | #### `Hash#dup` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.15063M i/s | 5.37889M i/s | 35.71024x | | 40 | 0.06515M i/s | 3.38196M i/s | 51.91279x | | 200 | 0.01359M i/s | 1.46538M i/s | 107.84056x | | 500 | 0.00559M i/s | 0.75411M i/s | 134.88057x | ### Binary Size Lower value is better. | File | Baseline | New | Factor | |:-----------|--------------:|--------------:|----------:| | mruby | 730,408B | 734,176B | 1.00519x | | libmruby.a | 1,068,134B | 1,072,846B | 1.00441x | ## Other Fixes The following issues have also been fixed in the parts where there was some change this time. * [Heap use-after-free in `Hash#value?`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-hash-value-md) * [Heap use-after-free in `ht_hash_equal`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_equal-md) * [Heap use-after-free in `ht_hash_func`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_func-md) * [Heap use-after-free in `mrb_hash_merge`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-mrb_hash_merge-md) * [Self-replacement does not work for `Hash#replace`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-self-replacement-does-not-work-for-hash-replace-md) * [Repeated deletes and inserts increase memory usage of `Hash`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-repeated-deletes-and-inserts-increase-memory-usage-of-hash-md) * [`Hash#rehash` does not reindex completely](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-hash-rehash-does-not-reindex-completely-md) * `mrb_hash_delete_key` does not cause an error for frozen object * `mrb_hash_new_capa` does not allocate required space first * [`mrb_os_memsize_of_hash_table` result is incorrect](https://github.com/mruby/mruby/pull/5032#discussion_r457994075)
2020-10-12Reorganize `Integer` system.Yukihiro "Matz" Matsumoto
- Integrate `Fixnum` and `Integer` - Remove `Integral` - `int / int -> int` - Replace `mrb_fixnum()` to `mrb_int()` - Replace `mrb_fixnum_value()` to `mrb_int_value()`. - Use `mrb_integer_p()` instead of `mrb_fixnum_p()`
2020-10-12Rename `MRB_TT_FIXNUM` to `MRB_TT_INTEGER`.Yukihiro "Matz" Matsumoto
We still have `#define MRB_TT_FIXNUM MRB_TT_INTEGER` for compatibility.
2020-10-12Fix `Fixnum` and `Float` comparison in `Hash` lookupKOBAYASHI Shuji
```console $ bin/mruby -e 'p({1 => 2}.key?(1.0))' true ``` ```console $ bin/mruby -e 'p({1 => 2}.key?(1.0))' false ```
2020-10-12Rename float configuration option names.Yukihiro "Matz" Matsumoto
- `MRB_WITHOUT_FLOAT` => `MRB_NO_FLOAT` - `MRB_USE_FLOAT` => `MRB_USE_FLOAT32` The former is to use `USE_XXX` naming convention. The latter is to make sure `float` is 32bit float and not floating point number in general.
2020-10-12Rename `mrb_hash_modify` to `hash_modify`.Yukihiro "Matz" Matsumoto
Since it's an internal static function.
2020-10-12Use `mrb_funcall_id()` extensively.Yukihiro "Matz" Matsumoto
Except for support files e.g. `mruby-test/driver.c`, which are not target of symbol collection via `rake gensym`.
2020-10-12Add `MRB_SYM()` for inline symbols.Yukihiro "Matz" Matsumoto
2020-08-11Fix `mrb_int` and `size_t` combination warnings.Yukihiro "Matz" Matsumoto
2020-08-03Initialized local variables in `mrb_hash_shift()`.Yukihiro "Matz" Matsumoto
2020-07-25Use type tag for hash code in `ht_hash_func()`KOBAYASHI Shuji
The function corresponding to `ht_hash_func()` was as follows in the days of khash implementation (before d78acc7a). ```c mrb_hash_ht_hash_func(mrb_state *mrb, mrb_value key) { enum mrb_vtype t = mrb_type(key); ... switch (t) { ... default: hv = mrb_funcall(mrb, key, "hash", 0); h = (khint_t)t ^ (khint_t)mrb_fixnum(hv); break; } ... } ``` When switched to the segmented list implementation (d78acc7a), this function was changed as follows. ```c sg_hash_func(mrb_state *mrb, seglist *t, mrb_value key) { enum mrb_vtype tt = mrb_type(key); ... switch (tt) { ... default: hv = mrb_funcall(mrb, key, "hash", 0); h = (size_t)t ^ (size_t)mrb_fixnum(hv); break; } ... } ``` Since the argument `t` was added, the variable for type tag was changed from `t` to `tt`, but the variable used in the expression of `h` remained `t`. Probably this is an omission of change, so fixed it.
2020-07-23Fix a bug with `ht_index` called with `size==0`; fix #5046Yukihiro "Matz" Matsumoto
It happens when a hash made empty calls `rehash`.
2020-07-15mrb_ prefix conventionRory O'Connell
2020-07-13Use size of hash's table in calculationRory OConnell
2020-06-20Add `mrb_get_arg1()` that retrieves single (and only) argument.Yukihiro "Matz" Matsumoto
`mrb_get_arg1()` raises `ArgumentError` if the method does not receive one argument. And replaces all `mrb_get_args(mrb, "o", &arg)` by the new function.
2020-06-05Add proper casts to silence VC warnings.Yukihiro "Matz" Matsumoto
2020-05-15Unify `eql?` receiver in `Hash` according to RubyKOBAYASHI Shuji
### Example ```ruby # example.rb class A def eql?(o) p self.class super end def hash 1 end end class B < A; end h = {A.new => 1} h[B.new] ``` #### Before this patch: ```console $ bin/mruby example.rb A ``` #### After this patch (same as Ruby) ```console $ bin/mruby example.rb B ```
2020-01-31Avoid implicit integer casting in `backtrace.c` and `hash.c`.Yukihiro "Matz" Matsumoto
2020-01-01Rename `mrb_num_args_error` to `mrb_argnum_error`; ref #4863Yukihiro "Matz" Matsumoto
2019-12-12Add `mrb_num_args_error()` for "wrong number of arguments" errorKOBAYASHI Shuji
To unify the style of messages.
2019-11-02Fix argument specs to `Hash`KOBAYASHI Shuji
2019-09-26Use type predicate macros instead of `mrb_type` if possibleKOBAYASHI Shuji
For efficiency with `MRB_WORD_BOXING` (implement type predicate macros for all `enum mrb_vtype`).
2019-09-14Add a macro `mrb_frozen_p` that points to `MRB_FROZEN_P`.Yukihiro "Matz" Matsumoto
2019-07-04It was too early to check `key` for `undef`; ref #4534Yukihiro "Matz" Matsumoto
2019-06-27Skip copying delete keys in a hash; fix #4534Yukihiro "Matz" Matsumoto
2019-06-22Refine `Hash#rehash` example [ci skip]KOBAYASHI Shuji
Previous example doesn't work because string key (frozen) can't be modified.
2019-04-14Fix memory leak for hash table index if occur out of memorydearblue
2019-04-09Extract frozen checking to functionKOBAYASHI Shuji
2019-02-11Should not copy keys&values when a hash table is empty; fix #4270Yukihiro "Matz" Matsumoto
2018-12-17Small refactoring of #4188Yukihiro "Matz" Matsumoto
2018-12-14Add `mrb_hash_size()` function.dearblue
2018-12-11Avoid using floating point number for HT_SEG_INCREASE_RATIO; ref #4182Yukihiro "Matz" Matsumoto
2018-12-11Rename `ht_foreach_func` to `mrb_hash_foreach_func`.Yukihiro "Matz" Matsumoto
2018-12-11Update comments.Yukihiro "Matz" Matsumoto
2018-12-11Add API function `mrb_hash_foreach()` to iterate over items in a hash.Yukihiro "Matz" Matsumoto
2018-11-19Removed `to_hash` conversion method.Yukihiro "Matz" Matsumoto