summaryrefslogtreecommitdiffhomepage
path: root/src/hash.c
AgeCommit message (Collapse)Author
2021-01-11Add missing cast in `ea_next_capa_for`KOBAYASHI Shuji
2021-01-03Avoid 64-bit operations in `src/hash.c`; close #5201KOBAYASHI Shuji
The idea of using `size_t` in `ea_next_capa_for` is by @dearblue.
2020-11-18Use `mrb_int_value` instead of `mrb_fixnum_value` in `src/hash.c`KOBAYASHI Shuji
2020-11-13Include size of iv table in `ObjectSpace.memsize_of` to `Hash` objectKOBAYASHI Shuji
2020-11-13Rename `mrb_os_memsize_of_hash_table` to `mrb_hash_memsize`KOBAYASHI Shuji
* The term `hash_table` can be misleading because the return value of this function includes memory usage of entire `Hash` object, including not only hash table part but also entry list part, etc. * This function takes a `Hash` object as a receiver and is defined in `src/hash.c`, so it is natural to have a `mrb_hash_` prefix.
2020-11-10Reduce memory usage of Hash objectKOBAYASHI Shuji
## Implementation Summary * Change entry list from segmented list to flat array. * Change value of hash bucket from pointer to entry to index of entry list, and represent it by variable length bits according to capacity of hash buckets. * Store management information about entry list and hash table to `struct RHash` as much as possible. ## Benchmark Summary Only the results of typical situations on 64-bit Word-boxing are present here. For more detailed information, including consideration, see below (although most of the body is written in Japanese). * https://shuujii.github.io/mruby-hash-benchmark ### Memory Usage Lower value is better. | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 344B | 256B | 0.74419x | | 40 | 1,464B | 840B | 0.57377x | | 200 | 8,056B | 3,784B | 0.46971x | | 500 | 17,169B | 9,944B | 0.57949x | ### Performance Higher value is better. #### `mrb_hash_set` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 1.41847M i/s | 1.36004M i/s | 0.95881x | | 40 | 0.39224M i/s | 0.31888M i/s | 0.81296x | | 200 | 0.03780M i/s | 0.04290M i/s | 1.13494x | | 500 | 0.01225M i/s | 0.01314M i/s | 1.07275x | #### `mrb_hash_get` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 26.05920M i/s | 30.19543M i/s | 1.15872x | | 40 | 44.26420M i/s | 32.75781M i/s | 0.74005x | | 200 | 44.55171M i/s | 31.56926M i/s | 0.70860x | | 500 | 39.19250M i/s | 29.73806M i/s | 0.75877x | #### `mrb_hash_each` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 25.11964M i/s | 30.34167M i/s | 1.20789x | | 40 | 11.74253M i/s | 13.25539M i/s | 1.12884x | | 200 | 2.01133M i/s | 2.97214M i/s | 1.47770x | | 500 | 0.87411M i/s | 1.21178M i/s | 1.38631x | #### `Hash#[]=` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.50095M i/s | 0.56490M i/s | 1.12764x | | 40 | 0.19132M i/s | 0.18392M i/s | 0.96129x | | 200 | 0.03624M i/s | 0.03256M i/s | 0.89860x | | 500 | 0.01527M i/s | 0.01236M i/s | 0.80935x | #### `Hash#[]` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 11.53211M i/s | 12.78806M i/s | 1.10891x | | 40 | 15.26920M i/s | 13.37529M i/s | 0.87596x | | 200 | 15.28550M i/s | 13.36410M i/s | 0.87430x | | 500 | 14.57695M i/s | 12.75388M i/s | 0.87494x | #### `Hash#each` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.30462M i/s | 0.27080M i/s | 0.88898x | | 40 | 0.12912M i/s | 0.11704M i/s | 0.90642x | | 200 | 0.02638M i/s | 0.02402M i/s | 0.91071x | | 500 | 0.01066M i/s | 0.00959M i/s | 0.89953x | #### `Hash#delete` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 7.84167M i/s | 6.96419M i/s | 0.88810x | | 40 | 6.91292M i/s | 7.41427M i/s | 1.07252x | | 200 | 3.75952M i/s | 7.32080M i/s | 1.94727x | | 500 | 2.10754M i/s | 7.05963M i/s | 3.34970x | #### `Hash#shift` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 14.66444M i/s | 13.18876M i/s | 0.89937x | | 40 | 11.95124M i/s | 11.10420M i/s | 0.92913x | | 200 | 5.53681M i/s | 7.88155M i/s | 1.42348x | | 500 | 2.96728M i/s | 5.40405M i/s | 1.82121x | #### `Hash#dup` | Hash Size | Baseline | New | Factor | |----------:|--------------:|--------------:|-----------:| | 16 | 0.15063M i/s | 5.37889M i/s | 35.71024x | | 40 | 0.06515M i/s | 3.38196M i/s | 51.91279x | | 200 | 0.01359M i/s | 1.46538M i/s | 107.84056x | | 500 | 0.00559M i/s | 0.75411M i/s | 134.88057x | ### Binary Size Lower value is better. | File | Baseline | New | Factor | |:-----------|--------------:|--------------:|----------:| | mruby | 730,408B | 734,176B | 1.00519x | | libmruby.a | 1,068,134B | 1,072,846B | 1.00441x | ## Other Fixes The following issues have also been fixed in the parts where there was some change this time. * [Heap use-after-free in `Hash#value?`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-hash-value-md) * [Heap use-after-free in `ht_hash_equal`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_equal-md) * [Heap use-after-free in `ht_hash_func`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_func-md) * [Heap use-after-free in `mrb_hash_merge`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-mrb_hash_merge-md) * [Self-replacement does not work for `Hash#replace`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-self-replacement-does-not-work-for-hash-replace-md) * [Repeated deletes and inserts increase memory usage of `Hash`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-repeated-deletes-and-inserts-increase-memory-usage-of-hash-md) * [`Hash#rehash` does not reindex completely](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-hash-rehash-does-not-reindex-completely-md) * `mrb_hash_delete_key` does not cause an error for frozen object * `mrb_hash_new_capa` does not allocate required space first * [`mrb_os_memsize_of_hash_table` result is incorrect](https://github.com/mruby/mruby/pull/5032#discussion_r457994075)
2020-10-12Reorganize `Integer` system.Yukihiro "Matz" Matsumoto
- Integrate `Fixnum` and `Integer` - Remove `Integral` - `int / int -> int` - Replace `mrb_fixnum()` to `mrb_int()` - Replace `mrb_fixnum_value()` to `mrb_int_value()`. - Use `mrb_integer_p()` instead of `mrb_fixnum_p()`
2020-10-12Rename `MRB_TT_FIXNUM` to `MRB_TT_INTEGER`.Yukihiro "Matz" Matsumoto
We still have `#define MRB_TT_FIXNUM MRB_TT_INTEGER` for compatibility.
2020-10-12Fix `Fixnum` and `Float` comparison in `Hash` lookupKOBAYASHI Shuji
```console $ bin/mruby -e 'p({1 => 2}.key?(1.0))' true ``` ```console $ bin/mruby -e 'p({1 => 2}.key?(1.0))' false ```
2020-10-12Rename float configuration option names.Yukihiro "Matz" Matsumoto
- `MRB_WITHOUT_FLOAT` => `MRB_NO_FLOAT` - `MRB_USE_FLOAT` => `MRB_USE_FLOAT32` The former is to use `USE_XXX` naming convention. The latter is to make sure `float` is 32bit float and not floating point number in general.
2020-10-12Rename `mrb_hash_modify` to `hash_modify`.Yukihiro "Matz" Matsumoto
Since it's an internal static function.
2020-10-12Use `mrb_funcall_id()` extensively.Yukihiro "Matz" Matsumoto
Except for support files e.g. `mruby-test/driver.c`, which are not target of symbol collection via `rake gensym`.
2020-10-12Add `MRB_SYM()` for inline symbols.Yukihiro "Matz" Matsumoto
2020-08-11Fix `mrb_int` and `size_t` combination warnings.Yukihiro "Matz" Matsumoto
2020-08-03Initialized local variables in `mrb_hash_shift()`.Yukihiro "Matz" Matsumoto
2020-07-25Use type tag for hash code in `ht_hash_func()`KOBAYASHI Shuji
The function corresponding to `ht_hash_func()` was as follows in the days of khash implementation (before d78acc7a). ```c mrb_hash_ht_hash_func(mrb_state *mrb, mrb_value key) { enum mrb_vtype t = mrb_type(key); ... switch (t) { ... default: hv = mrb_funcall(mrb, key, "hash", 0); h = (khint_t)t ^ (khint_t)mrb_fixnum(hv); break; } ... } ``` When switched to the segmented list implementation (d78acc7a), this function was changed as follows. ```c sg_hash_func(mrb_state *mrb, seglist *t, mrb_value key) { enum mrb_vtype tt = mrb_type(key); ... switch (tt) { ... default: hv = mrb_funcall(mrb, key, "hash", 0); h = (size_t)t ^ (size_t)mrb_fixnum(hv); break; } ... } ``` Since the argument `t` was added, the variable for type tag was changed from `t` to `tt`, but the variable used in the expression of `h` remained `t`. Probably this is an omission of change, so fixed it.
2020-07-23Fix a bug with `ht_index` called with `size==0`; fix #5046Yukihiro "Matz" Matsumoto
It happens when a hash made empty calls `rehash`.
2020-07-15mrb_ prefix conventionRory O'Connell
2020-07-13Use size of hash's table in calculationRory OConnell
2020-06-20Add `mrb_get_arg1()` that retrieves single (and only) argument.Yukihiro "Matz" Matsumoto
`mrb_get_arg1()` raises `ArgumentError` if the method does not receive one argument. And replaces all `mrb_get_args(mrb, "o", &arg)` by the new function.
2020-06-05Add proper casts to silence VC warnings.Yukihiro "Matz" Matsumoto
2020-05-15Unify `eql?` receiver in `Hash` according to RubyKOBAYASHI Shuji
### Example ```ruby # example.rb class A def eql?(o) p self.class super end def hash 1 end end class B < A; end h = {A.new => 1} h[B.new] ``` #### Before this patch: ```console $ bin/mruby example.rb A ``` #### After this patch (same as Ruby) ```console $ bin/mruby example.rb B ```
2020-01-31Avoid implicit integer casting in `backtrace.c` and `hash.c`.Yukihiro "Matz" Matsumoto
2020-01-01Rename `mrb_num_args_error` to `mrb_argnum_error`; ref #4863Yukihiro "Matz" Matsumoto
2019-12-12Add `mrb_num_args_error()` for "wrong number of arguments" errorKOBAYASHI Shuji
To unify the style of messages.
2019-11-02Fix argument specs to `Hash`KOBAYASHI Shuji
2019-09-26Use type predicate macros instead of `mrb_type` if possibleKOBAYASHI Shuji
For efficiency with `MRB_WORD_BOXING` (implement type predicate macros for all `enum mrb_vtype`).
2019-09-14Add a macro `mrb_frozen_p` that points to `MRB_FROZEN_P`.Yukihiro "Matz" Matsumoto
2019-07-04It was too early to check `key` for `undef`; ref #4534Yukihiro "Matz" Matsumoto
2019-06-27Skip copying delete keys in a hash; fix #4534Yukihiro "Matz" Matsumoto
2019-06-22Refine `Hash#rehash` example [ci skip]KOBAYASHI Shuji
Previous example doesn't work because string key (frozen) can't be modified.
2019-04-14Fix memory leak for hash table index if occur out of memorydearblue
2019-04-09Extract frozen checking to functionKOBAYASHI Shuji
2019-02-11Should not copy keys&values when a hash table is empty; fix #4270Yukihiro "Matz" Matsumoto
2018-12-17Small refactoring of #4188Yukihiro "Matz" Matsumoto
2018-12-14Add `mrb_hash_size()` function.dearblue
2018-12-11Avoid using floating point number for HT_SEG_INCREASE_RATIO; ref #4182Yukihiro "Matz" Matsumoto
2018-12-11Rename `ht_foreach_func` to `mrb_hash_foreach_func`.Yukihiro "Matz" Matsumoto
2018-12-11Update comments.Yukihiro "Matz" Matsumoto
2018-12-11Add API function `mrb_hash_foreach()` to iterate over items in a hash.Yukihiro "Matz" Matsumoto
2018-11-19Removed `to_hash` conversion method.Yukihiro "Matz" Matsumoto
2018-11-19Improve Hash table using variable sized segments.Yukihiro "Matz" Matsumoto
2018-11-16The key or value object could be reclaimed by GC; fix #4164Yukihiro "Matz" Matsumoto
The GC may occur between `sg_shift` and `mrb_assoc_new`, in which case `key` and `value` could be freed even tough they are still alive. The issue is found and fixed by https://hackerone.com/hexodus
2018-10-20Need to freeze string keys.Yukihiro "Matz" Matsumoto
2018-10-12Should not compare `undef` (deleted) key in hashes; fix #4136Yukihiro "Matz" Matsumoto
2018-10-12Add `NULL` check in `sg_compact()`; fix #4139Yukihiro "Matz" Matsumoto
2018-10-12`Hash#delete` should return the deleted value; fix #4133Yukihiro "Matz" Matsumoto
2018-09-26Implement `Hash#rehash` in C using `sg_compact()`.Yukihiro "Matz" Matsumoto
2018-09-26Add index to larger segment lists for performanceYukihiro "Matz" Matsumoto
2018-09-26Use `mrb_undef_value` for delete mark instead of shifting Hash entry table.Yukihiro "Matz" Matsumoto
That means entry table should be compacted periodically by `sg_compact()`.