| Age | Commit message (Collapse) | Author |
|
|
|
|
|
* The term `hash_table` can be misleading because the return value of this
function includes memory usage of entire `Hash` object, including not only
hash table part but also entry list part, etc.
* This function takes a `Hash` object as a receiver and is defined in
`src/hash.c`, so it is natural to have a `mrb_hash_` prefix.
|
|
## Implementation Summary
* Change entry list from segmented list to flat array.
* Change value of hash bucket from pointer to entry to index of entry list,
and represent it by variable length bits according to capacity of hash
buckets.
* Store management information about entry list and hash table to `struct
RHash` as much as possible.
## Benchmark Summary
Only the results of typical situations on 64-bit Word-boxing are present
here. For more detailed information, including consideration, see below
(although most of the body is written in Japanese).
* https://shuujii.github.io/mruby-hash-benchmark
### Memory Usage
Lower value is better.
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 344B | 256B | 0.74419x |
| 40 | 1,464B | 840B | 0.57377x |
| 200 | 8,056B | 3,784B | 0.46971x |
| 500 | 17,169B | 9,944B | 0.57949x |
### Performance
Higher value is better.
#### `mrb_hash_set`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 1.41847M i/s | 1.36004M i/s | 0.95881x |
| 40 | 0.39224M i/s | 0.31888M i/s | 0.81296x |
| 200 | 0.03780M i/s | 0.04290M i/s | 1.13494x |
| 500 | 0.01225M i/s | 0.01314M i/s | 1.07275x |
#### `mrb_hash_get`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 26.05920M i/s | 30.19543M i/s | 1.15872x |
| 40 | 44.26420M i/s | 32.75781M i/s | 0.74005x |
| 200 | 44.55171M i/s | 31.56926M i/s | 0.70860x |
| 500 | 39.19250M i/s | 29.73806M i/s | 0.75877x |
#### `mrb_hash_each`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 25.11964M i/s | 30.34167M i/s | 1.20789x |
| 40 | 11.74253M i/s | 13.25539M i/s | 1.12884x |
| 200 | 2.01133M i/s | 2.97214M i/s | 1.47770x |
| 500 | 0.87411M i/s | 1.21178M i/s | 1.38631x |
#### `Hash#[]=`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 0.50095M i/s | 0.56490M i/s | 1.12764x |
| 40 | 0.19132M i/s | 0.18392M i/s | 0.96129x |
| 200 | 0.03624M i/s | 0.03256M i/s | 0.89860x |
| 500 | 0.01527M i/s | 0.01236M i/s | 0.80935x |
#### `Hash#[]`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 11.53211M i/s | 12.78806M i/s | 1.10891x |
| 40 | 15.26920M i/s | 13.37529M i/s | 0.87596x |
| 200 | 15.28550M i/s | 13.36410M i/s | 0.87430x |
| 500 | 14.57695M i/s | 12.75388M i/s | 0.87494x |
#### `Hash#each`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 0.30462M i/s | 0.27080M i/s | 0.88898x |
| 40 | 0.12912M i/s | 0.11704M i/s | 0.90642x |
| 200 | 0.02638M i/s | 0.02402M i/s | 0.91071x |
| 500 | 0.01066M i/s | 0.00959M i/s | 0.89953x |
#### `Hash#delete`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 7.84167M i/s | 6.96419M i/s | 0.88810x |
| 40 | 6.91292M i/s | 7.41427M i/s | 1.07252x |
| 200 | 3.75952M i/s | 7.32080M i/s | 1.94727x |
| 500 | 2.10754M i/s | 7.05963M i/s | 3.34970x |
#### `Hash#shift`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 14.66444M i/s | 13.18876M i/s | 0.89937x |
| 40 | 11.95124M i/s | 11.10420M i/s | 0.92913x |
| 200 | 5.53681M i/s | 7.88155M i/s | 1.42348x |
| 500 | 2.96728M i/s | 5.40405M i/s | 1.82121x |
#### `Hash#dup`
| Hash Size | Baseline | New | Factor |
|----------:|--------------:|--------------:|-----------:|
| 16 | 0.15063M i/s | 5.37889M i/s | 35.71024x |
| 40 | 0.06515M i/s | 3.38196M i/s | 51.91279x |
| 200 | 0.01359M i/s | 1.46538M i/s | 107.84056x |
| 500 | 0.00559M i/s | 0.75411M i/s | 134.88057x |
### Binary Size
Lower value is better.
| File | Baseline | New | Factor |
|:-----------|--------------:|--------------:|----------:|
| mruby | 730,408B | 734,176B | 1.00519x |
| libmruby.a | 1,068,134B | 1,072,846B | 1.00441x |
## Other Fixes
The following issues have also been fixed in the parts where there was some
change this time.
* [Heap use-after-free in `Hash#value?`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-hash-value-md)
* [Heap use-after-free in `ht_hash_equal`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_equal-md)
* [Heap use-after-free in `ht_hash_func`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-ht_hash_func-md)
* [Heap use-after-free in `mrb_hash_merge`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-heap-use-after-free-in-mrb_hash_merge-md)
* [Self-replacement does not work for `Hash#replace`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-self-replacement-does-not-work-for-hash-replace-md)
* [Repeated deletes and inserts increase memory usage of `Hash`](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-repeated-deletes-and-inserts-increase-memory-usage-of-hash-md)
* [`Hash#rehash` does not reindex completely](https://gist.github.com/shuujii/30e4fcd5844a4112a0ecd4a5b3483101#file-hash-rehash-does-not-reindex-completely-md)
* `mrb_hash_delete_key` does not cause an error for frozen object
* `mrb_hash_new_capa` does not allocate required space first
* [`mrb_os_memsize_of_hash_table` result is incorrect](https://github.com/mruby/mruby/pull/5032#discussion_r457994075)
|
|
- Integrate `Fixnum` and `Integer`
- Remove `Integral`
- `int / int -> int`
- Replace `mrb_fixnum()` to `mrb_int()`
- Replace `mrb_fixnum_value()` to `mrb_int_value()`.
- Use `mrb_integer_p()` instead of `mrb_fixnum_p()`
|
|
We still have `#define MRB_TT_FIXNUM MRB_TT_INTEGER` for compatibility.
|
|
```console
$ bin/mruby -e 'p({1 => 2}.key?(1.0))'
true
```
```console
$ bin/mruby -e 'p({1 => 2}.key?(1.0))'
false
```
|
|
- `MRB_WITHOUT_FLOAT` => `MRB_NO_FLOAT`
- `MRB_USE_FLOAT` => `MRB_USE_FLOAT32`
The former is to use `USE_XXX` naming convention. The latter is to make
sure `float` is 32bit float and not floating point number in general.
|
|
Since it's an internal static function.
|
|
Except for support files e.g. `mruby-test/driver.c`, which are not
target of symbol collection via `rake gensym`.
|
|
|
|
|
|
|
|
The function corresponding to `ht_hash_func()` was as follows in the days of
khash implementation (before d78acc7a).
```c
mrb_hash_ht_hash_func(mrb_state *mrb, mrb_value key)
{
enum mrb_vtype t = mrb_type(key);
...
switch (t) {
...
default:
hv = mrb_funcall(mrb, key, "hash", 0);
h = (khint_t)t ^ (khint_t)mrb_fixnum(hv);
break;
}
...
}
```
When switched to the segmented list implementation (d78acc7a), this function
was changed as follows.
```c
sg_hash_func(mrb_state *mrb, seglist *t, mrb_value key)
{
enum mrb_vtype tt = mrb_type(key);
...
switch (tt) {
...
default:
hv = mrb_funcall(mrb, key, "hash", 0);
h = (size_t)t ^ (size_t)mrb_fixnum(hv);
break;
}
...
}
```
Since the argument `t` was added, the variable for type tag was changed from
`t` to `tt`, but the variable used in the expression of `h` remained `t`.
Probably this is an omission of change, so fixed it.
|
|
It happens when a hash made empty calls `rehash`.
|
|
|
|
|
|
`mrb_get_arg1()` raises `ArgumentError` if the method does not receive one
argument.
And replaces all `mrb_get_args(mrb, "o", &arg)` by the new function.
|
|
|
|
### Example
```ruby
# example.rb
class A
def eql?(o)
p self.class
super
end
def hash
1
end
end
class B < A; end
h = {A.new => 1}
h[B.new]
```
#### Before this patch:
```console
$ bin/mruby example.rb
A
```
#### After this patch (same as Ruby)
```console
$ bin/mruby example.rb
B
```
|
|
|
|
|
|
To unify the style of messages.
|
|
|
|
For efficiency with `MRB_WORD_BOXING` (implement type predicate macros for
all `enum mrb_vtype`).
|
|
|
|
|
|
|
|
Previous example doesn't work because string key (frozen) can't be modified.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The GC may occur between `sg_shift` and `mrb_assoc_new`, in which
case `key` and `value` could be freed even tough they are still alive.
The issue is found and fixed by https://hackerone.com/hexodus
|
|
|
|
|
|
|
|
|
|
|
|
|
|
That means entry table should be compacted periodically by `sg_compact()`.
|
|
I know it's not hash at all, but reduce memory consumption.
|
|
|