| Age | Commit message (Collapse) | Author |
|
Unlike CRuby, there's no way to process strings byte-wise by core
methods because there's no per string encoding in mruby, so that
we moved 3 byte-wise operation methods from `mruby-string-ext` gem.
|
|
|
|
Prioritize embedded string in the following functions:
- `str_new_static`
- `str_new`
- `mrb_str_new_capa`
- `mrb_str_pool`
The reasons are as follows:
- Consistency with `mrb_str_byte_subseq` and `str_replace`.
- Memory locality increases and may be slightly faster.
- No conversion cost to embedded string when modifying the string.
|
|
On 64-bit CPU, there is padding in `RBasic`, so reorder the fields and use
it as buffer of embedded string. This change allows 4 more bytes to be
embedded on 64-bit CPU.
However, an incompatibility will occur if `RString::as::ary` is accessed
directly because `RString` structure has changed.
|
|
|
|
Simplify get arguments
|
|
shuujii/rename-mrb_shared_string-len-to-mrb_shared_string-capa
Rename `mrb_shared_string::len` to `mrb_shared_string::capa`
|
|
Because this field is used as capacity of string buffer.
|
|
|
|
- `mrb_str_index_m()` and `mrb_str_rindex()`
Make `mrb_get_args()` called only once from called twice.
- `mrb_str_byteslice()`
Replace `goto` with `if ~ else`.
|
|
Refactor set/unset string type flags
|
|
Introduce `RSTR_SET_TYPE_FLAG` macro to set the specified string type flag and
clear the others.
|
|
Previously `String#rindex` returned the wrong index when given an
invalid UTF-8 string.
```terminal
% ruby26 -e 'str = "\xf0☀\xf1☁\xf2☂\xf3☃\xf0☀\xf1☁\xf2☂\xf3☃"; p str.rindex("☁")'
11
% ./mruby-head -e 'str = "\xf0☀\xf1☁\xf2☂\xf3☃\xf0☀\xf1☁\xf2☂\xf3☃"; p str.rindex("☁")'
nil
% ./mruby-patched -e 'str = "\xf0☀\xf1☁\xf2☂\xf3☃\xf0☀\xf1☁\xf2☂\xf3☃"; p str.rindex("☁")'
11
```
|
|
I think the string buffer of NOFREE string always exists and does not need
to be released, so it can be shared as another NOFREE string.
Also changed the `mrb_shared_string` field order so that eliminate padding if
`int` and `mrb_int` sizes are less than pointer size.
|
|
|
|
|
|
|
|
|
|
shuujii/mrb_str_modify_keep_ascii-can-embed-one-more-byte
`mrb_str_modify_keep_ascii` can embed one more byte
|
|
|
|
|
|
The condition to make an embedded string was incorrect. Because there were
several similar codes, extracted into `RSTR_EMBEDDABLE_P` macro.
|
|
Contrary to the name, `mrb_to_str` just checks type, no conversion.
|
|
`mrb_string_value_cstr` and `mrb_string_value_len`: obsolete
`mrb_string_cstr`: new function to retrieve NULL terminated C string
`RSTRING_CSTR`: wrapper macro of `mrb_string_cstr`
|
|
The binary sizes (gems are only `mruby-bin-mruby`) are reduced slightly in
my environment than before the introduction of new specifiers/modifiers
(5116789a) with this change.
------------+-------------------+-------------------+--------
BINARY | BEFORE (5116789a) | AFTER (This PR) | RATIO
------------+-------------------+-------------------+--------
mruby | 593416 bytes | 593208 bytes | -0.04%
libmruby.a | 769048 bytes | 767264 bytes | -0.23%
------------+-------------------+-------------------+--------
BTW, I accidentally changed `tasks/toolchains/visualcpp.rake` at #4613,
so I put it back.
|
|
`String#inspect` can set `MRB_STR_ASCII` flag to receiver and return value
because it checks character byte length.
|
|
Functions that are called infrequently need not to be inline.
|
|
|
|
shuujii/keep-MRB_STR_ASCII-flag-in-some-methods-of-String
Keep `MRB_STR_ASCII` flag in some methods of `String`
|
|
|
|
|
|
|
|
|
|
|
|
Based on Boyer-Moore-Horspool algorithm (Quick Search algorithm).
As a side effect, the correct position is returned even if an invalid UTF-8
string is given.
```console
% ./mruby@master -e 'p ("\xd1" * 100 + "#").index("#")'
50
% ./mruby@improve-index -e 'p ("\xd1" * 100 + "#").index("#")'
100
```
The other behavior should be the same as the current implementation.
|
|
|
|
|
|
In #4550, @shuuji proposed the name name `MRB_STR_NO_MULTI_BYTE` for
more precise description. Although I agree that the name name is
correct, but the flag means the string does not contain multi byte UTF-8
characters, i.e. all characters fit in the range of ASCII.
|
|
This patch is showed in #4549.
|
|
It is integration with part of argument parsing used in `mrb_str_aset_m()`.
|
|
The purpose is to eliminate string objects that are temporarily created during processing.
|
|
mruby/mruby/src/string.c:1722:4: warning: label 'bytes' defined but not used [-Wunused-label]
bytes:
^~~~~
|
|
Change to UTF-8 string reversing with in place
|
|
shuujii/fix-String-byteslice-with-MRB_UTF8_STRING-and-some-edge-cases
Fix `String#byteslice` with `MRB_UTF8_STRING` and some edge cases
|
|
Example:
$ bin/mruby -e '
p "あa".byteslice(1)
p "bar".byteslice(3)
p "bar".byteslice(4..0)
'
Before this patch:
"a"
""
RangeError (4..0 out of range)
After this patch (same as Ruby):
"\x81"
nil
nil
|
|
|
|
Now to be calls `mrb_str_modify()` only once when 2 or more characters.
|
|
|
|
|
|
Reverses UTF-8 strings without allocated heap for working memory.
1. String before reversing:
```
"!yburmの界世"
# byte unit
[33, 121, 98, 117, 114, 109, 227, 129, 174, 231, 149, 140, 228, 184, 150]
```
2. Reverse the byte order of each character:
```
[33, 121, 98, 117, 114, 109, 174, 129, 227, 140, 149, 231, 150, 184, 228]
```
3. Reverse the whole byte order and complete:
```
[228, 184, 150, 231, 149, 140, 227, 129, 174, 109, 114, 117, 98, 121, 33]
# string
"世界のmruby!"
```
|