diff options
| author | KOBAYASHI Shuji <[email protected]> | 2020-12-13 15:27:53 +0900 |
|---|---|---|
| committer | KOBAYASHI Shuji <[email protected]> | 2020-12-13 15:27:53 +0900 |
| commit | 456878ba06358a77d4ab9312fdc69bf780f8fdf4 (patch) | |
| tree | 2c96fd5289d87e9b464517e7d30d406db7c1d04e /src | |
| parent | 116e128b1a103e2fb246cc9d53b82246b24dbd40 (diff) | |
| download | mruby-456878ba06358a77d4ab9312fdc69bf780f8fdf4.tar.gz mruby-456878ba06358a77d4ab9312fdc69bf780f8fdf4.zip | |
Improve source scanning for presym
The accuracy is greatly improved by using the C preprocessor to scan C
sources for presym. C preprocessor can perfectly interpret all comments and
preprocessor directives, so it can detect all symbols defined, for example
`mrbgems/mruby-socket/src/const.cstub`.
Also, as described later, this change will greatly improve the accuracy of
presym detection from Ruby sources.
## Result
The number of lines in the `presym` file for all gems is as follows:
```console
Previous: 999 (false positive = 89, undetected = 297)
New: 1207
```
## Build process
The new build process (with presym) is as follows:
1. Build `mrbc` without presym (more on building without presym later).
2. Compile Ruby sources to C struct format with the `mrbc` created in
step 1, and create` mrblib.c` and `gem_init.c`. Note that the symbols
in the created files are output as `MRB_SYM` family macros or
`mrb_intern_lit` instead of IDs (details will be described later).
3. C preprocessor processes C sources including the created files of
step 2 and outputs them as `.i` files. In these files, for example,
`MRB_IVSYM(foo)` is converted to `<@! "@" "foo" !@>` and
`mrb_define_module(mrb, "Foo")` is converted to `<@! "Foo" !@>`.
4. Scan the files created in step 3 and create `presym` and` presym.inc`
files.
The files created in step 2 should output all static symbols defined in Ruby
sources, including local variables, so we can detect all presyms by just
scanning C sources without scanning Ruby sources directly.
Further, by this process, the files to be scanned becomes the same as the
files to be compiled, so that there is no excess or deficiency.
## Related changes
The following changes have been made in relation to realizing this feature.
### Allow build without presym
It enables build without presym to achieve the "Build process: 1". This
incorporates #5202, see its issue for details.
Note that when presym is enabled, even adding a local variable to a Ruby
source may change contents of presym and require recompilation of almost
all C sources. This is inconvenient, especially during trial and error in
development, but this feature is also useful because it does not cause
this problem if presym is disabled.
### Automatically create build target for `mrbc` without presym
The `mrbc` used in the "Build process: 1" will be built by automatically
creating a build target for it. The build name is `SOURCE_BUILD_NAME/mrbc`.
### Constantize output of C struct format by `mrbc`
To realizing the "Build process: 2", as mentioned above, symbol IDs are not
output directly in C struct format output by `mrbc`. As a result, the output
becomes constant regardless of the state of presym at the time of `mrbc`
build, and it is possible to detect symbols of Ruby sources in the same way
as other C sources.
Note that `mrb_intern_lit` is used for symbols that do not become presym,
but in this state, the corresponding element in the symbol array cannot be
statically initialized, so it is initialized at run time (therefore, in this
case, the `const` qualifier is not added to the symbol array).
### Specify arbitrary `mrbc` file
To realizing the "Build process: 2", enabled to specify `mrbc` created by
another build target or pre-built` mrbc`. Use `MRuby::Build#mrbcfile =` to
specify it explicitly. You can omit the "Build process: 1" by specifying
pre-built `mrbc`, and you can always use an optimized build to compile Ruby
sources faster. I think changes that affect the output of `mrbc` are rare,
so in many cases it helps to improve efficiency.
With presym, the build will be a little slower due to more build steps, but
this feature will improve it a bit.
### Create presym files for each build target
This feature was proposed at #5194 and merged once, but was reverted in
5c205e6e due to problems especially with cross-compilation. It has been
introduced again because this change solves the problem.
The presym files will be created below.
* `build/NAME/presym`
* `build/NAME/include/mruby/presym.inc`
### Other changes
* Because presym detection accuracy is greatly improved as mentioned above,
`MRuby::Gem::Specification#cdump?` is set to true by default, and
`disable_cdump` is added instead of `enable_cdump`. Also, support for gem
specific presym files has been discontinued (https://github.com/mruby/mruby/issues/5151#issuecomment-730967232).
* Previously, `mrbc` was automatically created for the `host` build, but it
will not be created if the build target for `mrbc` mentioned above is
automatically created. At this time, `mrbc` file of the `mrbc` build is
copied to` bin/`.
* Two types of `.d` files will be created, `.o.d` and `.i.d`. oThis is
because if `.i` depends on `presym.inc`, the dependency will circulate, so
the `.d` file cannot be shared.
* Changed file created with `enable_cxx_exception` to `X-cxx.cxx` from
`X.cxx` to use the mruby standard Rake rule.
### Note
Almost all C sources will need to be recompiled if there are any changes to
`persym.inc` (if not recompiled properly, it will often result in run-time
error). If `gcc` toolchain is used, dependencies are resolved by the `.d`
file, so it become automatically recompile target, but if not (e.g. MSVC),
it is necessary to manually make it recompile target.
Also, even if `gcc` toolchain is used, it may not become recompile target if
external gems does not use the mruby standard Rake rule. In particular, if
the standard rule is overwritten, such as
https://github.com/mruby/mruby/pull/5112/files, `.d` file will not be read,
so be careful.
Diffstat (limited to 'src')
| -rw-r--r-- | src/class.c | 4 | ||||
| -rw-r--r-- | src/debug.c | 2 | ||||
| -rw-r--r-- | src/dump.c | 205 | ||||
| -rw-r--r-- | src/symbol.c | 18 |
4 files changed, 201 insertions, 28 deletions
diff --git a/src/class.c b/src/class.c index 7141fe099..e5fa71f62 100644 --- a/src/class.c +++ b/src/class.c @@ -2791,7 +2791,8 @@ static const mrb_code new_iseq[] = { OP_RETURN, 0x0 /* OP_RETURN R0 */ }; -const mrb_sym new_syms[] = { MRB_SYM(allocate), MRB_SYM(initialize) }; +MRB_PRESYM_DEFINE_VAR_AND_INITER(new_syms, 2, MRB_SYM(allocate), MRB_SYM(initialize)) + static const mrb_irep new_irep = { 3, 6, 0, MRB_IREP_STATIC, new_iseq, NULL, new_syms, NULL, NULL, NULL, @@ -2804,6 +2805,7 @@ init_class_new(mrb_state *mrb, struct RClass *cls) struct RProc *p; mrb_method_t m; + init_new_syms(mrb); p = mrb_proc_new(mrb, &new_irep); MRB_METHOD_FROM_PROC(m, p); mrb_define_method_raw(mrb, cls, MRB_SYM(new), m); diff --git a/src/debug.c b/src/debug.c index c03c91cf5..2f9320ac9 100644 --- a/src/debug.c +++ b/src/debug.c @@ -66,7 +66,7 @@ mrb_debug_get_filename(mrb_state *mrb, const mrb_irep *irep, uint32_t pc) MRB_API int32_t mrb_debug_get_line(mrb_state *mrb, const mrb_irep *irep, uint32_t pc) { - if (irep && pc < irep->ilen) { + if (irep && pc >= 0 && pc < irep->ilen) { mrb_irep_debug_info_file* f = NULL; if (!irep->debug_info) { return -1; diff --git a/src/dump.c b/src/dump.c index 85074d5a2..3464f08b9 100644 --- a/src/dump.c +++ b/src/dump.c @@ -10,7 +10,6 @@ #include <mruby/dump.h> #include <mruby/string.h> #include <mruby/irep.h> -#include <mruby/numeric.h> #include <mruby/debug.h> #ifndef MRB_NO_FLOAT @@ -24,6 +23,45 @@ static size_t get_irep_record_size_1(mrb_state *mrb, const mrb_irep *irep); # error This code cannot be built on your environment. #endif +#define OPERATOR_SYMBOL(sym_name, name) {name, sym_name, sizeof(sym_name)-1} +struct operator_symbol { + const char *name; + const char *sym_name; + uint16_t sym_name_len; +}; +static const struct operator_symbol operator_table[] = { + OPERATOR_SYMBOL("!", "not"), + OPERATOR_SYMBOL("%", "mod"), + OPERATOR_SYMBOL("&", "and"), + OPERATOR_SYMBOL("*", "mul"), + OPERATOR_SYMBOL("+", "add"), + OPERATOR_SYMBOL("-", "sub"), + OPERATOR_SYMBOL("/", "div"), + OPERATOR_SYMBOL("<", "lt"), + OPERATOR_SYMBOL(">", "gt"), + OPERATOR_SYMBOL("^", "xor"), + OPERATOR_SYMBOL("`", "tick"), + OPERATOR_SYMBOL("|", "or"), + OPERATOR_SYMBOL("~", "neg"), + OPERATOR_SYMBOL("!=", "neq"), + OPERATOR_SYMBOL("!~", "nmatch"), + OPERATOR_SYMBOL("&&", "andand"), + OPERATOR_SYMBOL("**", "pow"), + OPERATOR_SYMBOL("+@", "plus"), + OPERATOR_SYMBOL("-@", "minus"), + OPERATOR_SYMBOL("<<", "lshift"), + OPERATOR_SYMBOL("<=", "le"), + OPERATOR_SYMBOL("==", "eq"), + OPERATOR_SYMBOL("=~", "match"), + OPERATOR_SYMBOL(">=", "ge"), + OPERATOR_SYMBOL(">>", "rshift"), + OPERATOR_SYMBOL("[]", "aref"), + OPERATOR_SYMBOL("||", "oror"), + OPERATOR_SYMBOL("<=>", "cmp"), + OPERATOR_SYMBOL("===", "eqq"), + OPERATOR_SYMBOL("[]=", "aset"), +}; + static size_t get_irep_header_size(mrb_state *mrb) { @@ -138,7 +176,7 @@ get_pool_block_size(mrb_state *mrb, const mrb_irep *irep) #endif break; - default: /* packed IREP_TT_STRING */ + default: /* packed IREP_TT_STRING */ { mrb_int len = irep->pool[pool_no].tt >> 2; /* unpack length */ mrb_assert_int_fit(mrb_int, len, size_t, SIZE_MAX); @@ -888,6 +926,8 @@ mrb_dump_irep_cfunc(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *f return MRB_DUMP_WRITE_FAULT; } if (fprintf(fp, + "#include <mruby.h>\n" + "#include <mruby/proc.h>\n" "#ifdef __cplusplus\n" "extern const uint8_t %s[];\n" "#endif\n" @@ -960,26 +1000,132 @@ dump_pool(mrb_state *mrb, const mrb_pool_value *p, FILE *fp) return MRB_DUMP_OK; } -mrb_bool mrb_sym_static_p(mrb_state *mrb, mrb_sym sym); - +static mrb_bool +sym_name_word_p(const char *name, mrb_int len) +{ + if (len == 0) return FALSE; + if (name[0] != '_' && !ISALPHA(name[0])) return FALSE; + for (int i = 1; i < len; i++) { + if (name[i] != '_' && !ISALNUM(name[i])) return FALSE; + } + return TRUE; +} + +static mrb_bool +sym_name_with_equal_p(const char *name, mrb_int len) +{ + return len >= 2 && name[len-1] == '=' && sym_name_word_p(name, len-1); +} + +static mrb_bool +sym_name_with_question_mark_p(const char *name, mrb_int len) +{ + return len >= 2 && name[len-1] == '?' && sym_name_word_p(name, len-1); +} + +static mrb_bool +sym_name_with_bang_p(const char *name, mrb_int len) +{ + return len >= 2 && name[len-1] == '!' && sym_name_word_p(name, len-1); +} + +static mrb_bool +sym_name_ivar_p(const char *name, mrb_int len) +{ + return len >= 2 && name[0] == '@' && sym_name_word_p(name+1, len-1); +} + +static mrb_bool +sym_name_cvar_p(const char *name, mrb_int len) +{ + return len >= 3 && name[0] == '@' && sym_name_ivar_p(name+1, len-1); +} + +const char * +sym_operator_p(const char *name, mrb_int len) +{ + mrb_sym start, idx; + mrb_sym table_size = sizeof(operator_table)/sizeof(struct operator_symbol); + int cmp; + const struct operator_symbol *op_sym; + for (start = 0; table_size != 0; table_size/=2) { + idx = start+table_size/2; + op_sym = &operator_table[idx]; + cmp = len-op_sym->sym_name_len; + if (cmp == 0) { + cmp = memcmp(name, op_sym->sym_name, len); + if (cmp == 0) return op_sym->name; + } + if (0 < cmp) { + start = ++idx; + --table_size; + } + } + return NULL; +} + static int -dump_sym(mrb_state *mrb, mrb_sym sym, FILE *fp) +dump_sym(mrb_state *mrb, mrb_sym sym, const char *var_name, int idx, mrb_value init_syms_code, FILE *fp, mrb_bool *presymp) { - const char *name; if (sym == 0) return MRB_DUMP_INVALID_ARGUMENT; - name = mrb_sym_name(mrb, sym); - if (!name) { - fprintf(stderr, "undefined symbol (%d) - define presym\n", sym); + + mrb_int len; + const char *name = mrb_sym_name_len(mrb, sym, &len), *op_name; + if (!name) return MRB_DUMP_INVALID_ARGUMENT; + if (presymp) *presymp = TRUE; + if (sym_name_word_p(name, len)) { + fprintf(fp, "MRB_SYM(%s)", name); + } + else if (sym_name_with_equal_p(name, len)) { + fprintf(fp, "MRB_SYM_E(%.*s)", (int)(len-1), name); + } + else if (sym_name_with_question_mark_p(name, len)) { + fprintf(fp, "MRB_SYM_Q(%.*s)", (int)(len-1), name); + } + else if (sym_name_with_bang_p(name, len)) { + fprintf(fp, "MRB_SYM_B(%.*s)", (int)(len-1), name); } - if (!mrb_sym_static_p(mrb, sym)) { - fprintf(stderr, "no static symbol (%s) - define presym\n", name); + else if (sym_name_ivar_p(name, len)) { + fprintf(fp, "MRB_IVSYM(%s)", name+1); } - fprintf(fp, "%d /* %s */,", sym, name); + else if (sym_name_cvar_p(name, len)) { + fprintf(fp, "MRB_CVSYM(%s)", name+2); + } + else if ((op_name = sym_operator_p(name, len))) { + fprintf(fp, "MRB_OPSYM(%s)", op_name); + } + else { + mrb_assert(var_name); + char buf[32]; + mrb_str_cat_lit(mrb, init_syms_code, " "); + mrb_str_cat_cstr(mrb, init_syms_code, var_name); + snprintf(buf, sizeof(buf), "[%d] = ", idx); + mrb_str_cat_cstr(mrb, init_syms_code, buf); + mrb_str_cat_lit(mrb, init_syms_code, "mrb_intern_lit(mrb, \""); + mrb_str_cat_cstr(mrb, init_syms_code, mrb_sym_dump(mrb, sym)); + mrb_str_cat_lit(mrb, init_syms_code, "\");\n"); + *presymp = FALSE; + fputs("0", fp); + } + fputs(", ", fp); return MRB_DUMP_OK; } +static const char* +sym_var_name(mrb_state *mrb, const char *initname, const char *key, int n) +{ + char buf[32]; + mrb_value s = mrb_str_new_cstr(mrb, initname); + mrb_str_cat_lit(mrb, s, "_"); + mrb_str_cat_cstr(mrb, s, key); + mrb_str_cat_lit(mrb, s, "_"); + snprintf(buf, sizeof(buf), "%d", n); + mrb_str_cat_cstr(mrb, s, buf); + return RSTRING_PTR(s); +} + static int -dump_irep_struct(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *fp, const char *name, int n, int *mp) +dump_irep_struct(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *fp, const char *name, int n, mrb_value init_syms_code, int *mp) { int i, len; int max = *mp; @@ -988,7 +1134,7 @@ dump_irep_struct(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *fp, if (irep->reps) { for (i=0,len=irep->rlen; i<len; i++) { *mp += len; - if (dump_irep_struct(mrb, irep->reps[i], flags, fp, name, max+i, mp) != MRB_DUMP_OK) + if (dump_irep_struct(mrb, irep->reps[i], flags, fp, name, max+i, init_syms_code, mp) != MRB_DUMP_OK) return MRB_DUMP_INVALID_ARGUMENT; } fprintf(fp, "static const mrb_irep *%s_reps_%d[%d] = {\n", name, n, len); @@ -1009,12 +1155,19 @@ dump_irep_struct(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *fp, } /* dump syms */ if (irep->syms) { + int ai = mrb_gc_arena_save(mrb); + const char *var_name = sym_var_name(mrb, name, "syms", n); + mrb_bool all_presym = TRUE, presym; len=irep->slen; - fprintf(fp, "static const mrb_sym %s_syms_%d[%d] = {", name, n, len); + fprintf(fp, "mrb_DEFINE_SYMS_VAR(%s, %d, (", var_name, len); for (i=0; i<len; i++) { - dump_sym(mrb, irep->syms[i], fp); + dump_sym(mrb, irep->syms[i], var_name, i, init_syms_code, fp, &presym); + all_presym &= presym; } - fputs("};\n", fp); + fputs("), ", fp); + if (all_presym) fputs("const", fp); + fputs(");\n", fp); + mrb_gc_arena_restore(mrb, ai); } /* dump iseq */ len=irep->ilen+sizeof(struct mrb_irep_catch_handler)*irep->clen; @@ -1029,7 +1182,7 @@ dump_irep_struct(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *fp, len=irep->nlocals; fprintf(fp, "static const mrb_sym %s_lv_%d[%d] = {", name, n, len-1); for (i=0; i+1<len; i++) { - fprintf(fp, "%uU, ", irep->lv[i]); + dump_sym(mrb, irep->lv[i], NULL, 0, mrb_nil_value(), fp, NULL); } fputs("};\n", fp); } @@ -1070,20 +1223,28 @@ dump_irep_struct(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *fp, int mrb_dump_irep_cstruct(mrb_state *mrb, const mrb_irep *irep, uint8_t flags, FILE *fp, const char *initname) { - int max = 1; - int n; - if (fp == NULL || initname == NULL || initname[0] == '\0') { return MRB_DUMP_INVALID_ARGUMENT; } if (fprintf(fp, "#include <mruby.h>\n" "#include <mruby/proc.h>\n\n") < 0) { return MRB_DUMP_WRITE_FAULT; } - n = dump_irep_struct(mrb, irep, flags, fp, initname, 0, &max); + fputs("#define mrb_BRACED(...) {__VA_ARGS__}\n", fp); + fputs("#define mrb_DEFINE_SYMS_VAR(name, len, syms, qualifier) \\\n", fp); + fputs(" static qualifier mrb_sym name[len] = mrb_BRACED syms\n", fp); + fputs("\n", fp); + mrb_value init_syms_code = mrb_str_new_capa(mrb, 0); + int max = 1; + int n = dump_irep_struct(mrb, irep, flags, fp, initname, 0, init_syms_code, &max); if (n != MRB_DUMP_OK) return n; fprintf(fp, "#ifdef __cplusplus\nextern const struct RProc %s[];\n#endif\n", initname); fprintf(fp, "const struct RProc %s[] = {{\n", initname); fprintf(fp, "NULL,NULL,MRB_TT_PROC,7,0,{&%s_irep_0},NULL,{NULL},\n}};\n", initname); + fputs("static void\n", fp); + fprintf(fp, "%s_init_syms(mrb_state *mrb)\n", initname); + fputs("{\n", fp); + fputs(RSTRING_PTR(init_syms_code), fp); + fputs("}\n", fp); return MRB_DUMP_OK; } diff --git a/src/symbol.c b/src/symbol.c index c78f41f63..58decc1f1 100644 --- a/src/symbol.c +++ b/src/symbol.c @@ -12,15 +12,19 @@ #include <mruby/dump.h> #include <mruby/class.h> -#undef MRB_PRESYM_MAX -#define MRB_PRESYM_NAMED(lit, num, type, name) {lit, sizeof(lit)-1}, -#define MRB_PRESYM_UNNAMED(lit, num) {lit, sizeof(lit)-1}, +#ifndef MRB_NO_PRESYM + +# undef MRB_PRESYM_MAX +# define MRB_PRESYM_NAMED(lit, num, type, name) {lit, sizeof(lit)-1}, +# define MRB_PRESYM_UNNAMED(lit, num) {lit, sizeof(lit)-1}, static const struct { const char *name; uint16_t len; } presym_table[] = { -#include <../build/presym.inc> +#ifndef MRB_PRESYM_SCANNING +# include <mruby/presym.inc> +#endif }; static mrb_sym @@ -51,6 +55,8 @@ presym_sym2name(mrb_sym sym, mrb_int *lenp) return presym_table[sym-1].name; } +#endif /* MRB_NO_PRESYM */ + /* ------------------------------------------------------ */ typedef struct symbol_name { mrb_bool lit : 1; @@ -147,9 +153,11 @@ find_symbol(mrb_state *mrb, const char *name, size_t len, uint8_t *hashp) symbol_name *sname; uint8_t hash; +#ifndef MRB_NO_PRESYM /* presym */ i = presym_find(name, len); if (i > 0) return i<<SYMBOL_SHIFT; +#endif /* inline symbol */ i = sym_inline_pack(name, len); @@ -306,10 +314,12 @@ sym2name_len(mrb_state *mrb, mrb_sym sym, char *buf, mrb_int *lenp) if (SYMBOL_INLINE_P(sym)) return sym_inline_unpack(sym, buf, lenp); sym >>= SYMBOL_SHIFT; +#ifndef MRB_NO_PRESYM { const char *name = presym_sym2name(sym, lenp); if (name) return name; } +#endif sym -= MRB_PRESYM_MAX; if (sym == 0 || mrb->symidx < sym) { |
