summaryrefslogtreecommitdiffhomepage
path: root/include
diff options
context:
space:
mode:
authorKOBAYASHI Shuji <[email protected]>2020-12-13 15:27:53 +0900
committerKOBAYASHI Shuji <[email protected]>2020-12-13 15:27:53 +0900
commit456878ba06358a77d4ab9312fdc69bf780f8fdf4 (patch)
tree2c96fd5289d87e9b464517e7d30d406db7c1d04e /include
parent116e128b1a103e2fb246cc9d53b82246b24dbd40 (diff)
downloadmruby-456878ba06358a77d4ab9312fdc69bf780f8fdf4.tar.gz
mruby-456878ba06358a77d4ab9312fdc69bf780f8fdf4.zip
Improve source scanning for presym
The accuracy is greatly improved by using the C preprocessor to scan C sources for presym. C preprocessor can perfectly interpret all comments and preprocessor directives, so it can detect all symbols defined, for example `mrbgems/mruby-socket/src/const.cstub`. Also, as described later, this change will greatly improve the accuracy of presym detection from Ruby sources. ## Result The number of lines in the `presym` file for all gems is as follows: ```console Previous: 999 (false positive = 89, undetected = 297) New: 1207 ``` ## Build process The new build process (with presym) is as follows: 1. Build `mrbc` without presym (more on building without presym later). 2. Compile Ruby sources to C struct format with the `mrbc` created in step 1, and create` mrblib.c` and `gem_init.c`. Note that the symbols in the created files are output as `MRB_SYM` family macros or `mrb_intern_lit` instead of IDs (details will be described later). 3. C preprocessor processes C sources including the created files of step 2 and outputs them as `.i` files. In these files, for example, `MRB_IVSYM(foo)` is converted to `<@! "@" "foo" !@>` and `mrb_define_module(mrb, "Foo")` is converted to `<@! "Foo" !@>`. 4. Scan the files created in step 3 and create `presym` and` presym.inc` files. The files created in step 2 should output all static symbols defined in Ruby sources, including local variables, so we can detect all presyms by just scanning C sources without scanning Ruby sources directly. Further, by this process, the files to be scanned becomes the same as the files to be compiled, so that there is no excess or deficiency. ## Related changes The following changes have been made in relation to realizing this feature. ### Allow build without presym It enables build without presym to achieve the "Build process: 1". This incorporates #5202, see its issue for details. Note that when presym is enabled, even adding a local variable to a Ruby source may change contents of presym and require recompilation of almost all C sources. This is inconvenient, especially during trial and error in development, but this feature is also useful because it does not cause this problem if presym is disabled. ### Automatically create build target for `mrbc` without presym The `mrbc` used in the "Build process: 1" will be built by automatically creating a build target for it. The build name is `SOURCE_BUILD_NAME/mrbc`. ### Constantize output of C struct format by `mrbc` To realizing the "Build process: 2", as mentioned above, symbol IDs are not output directly in C struct format output by `mrbc`. As a result, the output becomes constant regardless of the state of presym at the time of `mrbc` build, and it is possible to detect symbols of Ruby sources in the same way as other C sources. Note that `mrb_intern_lit` is used for symbols that do not become presym, but in this state, the corresponding element in the symbol array cannot be statically initialized, so it is initialized at run time (therefore, in this case, the `const` qualifier is not added to the symbol array). ### Specify arbitrary `mrbc` file To realizing the "Build process: 2", enabled to specify `mrbc` created by another build target or pre-built` mrbc`. Use `MRuby::Build#mrbcfile =` to specify it explicitly. You can omit the "Build process: 1" by specifying pre-built `mrbc`, and you can always use an optimized build to compile Ruby sources faster. I think changes that affect the output of `mrbc` are rare, so in many cases it helps to improve efficiency. With presym, the build will be a little slower due to more build steps, but this feature will improve it a bit. ### Create presym files for each build target This feature was proposed at #5194 and merged once, but was reverted in 5c205e6e due to problems especially with cross-compilation. It has been introduced again because this change solves the problem. The presym files will be created below. * `build/NAME/presym` * `build/NAME/include/mruby/presym.inc` ### Other changes * Because presym detection accuracy is greatly improved as mentioned above, `MRuby::Gem::Specification#cdump?` is set to true by default, and `disable_cdump` is added instead of `enable_cdump`. Also, support for gem specific presym files has been discontinued (https://github.com/mruby/mruby/issues/5151#issuecomment-730967232). * Previously, `mrbc` was automatically created for the `host` build, but it will not be created if the build target for `mrbc` mentioned above is automatically created. At this time, `mrbc` file of the `mrbc` build is copied to` bin/`. * Two types of `.d` files will be created, `.o.d` and `.i.d`. oThis is because if `.i` depends on `presym.inc`, the dependency will circulate, so the `.d` file cannot be shared. * Changed file created with `enable_cxx_exception` to `X-cxx.cxx` from `X.cxx` to use the mruby standard Rake rule. ### Note Almost all C sources will need to be recompiled if there are any changes to `persym.inc` (if not recompiled properly, it will often result in run-time error). If `gcc` toolchain is used, dependencies are resolved by the `.d` file, so it become automatically recompile target, but if not (e.g. MSVC), it is necessary to manually make it recompile target. Also, even if `gcc` toolchain is used, it may not become recompile target if external gems does not use the mruby standard Rake rule. In particular, if the standard rule is overwritten, such as https://github.com/mruby/mruby/pull/5112/files, `.d` file will not be read, so be careful.
Diffstat (limited to 'include')
-rw-r--r--include/mruby.h3
-rw-r--r--include/mruby/presym.h56
-rw-r--r--include/mruby/presym/disable.h68
-rw-r--r--include/mruby/presym/enable.h45
-rw-r--r--include/mruby/presym/scanning.h69
5 files changed, 209 insertions, 32 deletions
diff --git a/include/mruby.h b/include/mruby.h
index a5116f9ee..25e0bf287 100644
--- a/include/mruby.h
+++ b/include/mruby.h
@@ -93,7 +93,6 @@
#include <mruby/common.h>
#include <mruby/value.h>
#include <mruby/gc.h>
-#include <mruby/presym.h>
#include <mruby/version.h>
#ifndef MRB_NO_FLOAT
@@ -1410,6 +1409,8 @@ MRB_API void mrb_show_copyright(mrb_state *mrb);
MRB_API mrb_value mrb_format(mrb_state *mrb, const char *format, ...);
+#include <mruby/presym.h>
+
#if 0
/* memcpy and memset does not work with gdb reverse-next on my box */
/* use naive memcpy and memset instead */
diff --git a/include/mruby/presym.h b/include/mruby/presym.h
index fd08a24da..fd326c758 100644
--- a/include/mruby/presym.h
+++ b/include/mruby/presym.h
@@ -7,42 +7,36 @@
#ifndef MRUBY_PRESYM_H
#define MRUBY_PRESYM_H
-#undef MRB_PRESYM_MAX
-#ifdef MRB_USE_ALL_SYMBOLS
-#define MRB_PRESYM_NAMED(lit, num, type, name) MRB_##type##__##name = (num),
+#if defined(MRB_PRESYM_SCANNING)
+# include <mruby/presym/scanning.h>
+#elif defined(MRB_NO_PRESYM)
+# include <mruby/presym/disable.h>
#else
-#define MRB_PRESYM_NAMED(lit, num, type, name) MRB_##type##__##name = (num<<1),
+# include <mruby/presym/enable.h>
#endif
-#define MRB_PRESYM_UNNAMED(lit, num)
-
-enum mruby_presym {
-#include <../build/presym.inc>
-};
-
-#undef MRB_PRESYM_NAMED
-#undef MRB_PRESYM_UNNAMED
/*
- * For `MRB_OPSYM`, specify the names corresponding to operators (refer to
- * `op_table` in `Rakefile` for the names that can be specified for it).
- * Other than that, describe only word characters excluding leading and
- * ending punctuations.
+ * Where `mrb_intern_lit` is allowed for symbol interning, it is directly
+ * replaced by the symbol ID if presym is enabled by using the following
+ * macros.
+ *
+ * MRB_OPSYM(xor) //=> ^ (Operator)
+ * MRB_CVSYM(xor) //=> @@xor (Class Variable)
+ * MRB_IVSYM(xor) //=> @xor (Instance Variable)
+ * MRB_SYM_B(xor) //=> xor! (Method with Bang)
+ * MRB_SYM_Q(xor) //=> xor? (Method with Question mark)
+ * MRB_SYM_E(xor) //=> xor= (Method with Equal)
+ * MRB_SYM(xor) //=> xor (Word characters)
+ *
+ * For `MRB_OPSYM`, specify the names corresponding to operators (see
+ * `MRuby::Presym::OPERATORS` in `lib/mruby/presym.rb for the names that can
+ * be specified for it). Other than that, describe only word characters
+ * excluding leading and ending punctuations.
*
- * Example:
- * MRB_OPSYM(and) //=> &
- * MRB_CVSYM(foo) //=> @@foo
- * MRB_IVSYM(foo) //=> @foo
- * MRB_SYM_B(foo) //=> foo!
- * MRB_SYM_Q(foo) //=> foo?
- * MRB_SYM_E(foo) //=> foo=
- * MRB_SYM(foo) //=> foo
+ * These macros are expanded to `mrb_intern_lit` if presym is disabled,
+ * therefore the mruby state variable is required. The above macros can be
+ * used when the variable name is `mrb`. If you want to use other variable
+ * names, you need to use macros with `_2` suffix, such as `MRB_SYM_2`.
*/
-#define MRB_OPSYM(name) MRB_OPSYM__##name /* Operator */
-#define MRB_CVSYM(name) MRB_CVSYM__##name /* Class Variable */
-#define MRB_IVSYM(name) MRB_IVSYM__##name /* Instance Variable */
-#define MRB_SYM_B(name) MRB_SYM_B__##name /* Method with Bang */
-#define MRB_SYM_Q(name) MRB_SYM_Q__##name /* Method with Question mark */
-#define MRB_SYM_E(name) MRB_SYM_E__##name /* Method with Equal */
-#define MRB_SYM(name) MRB_SYM__##name /* Word characters */
#endif /* MRUBY_PRESYM_H */
diff --git a/include/mruby/presym/disable.h b/include/mruby/presym/disable.h
new file mode 100644
index 000000000..477405225
--- /dev/null
+++ b/include/mruby/presym/disable.h
@@ -0,0 +1,68 @@
+/**
+** @file mruby/presym/scanning.h - Disable Preallocated Symbols
+**
+** See Copyright Notice in mruby.h
+*/
+
+#ifndef MRUBY_PRESYM_DISABLE_H
+#define MRUBY_PRESYM_DISABLE_H
+
+#include <string.h>
+
+#define MRB_PRESYM_MAX 0
+
+#define MRB_OPSYM(name) MRB_OPSYM__##name(mrb)
+#define MRB_CVSYM(name) mrb_intern_lit(mrb, "@@" #name)
+#define MRB_IVSYM(name) mrb_intern_lit(mrb, "@" #name)
+#define MRB_SYM_B(name) mrb_intern_lit(mrb, #name "!")
+#define MRB_SYM_Q(name) mrb_intern_lit(mrb, #name "?")
+#define MRB_SYM_E(name) mrb_intern_lit(mrb, #name "=")
+#define MRB_SYM(name) mrb_intern_lit(mrb, #name)
+
+#define MRB_OPSYM_2(mrb, name) MRB_OPSYM__##name(mrb)
+#define MRB_CVSYM_2(mrb, name) mrb_intern_lit(mrb, "@@" #name)
+#define MRB_IVSYM_2(mrb, name) mrb_intern_lit(mrb, "@" #name)
+#define MRB_SYM_B_2(mrb, name) mrb_intern_lit(mrb, #name "!")
+#define MRB_SYM_Q_2(mrb, name) mrb_intern_lit(mrb, #name "?")
+#define MRB_SYM_E_2(mrb, name) mrb_intern_lit(mrb, #name "=")
+#define MRB_SYM_2(mrb, name) mrb_intern_lit(mrb, #name)
+
+#define MRB_OPSYM__not(mrb) mrb_intern_lit(mrb, "!")
+#define MRB_OPSYM__mod(mrb) mrb_intern_lit(mrb, "%")
+#define MRB_OPSYM__and(mrb) mrb_intern_lit(mrb, "&")
+#define MRB_OPSYM__mul(mrb) mrb_intern_lit(mrb, "*")
+#define MRB_OPSYM__add(mrb) mrb_intern_lit(mrb, "+")
+#define MRB_OPSYM__sub(mrb) mrb_intern_lit(mrb, "-")
+#define MRB_OPSYM__div(mrb) mrb_intern_lit(mrb, "/")
+#define MRB_OPSYM__lt(mrb) mrb_intern_lit(mrb, "<")
+#define MRB_OPSYM__gt(mrb) mrb_intern_lit(mrb, ">")
+#define MRB_OPSYM__xor(mrb) mrb_intern_lit(mrb, "^")
+#define MRB_OPSYM__tick(mrb) mrb_intern_lit(mrb, "`")
+#define MRB_OPSYM__or(mrb) mrb_intern_lit(mrb, "|")
+#define MRB_OPSYM__neg(mrb) mrb_intern_lit(mrb, "~")
+#define MRB_OPSYM__neq(mrb) mrb_intern_lit(mrb, "!=")
+#define MRB_OPSYM__nmatch(mrb) mrb_intern_lit(mrb, "!~")
+#define MRB_OPSYM__andand(mrb) mrb_intern_lit(mrb, "&&")
+#define MRB_OPSYM__pow(mrb) mrb_intern_lit(mrb, "**")
+#define MRB_OPSYM__plus(mrb) mrb_intern_lit(mrb, "+@")
+#define MRB_OPSYM__minus(mrb) mrb_intern_lit(mrb, "-@")
+#define MRB_OPSYM__lshift(mrb) mrb_intern_lit(mrb, "<<")
+#define MRB_OPSYM__le(mrb) mrb_intern_lit(mrb, "<=")
+#define MRB_OPSYM__eq(mrb) mrb_intern_lit(mrb, "==")
+#define MRB_OPSYM__match(mrb) mrb_intern_lit(mrb, "=~")
+#define MRB_OPSYM__ge(mrb) mrb_intern_lit(mrb, ">=")
+#define MRB_OPSYM__rshift(mrb) mrb_intern_lit(mrb, ">>")
+#define MRB_OPSYM__aref(mrb) mrb_intern_lit(mrb, "[]")
+#define MRB_OPSYM__oror(mrb) mrb_intern_lit(mrb, "||")
+#define MRB_OPSYM__cmp(mrb) mrb_intern_lit(mrb, "<=>")
+#define MRB_OPSYM__eqq(mrb) mrb_intern_lit(mrb, "===")
+#define MRB_OPSYM__aset(mrb) mrb_intern_lit(mrb, "[]=")
+
+#define MRB_PRESYM_DEFINE_VAR_AND_INITER(name, size, ...) \
+ static mrb_sym name[size]; \
+ static void init_##name(mrb_state *mrb) { \
+ mrb_sym name__[] = {__VA_ARGS__}; \
+ memcpy(name, name__, sizeof(name)); \
+ }
+
+#endif /* MRUBY_PRESYM_DISABLE_H */
diff --git a/include/mruby/presym/enable.h b/include/mruby/presym/enable.h
new file mode 100644
index 000000000..0aec7274d
--- /dev/null
+++ b/include/mruby/presym/enable.h
@@ -0,0 +1,45 @@
+/**
+** @file mruby/presym/scanning.h - Enable Preallocated Symbols
+**
+** See Copyright Notice in mruby.h
+*/
+
+#ifndef MRUBY_PRESYM_ENABLE_H
+#define MRUBY_PRESYM_ENABLE_H
+
+#undef MRB_PRESYM_MAX
+#ifdef MRB_USE_ALL_SYMBOLS
+# define MRB_PRESYM_NAMED(lit, num, type, name) MRB_##type##__##name = (num),
+#else
+# define MRB_PRESYM_NAMED(lit, num, type, name) MRB_##type##__##name = (num<<1),
+#endif
+#define MRB_PRESYM_UNNAMED(lit, num)
+
+enum mruby_presym {
+# include <mruby/presym.inc>
+};
+
+#undef MRB_PRESYM_NAMED
+#undef MRB_PRESYM_UNNAMED
+
+#define MRB_OPSYM(name) MRB_OPSYM__##name
+#define MRB_CVSYM(name) MRB_CVSYM__##name
+#define MRB_IVSYM(name) MRB_IVSYM__##name
+#define MRB_SYM_B(name) MRB_SYM_B__##name
+#define MRB_SYM_Q(name) MRB_SYM_Q__##name
+#define MRB_SYM_E(name) MRB_SYM_E__##name
+#define MRB_SYM(name) MRB_SYM__##name
+
+#define MRB_OPSYM_2(mrb, name) MRB_OPSYM__##name
+#define MRB_CVSYM_2(mrb, name) MRB_CVSYM__##name
+#define MRB_IVSYM_2(mrb, name) MRB_IVSYM__##name
+#define MRB_SYM_B_2(mrb, name) MRB_SYM_B__##name
+#define MRB_SYM_Q_2(mrb, name) MRB_SYM_Q__##name
+#define MRB_SYM_E_2(mrb, name) MRB_SYM_E__##name
+#define MRB_SYM_2(mrb, name) MRB_SYM__##name
+
+#define MRB_PRESYM_DEFINE_VAR_AND_INITER(name, size, ...) \
+ static const mrb_sym name[] = {__VA_ARGS__}; \
+ static void init_##name(mrb_state *mrb) {}
+
+#endif /* MRUBY_PRESYM_ENABLE_H */
diff --git a/include/mruby/presym/scanning.h b/include/mruby/presym/scanning.h
new file mode 100644
index 000000000..11a3ba312
--- /dev/null
+++ b/include/mruby/presym/scanning.h
@@ -0,0 +1,69 @@
+/**
+** @file mruby/presym/scanning.h - Scanning Preallocated Symbols
+**
+** See Copyright Notice in mruby.h
+*/
+
+#ifndef MRUBY_PRESYM_SCANNING_H
+#define MRUBY_PRESYM_SCANNING_H
+
+#define MRB_PRESYM_SCANNING_TAGGED(arg) <@! arg !@>
+
+#undef mrb_intern_lit
+#define mrb_intern_lit(mrb, name) MRB_PRESYM_SCANNING_TAGGED(name)
+#define mrb_define_method(mrb, c, name, f, a) MRB_PRESYM_SCANNING_TAGGED(name)
+#define mrb_define_class_method(mrb, c, name, f, a) MRB_PRESYM_SCANNING_TAGGED(name)
+#define mrb_define_class(mrb, name, s) MRB_PRESYM_SCANNING_TAGGED(name)
+#define mrb_define_module(mrb, name) MRB_PRESYM_SCANNING_TAGGED(name)
+#define mrb_define_module_function(mrb, c, name, f, s) MRB_PRESYM_SCANNING_TAGGED(name)
+#define mrb_define_const(mrb, c, name, v) MRB_PRESYM_SCANNING_TAGGED(name)
+#define mrb_define_global_const(mrb, name, v) MRB_PRESYM_SCANNING_TAGGED(name)
+
+#define MRB_OPSYM(name) MRB_OPSYM__##name(mrb)
+#define MRB_CVSYM(name) MRB_PRESYM_SCANNING_TAGGED("@@" #name)
+#define MRB_IVSYM(name) MRB_PRESYM_SCANNING_TAGGED("@" #name)
+#define MRB_SYM_B(name) MRB_PRESYM_SCANNING_TAGGED(#name "!")
+#define MRB_SYM_Q(name) MRB_PRESYM_SCANNING_TAGGED(#name "?")
+#define MRB_SYM_E(name) MRB_PRESYM_SCANNING_TAGGED(#name "=")
+#define MRB_SYM(name) MRB_PRESYM_SCANNING_TAGGED(#name)
+
+#define MRB_OPSYM_2(mrb, name) MRB_OPSYM__##name(mrb)
+#define MRB_CVSYM_2(mrb, name) MRB_PRESYM_SCANNING_TAGGED("@@" #name)
+#define MRB_IVSYM_2(mrb, name) MRB_PRESYM_SCANNING_TAGGED("@" #name)
+#define MRB_SYM_B_2(mrb, name) MRB_PRESYM_SCANNING_TAGGED(#name "!")
+#define MRB_SYM_Q_2(mrb, name) MRB_PRESYM_SCANNING_TAGGED(#name "?")
+#define MRB_SYM_E_2(mrb, name) MRB_PRESYM_SCANNING_TAGGED(#name "=")
+#define MRB_SYM_2(mrb, name) MRB_PRESYM_SCANNING_TAGGED(#name)
+
+#define MRB_OPSYM__not(mrb) MRB_PRESYM_SCANNING_TAGGED("!")
+#define MRB_OPSYM__mod(mrb) MRB_PRESYM_SCANNING_TAGGED("%")
+#define MRB_OPSYM__and(mrb) MRB_PRESYM_SCANNING_TAGGED("&")
+#define MRB_OPSYM__mul(mrb) MRB_PRESYM_SCANNING_TAGGED("*")
+#define MRB_OPSYM__add(mrb) MRB_PRESYM_SCANNING_TAGGED("+")
+#define MRB_OPSYM__sub(mrb) MRB_PRESYM_SCANNING_TAGGED("-")
+#define MRB_OPSYM__div(mrb) MRB_PRESYM_SCANNING_TAGGED("/")
+#define MRB_OPSYM__lt(mrb) MRB_PRESYM_SCANNING_TAGGED("<")
+#define MRB_OPSYM__gt(mrb) MRB_PRESYM_SCANNING_TAGGED(">")
+#define MRB_OPSYM__xor(mrb) MRB_PRESYM_SCANNING_TAGGED("^")
+#define MRB_OPSYM__tick(mrb) MRB_PRESYM_SCANNING_TAGGED("`")
+#define MRB_OPSYM__or(mrb) MRB_PRESYM_SCANNING_TAGGED("|")
+#define MRB_OPSYM__neg(mrb) MRB_PRESYM_SCANNING_TAGGED("~")
+#define MRB_OPSYM__neq(mrb) MRB_PRESYM_SCANNING_TAGGED("!=")
+#define MRB_OPSYM__nmatch(mrb) MRB_PRESYM_SCANNING_TAGGED("!~")
+#define MRB_OPSYM__andand(mrb) MRB_PRESYM_SCANNING_TAGGED("&&")
+#define MRB_OPSYM__pow(mrb) MRB_PRESYM_SCANNING_TAGGED("**")
+#define MRB_OPSYM__plus(mrb) MRB_PRESYM_SCANNING_TAGGED("+@")
+#define MRB_OPSYM__minus(mrb) MRB_PRESYM_SCANNING_TAGGED("-@")
+#define MRB_OPSYM__lshift(mrb) MRB_PRESYM_SCANNING_TAGGED("<<")
+#define MRB_OPSYM__le(mrb) MRB_PRESYM_SCANNING_TAGGED("<=")
+#define MRB_OPSYM__eq(mrb) MRB_PRESYM_SCANNING_TAGGED("==")
+#define MRB_OPSYM__match(mrb) MRB_PRESYM_SCANNING_TAGGED("=~")
+#define MRB_OPSYM__ge(mrb) MRB_PRESYM_SCANNING_TAGGED(">=")
+#define MRB_OPSYM__rshift(mrb) MRB_PRESYM_SCANNING_TAGGED(">>")
+#define MRB_OPSYM__aref(mrb) MRB_PRESYM_SCANNING_TAGGED("[]")
+#define MRB_OPSYM__oror(mrb) MRB_PRESYM_SCANNING_TAGGED("||")
+#define MRB_OPSYM__cmp(mrb) MRB_PRESYM_SCANNING_TAGGED("<=>")
+#define MRB_OPSYM__eqq(mrb) MRB_PRESYM_SCANNING_TAGGED("===")
+#define MRB_OPSYM__aset(mrb) MRB_PRESYM_SCANNING_TAGGED("[]=")
+
+#endif /* MRUBY_PRESYM_SCANNING_H */