Discussion:
Consistent bugs with dmd -O -inline in a large project
Lumi Pakkanen via Digitalmars-d
2014-10-12 15:44:13 UTC
Permalink
I'm creating a somewhat large hobby project with D. I'm enjoying
the ride so far. Unit tests and contract programming have saved
me from long bug hunts, but today I ran into a bug that seems to
be caused by the -O and -inline flags with dmd.

Without the flags the program runs correctly, but -O produces
wrong results consistently and -inline seems to cause memory
corruption.

Now my problem here is that the program has over 5000 lines of
code with interdependencies running everywhere so I'm not sure if
it's possible to come up with a neat small program that
demonstrates the problem for a bug report.

What should I do? Am I stuck with not using -O and -inline for
now, hoping that things will improve in the future?
ketmar via Digitalmars-d
2014-10-12 15:55:35 UTC
Permalink
On Sun, 12 Oct 2014 15:44:13 +0000
Post by Lumi Pakkanen via Digitalmars-d
What should I do?
haven't you tried DustMite? https://github.com/CyberShadow/DustMite/wiki
it may help to get reduced test case.
Post by Lumi Pakkanen via Digitalmars-d
Am I stuck with not using -O and -inline for
now, hoping that things will improve in the future?
yep. people trying hard to squash the bugs from optimiser, but
optimiser is a complex beast, so it's not easy. dustmite'd test case
(if you'll be able to produce it) can help alot though.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20141012/61148d91/attachment.sig>
Mike Parker via Digitalmars-d
2014-10-12 15:56:15 UTC
Permalink
Now my problem here is that the program has over 5000 lines of code with
interdependencies running everywhere so I'm not sure if it's possible to
come up with a neat small program that demonstrates the problem for a
bug report.
What should I do? Am I stuck with not using -O and -inline for now,
hoping that things will improve in the future?
You might try to reduce it with DustMite:

https://github.com/CyberShadow/DustMite/wiki

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com
ketmar via Digitalmars-d
2014-10-12 16:01:26 UTC
Permalink
On Sun, 12 Oct 2014 15:44:13 +0000
Post by Lumi Pakkanen via Digitalmars-d
What should I do? Am I stuck with not using -O and -inline for
now, hoping that things will improve in the future?
p.s. you can try ldc/gdc too. they using different codegens, and their
optimisers are better that dmd one. yet they aren't "bleeding edge" for
parser/phobos, so some recently added/fixed things may not work with
them. still worth a try, i think.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20141012/bea540d4/attachment.sig>
Chris via Digitalmars-d
2014-10-13 09:17:59 UTC
Permalink
Post by Lumi Pakkanen via Digitalmars-d
I'm creating a somewhat large hobby project with D. I'm
enjoying the ride so far. Unit tests and contract programming
have saved me from long bug hunts, but today I ran into a bug
that seems to be caused by the -O and -inline flags with dmd.
Without the flags the program runs correctly, but -O produces
wrong results consistently and -inline seems to cause memory
corruption.
Now my problem here is that the program has over 5000 lines of
code with interdependencies running everywhere so I'm not sure
if it's possible to come up with a neat small program that
demonstrates the problem for a bug report.
What should I do? Am I stuck with not using -O and -inline for
now, hoping that things will improve in the future?
I have the same problem. If I don't use -O it works fine (-inline
is ok). If I use it, I get an error when executing the program.

Error executing command run: Program exited with code -11

or

Segmentation fault (core dumped)

I posted here a few months ago, but to no avail. I still haven't
found the answer to the problem. As in your case, my project has
become too big to just "try to trace the bug".
Gary Willoughby via Digitalmars-d
2014-10-13 13:11:12 UTC
Permalink
Post by Chris via Digitalmars-d
I have the same problem. If I don't use -O it works fine
(-inline is ok). If I use it, I get an error when executing the
program.
I had the same about a year ago, thought i was going crazy and
refactored the program and it went away. Never did find out what
was causing it.
OlaOst via Digitalmars-d
2014-10-13 14:28:49 UTC
Permalink
Post by Gary Willoughby via Digitalmars-d
Post by Chris via Digitalmars-d
I have the same problem. If I don't use -O it works fine
(-inline is ok). If I use it, I get an error when executing
the program.
I had the same about a year ago, thought i was going crazy and
refactored the program and it went away. Never did find out
what was causing it.
Here too. I just managed to pare it down to 2 files:

-- main.d --
import std.algorithm;
import failsinline;

void main()
{
auto fail = new FailsInline();
}
-- main.d --

-- failsinline.d --
import std.algorithm;
import std.array;

void failsinline()
{
auto transform = (int i) => i;
[0].map!transform.array;
}
-- failsinline.d --

'rdmd main.d' works fine.
'rdmd -inline main.d' gives object.Error@(0): Access Violation.

Removing the std.algorithm import from main.d makes it work fine.

Same issue in dmd 2.066, 2.066-rc2 and 2.067-b1.
ketmar via Digitalmars-d
2014-10-13 14:53:13 UTC
Permalink
On Mon, 13 Oct 2014 14:28:49 +0000
care to fill bugreport?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20141013/de5fcdd1/attachment.sig>
OlaOst via Digitalmars-d
2014-10-13 15:09:42 UTC
Permalink
On Monday, 13 October 2014 at 14:53:24 UTC, ketmar via
Post by ketmar via Digitalmars-d
On Mon, 13 Oct 2014 14:28:49 +0000
care to fill bugreport?
Added to https://issues.dlang.org/show_bug.cgi?id=13244
Chris via Digitalmars-d
2014-10-16 08:45:17 UTC
Permalink
Post by Gary Willoughby via Digitalmars-d
Post by Chris via Digitalmars-d
I have the same problem. If I don't use -O it works fine
(-inline is ok). If I use it, I get an error when executing
the program.
I had the same about a year ago, thought i was going crazy and
refactored the program and it went away. Never did find out
what was causing it.
I think there is no easy way of finding out where the
optimization goes wrong. But should this happen at all, i.e. does
it point to a flaw in my program or is it a compiler bug? I like
to think it's the latter, after all the program works perfectly
without -O. On the other hand, it's scary because I have no clue
where to look for the offender.
Peter Alexander via Digitalmars-d
2014-10-16 10:25:10 UTC
Permalink
Post by Chris via Digitalmars-d
I think there is no easy way of finding out where the
optimization goes wrong. But should this happen at all, i.e.
does it point to a flaw in my program or is it a compiler bug?
I like to think it's the latter, after all the program works
perfectly without -O. On the other hand, it's scary because I
have no clue where to look for the offender.
It could be either.

Sometimes, if you program relies on undefined behaviour, enabling
optimizations might be what uncovers the bug, and manifest as a
crash.

On the other hand, it could be just a compiler bug. It has
happened several times to me with DMD, so it's not entirely
unlikely. These things happen.

Run Dustmite, reduce, and if you still think you're program is
right, file a bug against DMD.
Sag Academy via Digitalmars-d
2014-10-16 11:04:09 UTC
Permalink
On Thursday, 16 October 2014 at 10:25:12 UTC, Peter Alexander
Post by Peter Alexander via Digitalmars-d
Post by Chris via Digitalmars-d
I think there is no easy way of finding out where the
optimization goes wrong. But should this happen at all, i.e.
does it point to a flaw in my program or is it a compiler bug?
I like to think it's the latter, after all the program works
perfectly without -O. On the other hand, it's scary because I
have no clue where to look for the offender.
It could be either.
Sometimes, if you program relies on undefined behaviour,
enabling optimizations might be what uncovers the bug, and
manifest as a crash.
On the other hand, it could be just a compiler bug. It has
happened several times to me with DMD, so it's not entirely
unlikely. These things happen.
Run Dustmite, reduce, and if you still think you're program is
right, file a bug against DMD.
may be it is right
Chris via Digitalmars-d
2014-10-16 11:54:08 UTC
Permalink
On Thursday, 16 October 2014 at 10:25:12 UTC, Peter Alexander
Post by Peter Alexander via Digitalmars-d
Post by Chris via Digitalmars-d
I think there is no easy way of finding out where the
optimization goes wrong. But should this happen at all, i.e.
does it point to a flaw in my program or is it a compiler bug?
I like to think it's the latter, after all the program works
perfectly without -O. On the other hand, it's scary because I
have no clue where to look for the offender.
It could be either.
Sometimes, if you program relies on undefined behaviour,
enabling optimizations might be what uncovers the bug, and
manifest as a crash.
On the other hand, it could be just a compiler bug. It has
happened several times to me with DMD, so it's not entirely
unlikely. These things happen.
Run Dustmite, reduce, and if you still think you're program is
right, file a bug against DMD.
Ok, I've found the flaw in my program. It's code that was left
over after refactoring some modules. It looks like this
(simplified):

I import a module and access an enum in that module. However, I
never use the accessed element

module politician.answer;

enum { Statement = "Blah" }

mixin template PressConference {
int STATEMENT_LEN;
// ...
}

-------------

module press.article;

mixin PressConference;
this() {
STATEMENT_LEN = Statement.length; // Not good! Left over code
from refactoring.
}

Even worse, I never use STATEMENT_LEN in this class. The whole
logic is non-sense and is due to my not cleaning up the
constructor.

So the optimizer optimized this away, seeing that it is never
used, but it is still accessed in the class constructor.

My question now is, shouldn't the optimizer have noticed that it
is still being accessed? Or what did the optimizer actually do.

This helped me to find "dead code" at least.
Chris via Digitalmars-d
2014-10-16 13:35:53 UTC
Permalink
Post by Chris via Digitalmars-d
Ok, I've found the flaw in my program. It's code that was left
over after refactoring some modules. It looks like this
I import a module and access an enum in that module. However, I
never use the accessed element
module politician.answer;
enum { Statement = "Blah" }
mixin template PressConference {
int STATEMENT_LEN;
// ...
}
-------------
module press.article;
mixin PressConference;
this() {
STATEMENT_LEN = Statement.length; // Not good! Left over
code from refactoring.
}
Even worse, I never use STATEMENT_LEN in this class. The whole
logic is non-sense and is due to my not cleaning up the
constructor.
So the optimizer optimized this away, seeing that it is never
used, but it is still accessed in the class constructor.
My question now is, shouldn't the optimizer have noticed that
it is still being accessed? Or what did the optimizer actually
do.
This helped me to find "dead code" at least.
Update on the above. I actually do use the variable STATEMENT_LEN
later in the mixed in code. This escapes the optimizer.

mixin template PressConference {
int STATEMENT_LEN;
// ...
void someFunction() {
// uses STATEMENT_LEN
}
}

Hm.
Chris via Digitalmars-d
2014-10-16 13:54:07 UTC
Permalink
Post by Chris via Digitalmars-d
Post by Chris via Digitalmars-d
Ok, I've found the flaw in my program. It's code that was left
over after refactoring some modules. It looks like this
I import a module and access an enum in that module. However,
I never use the accessed element
module politician.answer;
enum { Statement = "Blah" }
mixin template PressConference {
int STATEMENT_LEN;
// ...
}
-------------
module press.article;
mixin PressConference;
this() {
STATEMENT_LEN = Statement.length; // Not good! Left over
code from refactoring.
}
Even worse, I never use STATEMENT_LEN in this class. The whole
logic is non-sense and is due to my not cleaning up the
constructor.
So the optimizer optimized this away, seeing that it is never
used, but it is still accessed in the class constructor.
My question now is, shouldn't the optimizer have noticed that
it is still being accessed? Or what did the optimizer actually
do.
This helped me to find "dead code" at least.
Update on the above. I actually do use the variable
STATEMENT_LEN later in the mixed in code. This escapes the
optimizer.
mixin template PressConference {
int STATEMENT_LEN;
// ...
void someFunction() {
// uses STATEMENT_LEN
}
}
Hm.
If I compile with

-release -noboundscheck -inline

(but without -O), I get this error:

Internal error: backend\cod4.c 358

If I compile with

-O -release -noboundscheck -inline

It compiles, but crashes.

The only thing that works is:

-release -noboundscheck

What is the optimizer optimizing away?
Chris via Digitalmars-d
2014-10-16 14:46:24 UTC
Permalink
Ok. It was the compiler. To reproduce the error, I wrote a small
example:

import std.stdio;
import std.algorithm : sort;

enum {
Answers = [
"Are you corrupt?" : "No!",
"Will you resign?" : "No!"
]
}

void main() {
auto journalist = new myClass;
journalist.printAnswers();
}

class myClass {
mixin News;
this() {
Questions = Answers.keys();
// Only here to do what my program does
sort!((a, b) => a.length > b.length)(Questions);
}

protected void printAnswers() {
foreach (q; Questions) {
writefln("Q: %s\nA: %s", q, getAnswer(q));
}
}
}

mixin template News() {
string[] Questions;

auto getAnswer(string q) {
return Answers[q];
}
}

[version 2.065]
$ dmd optimizer.d -O -release -inline -noboundscheck
$ ./optimizer
$ Segmentation fault (core dumped)

[versino 2.066]
$ dmd optimizer.d -O -release -inline -noboundscheck
$ ./optimizer
Q: Are you corrupt?
A: No!
Q: Will you resign?
A: No!
$

Sorry, I couldn't try it with 2.066 first, because I still have
to update my code base.
Daniel Murphy via Digitalmars-d
2014-10-13 14:28:59 UTC
Permalink
"Lumi Pakkanen" wrote in message
I'm creating a somewhat large hobby project with D. I'm enjoying the ride
so far. Unit tests and contract programming have saved me from long bug
hunts, but today I ran into a bug that seems to be caused by the -O
and -inline flags with dmd.
Without the flags the program runs correctly, but -O produces wrong
results consistently and -inline seems to cause memory corruption.
Now my problem here is that the program has over 5000 lines of code with
interdependencies running everywhere so I'm not sure if it's possible to
come up with a neat small program that demonstrates the problem for a bug
report.
What should I do? Am I stuck with not using -O and -inline for now, hoping
that things will improve in the future?
There are a few techniques to try and track this sort of thing down.

0. Build dmd from the lastest master and see if it works (if you haven't
done this already). The bug may have been fixed.

1. As others have suggested, run dustime on your code. It's magical.

2. Do a binary search, compiling with some modules not using -inline (or
instead with -O). Then, do the same with functions within the module,
moving them to another module (or using d/di split) to prevent inlining.
When the caller function is found, disable inlining of the potential
problematic callees by adding asm { nop; } or similar to their body.

3. Spend some quality time with a debugger and a disassembler, tracing back
from the fault to find out where it all went wrong. This becomes more
difficult, but still possible if the call stack is corrupted. This could be
the fastest or the slowest method depending on your luck. Usual debugging
tools like valgrind may be a huge help.

4. Switching word size (-m32/-m64) may make the problem go away, if that's
an option for your project.
Trass3r via Digitalmars-d
2014-10-16 21:23:08 UTC
Permalink
Post by Lumi Pakkanen via Digitalmars-d
What should I do? Am I stuck with not using -O and -inline for
now, hoping that things will improve in the future?
Step 1) DustMite the heck out of it and create a bug report.
Step 2) Start using ldc/gdc for release builds if possible.
Chris via Digitalmars-d
2014-10-16 21:53:40 UTC
Permalink
Post by Trass3r via Digitalmars-d
Post by Lumi Pakkanen via Digitalmars-d
What should I do? Am I stuck with not using -O and -inline for
now, hoping that things will improve in the future?
Step 1) DustMite the heck out of it and create a bug report.
Step 2) Start using ldc/gdc for release build
I had planned to use GDC/LDC too, but GDC is 2.064, so no option
for me. LDC is 2.065, that would still be ok for my program
(although I've just updated my code to 2.066). I always use DMD
for development (short compilation times), and even for release
builds I use DMD, if I build the program in order to give it out
for first tests. Andrei mentioned that DMD built programs are
only around 10% slower than GDC/LDC builds, in many cases I can
put up with that. So DMD should not have any serious issues with
release builds, imo, even if alternatives exist.

Next time I'll use DustMite too, once I've learned how to use it
properly. However, I managed to find the place where things went
wrong quite fast by using the oldest and (sometimes still the
best) debugging tool: inserting writeln() statements along the
path of initializations/routines in the program.
ketmar via Digitalmars-d
2014-10-16 22:01:27 UTC
Permalink
On Thu, 16 Oct 2014 21:53:40 +0000
Post by Chris via Digitalmars-d
I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
Post by Chris via Digitalmars-d
Andrei mentioned that DMD built programs are
only around 10% slower than GDC/LDC builds
it depends of the task. my voxel renderer runs with miserable 15 FPS
with dmd -O -inline, yet with much more appropriate 40 FPS with
gdc -O2.

but it's a specific task, many other software can work with reasonable
speed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20141017/094327f5/attachment.sig>
Iain Buclaw via Digitalmars-d
2014-10-17 07:02:42 UTC
Permalink
On 16 Oct 2014 23:01, "ketmar via Digitalmars-d" <
Post by ketmar via Digitalmars-d
On Thu, 16 Oct 2014 21:53:40 +0000
Post by Chris via Digitalmars-d
I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
And soon to be 2.066 as soon as I apply the last 244 patches between May
and the final release date.

Iain.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20141017/e1a1056b/attachment-0001.html>
Chris via Digitalmars-d
2014-10-17 08:54:28 UTC
Permalink
On Friday, 17 October 2014 at 07:02:53 UTC, Iain Buclaw via
Post by Iain Buclaw via Digitalmars-d
On 16 Oct 2014 23:01, "ketmar via Digitalmars-d" <
Post by ketmar via Digitalmars-d
On Thu, 16 Oct 2014 21:53:40 +0000
Post by Chris via Digitalmars-d
I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
And soon to be 2.066 as soon as I apply the last 244 patches
between May
and the final release date.
Iain.
Thanks. Good to know.
Chris via Digitalmars-d
2014-10-17 08:53:45 UTC
Permalink
On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via
Post by ketmar via Digitalmars-d
On Thu, 16 Oct 2014 21:53:40 +0000
Post by Chris via Digitalmars-d
I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
But why does it say 2.064 here http://dlang.org/download.html?
Post by ketmar via Digitalmars-d
Post by Chris via Digitalmars-d
Andrei mentioned that DMD built programs are only around 10%
slower than GDC/LDC builds
it depends of the task. my voxel renderer runs with miserable
15 FPS
with dmd -O -inline, yet with much more appropriate 40 FPS with
gdc -O2.
but it's a specific task, many other software can work with
reasonable
speed.
Iain Buclaw via Digitalmars-d
2014-10-17 09:15:14 UTC
Permalink
On 17 October 2014 09:53, Chris via Digitalmars-d
On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via Digitalmars-d
Post by ketmar via Digitalmars-d
On Thu, 16 Oct 2014 21:53:40 +0000
Post by Chris via Digitalmars-d
I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
But why does it say 2.064 here http://dlang.org/download.html?
dlang.org is not like a wiki. If I were to send a PR to change that
to 2.065, the site probably won't be updated until the 2.067 release,
by which point that information will be wrong again.

Iain
Chris via Digitalmars-d
2014-10-17 09:40:36 UTC
Permalink
On Friday, 17 October 2014 at 09:15:37 UTC, Iain Buclaw via
Post by Iain Buclaw via Digitalmars-d
On 17 October 2014 09:53, Chris via Digitalmars-d
Post by Chris via Digitalmars-d
On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via
Digitalmars-d
Post by ketmar via Digitalmars-d
On Thu, 16 Oct 2014 21:53:40 +0000
Post by Chris via Digitalmars-d
I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
But why does it say 2.064 here http://dlang.org/download.html?
dlang.org is not like a wiki. If I were to send a PR to change
that
to 2.065, the site probably won't be updated until the 2.067
release,
by which point that information will be wrong again.
Iain
I see, I see. But that should really be updated on the D
homepage. After all it's the first port of call for D
programmers. If I cannot trust the information there ...

Loading...