Hello kitties,
I am creating this topic to document my research regarding better code generation for CC65 as mentioned in other recently active threads on the topic.
As a first and simple approach I am focusing on fixing/enhancing opt65, which is a simple 6502 peephole optimizer which is apparently sometimes breaking code when applying optimizations.
My first goal is to:
* fix opt65
* possibly add some more optimizations which it currently misses
* compare with the best of lcc65's output modes
I have not (yet!) encountered any optimization bugs in opt65 but I only tested it on small code samples (all coming from lcc65's output).
After reading DBug's suggestion of using lcc65's "O3" optimization level I did a few tests which gave interesting results.
So here they are:
Left: lcc65 with -O2, Right: lcc65 with -O3
Left: lcc65 with -O2, Right: lcc65 + opt65 pass (default options)
Now a comparison between lcc65's best output (O3) and opt65: they are very similar (on this limited example):
Left: lcc65 with -O3, Right: lcc65 + opt65 pass (default options)
As you can see, on this limited sample, the results are very similar. It looks like lcc65's best output matches that of opt65's.
I will be testing on bigger samples soon.
Better code generation (for CC65)
- NekoNoNiaow
- Flight Lieutenant
- Posts: 272
- Joined: Sun Jan 15, 2006 10:08 pm
- Location: Montreal, Canadia
Re: Better code generation (for CC65)
That's actually a nice difference, specially in term of code size.
Maybe everybody should try to rebuild their programs in -O3 and see if they notice any bugs, because there used to be some code generation errors, which is why it was not enabled by default, but if it works, that could be a good first step.
Maybe everybody should try to rebuild their programs in -O3 and see if they notice any bugs, because there used to be some code generation errors, which is why it was not enabled by default, but if it works, that could be a good first step.
Re: Better code generation (for CC65)
@NekoNoNiaow: I didn't understand from pictures is any difference between -O3 and opt65 output?
Else, I've tested -O3 option and there is real benefit in size. No bugs so far but I've compiled only 3-4 samples...
Else, I've tested -O3 option and there is real benefit in size. No bugs so far but I've compiled only 3-4 samples...
- NekoNoNiaow
- Flight Lieutenant
- Posts: 272
- Joined: Sun Jan 15, 2006 10:08 pm
- Location: Montreal, Canadia
Re: Better code generation (for CC65)
What I find interesting is that the improvements brought by -O3 and opt65 are very close to one another, which leads me to think that they might actually be somewhat related. This is just an intuition though since I have not yet looked at lcc65's source code but I will eventually get there (more on that in future posts ).Dbug wrote: ↑Tue Feb 19, 2019 9:15 am That's actually a nice difference, specially in term of code size.
Maybe everybody should try to rebuild their programs in -O3 and see if they notice any bugs, because there used to be some code generation errors, which is why it was not enabled by default, but if it works, that could be a good first step.
Although it is probably too early to encourage people to use -O3 since it seems unlikely that the issues you guys found a few years ago have vanished mysteriously.
I am planning to adapt the test suite of other C compilers (I know Gcc has one) to verify that the optimizations are sound. Moreover, if we introduce new optimizations to opt65 (there are a few it does not know handle yet) such a test suite will likely be useful to detect and correct regressions.
In the meantime, if anyone remembers which programs had issues when compiled with -O3 back in the day, please post them here as this would really help debugging both lcc's -O3 and opt65.
Argh!
This was obvious from the images but apparently I mixed them up when posting so my post must have been quite confusing on that point.
I will fix my original post soon but in the meantime: the only difference is the order of a few "iny" instructions and the result is thus functionally completely equivalent.
This said, this could be due to the fact that my test sample was relatively limited. I will soon do a more comprehensive test of a complete routine of mine.
Re: Better code generation (for CC65)
Well, if we are talking about rcc16 used in the OSDK, it looks like there are still bugs with -O3...
I tried to compile my current C project (you know about it DBug, this is Electroric, the latest build I'm working on) that is quite large (the generated executable being 37 Kb) with this option and it failed miserably with this error:
font.s(48): 0d09:Overflow error
Strangely, the "font.s" file (in %OSDK%\TMP, generated from a "font.c" file during build) doesn't even have 48 lines... So I suppose there's a file included before though there's nothing included in the .s file...
Anyway, I'll make the source code publicly available very soon so it can serve as a test to exercise lcc/cc65 options...
Last edited by retroric on Sat Mar 02, 2019 7:57 pm, edited 1 time in total.
Re: Better code generation (for CC65)
Cool, I like to have test cases
And Cool, I did not dream there was indeed some problems
And Cool, I did not dream there was indeed some problems
Re: Better code generation (for CC65)
Speaking of bugs/test cases, I thought there was a bug in lcc/OSDK with unions in C, because after refactoring some of my code to introduce unions last week, it reacted strangely (union members not being correctly updated after being assigned) and I had to implement things differently (actually putting together all union member variables in the same struct).
Anyway, I tried to reproduce my supposed "union bug" by just taking the data struct definitions in a small test program but was unfortunately unable to reproduce the problem, the code in the test program with unions behaved perfectly
Anyway, I tried to reproduce my supposed "union bug" by just taking the data struct definitions in a small test program but was unfortunately unable to reproduce the problem, the code in the test program with unions behaved perfectly
Re: Better code generation (for CC65)
So, I was given the problematic source code, and I took a look at it.
The problem is with a wrongly generated zero page indirect access... using a label not in zero page.
More precisely, we get this error:
...\font.s(48): 090e:Overflow error
on this particular instruction: lda (Lfont65),y
I'm going to give a full follow through so everybody can understand how the code is generated and transformed from C source code to assembler.
The original source code is simple:
this code is passed through the C compiler which then generates some pseudo 16 code, which looks like that:
On the Left you can see the result of the -O2 code generator, on the right what you get with -O3
Then this code is processed by some macro replacement (using the MACROS.H file in the osdk\macro subfolder)
Which gives us the following result:
We can see on the right hand side the problematic instruction:
The problem is that you can't use this addressing mode since Lfont65 is not a zero page address, it's a normal 16 bit memory address
From there, we can see that this line of code was generated from the following macro
which in the macros.h file matches this definition:
So the question is to know why we end up with Lfont65 being passed as a parameter
Apparently the compiler decided to optimize this:
INDIRW_CD(Lfont65,tmp0)
INDIRB_ZD(tmp0,tmp0)
into that:
INDIRB_ZD(Lfont65,tmp0)
The problem is with a wrongly generated zero page indirect access... using a label not in zero page.
More precisely, we get this error:
...\font.s(48): 090e:Overflow error
on this particular instruction: lda (Lfont65),y
I'm going to give a full follow through so everybody can understand how the code is generated and transformed from C source code to assembler.
The original source code is simple:
Code: Select all
#include "font.h"
/**
* Pointer to CUSTOM FONT data defined in font6x8.s, that is automatically
* generated by the build script from the font image file (data/computer-font-6x8.png)
* using the pictconv utily (see build.bat)
*/
extern const char computer_font_6x8[];
static const char *fontdata_ptr = computer_font_6x8;
static char *charset0_ptr = TEXT_STD_CHARSET0_PTR;
/**
* Load font data in charset 0 area in RAM (standard charset)
*/
void load_font() {
while(*fontdata_ptr) {
*charset0_ptr++ = *fontdata_ptr++;
}
}
On the Left you can see the result of the -O2 code generator, on the right what you get with -O3
Then this code is processed by some macro replacement (using the MACROS.H file in the osdk\macro subfolder)
Which gives us the following result:
We can see on the right hand side the problematic instruction:
Code: Select all
Lfont68
ldy #0 ; lda (Lfont65),y ; sta tmp0 ;
From there, we can see that this line of code was generated from the following macro
Code: Select all
Lfont68
INDIRB_ZD(Lfont65,tmp0)
Code: Select all
#define INDIRB_ZD(ptr1,tmp2)\
ldy #0 ;\
lda (ptr1),y ;\
sta tmp2 ;\
Apparently the compiler decided to optimize this:
INDIRW_CD(Lfont65,tmp0)
INDIRB_ZD(tmp0,tmp0)
into that:
INDIRB_ZD(Lfont65,tmp0)
- NekoNoNiaow
- Flight Lieutenant
- Posts: 272
- Joined: Sun Jan 15, 2006 10:08 pm
- Location: Montreal, Canadia
Re: Better code generation (for CC65)
Interesting.
I did not realize that the compiler was actually optimizing its macro code rather than the generated code itself.
In essence:
Also, another hope I have is to modify opt65's code to make it more data driven so we can add new optimizations without having to recompile the optimizer. A cursory inspection of the code optimized by opt65 made a few weeks ago showed that there were still a few optimizations which could be added.
I did not realize that the compiler was actually optimizing its macro code rather than the generated code itself.
In essence:
- lcc -O3 : optimizes the macro generated code (not the generated assembly)
- opt65: optimizes the generated assembly
Also, another hope I have is to modify opt65's code to make it more data driven so we can add new optimizations without having to recompile the optimizer. A cursory inspection of the code optimized by opt65 made a few weeks ago showed that there were still a few optimizations which could be added.