Better code generation (for CC65)

Questions, bug reports, features requests, ... about the Oric Software Development Kit. Please indicate clearly in the title the related element (OSDK for generic questions, PictConv, FilePack, XA, Euphoric, etc...) to make it easy to locate messages.

User avatar
NekoNoNiaow
Flight Lieutenant
Posts: 272
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Better code generation (for CC65)

Post by NekoNoNiaow »

Hello kitties,
I am creating this topic to document my research regarding better code generation for CC65 as mentioned in other recently active threads on the topic.

As a first and simple approach I am focusing on fixing/enhancing opt65, which is a simple 6502 peephole optimizer which is apparently sometimes breaking code when applying optimizations.

My first goal is to:
* fix opt65
* possibly add some more optimizations which it currently misses
* compare with the best of lcc65's output modes

I have not (yet!) encountered any optimization bugs in opt65 but I only tested it on small code samples (all coming from lcc65's output).

After reading DBug's suggestion of using lcc65's "O3" optimization level I did a few tests which gave interesting results.
So here they are:

Left: lcc65 with -O2, Right: lcc65 with -O3
O2-O3.png
Left: lcc65 with -O2, Right: lcc65 + opt65 pass (default options)
O2-O3.png
Now a comparison between lcc65's best output (O3) and opt65: they are very similar (on this limited example):
Left: lcc65 with -O3, Right: lcc65 + opt65 pass (default options)
O2-O3.png
As you can see, on this limited sample, the results are very similar. It looks like lcc65's best output matches that of opt65's.
I will be testing on bigger samples soon.
Attachments
O3-O2+opt65.png
O2-O2+opt65.png
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Re: Better code generation (for CC65)

Post by Dbug »

That's actually a nice difference, specially in term of code size.

Maybe everybody should try to rebuild their programs in -O3 and see if they notice any bugs, because there used to be some code generation errors, which is why it was not enabled by default, but if it works, that could be a good first step.
User avatar
iss
Wing Commander
Posts: 1641
Joined: Sat Apr 03, 2010 5:43 pm
Location: Bulgaria
Contact:

Re: Better code generation (for CC65)

Post by iss »

@NekoNoNiaow: I didn't understand from pictures is any difference between -O3 and opt65 output?

Else, I've tested -O3 option and there is real benefit in size. No bugs so far but I've compiled only 3-4 samples...
User avatar
NekoNoNiaow
Flight Lieutenant
Posts: 272
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Re: Better code generation (for CC65)

Post by NekoNoNiaow »

Dbug wrote: Tue Feb 19, 2019 9:15 am That's actually a nice difference, specially in term of code size.

Maybe everybody should try to rebuild their programs in -O3 and see if they notice any bugs, because there used to be some code generation errors, which is why it was not enabled by default, but if it works, that could be a good first step.
What I find interesting is that the improvements brought by -O3 and opt65 are very close to one another, which leads me to think that they might actually be somewhat related. This is just an intuition though since I have not yet looked at lcc65's source code but I will eventually get there (more on that in future posts ;)).

Although it is probably too early to encourage people to use -O3 since it seems unlikely that the issues you guys found a few years ago have vanished mysteriously. :(
I am planning to adapt the test suite of other C compilers (I know Gcc has one) to verify that the optimizations are sound. Moreover, if we introduce new optimizations to opt65 (there are a few it does not know handle yet) such a test suite will likely be useful to detect and correct regressions.

In the meantime, if anyone remembers which programs had issues when compiled with -O3 back in the day, please post them here as this would really help debugging both lcc's -O3 and opt65.
iss wrote: Tue Feb 19, 2019 11:09 am @NekoNoNiaow: I didn't understand from pictures is any difference between -O3 and opt65 output?

Else, I've tested -O3 option and there is real benefit in size. No bugs so far but I've compiled only 3-4 samples...
Argh!
This was obvious from the images but apparently I mixed them up when posting so my post must have been quite confusing on that point.
I will fix my original post soon but in the meantime: the only difference is the order of a few "iny" instructions and the result is thus functionally completely equivalent.

This said, this could be due to the fact that my test sample was relatively limited. I will soon do a more comprehensive test of a complete routine of mine.
User avatar
retroric
Pilot Officer
Posts: 125
Joined: Sun Nov 22, 2009 4:33 pm
Location: Paris, France

Re: Better code generation (for CC65)

Post by retroric »

Dbug wrote: Tue Feb 19, 2019 9:15 am Maybe everybody should try to rebuild their programs in -O3 and see if they notice any bugs, because there used to be some code generation errors, which is why it was not enabled by default, but if it works, that could be a good first step.
Well, if we are talking about rcc16 used in the OSDK, it looks like there are still bugs with -O3...

I tried to compile my current C project (you know about it DBug, this is Electroric, the latest build I'm working on) that is quite large (the generated executable being 37 Kb) with this option and it failed miserably with this error:

font.s(48): 0d09:Overflow error

Strangely, the "font.s" file (in %OSDK%\TMP, generated from a "font.c" file during build) doesn't even have 48 lines... So I suppose there's a file included before though there's nothing included in the .s file...

Anyway, I'll make the source code publicly available very soon so it can serve as a test to exercise lcc/cc65 options...
Last edited by retroric on Sat Mar 02, 2019 7:57 pm, edited 1 time in total.
flag_fr RetrOric, aka laurentd75 flag_uk
            GitHub - RetrOric
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Re: Better code generation (for CC65)

Post by Dbug »

Cool, I like to have test cases :)
And Cool, I did not dream there was indeed some problems :D
User avatar
retroric
Pilot Officer
Posts: 125
Joined: Sun Nov 22, 2009 4:33 pm
Location: Paris, France

Re: Better code generation (for CC65)

Post by retroric »

Speaking of bugs/test cases, I thought there was a bug in lcc/OSDK with unions in C, because after refactoring some of my code to introduce unions last week, it reacted strangely (union members not being correctly updated after being assigned) and I had to implement things differently (actually putting together all union member variables in the same struct).

Anyway, I tried to reproduce my supposed "union bug" by just taking the data struct definitions in a small test program but was unfortunately unable to reproduce the problem, the code in the test program with unions behaved perfectly :-(
flag_fr RetrOric, aka laurentd75 flag_uk
            GitHub - RetrOric
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Re: Better code generation (for CC65)

Post by Dbug »

So, I was given the problematic source code, and I took a look at it.
The problem is with a wrongly generated zero page indirect access... using a label not in zero page.

More precisely, we get this error:
...\font.s(48): 090e:Overflow error
on this particular instruction: lda (Lfont65),y


I'm going to give a full follow through so everybody can understand how the code is generated and transformed from C source code to assembler.

The original source code is simple:

Code: Select all

#include "font.h"

/**
 * Pointer to CUSTOM FONT data defined in font6x8.s, that is automatically
 * generated by the build script from the font image file (data/computer-font-6x8.png)
 * using the pictconv utily (see build.bat)
 */
extern const char computer_font_6x8[]; 

static const char *fontdata_ptr = computer_font_6x8;
static char       *charset0_ptr = TEXT_STD_CHARSET0_PTR;

/**
 *  Load font data in charset 0 area in RAM (standard charset)
 */
void load_font() {
	while(*fontdata_ptr) {
		*charset0_ptr++ = *fontdata_ptr++;
	}
}
this code is passed through the C compiler which then generates some pseudo 16 code, which looks like that:
CodeGenerationError.png
On the Left you can see the result of the -O2 code generator, on the right what you get with -O3

Then this code is processed by some macro replacement (using the MACROS.H file in the osdk\macro subfolder)

Which gives us the following result:
GeneratedAssemblerCode.png
We can see on the right hand side the problematic instruction:

Code: Select all

Lfont68
	ldy #0 ;	lda (Lfont65),y ;	sta tmp0 ;
The problem is that you can't use this addressing mode since Lfont65 is not a zero page address, it's a normal 16 bit memory address
From there, we can see that this line of code was generated from the following macro

Code: Select all

Lfont68
	INDIRB_ZD(Lfont65,tmp0)
which in the macros.h file matches this definition:

Code: Select all

#define INDIRB_ZD(ptr1,tmp2)\
	ldy #0 ;\
	lda (ptr1),y ;\
	sta tmp2 ;\
So the question is to know why we end up with Lfont65 being passed as a parameter

Apparently the compiler decided to optimize this:

INDIRW_CD(Lfont65,tmp0)
INDIRB_ZD(tmp0,tmp0)

into that:

INDIRB_ZD(Lfont65,tmp0)
User avatar
NekoNoNiaow
Flight Lieutenant
Posts: 272
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Re: Better code generation (for CC65)

Post by NekoNoNiaow »

Interesting.
I did not realize that the compiler was actually optimizing its macro code rather than the generated code itself.

In essence:
  • lcc -O3 : optimizes the macro generated code (not the generated assembly)
  • opt65: optimizes the generated assembly
This leaves me hopeful that if both are fixed to work correctly then the optimizations added by each would be at least partially cumulative.

Also, another hope I have is to modify opt65's code to make it more data driven so we can add new optimizations without having to recompile the optimizer. A cursory inspection of the code optimized by opt65 made a few weeks ago showed that there were still a few optimizations which could be added.
Post Reply