[RCC16] Code optimizer

Questions, bug reports, features requests, ... about the Oric Software Development Kit. Please indicate clearly in the title the related element (OSDK for generic questions, PictConv, FilePack, XA, Euphoric, etc...) to make it easy to locate messages.

User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

[RCC16] Code optimizer

Post by Dbug »

I finaly managed to compile Contiki, and it even runs. Not particularly well, and it is quite FAT (well I am not sure that I managed to include the very same modules than what Carlsson included, anyway).

One reason of this fatness, is due to the way the opcodes are generated from a pseudo assembly language.

For example we have some calls to the LSH1W_DD instruction.
(As a matter of fact it is called 17 times), that is implemented this way:

Code: Select all

#define LSH1W_DD(tmp1,tmp2)\
	lda tmp1 ;\
	asl ;\
	sta tmp2 ;\
	lda tmp1+1 ;\
	rol ;\
	sta tmp2+1 ;\
For what I can understand this stores in tmp2 the result of shifting tmp1 by one to the left (ie: 16 bits multiply by two). It is actually quite efficiently implemented.

The only problem is that in all these 17 calls, tmp1 is equal to tmp2, so the most efficient code would have been that one:

Code: Select all

	asl tmp1
	rol tmp1+1
Similarly, we have instructions like that one:

Code: Select all

#define CZBW_DD(tmp1,tmp2)\
	lda tmp1 ;\
	sta tmp2 ;\
	lda #0 ;\
	sta tmp2+1 ;\
that could have a special version for when the two values are equal:

Code: Select all

	lda #0
	sta tmp1+1
And indeed, in some places in the code we have combinations of inneficiency like that:

Code: Select all

Lctk193
	INDIRB_CD(Lctk190,tmp0)
	CZBW_DD(tmp0,tmp0)
	LSH1W_DD(tmp0,tmp0)
	ADDW_DCD(tmp0,Lctk82,tmp0)
	INDIRW_ZD(tmp0,tmp0)
	INDIRW_CD(reg0,tmp1)
	NEW_DD(tmp0,tmp1,Lctk197)
	LEAVE
Lctk197
that got expended as this:

Code: Select all

Lctk193
	lda Lctk190 
	sta tmp0 

	lda tmp0 
	sta tmp0 
	lda #0 
	sta tmp0+1 

	lda tmp0 
	asl 
	sta tmp0 
	lda tmp0+1 
	rol 
	sta tmp0+1 

	clc 
	lda #<(Lctk82) 
	adc tmp0 
	sta tmp0 
	lda #>(Lctk82) 
	adc tmp0+1 
	sta tmp0+1 

	ldy #0 
	lda (tmp0),y 
	tax 
	iny 
	lda (tmp0),y 
	stx tmp0 
	sta tmp0+1 

	lda reg0 
	sta tmp1 
	lda reg0+1 
	sta tmp1+1 

	lda tmp0 
	eor tmp1 
	sta tmp 
	lda tmp0+1 
	eor tmp1+1 
	ora tmp 
	beq *+5 
	jmp Lctk197 

	jmp leave 
Lctk197
when a seasoned 6502 coder would have written this in probably less than half of that size...

Some time ago Fabrice told about doing a peephole optimizer, I guess that could be doable in some way, but this is far from being obvious as soon as you take into consideration things like self modifying code. I guess it would work fine on the C code :)
User avatar
carlsson
Pilot Officer
Posts: 127
Joined: Thu Jan 12, 2006 11:26 pm
Location: Västerås, Sweden

Post by carlsson »

Fat as in the binary takes a lot of memory? Using cc65 and the default makefile (i.e. with almost no applications or modules), the binary is 17945 bytes including TAP header. What do you get with ODSK?

cc65 doesn't have a cutthroat optimizer neither. It compiles line by line, not much of a lookahead if I remember correctly. It has three optimizer flags: -Oi for inlining functions, -Or for putting register variables on zeropage and -Os for inlining some library functions.

Apart from compiler, assembler, linker and perhaps supporting library, what does ODSK contain?
Anders Carlsson
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Post by Dbug »

carlsson wrote:Fat as in the binary takes a lot of memory? Using cc65 and the default makefile (i.e. with almost no applications or modules), the binary is 17945 bytes including TAP header. What do you get with ODSK?
Well, considering that the OSDK is not really makefile oriented, I just dropped all the .C and .H files into one single "contikiodsk" folder, and set my configuration build file with the following paramenters:

Code: Select all

SET OSDKADDR=$600
SET OSDKNAME=contiki
SET OSDKFILE=contiki contiki-main ek ctk dispatcher libconio programs petsciiconv ctk-conio
Do you have more or less modules ?
Anyway, with -O2 optimisation mode (I never managed to get -O3 to generate valid code in all configuration), the resulting linked .S source file is 6674 lines long, and takes 278 kb. After assembling with XA this makes a final tape file of 42 kb.
carlsson wrote:cc65 doesn't have a cutthroat optimizer neither. It compiles line by line, not much of a lookahead if I remember correctly. It has three optimizer flags: -Oi for inlining functions, -Or for putting register variables on zeropage and -Os for inlining some library functions.
RCC16 is not a 6502 compiler, it compiles code for a virtual processor that has a bunch of 16 bits registers. This processor is then "emulated" by blocs of macro instructions in a secondary post-processor phase.
carlsson wrote:Apart from compiler, assembler, linker and perhaps supporting library, what does ODSK contain?
The Oric native libraries in the OSDK are particularly good (imo). OSDK also provide a filepacker, image converter, some utilities to manipulate file headers, convert binary files to text, some documentation and code samples.

CC65 is obviously better than what it was before (when I tried it most ANSI C programs would not have compiled correctly on it, while they compiled fine with RCC16), but I still hate the syntax of the assembler that comes with it, and I don't like having a linker with binary objects files either.
User avatar
carlsson
Pilot Officer
Posts: 127
Joined: Thu Jan 12, 2006 11:26 pm
Location: Västerås, Sweden

Post by carlsson »

The makefile builds a binary from these object files plus required bits of the system library:

Code: Select all

$ wc -l *.s
     74 contiki-main.s
   1473 contiki.s
   3137 ctk-conio.s
   4614 ctk.s
    569 dispatcher.s
   1081 ek.s
    416 petsciiconv.s
    151 programs.s
     43 strncasecmp.s
  11558 total
That is pretty much the same modules you used. As you can see, the number of lines of assembly is much greater, but it doesn't say much. The object files are 141770 bytes in total. Different types of optimization:

Nothing: 21573 bytes
-Oi, -Osi: 19854 bytes
-Ori, -Oris: 19073 bytes
-Os: 18511 bytes
-Or, -Osr: 17945 bytes

Even without optimization, it is half of what you got with the other compiler. Is lcc65 a household name of the 6502 post-processor for RCC16? There have existed a number of other small C implementations, but most of them are limited or outdated.
Anders Carlsson
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Post by Dbug »

Ok, I made a side by side comparison of the number of lines (including empty lines in my case, do not know for you):

Code: Select all

Name          CC65    RCC16
contiki-main     74           64
contiki         1473        2539
ctk-conio     3137        7956
ctk              4614         644
dispatcher     569          976
ek               1081        1936
petsciiconv    416          360
programs      151          238
strncasecmp    43             ?
I don't understand the huge differences, like ctk for example.
Even without optimization, it is half of what you got with the other compiler. Is lcc65 a household name of the 6502 post-processor for RCC16? There have existed a number of other small C implementations, but most of them are limited or outdated.
LCC stands for Little C Compiler for what I know, and RCC is Retargetable C Compiler. What is related to what, I do not know, but probably Fabrice could clear it up :)
User avatar
carlsson
Pilot Officer
Posts: 127
Joined: Thu Jan 12, 2006 11:26 pm
Location: Västerås, Sweden

Post by carlsson »

From Fabrice's page, I got the impression that lcc is the front end (lexical analyzer and parser?), and then attached to the RCC16 backend, generating code for the virtual machine. Before it used to generate 32-bit VAX code, now a bit more efficient.

ctk.s should implement the abstract layer of the window manager: init, redraw, open window, close window, add menu item, signal handler and so on.

ctk-conio.s should implement the physical layer: draw window, draw widget etc.

Maybe you somehow got those two files interconnected so the compiler put all code referring to all layers of the window system in one file, and only a little part that has no further references in the other?
Anders Carlsson
User avatar
Euphoric
Game master
Posts: 99
Joined: Mon Jan 09, 2006 11:33 am
Location: France

Post by Euphoric »

Sorry for the delay, I was without web access last week...

lcc or lcc65 was the name of the compiler driver that Vaggelis Blathras wrote at the beginning to chain the cpp preprocessor, the compiler itself, the cpp preprocessor again (because the assembler used at the beginning had no preprocessor) and the assembler... Sure the lcc name might not have been the best name because it introduces some confusion with another C compiler (not sure which came first, though...)

I'm rather surprised with the results of the benchmarking under progress, because at the time I wrote the 16-bit 6502 backend for the Retargetable C Compiler (Hanson & Fraser), cc65 was largely below, both in terms of time and space quality. Also, cc65 wasn't ANSI compliant, and even had problems with simple signed int comparisons (it didn't check the overflow flag after the CMP instruction).

It seems cc65 has made great progress since that time, I might have a look at it again...

About good 6502 C compilers : one that can produce really good code is IAR's Embedded Workbench for 6502. Once you have selected the correct memory model for your application, you can have a really good code, with some limitations of course (non-reentry, recursivity, ...). A demo version should be still available, I think...
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Post by Dbug »

Euphoric wrote:I'm rather surprised with the results of the benchmarking under progress, because at the time I wrote the 16-bit 6502 backend for the Retargetable C Compiler (Hanson & Fraser), cc65 was largely below, both in terms of time and space quality. Also, cc65 wasn't ANSI compliant, and even had problems with simple signed int comparisons (it didn't check the overflow flag after the CMP instruction).

It seems cc65 has made great progress since that time, I might have a look at it again...
Yep, same here.

I stopped trying using cc65 when I tried to compile a small console based awele game that a collegue made. It didn't compile with cc65, but was perfectly compiling with rcc16 (and also gcc, and vc6).
User avatar
carlsson
Pilot Officer
Posts: 127
Joined: Thu Jan 12, 2006 11:26 pm
Location: Västerås, Sweden

Post by carlsson »

When you two were looking into cc65, it must be four-five years ago, or even more? The original Atari compiler may not have been so versatile, but a lot of water has flown under the bridges. This document outlines some of the differences vs the ISO standard, and cc65's shortcomings:

http://www.cc65.org/doc/cc65-4.html

Mainly, it lacks floating point numbers, bit fields, structs can not be returned or passed as value (which is an ugly thing to do anyway, IMHO).

Hmm, the retargetable C compiler by Hanson & Fraser is lcc, as far as I understand, so the name lcc65 may not be as confusing as you think it is.. or I'm double confused.
Anders Carlsson
Post Reply