The tape header is generated by the linker which currently always sets it as machine code. It did seem to work, however. Does it matter if the header says machine-code? In this case we could add an option for the linker to set the byte accordingly.
vbcc optimizing C compiler now with Oric Atmos support
Re: vbcc optimizing C compiler now with Oric Atmos support
- coco.oric
- Squad Leader
- Posts: 737
- Joined: Tue Aug 11, 2009 9:50 am
- Location: North of France
- Contact:
Re: vbcc optimizing C compiler now with Oric Atmos support
Yes, it's working i agree ; but as it's a basic file with a code attached, it should be more accurate to put the correct flag for a basic file and, i don't know prevent any discrepancy by have different "basic" files chained. As there's only one atmos configuration using basic&code, it's not an option. All files using this atmosr configuration should have this flag (but i've not yet read all your pdf files with the possibilities of your asm & c code process).The tape header is generated by the linker which currently always sets it as machine code. It did seem to work, however. Does it matter if the header says machine-code? In this case we could add an option for the linker to set the byte accordingly.
Top
coco.oric as DidierV, CEO Member
Historic owner of Oric, Apple II, Atari ST, Amiga
Historic owner of Oric, Apple II, Atari ST, Amiga
Re: vbcc optimizing C compiler now with Oric Atmos support
Yes, definitely when basic startup code is used the header byte must be set to $00. The ROM loading routine checks it and uses different handling of the loaded data dependent on this byte.
If it's machine code then Oric simply jumps in the void:
Code: Select all
JMP ($02A9)
And it really seams to work because:
1. the TAP isn't autostart and you have to type RUN which calls properly the compiled code at $50E;
2. we are lucky enough If the header was set to be autostart then 6502 lands at $501:
then it eats some invalid but harmless OPcodes, a couple of BRK's, and continues happily at $50E .
So, in any case if the compiled binary will be preceded with basic caller then the header byte must be set to $00, else the byte must be set to $80. IMHO adding another option to linker for this is pointless.
Re: vbcc optimizing C compiler now with Oric Atmos support
Please, read the 4 previous posts!
From formal point of view the byte should be $00 to indicate BASIC code (full stop) .
But let's push things further - what if we left it "as is" i.e. $80 and make the TAP autostart ...
I am announcing a prize-less just for fun BASIC/assembler contest:
Current startup BASIC code is:
Code: Select all
123 CALL #50E
Code: Select all
$501: 0b 05 - next BASIC line address
$503: 7b 00 - line number i.e. 123
$505: bf - 'CALL' token
$506: 23 35 30 45 00 - ascii of '#50E', zero terminated
$50B: 00 00 - end of BASIC program
- BASIC program in memory which can be RUNned and LISTed;
- 6502 OP codes, at-least first few to jump to $50E, so the $80 in header is OK!
Shorter solutions are better!
Don't be lazy and challenge yourself. Good luck.
PS: The meme picture contains a clue which can be used: register A is loaded from $2B1 (which contains normally $00) just before the jump to $501.
Re: vbcc optimizing C compiler now with Oric Atmos support
Hello Dr vbc,
IF you need, you can get inspired by my little utility "taptap" (wich is written in object pascal : Delphi and Lazarus), and the pdf doc wich synthetizes the Oric Tape format.
You'll find everything on my github :
https://github.com/DJChloe/taptap/tree/master
IF you need, you can get inspired by my little utility "taptap" (wich is written in object pascal : Delphi and Lazarus), and the pdf doc wich synthetizes the Oric Tape format.
You'll find everything on my github :
https://github.com/DJChloe/taptap/tree/master
Re: vbcc optimizing C compiler now with Oric Atmos support
I've been trying to use variious compilers, but only really been able to get cc65 and vbcc to work on Android/Termux, others are quite difficult to compile for a new platform.
What strikes me is that cc65 is quite capable and is very reliable. But the code generated is maybe slower than others. vbcc, in my tests inline everything so a simple recursive fib() becomes 130+ bytes whereas cc65 gave 56 bytes.
I think, and here I'm guessing, that a cycles/codesize could be an interesting measure. All the data is there, buti it's difficult to compare them and get a bigger picture. 20% faster by more than doubling the codesize isn't always a good idea.
Of course we want both speed and smaller code, if one could annotate per function, if code is time critical that would be beneficial.
What strikes me is that cc65 is quite capable and is very reliable. But the code generated is maybe slower than others. vbcc, in my tests inline everything so a simple recursive fib() becomes 130+ bytes whereas cc65 gave 56 bytes.
I think, and here I'm guessing, that a cycles/codesize could be an interesting measure. All the data is there, buti it's difficult to compare them and get a bigger picture. 20% faster by more than doubling the codesize isn't always a good idea.
Of course we want both speed and smaller code, if one could annotate per function, if code is time critical that would be beneficial.
vbc wrote: ↑Tue Feb 07, 2023 10:39 pmIs there some documentation how you build and run those benchmarks?
I am not sure I understand what that script is doing exactly, but from looking at the XA man-page, most of its directives should be handled by vasm using the options -dotdir -sect.IMHO, it would be very useful for Oric fans to be able somehow to use XA assembler sources with vbcc...
My solution so far looks like this :
[...]
... it works fine but it's grausam.
vlink supports the o65 object format, so it might also be possible to link object code assembled by XA together with code created by vbcc.
Re: vbcc optimizing C compiler now with Oric Atmos support
It is true that vbcc will usually generate direct code instead of calling lots of library functions. There surely are some opportunities to improve the size of the vbcc generated code in this fashion. However, vbcc usually generates more optimized code which makes using fixed library functions more difficult. Also, in such a small example, you have to take into account the size of the library functions as well.
If the size difference is as large as in your example, the code is probably not very friendly to the 6502. In such situations, the best option for code size probably would be to use a virtual machine with a compact instruction set.
This is not representative of the results I usually see or get reported by users. More typical is a code size difference of about +/-20% and a speed improvement of 0-200%.I think, and here I'm guessing, that a cycles/codesize could be an interesting measure. All the data is there, buti it's difficult to compare them and get a bigger picture. 20% faster by more than doubling the codesize isn't always a good idea.
Re: vbcc optimizing C compiler now with Oric Atmos support
I applaud any attempts to write compilers to the 6502 that is a notarious difficult target platform.
I've personally looked at the code cc65 generates and it feels more like it generates a "byte code" and then generates JSR calls "mostly" to save space. So your comment about using byte code interpreter hits home!
My test case is a recursive function, which, as I understand it, isn't typical or the best target for the given platform as the stack allocation cannot be removed. (My compiler source is lisp-code, thus it's congruent with that choice and target, to look at this example.)
unsigned int ultfib(unsigned int n) {
if (n<2) return n;
return ultfib(n-1) + ultfib(n-2);
}
I have trouble reading the IC1, IC2 output as it's quite verbose, I've not been able to run the code (android/termux) But this is relatively simple code, and the parameter can be passed in register, and pushed if need be saved for recursion.
But, it feels that 2 lines of code (ok 3 generates 56 bytes of code on cc65 using mostly JSR calls to library functions, but vbcc generates 130+ bytes. It seems a lot of double loading and movement between registers is taking place. Maybe because of the "unfortunate recursive" function.
Looking at the generated code, my own code-generator for a "lisp" can get it down to 39 bytes, using JSR calls. Mostly just avoiding redundant calls to load n when it's already in AX!
Now, cc65 is good at using "char" (at times) to optimized index access to arrays < 256 bytes, however, in parameter passing, not really.
Inlining all the functions in cc65, by hand, I get 112 bytes...
So, I'd *really* like to go for a hybrid, where functions called once are bytecompiledJSR compiled and others which are timecritical are inlined for maximum performance.
I'm sad that (standard) C and no platform provides a way to say INLINE(funccall(a,b,c)) at the *callsite*, all codes aren't created equal, and much of it is only run once or quite seldomly.
I've personally looked at the code cc65 generates and it feels more like it generates a "byte code" and then generates JSR calls "mostly" to save space. So your comment about using byte code interpreter hits home!
My test case is a recursive function, which, as I understand it, isn't typical or the best target for the given platform as the stack allocation cannot be removed. (My compiler source is lisp-code, thus it's congruent with that choice and target, to look at this example.)
unsigned int ultfib(unsigned int n) {
if (n<2) return n;
return ultfib(n-1) + ultfib(n-2);
}
I have trouble reading the IC1, IC2 output as it's quite verbose, I've not been able to run the code (android/termux) But this is relatively simple code, and the parameter can be passed in register, and pushed if need be saved for recursion.
But, it feels that 2 lines of code (ok 3 generates 56 bytes of code on cc65 using mostly JSR calls to library functions, but vbcc generates 130+ bytes. It seems a lot of double loading and movement between registers is taking place. Maybe because of the "unfortunate recursive" function.
Looking at the generated code, my own code-generator for a "lisp" can get it down to 39 bytes, using JSR calls. Mostly just avoiding redundant calls to load n when it's already in AX!
Now, cc65 is good at using "char" (at times) to optimized index access to arrays < 256 bytes, however, in parameter passing, not really.
Inlining all the functions in cc65, by hand, I get 112 bytes...
So, I'd *really* like to go for a hybrid, where functions called once are bytecompiledJSR compiled and others which are timecritical are inlined for maximum performance.
I'm sad that (standard) C and no platform provides a way to say INLINE(funccall(a,b,c)) at the *callsite*, all codes aren't created equal, and much of it is only run once or quite seldomly.
vbc wrote: ↑Fri Nov 01, 2024 2:15 pmIt is true that vbcc will usually generate direct code instead of calling lots of library functions. There surely are some opportunities to improve the size of the vbcc generated code in this fashion. However, vbcc usually generates more optimized code which makes using fixed library functions more difficult. Also, in such a small example, you have to take into account the size of the library functions as well.
If the size difference is as large as in your example, the code is probably not very friendly to the 6502. In such situations, the best option for code size probably would be to use a virtual machine with a compact instruction set.
This is not representative of the results I usually see or get reported by users. More typical is a code size difference of about +/-20% and a speed improvement of 0-200%.I think, and here I'm guessing, that a cycles/codesize could be an interesting measure. All the data is there, buti it's difficult to compare them and get a bigger picture. 20% faster by more than doubling the codesize isn't always a good idea.
Re: vbcc optimizing C compiler now with Oric Atmos support
I remember that some compilers (like Visual Studio) had options like "optimize for speed" and "optimize for space", maybe it would be useful to have this kind of options in vbcc?
In small projects having the code as fast as it can be is nice, but for large projects keeping the size under control is quite important.
In small projects having the code as fast as it can be is nice, but for large projects keeping the size under control is quite important.
Re: vbcc optimizing C compiler now with Oric Atmos support
Not sure what options you were using, but I get 103 bytes for that function:jsk wrote: ↑Mon Nov 11, 2024 4:37 pm But, it feels that 2 lines of code (ok 3 generates 56 bytes of code on cc65 using mostly JSR calls to library functions, but vbcc generates 130+ bytes. It seems a lot of double loading and movement between registers is taking place. Maybe because of the "unfortunate recursive" function.
Code: Select all
$ vc +atmos -O -c r.c
$ vobjdump r.o | head
------------------------------------------------------------------------------
VOBJ 6502 (little endian), 8 bits per byte, 2 bytes per word.
42 symbols.
1 section.
------------------------------------------------------------------------------
0000022b: SECTION "text" (attributes="acrx")
Flags: 1 Alignment: 1 Total size: 103 File size: 103
33 Relocations present.
I understand that, but providing both options with good quality means pretty mouch double the effort in the compiler for a pretty niche situation. After all, if your space-critical code reaches a certain size, you are probably better of using a different instruction set and interpreting that.So, I'd *really* like to go for a hybrid, where functions called once are bytecompiledJSR compiled and others which are timecritical are inlined for maximum performance.
E.g. for your code vbcc generates 46 bytes for 65816 or 68k and 36 bytes for 6809.
I am not sure I understand what you mean with "at the callsite". If I have one callsite where I need a version of a function compiled for speed, it does not seem to make sense to have another version anywhere else.I'm sad that (standard) C and no platform provides a way to say INLINE(funccall(a,b,c)) at the *callsite*, all codes aren't created equal, and much of it is only run once or quite seldomly.