Ceed Ceed A Tiny Compiler with ELF & PE Target

Introduction


I have always been interested in compilers but never got a chance to build one. I recently got some time and decided to learn compiler development. I started to look for an example compiler that meets following criteria:

  • Minimalistic – Quick walk-through and reading
  • Documented – Easier to understand
  • Complete – Produces executables without external tools
  • Uses Lex & Yacc – Many books available on building parsers

I didn’t find a good example that meets these requirements and can used as a starting point to bootstrap knowledge of compiler construction. I decided to develop a compiler, primarily for learning and later on, to create an example, that serves as a resource for others who are interested in learning compiler development.

Ceed (pronounced as seed) is an open source (BSD style license) compiler that compiles a high level language to x86 machine code and packages it in an executable binary. It supports 32-bit Linux and Windows and can generate output file in either ELF or PE format. Complete code for Ceed is around 1400 lines of Flex/Bison/C code excluding comments.

(more…)

Read More

Windows Memory Management

[Moved an old article that I wrote in 2004 to my new blog]

Introduction


Windows on 32 bit x86 systems can access up to 4GB of physical memory. This is due to the fact that the processor’s address bus which is 32 lines or 32 bits can only access address range from 0x00000000 to 0xFFFFFFFF which is 4GB. Windows also allows each process to have its own 4GB logical address space. The lower 2GB of this address space is available for the user mode process and upper 2GB is reserved for Windows Kernel mode code. How does Windows give 4GB address space each to multiple processes when the total memory it can access is also limited to 4GB. To achieve this Windows uses a feature of x86 processor (386 and above) known as paging. Paging allows the software to use a different memory address (known as logical address) than the physical memory address. The Processor’s paging unit translates this logical address to the physical address transparently. This allows every process in the system to have its own 4GB logical address space. To understand this in more details, let us first take a look at how the paging in x86 works. (more…)

Read More

Syntactical twists of C (*p != *p)

[Moved an old post from 2006 to my new blog]

Few days back one of my colleague asked me to debug a problem. She wrote a program and it was crashing in strcpy. I looked at the the code and it looked just fine to me. I thought lets debug it to see whats going on. I started the debug session, variables were pointing to the right data, the stack was fine and she was copying a fixed string to a big enough buffer. I stepped over strcpy and bammm…access violation. Weird huh…For a second i thought how can a simple code like this crash. It was time to dig into the disassembly to see what exactly is going on. But before we do that, lets take a look at two C functions below: (more…)

Read More

Calling conventions in Windows on x86

[Moved an old post from 2006 to my new blog]

It is 2 AM in the night and i don’t feel like sleeping so i thought why not i start my blog and here i am with my first blog entry ever.

People who do programming on Windows in C/C++, might wonder sometime, what is the __cdecl or __stdcall in front of a function declaration? These compiler specific prefixes are basically a way to tell the compiler, how to push the function arguments on the stack and how to pop them off the stack. These prefix defines the contract between Caller (the one who calls a function) and Callee (the called function) for argument passing. This contact is known as Calling convention. Usually we should need only one calling convention for argument passing but Windows compilers provide more than one convention because of historical and performance reasons. The three calling conventions available on windows are:

  1. __cdecl (more…)

Read More