How software libraries work?

About
Share

Published On Premiered Sep 20, 2020

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development.

Programming Libraries are nothing new - they have been around for roughly 70 years, and they’re used by almost all programs written today. Even the simplest helloworld program written in C++ is using multiple libraries. Some of them most likely - you never heard of.

So what are software libraries, and how do they work? Obviously you already know they are pieces of code written for you by some other developer, so you don’t have to do everything from scratch. And to be fair: most programmers won’t need to bother themselves with details how this works. Well at least until it doesn’t. Then my friend, you open the hatch to figure out what is going on, and boy oh boy. Things get a little overwhelming. So today we’ll look into these building blocks that make up our environment and let us do our job.

#programming #tech #softwaredevelopment
http://refspecs.linuxbase.org/elf/elf...
https://en.wikipedia.org/wiki/Executa...

First, we have code-generation libraries. These are special tools that read configuration stored in the code, or externally to create more code that will be later read to the compiler. Example in java could be: project lombok, mapstruct library, or tools like xtext, For C# you’ll find powerful T4, and for C++ - well, there are libraries that work on preprocessor level like GTest, and some could argue that the whole template system is nothing else but a glorified code generation tool.

Static libraries come second. They are also called compile-time libraries and for a good reason: they’re permanently sewn into the machine code of our program’s executable during the compilation. Machine code compiled from your source will be merged with machine code of libraries during the linking process into a single binary file.

Dynamic libraries are a little different: Linker reads them when building your binary to check if everything is correct but it doesn’t actually add libraries to your code. They belong to the operating system, and often multiple programs use the same libraries (that’s why they are also called shared objects). Whenever a user runs your program - the operating system will check if perhaps the library is already loaded, because another process is using it. If that’s the case - it will share the library's address with your program. Otherwise it will just load it before running your code.

Now let’s talk briefly about remote libraries. These allow programmers to use RPCs or remotely called procedures. While code of such a library can actually run on the same machine, it has substantial overhead compared to say dynamic libraries. All this gets justified if you can benefit from distributed architecture. Simplest example is your trusty database: your program most likely has a database client, and this client is calling the database server remotely to fetch some data. There are whole frameworks that support writing such libraries, for example gRPC.

Today I want to dive a tad deeper into shared libraries. This is what my recent work in Cisco is currently about: I’m putting one of our services written in C++ into a Docker container, and correctly managing shared libraries is really a huge part of the task.

As I have mentioned before: even the smallest program like hello world uses libraries, and shared ones too. So let’s try it. I’ll spin up a docker container and we’ll take a look at these shared libraries.
For my docker image I’m using ubuntu with builtin clang-12, my current go-to compiler for C++. I’ll add two packages here: paxutils and ncurses-library. Paxutils brings some useful tools to analyse binaries, and ncurses is just an exemplary dependency I’ll use in my code to color text output.

As promised I’ll compile a simple hello world program: nothing fancy, just plain string sent to standard output. I can compile this from the command line and I’ll use a standard tool to list dependencies: LDD.

As you can see: there are few libraries that are used here: kernel vdso (virtual dynamic shared object), standard C++ library, libm (c math library), gcc (low-level runtime library), libc (c-standard library) and lastly: interesting ld-linux…..so.

This last row is a clue to what we’re after: dynamic linker library.

- Wait. What is that dynamic linker?
- This is the system component that is responsible for finding dynamic libraries that the program wants to use. Interesting fact is - that each program decides which linker is going to handle dependencies.

- Wait so there are multiple dynamic linkers?
Well, fortunately not really, no. In practice we use one linker in the operating system. But here’s the catch: binaries in linux follow the ELF standard, and it says that binaries contain sections, and in one of these sections every binary defines a program interpreter.

Published On Premiered Sep 20, 2020

Share/Embed

Video Link