My Journey @FOSS club

Let me start by giving a brief introduction to myself. I am currently pursuing my master’s in CS from TU Berlin. I completed my first year of master’s from KTH which is one of the world’s top 100 universities. I have received the highest scholarship for my master’s degree. I worked at Red Hat for two years which is one of the biggest open source companies in the world. I was invited as a speaker at the Linux Vault conference, North Carolina. I have published articles for open source magazine and have presented/attended various conferences. I was ranked 10th in the entire university for CSE. No, this is not me bragging about myself but trying to catch your attention to read the remaining article:)

 
Like majority of the students today, I too opted to do bachelor’s in CSE just because everybody else was doing it. I did not particularly have any interests in CS. Infact if anything I was pretty bad with computers and technology and believed that girls cannot be as good as boys in CS. Once I entered Amrita the first year went by pretty quickly and I had learnt absolutely nothing new in CS. This is when I started panicking – How can I be a good engineer when I don’t know anything about computers?. But there was nothing that I could do, nobody to talk to, nowhere to learn and absolutely no guidance. While I was aimlessly wandering one of my friends told me about this new “FOSS club” she had recently joined and how she was learning a lot of new things here. They were giving a small talk so I too decided to attend the session. This is and will be one of the most important turning points in my life. I realized that FOSS club was exactly what I was looking for. I started attending FOSS club regularly and the initial few weeks were frustrating because I was supposed to google things and learn on my own!! I was not spoon-fed and had to learn on my own. But slowly I realized how easy it was to learn something new, how much material was available online. I started to depend less on others and more on myself. However within a month, most of my classmates had left and we were just very few students who would be regular (we were still one of the batches with the largest number of students:)).

 
I will give a brief description of what I did during my bachelor’s. Throughout the second year I was learning a lot of new things, participating in various contests. During my third year I started open source contribution (that later helped me land my job at Red Hat). Me along with few other students got interested in cyber security and participated in various CTF contests. I also started writing a few articles for open source magazines. My final year was focused a lot on applying for internships/job, building a good SOP, recommendation letters, applying for master’s, attending talks and conferences. I also focused a lot on my studies and tried my best to keep a good cpga. Though a good cgpa goes not guarantee knowledge, but in my opinion it does create a huge difference when it comes to applying for your master’s.

 
There were ofcourse numerous hurdles throughout the journey. The first hurdle was myself. Now even though I was very motivated and regularly attended FOSS club (during weekends, after exams, during semester break). I was sometimes not very productive. However the key is to be consistent, even though you are not very productive never drop out. It is just a phase, it will pass. Many of my friends/classmates would make fun of me for “studying” too much, label me as a nerd or try to distract me. However the key is to understand that you are responsible for your own life and irrespective of what others say or do you must follow your passion and keep learning. Failing to achieve – there was a duration of time when most of my friends were achieving a lot of their dreams and I was unable to achieve anything. The key is to believe in yourself, don’t get disheartened and continue working hard.

 

Finally I would like to mention two of my achievements that I truly treasure. Firstly I learnt to learn – I am no longer scared to learn anything on my own and do not need others to spoon feed me. Secondly I am confident with my skills and believe in myself. All these experiences formed a strong foundation for me and helped me grow. I had amazing seniors who guided me, I had amazing friends who motivated me to stay consistent and ofcourse I had the most amazing mentor Vipin Sir who believed in me when I was completely clueless.

DHT – Distributed Hash Table

Distribution in GlusterFS is handled by the DHT or the Distributed Hash Table which is loaded on the client stack. All operations are driven by the clients which are all equal. There are no metadata servers or special nodes which have any additional information about where the files are present or should go. Any additional information about file or directories are stored in the extended attributes or xattrs. Xattrs are filesystem features that enable users to associate files/dirs with metadata. They store information as key-value pairs. There are mainly two DHT related xattrs-  linkto and layout.

DHT creates directories on all the bricks.  When directory is created a layout range is assigned to it which is stored in the extended attribute called trusted.glusterfs.dht .The range varies from 00000000 to 0xffffffff  and each brick is assigned a specific subset of this range. The  layout is complete and healthy when the range 00000000 to 0xffffffff is distributed across the volume without any gaps or overlap. The directory creation and the setting of layout is part of mkdir operation. While setting the layout we can either do

  1. Homogeneous setting of layout, where each brick gets the equal range. This is the default option.
  2. Heterogeneous weight assigning where based on the size of the bricks we assign the layout range to a brick. This means that larger bricks have larger layout, increasing the probability of data on these bricks. 

Lets take an example, consider that your volume has three bricks (b1, b2, b3). If you create a directory dir1 then the layout would look something like this:

# file: export/testvol/brick1/dir1
trusted.gfid=0x3c3f4bc85885438f98825ef348a95984
trusted.glusterfs.dht=0x00000001000000000000000055555554

# file: export/testvol/brick2/dir1
trusted.gfid=0x3c3f4bc85885438f98825ef348a95984
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

# file: export/testvol/brick3/dir1
trusted.gfid=0x3c3f4bc85885438f98825ef348a95984
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

There are mainly two types of anomalies that can be seen w.r.t layout:

    1. Holes –  on a brick if a directory does not have a layout it is called a hole. If there is no layout on a directory no files can be stored on that brick.
    2. Overlaps – all brick must have exclusive layout ranges, if the layout ranges overlap it is an overlap.

Unlike directories file have to be present on only one subvol.  Given a file we find its hash value and the brick on which the hash value falls. This brick is known as the hashed brick. The brick on which the data file actually exists is the cached brick. For a newly created file the hashed and the cached brick will usually be the same. Considering the above example if we create a file under the directory dir1 then the file will be created on only one of the brick.

 However while renaming a file the destination file’s hashed brick may be different from the source file’s hashed brick. In this case instead of actually moving the entire data file to the new hashed brick we create a linkto file. This is a 0 byte file which is created on the new hashed brickhave mode equal to _____T (no permissions except for the sticky bit ‘T’). The purpose of the linkto file is to act as a pointer to the brick where the data file actually exists (which is still located on the old hashed brick).  They have an xattr called the  rusted.dht.linkto xattr which stores the name of the brick on which the data file actually exists. Now the brick on which the linkto file exists is the hashed brick and the file on which the actual data file exists is the cached brick. All fops on the file will first land to the hashed brick and will be redirected to the cached brick by reading the linkto file’s xattr.

 

GlusterFS and types of volumes

As discussed in another post a volume is a logical collection of bricks and most file operations happen on the volume. GlusterFS supports various types of volumes.

Some volumes are good for scaling storage size, some for improving performance and some for both.

  1. Distributed Glusterfs Volume – This is the default glusterfs volume i.e, while creating a volume of you do not specify the type of the volume the default option is to create a distributed type of volume. In this volume, files are randomly stored across the bricks in the volume. There is no data redundancy. The purpose for such a storage volume is to easily scale the volume size. However this also means that a brick failure will lead to complete loss of data and one must rely on the underlying hardware for data loss protection.
  2. Replicated Glusterfs Volume – In this volume we overcome the data loss problem faced in the distributed volume. Here exact copy of the data is maintained on all bricks. The number of replicas in the volume can be decided by client while creating the volume. One major advantage of such a volume is that even if one brick fails the data can still be accessed from its replica brick. Such a volume is used for better reliability and data redundancy.
  3. Distributed Replicated Glusterfs Volume – This volume combines the advantages of both a distribute and a replicate volume. In this volume files are distributed across replicated sets of bricks. The number of bricks must be a multiple of the replica count. Also the order in which we specify the bricks matters since adjacent bricks become replicas of each other. This type of volume is used when high availability of data due to redundancy and scaling storage is required. So if there were eight bricks and replica count 2 then the first two bricks become replicas of each other then the next two and so on. This volume is denoted as 4×2. Similarly if there were eight bricks and replica count 4 then four bricks become replica of each other and we denote this volume as 2×4 volume.
  4. Striped Glusterfs Volume – Consider a large file being stored in a brick which is frequently accessed by many clients at the same time. This will cause too much load on a single brick and would reduce the performance. In striped volume the data is stored in the bricks after dividing it into different stripes. So the large file will be divided into smaller chunks (equal to the number of bricks in the volume) and each chunk is stored in a brick. Now the load is distributed and the file can be fetched faster but no data redundancy provided.

GlusterFS – a distributed file system

Looking at the topic the first question arises as to what is a distributed file system and why do we need it. Distributed file system gives us a way for storing and accessing files in a client/server architecture. Here we can use more that one server to store data and use multiple clients (local or remote) that can access data from these servers. Distributed file system organizes and displays  files and directories from multiple servers as if they were stored in your local system, thereby projecting a simple interface to the user or application. Their main advantage is that it is easier to distribute documents to multiple clients and they provide a centralized storage system so that client machines are not using their resources to store the data.

Lets look at some of the GlusterFS terminologies:

  1. Brick is the basic unit of storage, represented by an export directory on a machine
  2. A storage pool is a trusted network of storage servers. You can dynamically add or remove nodes from the pool. Only nodes that are in a trusted storage pool can participate in volume creation.
  3. A volume is a logical collection of bricks. Most of the gluster management operations happen on the volume. Multiple volumes can be hosted on the same node.
  4. These are building blocks of GlusterFS. All functions are implemented as translator They are stacked together for achieving a desired functionality.
  5. The machine which mounts the volume (this may also be a server).

 

GlusterFS is a distributed file system capable of scaling up to several petabytes and can  handle thousands of clients. GlusterFS clusters together storage building blocks over RDMA or TCP/IP and aggregates disk and memory resources for managing data in a single global namespace. It does not need any special hardware to store data. It is based on a stackable user space design.

 

GlusterFS is a user space file system and hence uses FUSE (Filesystem in user space) to hook itself with VFS layer. FUSE is a loadable kernel module that lets non-privileged users create their own file systems without editing kernel code. Gluster uses already tried and tested disk file systems like ext4, xfs, etc as the underlying filesystem to store the data.

My talk at Vault Linux Filesystem and Storage

Recently I got an opportunity to present a talk for the Vault Linux Filesystem and Storage conference at held at Raleigh, North Carolina. This was my first experience giving a talk and I got to learn many things. My talk was on GlusterFS and its distribution model. You can have a look at my slides from [1]. The conference was a two day event and my talk was one of the first talks on day 1. For me this was a huge advantage since I could finish my talk first and then could enjoy the remaining talks without any worries. Though I had been preparing since a long time I was pretty nervous once the talk started. the talk on the whole went smoothly without anything bad happening (before the event I was imagining a variety of things that could go wrong ranging from me going blank, mic not working, etc). Having given one talk has made me much more confident about talks in the future and I realized that the only way of overcoming one’s stage fright is to keep giving more and more talks. Also this talk showed me some of my mistakes and has helped me to improve. Also I got to meet a lot of people and interact with them.

Go Goa Gone…!

One of my friend had once suggested me that Goa is a must visit place for bachelors. Hence though I had been to Goa as a kid with my family and family friends (huge group), I decided to be there with my ‘bachelor’ friends. We were a group of five (four girls and  a guy :P) and decided to go for our Goa trip.

Day 0: We boarded our bus for Karwar which is 40 km away from south goa. Now this is was a pretty uneventful bus ride, if you exclude sleeping uncomfortably.

Day 1: On reaching Karwar, we took a bus to south goa and finally reached our hotel which was alongside palolem beach. We dumped our luggage, rented bikes and rode to the turtle beach which was a few km from palolem. Finally we got a feeling that we were in Goa. We had a great time riding bikes but were pretty tired and exhausted by the time we reached turtle beach. There was a small restaurant nearby, so we decided to have a brunch.  Turtle beach has become one of my favorite beaches, it was quite, clean and we had a great time there. Finally after 3 hours of playing at the beach we decided to ride back. After reaching back we took bath and went to palolem beach to watch the sunset which I cannot specify enough was beautiful to watch. We spent sometime walking, building sand castle, talking on the beach and then crashed.

Day 2: Next day we got ready, bid farewell to south goa and went to Dhoodh sagar. It was a couple of hour drive from palolem beach.  After a small trek we reached the falls, which was amazing and the water had some great de-tanning effect on us. After spending a couple of hours at the fall we returned and had food. We then proceeded towards north goa. The drive was amazing, as we were singing all the way, watching pretty good scenes on our sides. We reached north goa sometime later that night.

Day 3: This was one of the most exciting days for us as we were to go for water sports this day. We rented bikes and set off to anjuna beach for water sports. There were various water sports to try at anjuna beach and obviously we tried most of them. We went for jet-ski, banana ride, bumper ride, para-sailing. For me the highlight of the trip was when we got to jump in deep water after para-sailing. It was so much fun to just float in the deep water! We then played in the beach for almost 3 hours. After this we were completely hungry and tried and rented some shack, ordered some burgers and watched the sunset. After a couple of hours of relaxing we rode back. On reaching our hotel we quickly got ready and headed for a casino. Though I was very tired and had no interest to goto the casino, I actually enjoyed a lot and was one of the highlights of the trip for me. Initially we just roamed from table to table at the casino having no idea what to do, but soon we found a game (which sadly I don’t remember the name) and got really interested in the game. We actually played the game later and won for sometime:). After around four hours we returned to out hotel.

Day 4: This was our last day of the trip and though nobody was mentioning, we were all a bit sad. We got up a bit late, packed and checked out. We checked out tito’s lane a bit, but it was hot so we went to baga beach and played at the beach for a few hours. We came back to our hotel and changed again. We returned our bikes and booked a taxi to goto Panjim to board our bus. Since we reached Panjim a little early we just went to a park, where we rested, talked and had some snacks. We then boarded our bus and immediately slept.

Next morning we reached Bangalore and with heavy hearts and a lot of memories we said our good byes, with promises to travel again soon…

 

Implementing ls -l command in Linux

Bash shells allows us to have an interactive shell such that the user can type various commands and get back results. One of the very common command used all the time is ls – which will list all the files in that particular directory. The command ls also has numerous variations like:

ls -a : lists all the hidden files

ls -l: lists all file names along with other details like owner, permission, date of creation etc.

ls -R: Recursively lists files present in the directories inside the main directory

Given below is a program which is a simple implementation of the ls -l command:

#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <dirent.h>

int main(int argc, char **argv)
{
    DIR *dp;
    struct dirent *dirp;

    if ((dp = opendir(argv[1])) == NULL)
        printf(“can’t open %s”, argv[1]);

    while ((dirp = readdir(dp)) != NULL){
        struct stat fileStat;
        stat(dirp->d_name,&fileStat);   
 
        printf(dirp->d_name);
        printf(“—————————\n”);
        printf(“File Size: \t\t%d bytes\n”,fileStat.st_size);
        printf(“Number of Links: \t%d\n”,fileStat.st_nlink);
        printf(“File inode: \t\t%d\n”,fileStat.st_ino);
 
        printf(“File Permissions: \t”);
        printf( (S_ISDIR(fileStat.st_mode)) ? “d” : “-“);
        printf( (fileStat.st_mode & S_IRUSR) ? “r” : “-“);
        printf( (fileStat.st_mode & S_IWUSR) ? “w” : “-“);
        printf( (fileStat.st_mode & S_IXUSR) ? “x” : “-“);
        printf( (fileStat.st_mode & S_IRGRP) ? “r” : “-“);
        printf( (fileStat.st_mode & S_IWGRP) ? “w” : “-“);
        printf( (fileStat.st_mode & S_IXGRP) ? “x” : “-“);
        printf( (fileStat.st_mode & S_IROTH) ? “r” : “-“);
        printf( (fileStat.st_mode & S_IWOTH) ? “w” : “-“);
        printf( (fileStat.st_mode & S_IXOTH) ? “x” : “-“);
        printf(“\n\n”);
 
        printf(“The file %s a symbolic link\n”, (S_ISLNK(fileStat.st_mode)) ? “is” : “is not”);
    }
    return 0;
}

The opendir function returns a pointer to a DIR structure, and we pass this pointer to the readdir function.  We then call readdir in a loop, to read each directory entry.  The readdir function returns a pointer to a dirent structure or, when it’s finished with the directory,  a null pointer.

X86 Assembly

There are three basic kids of instructions:

  • Transfer data between memory and register:
    • Load data from memory to register – %reg = Mem[address]
    • Store register data into memory – Mem[address] = %reg
  • Perform arithmetic function on register or memory data.                  Eg: c = a+b
  • Transfer Control:
    • Change the execution of program.
    • Unconditional jumps
    • Conditional branches

Moving data:
movX source, dest – where X is {b, w, l}

movl source, dest – move 4 bytes long word
movb source, dest – move 1 byte byte
movw source, dest – move 2 bytes word

Operands:

  • Immediate – constant int data. Eg: $0x400, &-533. All constants are prefixed with $.
  • Register – eg: %eax, %edx
  • Memory – 4 consecutive bytes of memory address given by register. Eg: (%eax) – this means that %eax stores an address.

There can be various operand combinations like:
Source: Immediate
Destination: Register – movl $0x4, %eax – var a = 0x4
Destination: Memory – movl $0x4, (%eax) – *p = 0x4

Source: Register
Destination: Register – movl %edx, %eax – var a = b
Destination: Memory – movl %edx, (%eax) – *p = b

Source: Moemory
Destination: Register – movl (%eax), %edx – var a = *p

We cannot do memory – memory transfer, so we do memory-register transfer followed by register-memory transfer.

Memory addressing modes:

  • Indirect: register contains an address and we use the content of register as an address to memory – Memory[Register[R]]
    Displacement – consider a constant D(R) which is added to the base register – Memory[Register[R] + D]

The general form is D(Rb, Ri, S) where D is the constant displacement, Rb is the base register, Ri is the index register(except for esp or eip) and scale is 1,2 4, 8 bytes.

Observations:

  • In the assemble code the instruction might be in different order from the C code.
  • Some expressions require multiple instructions.
  • We get the exact same code after compliling.

Conditionals and Control flow:

There are four conditon codes which can be set while evaluating some expression or executing branch:
CF – carry flag
ZF – Zero flag
SF – sign flag
OF – overflow flag

Java VS C

In Java pointers are known as References. So in C a pointer can point to an arbitrary address but in Java a reference can only point to the starting of an object/data structure.

Characters and Strings:

  • In Java there are two bytes per character. Also in C there is ASCII coding but in Java there is two byte unit code character which allows us to represent many more of world’s alphabets.
  • In C all strings end with a ” but not in Java. In Java the first four bytes are integer which store the string length. The same approach is used in arrays. An advantage of storing the array length is that when we try to access an address that is out of bounds Java will immediately throw an exception. So there is this extra bound check provided by Java which was not provided by C.
  • In C when an array is declared, all the array elements will store some random or garbage values. But in Java as soon as the array is declared all the array elements will store the value 0.

Objects:
In Java objects can have only primitive datatypes.
Eg: In C we can have int arr[3]
But in Java we must declare this as:
int arr = new int[3];

Creating object instances in Java:
Allocates space for data fields, adds pointer in object to virtual table.
Virtual table is shared across all objects in the class and includes space for static fields and pointers to method codes.
Returns reference to new object in memory

Basics of Architecture and Machine Code

ISA – Instruction set Architecture is a set of instructions that the processor makes available to the compiler.

Time required to execute a program depends:

  • The time required to write a program in C.
  • Compiler – set of assembler instructions it translates the C program
  • ISA – set of instructions that the processor makes available to the compiler
  • Hardware implementation – how much time it takes to execute an instruction

Instruction set Architecture(ISA) defines:

  • System state – register, memory, program counter
  • The instruction the CPU can execute. Eg: add, sub, mul
  • The effect that each of these instructions will have on the system state.

X86 clones – Advanced Micro Devices (AMD)
They followed Intel but is a little bit slower and a lot cheaper than Intel processor.

Architecture – the part of a processor design that one needs to understand to write assembly code. It is directly visible to the software. Eg: number of registers.

Micro-architecture: implementation of the architecture. Eg: cache size, core frequency

Assembly programmer’s view:

In the CPU side there are three main components:

  • PC – holds the address of the next instruction, also known as EIP.
  • Condition codes – stores certain values like if the overflow occured, for conditional branching
  • Registers

In the Memory side the components are:
Object code, program data, OS data, stack.

CPU sends address to memory and memory sends back the value stored at that address. CPU can also send and receive data to the memory. CPU will fetch instructions from object code that is stored in the memory.

Assembler:

  • Translates .s file to .o file.
  • Binary encoding of each instruction.
  • Missing links between code of different file.

Linker:

  • Compilers with static run-time libraries like malloc, printf.
  • Linking occurs when program begins execution.

Register:
A location in the memory that stores a small amount of data which can be accessed very quickly.