CSc 422, Program #2: Multithreaded Memory Allocator Due date: April 14, 2014 at 10:30am. No late assignments will be accepted. In this assignment we will investigate a simple implementation of a thread-safe malloc and free. This assignment must be done in C or C++. You are required to do the following, using a multicore machine from the 9th floor lab with at least four cores: 1. Develop sequential versions of malloc and free. Their prototypes are: void *myMalloc(int size); void myFree(void *ptr); There also is an API function myInit() that accepts as input the number of cores on which the user program will run. The user program must call myInit() before invoking myMalloc or myFree. Function myInit returns zero on success (-1 on failure, which includes any error that the existing malloc incurs) and has the prototype: int myInit(int numCores); The implementation that you must use will be straightforward. Specifically, on startup, use the system malloc to allocate a total of 256K bytes, not including any metadata that you use. We will assume that the user never needs any more than that; i.e., you will never need to increase the memory that your myMalloc returns to satisfy user requests. We will also assume that all myMalloc calls are either small, which is defined as 64 bytes or less, or large, which is defined as between 65 bytes and 1K, inclusive. Thus, you will always hand out either 64 or 1K byte chunks (plus any metadata). Therefore, take the 256K chunk and break it into two linked lists: one holding 64 byte chunks and one holding 1K. This means that you will need to use pointers; each of your chunks should be slightly larger than you will hand out. Specifically, the chunk should be a structure, consisting of either 64 or 1K bytes (depending on the list), along with a pointer to the next chunk and a flag indicating whether it is a small or large chunk. It is easiest to use a singly-linked list and maintain it as a stack. Figure 1 shows a picture of the free list (this is how we did it; you are not bound to this particular implementation, though you must manage your own memory). Note, though, that you may not allocate more than 256K of memory to be handed out to the user. The pointer freeListS points to the first free block; on initialization, that is the first block of the entire 128K allocation. You will need to initialize the pointers as shown in the picture. If you allocate the 128K and cast to a char *, then you can (relatively) 1 freeListS type (4) type (4) type (4) next (8) next (8) next (8) data (64) data (64) data (64) NULL Figure 1: One way to manage the free list. easily walk throught the structure and use pointer arithmetic to set up the next pointers as I have drawn in the figure. When you return a block to the user, make sure the pointer you return is to the data, and not to the type. (Otherwise, the user will overwrite your metadata through no fault of the user’s.) I will leave the rest to you to do. Make sure that you manage your next pointers correctly. Write a benchmark application that uses your myMalloc and myFree by repeatedly calling them (first myMalloc and then myFree) and accessing the data in between. 2. Develop coarse-grain, concurrent versions of myMalloc and myFree. This means that you must use one lock for the entire two routines. Measure performance on the benchmark above, using 1–4 cores, along with hyperthreaded mode at 4 cores. Multiply the value of 256K by 8 for this part. 3. Develop fine-grain, concurrent versions of myMalloc and myFree in which you create a free list of memory per core (of size 256K each, again split into two 128K lists as in part 1). All allocations should be done from the local list. Assume that you never run out of space. Again, measure performance on the benchmark above, using 1–4 cores, along with hyperthreaded mode at 4 cores (which means 8 logical cores). Assume that there will never be more than one thread per core. 4. Develop fine-grain, concurrent versions of myMalloc and myFree in which you create a free list of memory per core (of size 256K each; split into 128K lists as in parts 1 and 3). Additionally, keep a shared list, that is thread-safe, of size 256K; this means you should have one overflow list of 64 byte blocks and one of 1K byte blocks. Cores use the overflow list if they run out of space on their local list. Assume that you never run out of space on the backup list. Once again, measure performance on the benchmark above, using 1–4 cores, along with hyperthreaded mode on 4 cores (which means 8 logical cores). Assume that there will never be more than one thread per core. Notes First, do not turn in any project that has debugging print statements (or any other kinds of print statements). Second, myMalloc and myFree must be constant time operations. Third, to implement the multithreaded version of myMalloc, you may want to use following functions: 2 int pthread_key_create(pthread_key_t *key, void (*destructor)(void*)); int pthread_setspecific(pthread_key_t key, const void *value); void *pthread_getspecific(pthread_key_t key); The purpose of these functions is to determine which thread is invoking myMalloc or myFree. For more details, please see the following web page: http://www.opengroup.org/onlinepubs/7990989775/xsh/pthread.h.html. We also provide an example, sample pthread.c, that shows how these functions work. Testing You should compile each version of myMalloc, myFree and myInit into a static library, called libmymalloc1, libmymalloc2, libmymalloc3 and libmymalloc4, respectively, for each of the four parts. To test your code, you need to implement a driver function, in which main should be implemented, and link the testcase to your library. As mentioned previously, myInit is called in main and before any call to myMalloc or myFree. We provide a sample driver function in sample test.c. When we test your programs, we will be using our own driver files. Therefore, you must conform to this interface or you will fail our test cases. To compile, do the following, assuming that your library source file is libmymalloc.c, and the testcase source file is test.c: 1. Compile each source file into an object file, with no linking, via gcc -c libmymalloc.c [-lpthread]. Note that the -lpthread is not used on part 1. 2. Use the ar command to add the generated object files to an archive library file, via ar -rv libmymalloc.a libmymalloc.o 3. Link to your library, via gcc -o <test> <test>.c -Lpath -lmymalloc [-lpthread]. The path is the absolute path to the location of your library. Note that we have provided you with a makefile that performs the first two steps above; you are of course welcome to create your own makefile. Good News There is good news. First, there is no output format requirement, because we will use our own driver files. Second, the only “analysis” I am requiring is that you turn in a graph (name the file results.pdf), with the following information: the x-axis should be the number of cores from 1 to 8 (count hyperthreads as cores), the y-axis should be speedup relative to the sequential memory allocator (part 1 above), and there should be three lines on the graph (one for each version of the concurrent memory allocator [parts 2, 3, and 4 above]). 3 Submission Your submission should include all your source code and a Makefile on lectura, and assume that pointers are 8 bytes on the machines on which we will test your code. The Makefile should compile the source files into the four required libraries. The Makefile does not need to create any driver programs (you will need to, of course, create driver programs for your own testing purposes). For example, make mymalloc1 must create libmymalloc1.a. If you have a main function in your libraries, you will fail all test cases (linking will fail). The same will happen if you modify myMalloc.h. You should submit your assignment using the turnin command; for this program, use the assignment name cs422-s14-prog2. 4
© Copyright 2024 ExpyDoc