CSc 422, Program #2: Multithreaded Memory Allocator

CSc 422, Program #2: Multithreaded Memory Allocator
Due date: April 14, 2014 at 10:30am. No late assignments will be accepted.
In this assignment we will investigate a simple implementation of a thread-safe malloc
and free. This assignment must be done in C or C++.
You are required to do the following, using a multicore machine from the 9th floor lab
with at least four cores:
1. Develop sequential versions of malloc and free. Their prototypes are:
void *myMalloc(int size);
void myFree(void *ptr);
There also is an API function myInit() that accepts as input the number of cores
on which the user program will run. The user program must call myInit() before
invoking myMalloc or myFree. Function myInit returns zero on success (-1 on failure,
which includes any error that the existing malloc incurs) and has the prototype:
int myInit(int numCores);
The implementation that you must use will be straightforward. Specifically, on startup,
use the system malloc to allocate a total of 256K bytes, not including any metadata
that you use. We will assume that the user never needs any more than that; i.e., you
will never need to increase the memory that your myMalloc returns to satisfy user
requests. We will also assume that all myMalloc calls are either small, which is defined
as 64 bytes or less, or large, which is defined as between 65 bytes and 1K, inclusive.
Thus, you will always hand out either 64 or 1K byte chunks (plus any metadata).
Therefore, take the 256K chunk and break it into two linked lists: one holding 64 byte
chunks and one holding 1K. This means that you will need to use pointers; each of
your chunks should be slightly larger than you will hand out. Specifically, the chunk
should be a structure, consisting of either 64 or 1K bytes (depending on the list), along
with a pointer to the next chunk and a flag indicating whether it is a small or large
chunk. It is easiest to use a singly-linked list and maintain it as a stack.
Figure 1 shows a picture of the free list (this is how we did it; you are not bound to
this particular implementation, though you must manage your own memory). Note,
though, that you may not allocate more than 256K of memory to be handed
out to the user.
The pointer freeListS points to the first free block; on initialization, that is the first
block of the entire 128K allocation. You will need to initialize the pointers as shown in
the picture. If you allocate the 128K and cast to a char *, then you can (relatively)
1
freeListS
type (4)
type (4)
type (4)
next (8)
next (8)
next (8)
data (64)
data (64)
data (64)
NULL
Figure 1: One way to manage the free list.
easily walk throught the structure and use pointer arithmetic to set up the next pointers
as I have drawn in the figure. When you return a block to the user, make sure the
pointer you return is to the data, and not to the type. (Otherwise, the user
will overwrite your metadata through no fault of the user’s.) I will leave the rest to
you to do. Make sure that you manage your next pointers correctly.
Write a benchmark application that uses your myMalloc and myFree by repeatedly
calling them (first myMalloc and then myFree) and accessing the data in between.
2. Develop coarse-grain, concurrent versions of myMalloc and myFree. This means that
you must use one lock for the entire two routines. Measure performance on the benchmark above, using 1–4 cores, along with hyperthreaded mode at 4 cores. Multiply
the value of 256K by 8 for this part.
3. Develop fine-grain, concurrent versions of myMalloc and myFree in which you create
a free list of memory per core (of size 256K each, again split into two 128K lists as in
part 1). All allocations should be done from the local list. Assume that you never run
out of space. Again, measure performance on the benchmark above, using 1–4 cores,
along with hyperthreaded mode at 4 cores (which means 8 logical cores). Assume that
there will never be more than one thread per core.
4. Develop fine-grain, concurrent versions of myMalloc and myFree in which you create a
free list of memory per core (of size 256K each; split into 128K lists as in parts 1 and
3). Additionally, keep a shared list, that is thread-safe, of size 256K; this means you
should have one overflow list of 64 byte blocks and one of 1K byte blocks. Cores use
the overflow list if they run out of space on their local list. Assume that you never run
out of space on the backup list. Once again, measure performance on the benchmark
above, using 1–4 cores, along with hyperthreaded mode on 4 cores (which means 8
logical cores). Assume that there will never be more than one thread per core.
Notes
First, do not turn in any project that has debugging print statements (or any other kinds of
print statements). Second, myMalloc and myFree must be constant time operations.
Third, to implement the multithreaded version of myMalloc, you may want to use following functions:
2
int pthread_key_create(pthread_key_t *key, void (*destructor)(void*));
int pthread_setspecific(pthread_key_t key, const void *value);
void *pthread_getspecific(pthread_key_t key);
The purpose of these functions is to determine which thread is invoking myMalloc or
myFree. For more details, please see the following web page:
http://www.opengroup.org/onlinepubs/7990989775/xsh/pthread.h.html. We also provide an example, sample pthread.c, that shows how these functions work.
Testing
You should compile each version of myMalloc, myFree and myInit into a static library, called
libmymalloc1, libmymalloc2, libmymalloc3 and libmymalloc4, respectively, for each of the
four parts. To test your code, you need to implement a driver function, in which main should
be implemented, and link the testcase to your library. As mentioned previously, myInit is
called in main and before any call to myMalloc or myFree. We provide a sample driver
function in sample test.c.
When we test your programs, we will be using our own driver files. Therefore, you must
conform to this interface or you will fail our test cases.
To compile, do the following, assuming that your library source file is libmymalloc.c, and
the testcase source file is test.c:
1. Compile each source file into an object file, with no linking, via gcc -c libmymalloc.c
[-lpthread]. Note that the -lpthread is not used on part 1.
2. Use the ar command to add the generated object files to an archive library file, via ar
-rv libmymalloc.a libmymalloc.o
3. Link to your library, via gcc -o <test> <test>.c -Lpath -lmymalloc [-lpthread].
The path is the absolute path to the location of your library.
Note that we have provided you with a makefile that performs the first two steps above; you
are of course welcome to create your own makefile.
Good News
There is good news. First, there is no output format requirement, because we will use our
own driver files. Second, the only “analysis” I am requiring is that you turn in a graph (name
the file results.pdf), with the following information: the x-axis should be the number of
cores from 1 to 8 (count hyperthreads as cores), the y-axis should be speedup relative to the
sequential memory allocator (part 1 above), and there should be three lines on the graph
(one for each version of the concurrent memory allocator [parts 2, 3, and 4 above]).
3
Submission
Your submission should include all your source code and a Makefile on lectura, and assume
that pointers are 8 bytes on the machines on which we will test your code. The Makefile
should compile the source files into the four required libraries. The Makefile does not need
to create any driver programs (you will need to, of course, create driver programs for your
own testing purposes). For example, make mymalloc1 must create libmymalloc1.a. If you
have a main function in your libraries, you will fail all test cases (linking will fail). The same
will happen if you modify myMalloc.h.
You should submit your assignment using the turnin command; for this program, use
the assignment name cs422-s14-prog2.
4