Diss. ETH No. 23504 Some Extremal Problems about Saturation and Random Graphs A thesis submitted to attain the degree of Doctor of Sciences of ETH Zurich (Dr. sc. ETH Zurich) presented by Dániel Korándi MASt in Mathematics, University of Cambridge born on February 23, 1990 citizen of Hungary. accepted on the recommendation of Prof. Dr. Benny Sudakov, Examiner Prof. Dr. Jacques Verstraëte, Co-Examiner 2016 Abstract Extremal combinatorics is a major branch of discrete mathematics that studies problems looking for the maximum or minimum size of combinatorial structures satisfying certain properties. Saturation is one of the oldest themes in the area and considers questions of the following form: What is the minimum size of a structure that cannot be extended without creating a forbidden substructure? Another important field is probabilistic combinatorics, that has grown out of extremal combinatorics and studies random combinatorial objects. The Erdős-Rényi random graph G(n, p) is of particular interest, and it is a recent trend in the area to look at classic extremal questions restricted to random graphs. This thesis contributes to the study of saturation problems and of extremal questions in random graphs, including the combination of them: saturation-type problems in random graphs. In Chapter 2 we look at a saturation problem in bipartite graphs and settle a conjecture of Moshkovitz and Shapira in the asymptotic sense. In Chapter 3 we consider the classic saturation problem in random graphs and obtain some tight results. In Chapter 4 we use the differential equation method to analyze a random process that is closely related to weak saturation. As an application, we improve previous results about the fundamental group of random simplicial complexes. In Chapter 5 we look at a classic edge decomposition problem in the random graph setting, and prove some asymptotically correct estimates. ii Zusammenfassung Die extremale Kombinatorik ist ein bedeutender Zweig der diskreten Mathematik. Sie befasst sich mit Fragestellungen über die minimale oder maximale Größe einer kombinatorischen Struktur mit bestimmten geforderten Eigenschaften. Sättigung ist eines der ältesten Themen in diesem Gebiet und studiert Probleme der folgenden Art: Was ist die minimale Größe einer Struktur, die nicht erweitert werden kann ohne eine verbotene Teilstruktur zu erzeugen? Ein weiteres wichtiges Feld ist die probabilistische Kombinatorik, die sich aus der extremalen Kombinatorik heraus entwickelt hat und zufällige kombinatorische Objekte studiert. Hierbei ist der Erdős-Rényi Zufallsgraph G(n, p) von besonderem Interesse. Dies spiegelt sich unter anderem im aktuellen Trend wieder, klassische Fragen der extremalen Kombinatorik auf Zufallsgraphen zu beschränken. Diese Dissertation trägt zur Untersuchung von Sättigungsproblemen und von extremalen Problemen in Zufallsgraphen bei und verbindet diese durch die Untersuchung von Sättigung in Zufallsgraphen. In Kapitel 2 betrachten wir ein Sättigungsproblem in bipartiten Graphen und beweisen eine Vermutung von Moshkovitz und Shapira im asymptotischen Sinn. In Kapitel 3 untersuchen wir das klassische Sättigungsproblem in Zufallsgraphen und erzielen einige scharfe Resultate. In Kapitel 4 benutzen wir die Differentialgleichungsmethode um einen Zufallsprozess zu analysieren, der eng mit der schwachen Sättigung in Graphen zusammenhängt. Diese Analyse verwenden wir, um frühere Resultate über die Fundamentalgruppe von zufällig ausgewählten Simplizialkomplexen zu verbessern. In Kapitel 5 betrachten wir ein klassisches Problem der Kantenzerlegung im Kontext von Zufallsgraphen und beweisen einige asymptotisch korrekte Abschätzungen. iii Acknowledgements First and foremost, I want to thank my advisor, Benny Sudakov, for all his advice and guidance that helped me take my first steps as a mathematician. Doing a PhD with Benny was probably one of the best decisions of my life. I am just as indebted to my parents and brothers. Without their support and encouragement none of this would have been possible. Special thanks go to all the great math teachers that I could learn from and to Judit Csath for teaching me how English is done. I have had the pleasure of collaborating with many people, and for that I am grateful to my co-authors, Victor Falgas-Ravry, Wenying Gan, Ping Kittipassorn, Michael Krivelevich, Shoham Letzter, Bhargav Narayanan and Yuval Peled. Doing a PhD would not have been so fun without all the people surrounding me. Many thanks to Igor Balla, Shagnik Das, Felix Dräxler, Roman Glebov, Nina Kamčev, Matt Kwan, Humberto Naves, Alexey Pokrovskiy, Pedro Vieira, Jan Volec, Frank Mousset, Rajko Nenadov, Nemanja Škorić, Corentin Perret, Laci Lovász, Katie Arterburn, Zach Norwood and all my other friends from ETH, UCLA and elsewhere just for being there and talking to me. And Ilcsi. Thank you for your love and eternal patience. Chapter 2 is a version of the paper “Ks,t -saturated bipartite graphs” that I worked on with Wenying Gan and Benny Sudakov. It was published in the European Journal of Combinatorics, Volume 45 (2015). Chapter 3 is joint work with Benny Sudakov and is essentially the same as the paper “Saturation in random graphs” that is now accepted for publication in Random Structures & Algorithms. In Chapter 4, I included collaborative work with Yuval Peled and Benny Sudakov that was published as “A random triadic process” in the SIAM Journal on Discrete Mathematics, Volume 30 iv (2016). My contribution to this result was the combinatorial analysis of the discrete random process in question. Finally, Chapter 5 is an adaptation of the paper “Decomposing random graphs into few cycles and edges”, published in Combinatorics, Probability and Computing, Volume 24 (2015). v Contents Contents vi 1 Introduction 1 2 Saturation in bipartite graphs 2.1 Introduction . . . . . . . . . . . . . . . . 2.2 Lower bounds on the saturation number 2.3 Extremal graphs . . . . . . . . . . . . . 2.4 The K2,3 case . . . . . . . . . . . . . . . 2.5 Further remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 8 14 15 20 3 Saturation in random graphs 3.1 Introduction . . . . . . . . . 3.2 Strong saturation . . . . . . 3.2.1 Lower bound . . . . 3.2.2 Upper bound . . . . 3.3 Weak saturation . . . . . . . 3.4 Further remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 24 25 26 31 35 4 A random triadic process 4.1 Introduction . . . . . . . . . . . . 4.1.1 Proof outline . . . . . . . 4.2 The differential equation method 4.3 Calculations . . . . . . . . . . . . 4.3.1 Tools . . . . . . . . . . . . 4.3.2 Degrees . . . . . . . . . . 4.3.3 Open triples . . . . . . . . 4.3.4 3-walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 37 39 40 44 46 51 52 53 vi . . . . . . . . . . . . CONTENTS 4.4 4.5 4.3.5 4-walks . . . . . . 4.3.6 Codegrees . . . . The second phase . . . . 4.4.1 The lower bound 4.4.2 The upper bound Further remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Decomposing random graphs into few cycles and edges 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Proof outline . . . . . . . . . . . . . . . . . . . . . 5.2 Covering the odd-degree vertices . . . . . . . . . . . . . . . 5.3 Cycle decompositions in sparse random graphs . . . . . . . 5.4 The main ingredients for the dense case . . . . . . . . . . . 5.5 Cycle-edge decompositions in dense random graphs . . . . 5.6 Further remarks . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . 55 57 57 58 60 63 . . . . . . . 64 64 65 66 72 75 79 81 82 vii Chapter 1 Introduction Extremal combinatorics is a major branch of discrete mathematics that has been actively developed for over 80 years. It studies problems looking for the maximum or minimum size of combinatorial structures satisfying certain properties. Such questions are often motivated by other fields of mathematics and science, when the optimization of certain combinatorial quantities is needed. A nice example of an extremal problem in non-mathematical terms is the well-known puzzle that asks how many knights can be placed on the chess board in a way that no two of them attack each other. This puzzle fits a more general framework of extremal questions, called Turán theory, where one seeks to maximize the size of a structure avoiding certain forbidden substructures. The cornerstone result in this area is Turán’s theorem [52] that determines the maximum number of edges in graphs not containing any large cliques. One way to approach such a problem is to think about the corresponding process where we are trying to extend our structure step-by-step (adding knights or edges) in such a way that we never create a forbidden substructure (a pair of attacking knights or large cliques). Then we want to see how far we can get with this process. This leads us to another interesting question: How soon can we get stuck? What is the minimum size of a structure that cannot be extended without creating a forbidden substructure? Questions of this sort are called saturation problems. Saturation was first considered by Zykov [55] and by Erdős, Hajnal and Moon [24], and has proved to be a very fruitful subject to study. Its development has revealed close ties to percolation theory, and saturation 1 1. INTRODUCTION problems have been instrumental in motivating the development of various tools in combinatorics, most notably the linear algebraic method. Probabilistic combinatorics is an area that has grown out of extremal combinatorics. It studies random combinatorial objects such as G(n, p), the Erdős-Rényi random graph on n vertices, where each pair of vertices is connected by an edge with probability p, independently of the others. Such random objects first appeared as probabilistic constructions for extremal questions. For example, the implicit use of G(n, 1/2) was key to a fundamental result of Erdős [21] on Ramsey theory from 1947, where he proved the existence of graphs containing no cliques or independent sets of super-logarithmic size. The idea of using randomness in combinatorial constructions revolutionized extremal combinatorics. The study of random graph models for their own sake was initiated by Erdős and Rényi [25] in 1959. They investigated the structure and size of the connected components of G(n, p) as the edge probability p increases from 0 to 1. In particular, they found the probability threshold where the random graph becomes connected. This paper led to intensive research that determined the threshold for many other structural properties, as well. We refer the interested reader to the monographs of Bollobás [12] and Janson, Luczak and Ruciński [33]. A more recent trend is to look at random analogs of classic extremal problems. Probably the first result of this kind appears in a 1986 paper of Frankl and Rödl [30], where (again, motivated by a question from extremal combinatorics) they find the maximum number of edges in a triangle-free subgraph of G(n, p). The systematic study of similar problems began in the mid-1990s. For more examples, see the survey of Rödl and Schacht [48]. This dissertation contributes to the study of saturation problems and of extremal questions in random graphs, including the combination of the two themes: saturation-type problems in random graphs. Below, we give a short description of the topics covered. More detail can be found in the introduction of the respective chapter. For two graphs H and F , H is said to be F -saturated if it contains no copy of F as a subgraph, but the addition of any edge missing from H creates a copy of F in H. Or in other words, if H is a maximal F free graph. An F -free graph is weakly F -saturated if the missing edges can be added back in some order such that every added edge creates a 2 1. INTRODUCTION new copy of F (possibly using the previously added edges). Clearly, any F -saturated graph is also weakly F -saturated. The saturation numbers sat(n, F ) and w-sat(n, F ) are defined to be the minimum number of edges in an F -saturated and weakly F -saturated graph on n vertices, respectively. Probably the most natural question one might ask here is the saturation number of cliques. Zykov [55] and Erdős, Hajnal and independently Moon [24] showed that sat(n, Ks ) = n2 − n−s+2 . Somewhat surprisingly, 2 the weak saturation number w-sat(n, Ks ) turns out to be the same as the saturation number [35, 36]. In their paper, Erdős, Hajnal and Moon also suggested the following bipartite variant of the problem. We say that an n-by-n bipartite graph is F -saturated if the addition of any missing edge between its two parts creates a new copy of F . What is the minimum number of edges in a Ks,s -saturated bipartite graph? This question was answered independently by Wessel [53] and Bollobás [10] in a more general, but ordered, setting: they showed that the minimum number of edges in a K(s,t) -saturated bipartite graph is n2 −(n−s+1)(n−t+ 1), where K(s,t) is the “ordered” complete bipartite graph with s vertices in the first color class and t vertices in the second. However, the very natural question of determining the minimum number of edges in the usual, unordered, Ks,t -saturated case remained unsolved. This problem was considered recently by Moshkovitz and Shapira who also conjectured what its answer should be. In Chapter 2 we give an asymptotically tight bound on the minimum number of edges in a Ks,t -saturated bipartite graph, which is only smaller by an additive constant than the conjecture of Moshkovitz and Shapira. We also prove their conjecture for K2,3 -saturation, which was the first open case. Saturation can also be defined more generally in an arbitrary host graph G, in which case we say that a graph is F -saturated in G if it is a maximal F -free subgraph of G. Using this terminology, Chapter 2 treats saturation in Kn,n , but many other host graphs (e.g., complete multipartite graphs and hypercubes) have also been considered in the past decades. The work presented in Chapter 3 initiates the study of saturation in random graphs. For constant probability p, we give asymptotically tight estimates on the saturation number of cliques in G(n, p), and determine the exact value of the weak saturation number in the same setting. This research fits nicely with previous results. Indeed, the opposite of the saturation problem, the Turán problem in G(n, p) and its complete so3 1. INTRODUCTION lution has attracted significant attention in recent years (see [48]). On the other hand, weak saturation can also be thought of as a kind of bootstrap percolation. Then our question of finding the smallest percolating subgraph of G(n, p) is closely related to a problem that Balogh, Bollobás and Morris [4] looked at: they were interested in the smallest p for which G(n, p) percolates in Kn . A random simplicial 2-dimensional complex Y2 (n, p), as introduced by Linial and Meshulam [41] in 2006, is a higher dimensional analog of the Erdős-Rényi random graph. It is defined to have n vertices, all the n2 edges, and each triangular face independently with probability p. Combinatorial and topological properties of random complexes have been the subject of active research in the past decade, with tools coming from various fields of mathematics. One of the questions raised in the original paper of Linial and Meshulam was how the fundamental group of Y2 (n, p) behaves. In particular, what is the threshold where the complex becomes simply connected. This question was studied by Babson, Hoffman and Kahle [3], who provided lower and upper bounds for the threshold. In Chapter 4, we improve √ their upper bound by a log n factor. We have some reason to believe that our estimate is asymptotically correct, although the lower bound in [3] is far away from it. We prove our result by exhibiting a contractible subcomplex in Y2 (n, p) when p is above the threshold that contains the complete 1-skeleton. This is enough to show that the complex is simply connected. We do so by analyzing the following graph process, that is yet another instance of weak triangle-saturation with random restrictions. Let H = H(n, p) be an underlying 3-uniform random hypergraph on n vertices where each triple independently appears with probability p. Our process starts with the star G0 on the same vertex set, containing all the edges incident to some fixed vertex v0 . Then we repeatedly add an edge xy if there is a vertex z such that xz and zy are already in the graph and xzy ∈ H. We say that the process propagates if it reaches the complete graph before it terminates. In Chapter 4 we prove that the threshold probability for propagation is p = 2√1 n . Problems about packing and covering the edge set of a graph using certain subgraphs form a wide subfield of extremal combinatorics, one that has been intensively studied since the 1960s. One of the oldest conjectures in this area, made by Erdős and Gallai [23], claims that the edges of ev4 1. INTRODUCTION ery graph on n vertices can be decomposed into O(n) cycles and edges. Although an O(n log n) upper bound is not hard to show, the first significant step towards the solution was only made recently by Conlon, Fox and Sudakov [18]. They also showed that the conjecture holds for the random graph G(n, p). In Chapter 5 we look at the minimum for random graphs more closely. Note that any odd-degree vertex needs to be the endpoint of at least one edge. As typically about half of the vertices in G(n, p) have odd degree, this shows that we need at least n/4edges in our decomposition. On the other hand, G(n, p) contains about n2 p edges and any cycle covers at most n of them, so we also need at least np/2 cycles. Our result shows that this lower bound is asymptotically correct, as G(n, p) can indeed be decomposed into + o(n) cycles and edges. a union of n4 + np 2 5 Chapter 2 Saturation in bipartite graphs 2.1 Introduction For two graphs G and F , G is said to be F -saturated if it contains no copy of F as a subgraph, but the addition of any edge missing from G creates a copy of F in G. The saturation number sat(n, F ) is defined as the minimum number of edges in an F -saturated graph on n vertices. Note that the maximum number of edges in an F -saturated graph is exactly the extremal number ex(n, F ), so the saturation problem of finding sat(n, F ) is in some sense the opposite of the Turán problem. Probably the most natural setup of this problem is when we choose F to be a fixed complete graph Ks . This was first studied by Zykov [55] in the 1940’s, and later by Erdős, Hajnal and Moon [24] in 1964. They proved that s−1 sat(n, Ks ) = (s − 2)n − 2 . Here the upper bound comes from the Ks saturated graph that has s−2 vertices connected to all other vertices. Later, the closely related notion of weak saturation was introduced by Bollobás [11]. A graph G is weakly F -saturated if it is possible to add back the missing edges of G one by one in some order, so that each addition creates a new copy of F . Trivially, if G is F -saturated, then any order satisfies this property, hence G is also weakly F -saturated. Let w-sat(n, Ks ) be the minimum number of edges in an n-vertex graph that is weakly Ks -saturated. We then have w-sat(n, Ks ) ≤ sat(n, Ks ). Somewhat surprisingly, one can prove using algebraic techniques (see e.g. [42]) that these two functions are actually equal. On the other hand, the extremal graphs for these problems are not the same, and already for s = 3 there are weakly K3 -saturated 6 2.1. INTRODUCTION graphs (i.e., trees) which are not K3 -saturated. The paper by Erdős, Hajnal and Moon also introduced the bipartite saturation problem, where we are looking for the minimum number of edges sat(Kn,n , F ) in an F -free n-by-n bipartite graph, such that adding any missing edge between the two color classes creates a new copy of F . (Of course, this definition is only meaningful if F is also bipartite.) They conjectured that sat(Kn,n , Ks,s ) = n2 − (n − s + 1)2 . Once again, this is seen to be tight by selecting s−1 vertices on each side of the bipartite graph and connecting them to every vertex on the opposite side. In the bipartite setting, one can impose an additional restriction on the problem by ordering the two vertex classes of F and requesting that each missing edge create an F respecting the order: the first class of F lies in the first class of G. For example, let K(s,t) be the complete “ordered” sby-t bipartite graph with s vertices in the first class and t vertices in the second, then a bipartite graph G is K(s,t) -saturated if each missing edge creates a Ks,t with the s-vertex class lying in the first class of G. Indeed, the conjecture of Erdős, Hajnal and Moon was independently confirmed by Wessel [53] and Bollobás [10] a few years later as the special case of the following result: sat(K(n,n) , K(s,t) ) = n2 − (n − s + 1)(n − t + 1). This was further generalized in the 80s by Alon [1] to complete k-uniform hypergraphs in a k-partite setting using algebraic tools. Alon showed that the saturation and weak saturation bounds are the same in this case as well. For a more detailed discussion of F -saturation in general, we refer the reader to the survey [27] by Faudree, Faudree and Schmitt. In this chapter we study the unordered case of bipartite saturation. Although this is arguably the most natural setting for the bipartite problem, it did not receive any attention until very recently in [5, 43]. Moshkovitz and Shapira [43] studied the unordered weak saturation number of Ks,t , s ≤ t, and showed that w-sat(Kn,n , Ks,t ) = (2s − 2 + o(1))n. Note that, surprisingly, it is much smaller than the corresponding ordered saturation number and only depends on the size of the smaller part. One might think that a similar gap exists for saturation numbers as well. Moshkovitz and Shapira conjectured that this is not the case, and that ordered and unordered bipartite saturation numbers differ only by an additive constant. More precisely, they made the following conjecture and constructed an example showing that, if true, this bound is tight. 7 2.2. LOWER BOUNDS ON THE SATURATION NUMBER Conjecture 2.1.1. Let 1 ≤ s ≤ t be integers. Then there is an n0 such that if n ≥ n0 and G is a Kjs,t -saturated n-by-n bipartite graph, then G contains k s+t−2 2 edges. at least (s + t − 2)n − 2 Our main result confirms the above conjecture up to a small additive constant. Theorem 2.1.2. Let 1 ≤ s ≤ t be fixed and n ≥ t. Then sat(Kn,n , Ks,t ) ≥ (s + t − 2)n − (s + t − 2)2 . The proof is presented in Section 2.2. In Section 2.3, we show that if the conjecture is true, it has many extremal examples. Finally, in Section 2.4, we prove Conjecture 2.1.1 in the first open case of K2,3 -saturation. 2.2 Lower bounds on the saturation number Let G be a bipartite graph with vertex class U and U 0 of size n. Assume 1 ≤ s ≤ t ≤ n and suppose G is Ks,t -saturated, i.e. each missing edge between U and U 0 creates a new K(s,t) or a new K(t,s) when added to G. Here K(a,b) refers to a complete bipartite graph with a vertices in U and b vertices in U 0 . Let us start with the following, easy special case of Theorem 2.1.2. Proposition 2.2.1. Suppose a Ks,t -saturated graph has minimum degree δ < t − 1. Then it contains at least n(t + s − 2) − (s + t − 2)2 edges. Proof. The Ks,t -saturated property ensures that each vertex has at least s − 1 neighbors, so we actually have s − 1 ≤ δ < t − 1. Let u0 be a vertex of degree δ; we may assume that u0 ∈ U . Then adding any missing edge u0 u0 to G (where u0 ∈ U 0 − N (u0 )) should create a new K(t,s) because it cannot create a K(s,t) . For such a u0 , let Su0 ⊆ U be the t − 1 vertices other than u0 in the t-class of this K(t,s) , and define V ⊆ U to be the union of these Su0 . Then all vertices in V have at least s − 1 neighbors in N (u0 ) and all vertices in U 0 − N (u0 ) have at least t − 1 neighbors in V . Now we can count 8 2.2. LOWER BOUNDS ON THE SATURATION NUMBER the number of edges in G as follows: e(U, U 0 ) = e(V, N (u0 )) + e(V, U 0 − N (u0 )) + e(U − V, U 0 ) ≥ (s − 1)|V | + (t − 1)(n − |N (u0 )|) + δ(n − |V |) ≥ (s − 1)|V | + (t − 1)(n − t + 2) + (s − 1)(n − |V |) ≥ n(t + s − 2) − (t − 1)(t − 2) ≥ n(t + s − 2) − (s + t − 2)2 . The case when δ ≥ t−1 is considerably more complicated. We introduce the following structure to count the edges of G (see Figure 1). The core of this structure is a set Ã0 = A0 ∪ A00 with A0 ⊆ U and A00 ⊆ U 0 satisfying the following technical property: • there are vertices u0 ∈ A0 and u00 ∈ A00 such that their neighborhoods are also contained in the core. Next, we build the shell around the core: starting with à = Ã0 , we iteratively add any vertex v to à that has at least t − 1 neighbors in it. In other words, à = A ∪ A0 is the smallest set containing Ã0 such that any vertex v ∈ G − à has fewer than t − 1 neighbors in Ã. Here A0 ⊆ A ⊆ U and A00 ⊆ A0 ⊆ U 0 . We use the variables x0 = |A0 |, x00 = |A00 |, x = |A| and x0 = |A0 | to denote the sizes of the corresponding sets. Obviously x0 ≤ x and x00 ≤ x0 . The following, rather scary, lemma is the key to our lower bounds on the saturation numbers. It shows that we can find about n(s + t − 2) edges in a Ks,t -saturated graph, provided we have a small enough core. Lemma 2.2.2. Assuming δ ≥ t − 1, suppose the core spans e = e(A0 , A00 ) edges. Then G has at least (s − 1)2 0 + e + min{(t − s)x, (t − s)x0 } n(s + t − 2) − (x0 + x0 )(t − 1) − 4 edges. Proof. By the construction of Ã, we know that it spans at least e + (t − 1)(x + x0 − x0 − x00 ) edges. Indeed, each vertex we added to the shell brings 9 2.2. LOWER BOUNDS ON THE SATURATION NUMBER C2 C20 dV −B2 < s − 1 C0 C C1 C10 B2 B20 dV −B2 ≥ s − 1 dB < t − s B0 B B1 B10 A0 A00 u0 u00 A dA < s − 1 s − 1 ≤ dA < t − 1 dB ≥ t − s A0 Figure 2.1: the structure for counting the edges 10 2.2. LOWER BOUNDS ON THE SATURATION NUMBER at least t − 1 new edges. The idea is to count t − 1 edges from the remaining vertices on one side of the graph, say U 0 − A0 , and then to find s − 1 new (yet uncounted) edges from the other side, U − A. Of course, if a vertex in U − A has at least s − 1 neighbors in A0 , then these edges are guaranteed to be new. So let us continue with our definition of the structure. We know that any vertex in U − A has fewer than t − 1 neighbors in A0 . We break this set into two parts by defining B to be the set of vertices in U − A having at least s − 1 neighbors in A0 , and C to be those having fewer than s − 1 neighbors in A0 . Similarly we break U 0 − A0 into two sets B 0 and C 0 based on the size of the neighborhood in A. We need to break B further into two parts B1 and B2 , by defining B1 to be the set of vertices having at least t − s neighbors in B 0 . Similarly, let B10 be the set of vertices in B 0 having at least t − s neighbors in B (see Figure 1). Note that any vertex in B10 already has t − 1 neighbors in A ∪ B (at least s − 1 in A and at least t − s in B), but this is not necessarily true for B20 . This, together with our strategy to find s − 1 new edges from the vertices in C motivates our last partitioning: We now break C into two parts C1 and C2 , where C1 is the set of those vertices in C which have at least s − 1 neighbors outside B20 , and C2 = C − C1 . We similarly define C10 = {v ∈ C 0 : |N (v) − B2 | ≥ s − 1}, where N (v) is the neighborhood of v, and C20 = C 0 − C10 . An observation here, which will prove to be crucial when counting the edges, is that C2 and C20 span a complete bipartite graph. Indeed, suppose there is a missing edge vv 0 in G, with v ∈ C2 and v 0 ∈ C20 . Adding this edge creates a K(s,t) or a K(t,s) , suppose it is a K(s,t) . Then v 0 is connected to all the s − 1 vertices other than v in the s-vertex class of this K(s,t) . But v 0 is in C20 , so it has at most s − 2 neighbors outside B2 , consequently there is a vertex w ∈ B2 in the s-class. Similarly, using that v is in C2 , we find at least t − s vertices of the t-class in B20 . But then w ∈ B2 has at least t − s neighbors in B20 ⊆ B 0 , which contradicts the definition of B2 . The same argument leads to a contradiction if the edge creates a K(t,s) , hence we can conclude that there is no missing edge between C2 and C20 . On another note, observe that adding the edge u0 v 0 , where u0 is the vertex in A0 defined in the property of the core and v 0 is any vertex in C 0 , cannot create a K(s,t) . Indeed, if it created a K(s,t) , then all the vertices of the t-class except v 0 are neighbors of u0 , so they are sitting in the core, A00 . This means that each vertex in the s-class is connected to at least t − 1 11 2.2. LOWER BOUNDS ON THE SATURATION NUMBER vertices in the core, hence the whole s-class is in A. But then v 0 has at least s − 1 neighbors in A, contradicting v 0 ∈ C 0 . So we see that adding u0 v 0 creates a K(t,s) . Then, all the vertices of the s-class of this copy of K(t,s) except v 0 are in A00 , therefore the vertices of the t-class have at least s − 1 neighbors in A0 . Hence all of them are in A ∪ B, implying that every v 0 ∈ C 0 has at least t − 1 neighbors in A ∪ B. The same argument shows that each v ∈ C has at least t − 1 neighbors in A0 ∪ B 0 . Lemma 2.2.2 will now follow from the following claim, possibly applied to the graph with the two vertex classes switched. Claim 2.2.3. Assuming δ ≥ t − 1, suppose |C2 | ≤ |C20 |. Then (s − 1)2 0 0 e(U, U ) ≥ n(s + t − 2) − (x0 + x0 )(t − 1) − + e + (t − s)x. 4 Proof. Let y = |C2 | and y 0 = |C20 |, and let us count the edges in G. We noted above that each vertex in B10 has at least t − 1 neighbors in A ∪ B, so e(A ∪ B, B10 ) ≥ (t − 1)|B10 |. By assumption, each vertex in B20 has degree at least t − 1, hence e(A ∪ B ∪ C, B20 ) ≥ (t − 1)|B20 |. We have also shown that each vertex in C 0 has at least t − 1 neighbors in A ∪ B, so e(A ∪ B, C 0 ) ≥ (t − 1)|C 0 |. This so far means that e(A ∪ B, B10 ) + e(A ∪ B ∪ C, B20 ) + e(A ∪ B, C 0 ) ≥ (t − 1)(n − x0 ). (2.1) Now look at what we have left from the other side: By definition, any vertex in B has at least s − 1 neighbors in A0 , so e(B, A0 ) ≥ (s − 1)|B|. We also defined C1 so that its vertices have at least s − 1 neighbors outside B20 , this gives e(C1 , A0 ∪ B10 ∪ C 0 ) ≥ (s − 1)|C1 |. As we noted above, the vertices of C2 are all connected to the of C20 , so e(C2 , C20 ) = yy 0 . Using the j vertices k 2 fact that y(s − 1 − y) ≤ (s−1) (y is an integer), we get that 4 e(B, A0 ) + e(C1 ,A0 ∪ B10 ∪ C 0 ) + e(C2 , C20 ) ≥ (s − 1)(n − x − y) + yy 0 ≥ (s − 1)(n − x) − (s − 1)y + y 2 (s − 1)2 . ≥ (s − 1)(n − x) − 4 (2.2) We have also seen that e(A, A0 ) is at least e + (t − 1)(x + x0 − x0 − x00 ). 12 2.2. LOWER BOUNDS ON THE SATURATION NUMBER It is easy to check that we never counted an edge more than once above, hence e(U, U 0 ) ≥ (t − 1)(n − x0 ) + (s − 1)(n − x) 0 + (t − 1)(x + x − x0 − x00 ) (s − 1)2 +e− 4 = n(t + s − 2) + (t − s)x − (x0 + x00 )(t (s − 1)2 , − 1) + e − 4 what we wanted to show. We state the following immediate corollary of this claim, which we need in Section 4. Corollary 2.2.4. If we have equality in Claim 2.2.3, then the following statements hold: • any vertex in B10 ∪ C 0 has exactly t − 1 neighbors in A ∪ B, • any vertex in B has exactly s − 1 neighbors in A0 , • the vertices in C1 have exactly s − 1 neighbors outside B20 , and • y(s − 1) − yy 0 = b(s − 1)2 /4c. Now we are ready to prove our general theorem, which is tight up to an additive constant. Let us emphasize, however, that since our methods do not give the exact result, we will not make any effort to optimize the constant error term. Theorem 2.2.5. If G = (U, U 0 , E) is a Ks,t -saturated bipartite graph with n vertices on each side, then it contains at least (s + t − 2)n − (s + t − 2)2 edges. Proof. Following Lemma 2.2.2, our plan is to find an appropriate core. By Proposition 2.2.1, we may assume that the minimum degree of our graph is at least t−1. Suppose for contradiction that G contains fewer than (s+t−2)n−(s+t−2)2 edges. Then there is a vertex u0 ∈ U of degree at most s+t−3. Moreover, there is a non-adjacent vertex u00 ∈ U 0 −N (u0 ) of degree at most s + t − 3 as well, since otherwise the number of edges in G would be 13 2.3. EXTREMAL GRAPHS at least (n − (s + t − 3))(s + t − 2) > (s + t − 2)n − (s + t − 2)2 , contradicting our assumption. Set A0 = {u0 } ∪ N (u00 ) and A00 = {u00 } ∪ N (u0 ), and define Ã0 = A0 ∪ A00 to be the core. Using the above notation, we see that x0 = |A0 | = 1+|N (u00 )| ≤ s+t−2 and x00 = |A00 | = 1 + |N (u0 )| ≤ s + t − 2. Since u0 and u00 are not adjacent, we can add the edge u0 u00 to create a new Ks,t . Notice that all the vertices of this Ks,t are adjacent to either u0 or u00 , hence they all lie in the core. Consequently, the core spans e = e(A0 , A00 ) ≥ st − 1 edges. Now applying Lemma 2.2.2 we get e(U, U 0 ) ≥ n(s + t − 2) − (x0 + x00 )(t − 1) (s − 1)2 + min{(t − s)x, (t − s)x } − +e 4 ≥ n(s + t − 2) − (x0 + x00 )(t − 1) (s − 1)2 0 + min{(t − s)x0 , (t − s)x0 } − + st − 1 4 (s − 1)2 2 ≥ n(s + t − 2) − (s + t − 2) + st − 1 − 4 2 ≥ n(s + t − 2) − (s + t − 2) . 0 This contradicts the assumption, thus proving the theorem. 2.3 Extremal graphs As we mentioned in the introduction, Moshkovitz and Shapira [43] constructed a Ks,t -saturated n-by-n bipartite graph showing that the bound of the Conjecture 2.1.1, if true, is tight. It appears that this example is not unique. In this section we describe a general family of such graphs which contains the example by Moshkovitz and Shapira as a special case (when l = 1). Example. As usual, we denote the two sides of the bipartite graph by U and U 0 , where |U | = |U 0 | = n. Let us break each class into two major parts: U = V ∪W and U 0 = V 0 ∪W 0 , where |V | = |V 0 | = t+s−2 (assume n is large 2 0 enough). Suppose W and W are further broken into some parts W1 , . . . , Wl and W10 , . . . , Wl0 where |Wi | = |Wi0 | ≥ t − s for all i. The construction of an extremal graph G goes as follows. 14 2.4. THE K2,3 CASE First include in G all the edges between V and V 0 , making it a complete bipartite graph. Also, for every i, choose the edges between Wi and Wi0 to span an arbitrary (t − s)-regular graph. It remains to describe the edges going between different type of classes. We do not include any edge between Wi and Wj0 for any i 6= j. Instead, choose arbitrary sets S 0 ⊆ V 0 and S1 , . . . , Sl ⊆ V of size s − 1, and take all edges going between Wi and S 0 as well as the edges between Si and Wi0 , for all i. A straightforward computation shows that the number of edges in this G is exactly the number in the conjecture. We claim that G is Ks,t -saturated. Let us see what happens when we add a missing edge uu0 to G. If 0 u ∈ W 0 , i.e. u0 ∈ Wi0 for some i, then let N be the set of its t − s neighbors in Wi . Since u ∈ U − N − Si , the set Si ∪ {u} ∪ N ∪ S 0 ∪ {u0 } then forms a K(t,s) . On the other hand, if u0 ∈ V 0 , then u ∈ Wi for some i. Let N 0 be the set of the t − s neighbors of u in Wi0 , then u0 ∈ U 0 − N 0 − S 0 and hence the set Si ∪ {u} ∪ S 0 ∪ N 0 ∪ {u0 } forms a K(s,t) . This proves the saturation property. The asymmetric structure of the above example comes from the relaxation of the l = 1 case, which corresponds to the construction of Moshkovitz and Shapira. When all the vertices in W 0 are connected to the same subset of V of size s − 1, adding an edge between W and W 0 creates both a K(s,t) and a K(t,s) . Our example exploits the freedom we had in choosing the edges between W 0 and V . In our case, when l > 1, adding an edge between Wi and Wj0 with Si 6= Sj creates only a K(t,s) . The existence of such asymmetric examples provides further difficulties in proving an exact result. 2.4 The K2,3 case For t = s, Conjecture 2.1.1 trivially follows from the ordered result by Bollobás [10]. The other extreme is also easy to handle. When s = 1, the K1,t -saturated property merely means that the vertices of degree less than t − 1 span a complete bipartite graph. Then it is a simple exercise to show (see [43]) that the conjecture holds in this case as well. Thus the first open case is s = 2 and t = 3, where the conjecture asserts that any K2,3 -saturated graph contains at least 3n − 2 edges. We note that 15 2.4. THE K2,3 CASE there are many saturated graphs on 3n − 2 edges. In fact, there are many such examples which are even K(2,3) -saturated: Just take a vertex v 0 ∈ U 0 that is connected to everything in U , and make sure that every other vertex in U 0 has degree 2. In this section we prove the matching lower bound. A brief summary of the coming theorem can be phrased as follows. By finding an appropriate core, our techniques from Section 2.2 easily give a 3n − 3 lower bound. The rest of the proof is then a series of small structural observations, ultimately ruling out the possibility that a K2,3 -saturated graph with 3n − 3 edges exists. Theorem 2.4.1. If G = (U, U 0 ; E) is a K2,3 -saturated bipartite graph with n ≥ 4 vertices in each part, then it has at least 3n − 2 edges. Proof. As a first step, we show in the spirit of Proposition 2.2.1 that it is enough to consider graphs of minimum degree 2. Lemma 2.4.2. If G contains fewer than 3n−2 edges, then it has minimum degree 2. Moreover, it contains two non-adjacent vertices u0 ∈ U and u00 ∈ U 0 of degree 2. Proof. The saturation property ensures that each vertex has at least one neighbor. Suppose there is a vertex u of degree 1 – wlog u ∈ U – and let u0 ∈ U 0 be its neighbor. Take any vertex v 0 ∈ U 0 other than u0 , then adding the edge uv 0 cannot create a K(2,3) , so it must create a K(3,2) , with the 2-vertex class being {u0 , v 0 }. For any such v 0 , let Uv0 ⊆ U be the 3-class of this K(3,2) , so Uv0 consists of u and two neighbors of v 0 . We count the two edges between v 0 and Uv0 for each v 0 ∈ U 0 , v 0 6= u0 to get a total of 2n − 2 different edges. Now let X = ∪Uv0 , then every vertex in X is connected to u0 because each of the above K(3,2) ’s contains u0 . This gives |X| new edges. On the other hand, we still have not encountered any edges touching U − X. But since we know that each vertex has at least one neighbor, we surely have at least n − |X| new edges. This is already a total of 3n − 2 edges in G, contradicting our assumption. Therefore the minimum degree is at least 2, but in fact it is exactly two, as otherwise we would have at least 3n edges in the graph. Let u0 have degree 2 – we may assume u0 ∈ U . If every non-adjacent vertex in U 0 has at least 3 neighbors, then we have 2 · 2 + 3(n − 2) = 3n − 2 edges incident 16 2.4. THE K2,3 CASE to U 0 , again a contradiction. Hence there is a u00 ∈ U 0 of degree 2 that is not adjacent to u0 , and we are done. Suppose G is a counterexample to our theorem, and apply Lemma 2.4.2 to get two non-adjacent vertices u0 and u00 of degree 2. Denote the neighbors of u0 by u01 , u02 ∈ U 0 , the neighbors of u00 by u1 , u2 ∈ U , and let A0 = {u0 , u1 , u2 } and A00 = {u00 , u01 , u02 } be the core of the structure we described at the beginning of Section 2.2. Using this core we will also construct the sets A, B = B1 ∪ B2 , C = C1 ∪ C2 and A0 , B 0 = B10 ∪ B20 , C 0 = C10 ∪ C20 as defined by the structure. Assume that |C2 | ≤ |C20 | and apply Claim 2.2.3 with s = 2 and t = 3 to the structure of core Ã0 = A0 ∪ A00 . These choices for s and t significantly simplify the bound we get from this claim: e(U, U 0 ) ≥ 3n − 6 · 2 − 0 + e + x = 3n − 12 + e + x. Using that the addition of the edge u0 u00 creates a K2,3 inside the core (as all neighbors of u0 and u00 are in Ã0 ), it is easy to check that e = e(A0 , A00 ) ≥ 6. We also know that x ≥ x0 = 3, so e(U, U 0 ) ≥ 3n−3. Then these inequalities together with Corollary 2.2.4 imply that if e(U, U 0 ) = 3n−3 then G satisfies the following five properties: 1) e = e(A0 , A00 ) = 6 and x = |A| = 3, 2) any vertex in B10 ∪ C 0 has exactly 2 neighbors in A ∪ B, 3) any vertex in B has exactly 1 neighbor in A0 , 4) the vertices in C1 have exactly 1 neighbor outside B20 , and 5) for y = |C2 | and y 0 = |C20 | (with 0 ≤ y ≤ y 0 ) we have y(s − 1) − yy 0 = y(1 − y 0 ) = 0, so either y = 0 or y = y 0 = 1. The following lemma supplements the fifth property and shows that C2 must be empty and C20 must be non-empty, by taking care of the case y = y 0 = 0 and y = y 0 = 1. 17 2.4. THE K2,3 CASE Lemma 2.4.3. If |C2 | = |C20 | then G spans at least 3n − 2 edges. Proof. As G is a counterexample, by Lemma 2.4.2 it has minimum degree 2. Since y = y 0 , we may apply Claim 2.2.3 and Corollary 2.2.4 to G’s “mirror”, with U and U 0 switched, and observe that the five properties hold for this mirror graph as well. Then the first property gives x = 3, x0 = 3 and e = 6. So A = {u0 , u1 , u2 } and A0 = {u00 , u01 , u02 } (i.e. any vertex not in the core has at most one neighbor in it), and the core spans 6 edges. By symmetry we can assume that adding the edge u0 u00 creates a K(2,3) on the set {u0 , u1 , u00 , u01 , u02 }, so the missing edges are u0 u00 , u2 u01 and u2 u02 . We also assumed that there is no vertex of degree 1, so u2 must have some neighbor v 0 in B 0 . Note that v 0 has exactly one neighbor in A, in particular it is not connected to u1 . Now let us see what happens when we add the edge u0 v 0 . We cannot create a K(2,3) , because that would use both u01 and u02 , but their only common neighbor other than u0 is u1 (recall that no vertex outside A0 can have 2 neighbors in A), which is not connected to v 0 . So it must be a K(3,2) , and it is not using u2 , as u2 has no common neighbor with u0 . But then the K(3,2) contains two neighbors of v 0 that are not in A, but are connected to a vertex in A0 . Then, by definition, these neighbors are in B. So v 0 ∈ B10 has at least two neighbors in B and one in A, and this contradicts the second property. From now on we assume that C2 is empty and C20 is non-empty. Then the fourth property also implies that the vertices in C = C1 have exactly one neighbor outside B20 . Moreover, the third property tells us that each vertex in B has exactly one neighbor in A0 . Lemma 2.4.4. All vertices in B are connected to the same vertex in A0 . Proof. Break B into parts based on the neighbor in A0 by putting the vertices in B connected to w0 ∈ A0 into the set Bw0 . We claim that vertices in different parts do not share common neighbors, or in other words, any vertex v 0 ∈ B 0 ∪ C 0 has all its neighbors in B contained in the same part Bw0 . Indeed, any vertex in B 0 has at most one neighbor in B: this is true by definition for the vertices in B20 , and follows from the second property for B10 (every vertex in B10 has a neighbor in A). Now look at the vertices in C 0 . An easy observation in Lemma 2.2.2 shows that adding the edge u0 v 0 for 18 2.4. THE K2,3 CASE v 0 ∈ C 0 cannot create a K(2,3) . So it creates a K(3,2) , and this K(3,2) must contain the two neighbors of v 0 in B and a neighbor w00 of u0 in A0 . Hence both neighbors of v 0 are in Bw00 , establishing the claim. As we noted above, C 0 is not empty, so take a vertex v10 ∈ C 0 and assume that the neighbors of v10 in B are in Bw10 . We will show that B = Bw10 . Suppose not, i.e. there is a v1 ∈ Bw20 with w10 6= w20 . Then the edge v1 v10 is missing; let us see what happens when we add that edge. We create a K(2,3) or a K(3,2) , so in any case there are vertices v2 ∈ U and v20 ∈ U 0 such that v1 v20 , v2 v20 and v2 v10 are all edges of G. Here v2 cannot be in A, as it is connected to v10 ∈ C 0 . It is not in B either, since then one of v10 and v20 would have neighbors in both Bw10 and Bw20 . So v2 ∈ C = C1 (since C2 is empty). Now the fourth property says that v2 has exactly 1 neighbor outside B20 . Since v10 ∈ C 0 , v20 must be in B20 . But the vertices in B20 have no neighbors in B so v1 v20 cannot be an edge, giving a contradiction. Note that this lemma implies that one of the two neighbors of u0 – say – is not connected to any vertex in B, and therefore it is adjacent to at least two vertices in A = A0 . We also recall that the core only spans six edges. It is time to analyze what happens in the core when we add the edge u0 u00 . It might create a K(3,2) or a K(2,3) , but the obtained graph is inside the core in both cases. Case 1: u0 u00 creates a K(3,2) . If this K(3,2) used u02 , then the core of G would contain more than 6 edges: 5 from the K(3,2) and 2 other edges incident to u01 , which is impossible. So u01 is connected to both u1 and u2 , while u02 is not connected to any of them. Note, however, that u02 is connected to all vertices in B. Let v be any vertex in U − A. When we add the edge vu00 to G, we create a K(2,3) or a K(3,2) , so there is a vertex v 0 connected to both v and u1 or u2 . Then v 0 is not in A0 , since v is only connected to u02 in A0 , but both u1 u02 and u2 u02 are missing. Thus v 0 ∈ B 0 (it has a neighbor in A, so it is not in C 0 ). When we add the edge u0 v 0 , we cannot create a K(2,3) , because that would use both u01 and u02 , which only share u0 as their common neighbor. So it creates a K(3,2) using one of u01 and u02 . It cannot be u01 , because then the 3-class of the K(3,2) is exactly A, making v 0 have 2 neighbors in A. Thus, by definition, v 0 ∈ A0 which contradicts v 0 ∈ B 0 . But it cannot be u02 either, because then v 0 would have two neighbors in B, which together with a neighbor in A that v 0 must have, contradicts the second property. So this case is impossible. u01 19 2.5. FURTHER REMARKS Case 2: u0 u00 creates a K(2,3) . Then one of u1 and u2 – say u1 – is connected to both u01 and u02 , and the other is connected to neither. But then u01 has exactly two neighbors, u0 and u1 , and the set Ā = {u0 , u1 , u01 , u02 } spans four edges. This means that we can apply Lemma 2.2.2 taking Ā as the core, and u0 and u01 being its “distinguished” vertices with their neighborhoods also sitting in the core. One can check that all conditions are satisfied, and with our new values of x ≥ x0 = 2, x0 ≥ x00 = 2 and e = 4, we get that G has at least 3n − 8 − 0 + 4 + 2 = 3n − 2 edges. This contradiction finishes the proof of the theorem. 2.5 Further remarks Although we could slightly improve the error term in Theorem 2.2.5, it seems that more ideas are needed to prove the full conjecture. We also note that our methods can be used to provide an asymptotically tight estimate on the minimum number of edges in a Ks,t -saturated unbalanced bipartite graph (i.e., with parts of size m and n). Determining the precise value in this unbalanced case might be even more challenging, although we believe that a straightforward modification of the extremal construction from the balanced case is tight here as well. Bipartite saturation results were generalized to the hypergraph setting in [1, 43], where G and F are assumed to be k-partite k-uniform hypergraphs, and G is F -saturated if any new hyperedge meeting one vertex from each color class creates a new copy of F . It would be interesting to extend our results to get an asymptotically tight bound for the unordered k-partite hypergraph saturation problem. 20 Chapter 3 Saturation in random graphs 3.1 Introduction For some fixed graph F , a graph H is said to be F -saturated if it is a maximal F -free graph, i.e., H does not contain any copies of F as a subgraph, but adding any missing edge to H creates one. The saturation number sat(n, F ) is defined to be the minimum number of edges in an F -saturated graph on n vertices. As we pointed out in the previous chapter, the saturation problem is in some sense the opposite of the Turán problem, and the first results on the topic were published by Zykov [55] in 1949 and independently by Erdős, Hajnal and Moon [24] in 1964. They considered the problem for cliques and showed that sat(n, Ks ) = (s − 2)n − s−1 , where the upper bound comes 2 from the graph consisting of s − 2 vertices connected to all other vertices. Since the 1960s, the saturation number sat(n, F ) has been extensively studied for various different choices of F . For results in this direction, we refer the interested reader to the survey [27]. Here we are interested in a different direction of extending the original problem, where saturation is restricted to some host graph other than Kn . For fixed graphs F and G, we say that a subgraph H ⊆ G is F -saturated in G if H is a maximal F -free subgraph of G. The minimum number of edges in an F -saturated graph in G is denoted by sat(G, F ). Note that with this new notation, sat(n, F ) = sat(Kn , F ). Such a question with a complete bipartite host graph already appeared in the above-mentioned paper of Erdős, Hajnal and Moon, and Chapter 2 21 3.1. INTRODUCTION discussed some further developments on that problem. But over the years, several other host graphs have also been considered, including complete multipartite graphs [28, 47] and hypercubes [15, 34, 44]. In recent decades, classic extremal questions of all kinds are being extended to random settings. Hence it is only natural to ask what happens with the saturation problem in random graphs. As usual, we let G(n, p) denote the Erdős-Rényi random graph on vertex set [n] = {1, . . . , n}, where two vertices i, j ∈ [n] are connected by an edge with probability p, independently of the other pairs. In this chapter we study Ks -saturation in G(n, p). The corresponding Turán problem of determining ex(G(n, p), Ks ), the maximum number of edges in a Ks -saturated graph in G(n, p), has attracted a considerable amount of attention in recent years. The first general results in this direction were given by Kohayakawa, Rödl and Schacht [39], and independently Szabó and Vu [51] who proved a random analog of Turán’s theorem for large enough p. This problem was resolved by Conlon and Gowers [19], and independently by Schacht [50], who determined the correct range of edge probabilities where the Turán-type theorem holds. The powerful method of hypergraph containers, developed by Balogh, Morris and Samotij [6] and by Saxton and Thomason [49], provides an alternative proof. Roughly speaking, these results establish that for most values of p, the random graph G(n, p) behaves much like the complete graph as a host graph, in the sense that Ks -free subgraphs of maximum size are essentially s − 1-partite. Now let us turn our attention to the saturation problem. When s = 3, the minimum saturated graph in Kn is the star. Of course we cannot exactly adapt this structure to the random graph, because the degrees in G(n, p) are all close to np with high probability, but we can do something very similar. Pick a vertex v1 and include all its incident edges in H. This way, adding any edge of G(n, p) induced by the neighborhood of v1 creates a triangle, so we have immediately taken care of a p2 fraction of the edges (which is the best we can hope to achieve with one vertex). Then the edges adjacent to some other vertex is expected to take care of a p2 fraction of the remaining edges, and so on: we expect every new vertex to reduce the number of remaining edges by a factor of (1 − p2 ). Repeating this about log1/(1−p2 ) n2 times, we obtain a K3 -saturated bi partite subgraph containing approximately pn log1/(1−p2 ) n2 edges. It feels 22 3.1. INTRODUCTION natural to think that this construction is more or less optimal. Surprisingly, this intuition turns out to be incorrect. Indeed, we will present an asymptotically tight result which is significantly better. Moreover, rather unexpectedly, the asymptotics of the saturation numbers do not depend on s, the size of the clique. Theorem 3.1.1. Let 0 < p < 1 be some constant probability and s ≥ 3 be an integer. Then sat(G(n, p), Ks ) = (1 + o(1))n log 1 1−p n with high probability. Our next result is about the closely related notion of weak saturation (also known as graph bootstrap percolation), introduced by Bollobás [11] in 1968. Generalizing our definition from the previous chapter, we say that a graph H ⊆ G is weakly F -saturated in G if H does not contain any copies of F , but the missing edges of H in G can be added back one by one in some order, such that every edge creates a new copy of F . The smallest number of edges in a weakly F -saturated graph in G is denoted by w-sat(G, F ). Clearly w-sat(G, F ) ≤ sat(G, F ), but Bollobás conjectured that when both G and F are complete, then in fact equality holds. This somewhat surprising fact was proved by Lovász [42], Frankl [29] and Kalai [35, 36] using linear algebra: Theorem 3.1.2 ([35, 36]). s−1 w-sat(Kn , Ks ) = sat(Kn , Ks ) = (s − 2)n − 2 Again, many variants of this problem with different host graphs have been studied [1, 43, 44], and it turns out to be quite interesting for random graphs, as well. In this case we are able to determine the weak saturation number exactly. It is worth pointing out that this number is linear in n, as opposed to the saturation number, which is of the order n log n. Theorem 3.1.3. Let 0 < p < 1 be some constant probability and s ≥ 3 be an integer. Then s−1 w-sat(G(n, p), Ks ) = (s − 2)n − 2 with high probability. 23 3.2. STRONG SATURATION We will prove Theorem 3.1.1 in Section 3.2 and Theorem 3.1.3 in Section 3.3. We conclude the chapter with a discussion of open problems in Section 3.4. Notation. All our results are about n tending to infinity, so we often tacitly assume that n is large enough. We say that some property holds with high probability, or whp, if the probability tends to 1 as n tends to infinity. In this chapter log stands for the natural logarithm unless specified otherwise in the subscript. For clarity of presentation, we omit floor and ceiling signs whenever they are not essential. We use the standard notations of G[S] for the subgraph of G induced by the vertex set S, and G[S, T ] for the (bipartite) subgraph of G[S ∪ T ] containing the S-T edges of G. For sets A, B and element x, we will sometimes write A + x for A ∪ {x} and A − B for A \ B. 3.2 Strong saturation In this section we prove Theorem 3.1.1 about Ks -saturation in random graphs. Let us say that a graph H completes a vertex pair {u, v} if adding the edge uv to H creates a new copy of Ks . Using this terminology, a Ks -free subgraph H ⊆ G is Ks -saturated in G, if and only if H completes all edges of G missing from H. We will make use of the following bounds on the tail of the binomial distribution. Claim 3.2.1. Let 0 < p < 1 be a constant and X ∼ Bin(n, p) be a binomial random variable. Then a2 1) P[X ≥ np + a] ≤ e− 2(np+a/3) , a2 2) P[X ≤ np − a] ≤ e− 2np and h i n 3) P X ≤ logn2 n ≤ (1 − p)n− log n . Proof. The first two statements are standard Chernoff-type bounds (see e.g. [33]). The third can be proved using a straightforward union-bound argument as follows. 24 3.2. STRONG SATURATION Let us think about X as the cardinality of a random subset A ⊆ [n], where every element in [n] is included in A with probability p, independently of the others. X ≤ logn2 n means that A is a subset of some set I ⊆ [n] of size logn2 n . For a fixed I, the probability that X ⊆ I is (1 − p)n−|I| . We can choose an I of size logn2 n in n2 3n log log n log n n n en 2 log2 n ≤ e log2 n ≤ ≤ (e log n) n/ log2 n n/ log2 n different ways, so 3n log log n n n n− n P X≤ ≤ e log2 n · (1 − p) log2 n ≤ (1 − p)n− log n . 2 log n 3.2.1 Lower bound First, we prove that any Ks -saturated graph in G(n, p) contains at least (1 + o(1))n log1/(1−p) n edges. In fact, our proof does not use the property that Ks -saturated graphs are Ks -free, only that adding any missing edge from the host graph creates a new copy of Ks . Now if such an edge creates a new Ks then it will of course create a new K3 , as well, so it is enough to show that our lower bound holds for triangle-saturation. Theorem 3.2.2. Let 0 < p < 1 be a constant. Then with high probability, G = G(n, p) satisfies the following. If H is a subgraph of G such that for any edge e ∈ G missing from H, adding e to H creates a new triangle, then H contains at least n log1/(1−p) n − 6n log1/(1−p) log1/(1−p) n edges. 1 . Let A be the set of Proof. Let H be such a subgraph and set α = 1−p 2 vertices that are incident to at least logα n edges in H, and let B = [n] − A be the rest. If |A| ≥ log2n n then H contains at least 12 |A| log2α n ≥ n logα n α edges and we are done. So we may assume |A| ≤ log2n n and hence |B| ≥ α n(1 − log2 n ). Our aim is to show that whp every vertex in B is adjacent to α at least logα n − 5 logα logα n vertices of A in H. This would imply that H contains at least |B|(logα n−5 logα logα n) ≥ n(logα n−6 logα logα n) edges, as needed. 25 3.2. STRONG SATURATION So pick a vertex v ∈ B and let N be its neighborhood in A in the graph H. Let uv be an edge of the random graph G missing from H. Since H is K3 -saturated, we know that u and v must have a common neighbor w in H. Notice that for all but at most log4α n choices of u, this w must lie in A. Indeed, v is in B, so there are at most log2α n choices for w to be a neighbor of v, and if this w is also in B, then there are only log2α n options for u, as well. So the neighbors of N in H ⊆ G must contain all but log4α n of the vertices (outside N ) that are adjacent to v in G. The following claim shows that this is only possible if |N | ≥ logα n − 5 logα logα n, thus finishing our proof. Claim 3.2.3. Let 0 < p < 1 be a constant and G = G(n, p). Then whp for any vertex x and set Q of size at most logα n − 5 logα logα n, there are at least 2 log4α n vertices in G adjacent to x but not adjacent to any of the vertices in Q. Proof. Fix x and Q. Then the probability that some other vertex y is 5 adjacent to x but not to any of Q is p0 = p(1 − p)|Q| ≥ p lognα n . These events are independent for the different y’s, so the number of vertices satisfying this property is distributed as Bin(n−1−|Q|, p0 ). Its expectation, p0 (n−1−|Q|) is at least p2 log5α n, so the probability that there are fewer than 2 log4α n such 5 vertices is, by P Claim 3.2.1, at most e−Ω(log n) . But there are only n ways to logα n n choose x and i=1 ≤ n · nlogα n ways to choose Q, so the probability i that for some x and Q the claim fails is n2 · nlogα n · e−Ω(log 3.2.2 5 n) ≤ exp(O(log2 n) − Ω(log5 n)) = o(1). Upper bound Next, we construct a saturated subgraph of the random graph that contains (1 + o(1))n log1/(1−p) n edges. The following observation says that it is enough to find a graph that is saturated at almost all the edges. Observation. It is enough to find a Ks -free graph G0 that completes all but at most o(n log n) missing edges. A maximal Ks -free supergraph G ⊇ G0 will then be Ks -saturated with an asymptotically equal number of edges. 26 3.2. STRONG SATURATION Before we give a detailed proof of the upper bound, let us sketch the main ideas for the case s = 3. For simplicity, we will also assume p = 12 . As we mentioned in the introduction, if we fix a set A1 of log4/3 n2 ≈ 2 log4/3 n vertices with B1 = [n] − A1 , then G[A1 , B1 ] is a K3 -saturated subgraph with about n log4/3 n edges. But we can do better than that (see Figure 3.1): So instead, we fix A1 to be a set of 2 log2 n vertices and add all edges in G[A1 , B1 ] to our construction. This way we complete most of the edges in B1 using about n log2 n edges. Of course, we still have plenty of edges in G[B1 ] left incomplete, however, as we shall see, almost all of them are induced by a small set B2 ⊆ B1 of size o(n). But then we can complete all the edges in B2 using only o(n log n) extra edges: just take an additional set A2 of log4/3 n2 vertices, and add all edges in G[A2 , B2 ] to our construction. This way, however, we still need to take care of the Θ(n log n) incomplete edges between A2 and B3 = B1 −B2 . As a side remark, let us point out that dropping the K3 -freeness condition from K3 -saturation would make our life easier here. Indeed, then we could have just chosen A2 to be a subset of B2 . The trick is to take yet another set A3 of o(log n) vertices, and add all the o(n log n) edges of G between A3 and A2 ∪ B3 to our construction. Now the o(log n) vertices in A3 are not enough to complete all the edges between A2 and B3 , but if |A3 | = ω(1), then it will complete most of them. This gives us a triangle-free construction on (1 + o(1))2 log2 n edges completing all but o(n log n) edges, and by the observation above this is enough. B1 B2 B3 A3 A1 A2 Figure 3.1: strong K3 -saturation 27 3.2. STRONG SATURATION Let us now collect the ingredients of the proof. The next lemma says that the graph comprised of the edges incident to a fixed set of a vertices will complete all but an approximately (1 − p2 )a fraction of the edges in any prescribed set E. Lemma 3.2.4. Let s ≥ 3 be a fixed integer, and suppose a = a(n) grows to infinity as n → ∞. Let A ⊆ [n] be a set of size a and E be a collection of vertex pairs from B = [n] − A. Now consider the random graph GA defined on [n] as follows: 1) GA [B] is empty, 2) GA [A] is a fixed graph such that any induced subgraph on contains a copy of Ks−2 , a log2 a vertices 3) the edges between A and B are in GA independently with probability p. Then the expected number of pairs in E that GA does not complete is at a most (1 − p2 )a− log a · |E|. Proof. Suppose a pair {u, v} ∈ E of vertices is incomplete, i.e., adding the edge uv to GA does not create a Ks . Because of the second condition, the probability of this event can be bounded from above by the probability that u and v have fewer than loga2 a neighbors in A. As the size of the common neighborhood of u and v is distributed as Bin(a, p2 ), Lemma 3.2.1 implies a that this probability is at most (1 − p2 )a− log a . This bound holds for any pair in E, so the expected number of bad pairs is indeed no greater than a (1 − p2 )a− log a · |E|. A Ks -saturated graph cannot contain any cliques of size s. So if we want to construct such a graph using Lemma 3.2.4, we need to make sure that GA itself is Ks -free. The easiest way to do this is by requiring that GA [A] do not contain any Ks−1 . In our application, GA will be a subgraph of G(n, p), so in particular, we need GA [A] to be a subgraph of an Erdős-Rényi random graph that is Ks−1 -free but induces no large Ks−2 -free subgraph. Krivelevich [40] showed that whp we can find such a subgraph. 28 3.2. STRONG SATURATION Theorem 3.2.5 (Krivelevich). Let s ≥ 3 be an integer, p ≥ cn−2/s (for some small constant c depending on s) and G = G(n, p). Then whp G contains a Ks−1 -free subgraph H that has no Ks−2 -free induced subgraph on s−2 n s polylog n ≤ logn3 n vertices. We will also use the fact that random graphs typically have relatively small chromatic numbers (see e.g. [33]): Claim 3.2.6. Let 0 < p < 1 be a constant and G = G(n, p). Then χ(G) = (1 + o(1)) 2 log n n whp. 1/(1−p) We are now ready to prove our main theorem. Theorem 3.2.7. Let 0 < p < 1 be a constant, s ≥ 3 be a fixed integer and G = G(n, p). Then whp G contains a Ks -saturated subgraph on (1 + o(1))n log1/(1−p) n edges. 1 1 3 1 , β = 1−p Proof. Define α = 1−p 2 , and set a1 = p (1 + log log n ) logα n, α a2 2 ) logβ n2 and a3 = √log a2 = (1 + log log = o(log n). Let us also define n a2 β A1 , A2 , A3 to be some disjoint subsets of [n] such that |Ai | = ai , and let B1 = [n] − (A1 ∪ A2 ∪ A3 ). Expose the edges of G[A1 ]. By Theorem 3.2.5 it contains a Ks−1 -free subgraph G1 with no Ks−2 -free subset of size loga31a1 whp. Let H1 be the graph on vertex set A1 ∪ B1 such that H1 [A1 ] = G1 , H1 [B1 ] is empty, and H1 is identical to G between A1 and B1 . Now define B2 to be the set of 2 ) logα n vertices vertices in B1 that are adjacent to fewer than (1 + log log αn of A1 in the graph H1 . We claim that H1 completes all but O(n log log n) of the vertex pairs in B1 not induced by B2 . To see this, let F be the set of incomplete pairs in B1 not induced by B2 . Now fix a vertex v ∈ B1 − B2 and denote its neighborhood in A1 by 2 Nv . By definition, |Nv | ≥ (1 + log log ) logα n. Let dF (v) be the degree of αn v in F , i.e., the number of pairs {u, v} ⊆ B2 that H1 does not complete. Conditioned on Nv , the size of the common neighborhood of v and some u ∈ B1 is distributed as Bin(|Nv |, p), so by Claim 3.2.1 the probability that v| is at most (1 − p)|Nv |−|Nv |/ log |Nv | ≤ this is smaller than loga31a1 ≤ log|N 2 |Nv | (1 − p)logα n = n1 . On the other hand, if the common neighborhood contains at least loga31a1 vertices in A1 , then it also induces a Ks−2 , thus the pair is P complete. Therefore E[dF (v)] ≤ 1 and E[ v∈B2 −B1 dF (v)] ≤ n. 29 3.2. STRONG SATURATION P Here the quantity v∈B2 −B1 dF (v) bounds |F |, the number of incomplete edges in B1 that are not induced by B2 , so by Markov’s inequality this number is indeed O(n log log n) whp. By the Observation above, we can temporarily ignore these edges, and concentrate instead on completing those induced by B2 . To complete them, we will use the (yet unexposed) edges touching A2 . The good thing about B2 is that it is quite small: For a fixed v ∈ B1 , the size of its neighborhood in A1 is distributed as Bin(a1 , p), so the probability 2 αn ) logα n = pa1 − logloglog is, by Claim 3.2.1, that it is smaller than (1+ log log n n 2 α α at most e− logα n/4(log logα n) log1 n . So by Markov, |B2 |, the number of such vertices, is smaller than logn n whp. Now expose the edges of G[A2 ]. Once again, whp we can apply Theorem 3.2.5 to find a Ks−1 -free subgraph G2 that has no large Ks−2 -free induced subgraph. Let H2 be the graph on A2 ∪ B2 such that H2 [A2 ] = G2 , H2 [B2 ] is empty, and H2 = G on the edges between A2 and B2 . Let us apply Lemma 3.2.4 to GA2 = H2 with E containing all vertex pairs in B2 . The lemma says that the expected number of incomplete pairs in E is at most a |B2 | n/ log n 2 logβ (n 2 a2 − log2a2 ) 2 · · ≤ (1 − p ) = o(1), (1 − p ) 2 2 so by Markov, H1 completes all of E whp, in particular it completes all the edges in G[B2 ]. Note that |B2 | ≤ logn n also implies that H2 only contains O(n) edges, so adding them to H1 does not affect the asymptotic amount of edges in our construction. However, the edges connecting A2 to B3 = B1 − B2 are still incomplete, and there are Θ(n log n) of them, too many to ignore using the Observation. The idea is to complete most of these using the edges between A3 and A2 ∪ B3 , but we need to be a little bit careful not to create copies of Ks with the edges in H2 [A2 ] = G2 . We achieve this by splitting up A2 into independent sets. a2 By Claim 3.2.6, we know that k = χ(G[A2 ]) = O log a2 , so G2 ⊆ G[A2 ] can also be k-colored whp. Let A2 = ∪ki=1 A2,i be a partitioning into color classes, so here G2 [A2,i ] is an empty graph for every i. Let us also split A3 into 2k parts of size a4 = a3 /2k and expose the edges in G[A3 ]. Each of the 2k parts contains a Ks−1 -free subgraph with no Ks−2 -free subset 30 3.3. WEAK SATURATION of size loga34a4 with the same probability p0 , and by Theorem 3.2.5, p0 = √ 1 − o(1) (note that a4 = Ω( log a2 ) grows to infinity). This means that the expected number of parts not having such a subgraph is 2k(1 − p0 ) = o(k), so by Markov’s inequality, whp k of the parts, A3,1 , . . . A3,k , do contain such subgraphs G3,1 , . . . , G3,k . Now define H3,i to be the graph with vertex set A3,i ∪ A2,i ∪ B3 such that H3,i [A3,i ] = G3,i , H3,i [A2,i ∪ B3 ] is empty, and the edges between A3,i and A2,i ∪ B3 are the same as in G. Then we can apply Lemma 3.2.4 to show that H3,i is expected to complete all but a (1 − p2 )a4 −a4 / log a4 fraction of the A2,i -B3 pairs. This means that the expected number of edges between A2 and B3 that H3 = ∪ki=1 H3,i does not complete is bounded by √ (1 − p2 )Ω( log a2 ) √ · |A2 ||B3 | ≤ (1 − p2 )Ω( log log n) · O(n log n). So if F 0 is the set of incomplete A2 -B3 edges, then another application of Markov gives |F 0 | = o(n log n) whp. It is easy to check that H = H1 ∪ H2 ∪ H3 is Ks -free, since it can be obtained by repeatedly “gluing” together Ks -free graphs along independent sets, and such a process can never create an s-clique. To estimate the size of H, note that Claim 3.2.1 implies that whp all the vertices in A1 have (1 + o(1))p|B1 | neighbors in B1 , so H1 contains (1 + o(1))n logα n edges. We have noted above that H2contains O(n) edges and H3 clearly contains log n at most |A2 ∪ B3 ||A3 | = O n √log log n edges. So in total, H contains (1 + o(1))n logα n edges. Finally, the only edges H is not saturated at are either in F , in F 0 , touching A3 , or induced by A1 ∪ A2 . There are o(n log n) edges of each of these four kinds, so any maximal Ks -free supergraph H 0 of H has (1 + o(1))n logα n edges. This H 0 is Ks -saturated, hence the proof is complete. 3.3 Weak saturation In this section we prove Theorem 3.1.3 about the weak saturation number of random graphs of constant density. In fact, we prove the statement for a slightly more general class of graphs, satisfying certain pseudorandomness conditions. We will need some definitions to formulate this result. 31 3.3. WEAK SATURATION Given a graph G and a vertex set X, a clique extension of X is a vertex set Y disjoint from X such that G[Y ] is a clique and G[X, Y ] is a complete bipartite graph. We define the size of this extension to be |Y |. The following definition of goodness captures the important properties needed in our proof. Definition 3.3.1. A graph G is t-good with respect to γ if G satisfies the following properties: P1. For any vertex set X of size x and any integer y such that x, y ≤ t, G contains at least γn disjoint clique extensions of X of size y. P2. For any two disjoint sets S and T of size at least γn/2, there is a vertex v ∈ T and X ⊆ S of size t − 1 such that X ∪ v induces a clique in G. It is not hard to see that the Erdős-Rényi random graphs satisfy the properties: Claim 3.3.2. Let 0 < p < 1 be a constant and let t be a fixed integer. Then there is a constant γ = γ(p, t) > 0 such that whp G = G(n, p) is t-good with respect to γ. Proof. To prove property P1, fix x, y ≤ t and a set X of size x, and split V −X k groups of y elements (with some leftover): V1 , . . . , Vm where j into n−x m = y . The probability that for some i ∈ [m], all the pairs induced y+x x by V or connecting X and V are edges in G is p̃ = p( 2 )−(2) , which is a i i constant. Let BX,y be the event that fewer than mp̃/2 of the Vi satisfy this. By Claim 3.2.1, P(BX,y ) ≤ e−mp̃/8 . 2t Now set γ = p( 2 ) /4t, then γn ≤ mp̃/2, so if BX,y does not hold then there are at least γn different Vi ’s that we can choose are Yi wet log the nsets Pt tonbe looking for. On the other hand, there are only x=0 x ≤ t t ≤ t · e n choices for X and t choices for y, so P(∪X,y BX,y ) ≤ t2 · et log n · e−γn/4 = o(1), hence P1 is satisfied whp. 32 3.3. WEAK SATURATION To prove property P2, notice that Theorem 3.2.5 implies that whp any induced subgraph of G on at least logn3 n vertices contains a clique of size t.1 Let us assume this is the case and fix S and T . Then if a vertex v ∈ T has at least logn3 n neighbors in S, then the neighborhood contains a t − 1-clique in G that we can choose to be X. So if property P2 fails, then no such v can have logn3 n neighbors in S. The neighborhood of v in S is distributed as Bin(|S|, p), so the probability that v has fewer than logn3 n ≤ log|S| neighbors in S is at most 2 |S| (1 − p)|S|−|S|/ log |S| ≤ e−p|S|/2 ≤ e−pγn/4 by Claim 3.2.1. These events are independent for the different vertices in S, so the probability that P2 fails 2 for this particular choice of S and T is e−Ω(n ) . But we can only fix S and T in 22n different ways, so whp P2 holds for G. Now we are ready to prove the following result, which immediately implies Theorem 3.1.3. Theorem 3.3.3. Let G be 2s-good with respect to some constant γ. Then s−1 w-sat(G, Ks ) = (s − 2)n − . 2 Proof. To prove the lower bound on w-sat(G, Ks ), it is enough to show that G is weakly saturated in Kn . Indeed, if H is weakly saturated in G and G is weakly saturated in Kn , then H is weakly saturated in Kn : We can just add edges one by one to H, obtaining G, and then keep adding edges until we reach Kn in such a way that every added edge creates a new copy of Ks. But then Theorem 3.1.2 implies that H contains at least (s − 2)n − s−1 2 edges, which is what we want. Actually, G is not only weakly, but strongly saturated in Kn : property P1 implies that for any vertex pair X = {u, v}, G contains a clique extension of X of size s − 2. But then adding the edge uv to this subgraph creates the copy of Ks that we were looking for. Let us now look at the upper bound on w-sat(G, Ks ). Fix a core C of s − 2 vertices that induces a complete graph (such a C exists by property P1, as it is merely a clique extension of ∅ of size s−2). Our saturated graph H will consist of G[C], plus s − 2 edges for each vertex v ∈ V − C, giving 1 We should point out that this statement is much weaker than Theorem 3.2.5 and can be easily proved directly using a simple union bound argument. 33 3.3. WEAK SATURATION s−1 a total of s−2 + (s − 2)(n − s + 2) = (s − 2)n − edges. We build H 2 2 in steps. Let V 0 be the set of vertices v ∈ V − C adjacent to all of C. Note that |V 0 | ≥ γn by property P1. We start our construction with the graph H0 ⊆ G on vertex set V0 = C ∪ V 0 that contains all the edges touching C. Once we have defined Hi−1 on Vi−1 , we pick an arbitrary vertex vi ∈ V − Vi−1 , and choose a set Ci ⊆ Vi−1 of s − 2 vertices that induces a clique in G with vi . Again, we can find such a Ci as a clique extension of vi ∪C. In fact, by the definition of V 0 , this Ci will lie in V 0 . Then we set Vi = Vi−1 ∪ vi and define Hi to be the graph on Vi that is the union of Hi−1 and the s − 2 edges connecting vi to Ci . Repeating this, end up with some we eventually graph H = Hl on V = Vl that has n2 − n−s+2 edges. We claim it is 2 saturated. C vi Vi−1 Ci Figure 3.2: Weak K4 -saturation. Black edges are in H, gray edges are in G but not in H. We really prove a bit more: we show, by induction, that Hi is weakly Ks -saturated in G[Vi ] for every i. This is clearly true for i = 0: any edge of G[V0 ] not contained in H0 is induced by V 0 , and forms an s-clique with C. Now assume the statement holds for i − 1. We want to show that we can add all the remaining edges in Gi = G[Vi ] to Hi one by one, each time creating a Ks . By induction, we can add all the edges in Gi−1 , so we may assume they are already there. Then the only missing edges are the ones touching vi . If v ∈ Vi is a clique extension of Ci ∪ vi then adding vvi creates a Ks , so we can add this edge to the graph. Property P1 applied to C ∪Ci ∪vi to find 34 3.4. FURTHER REMARKS clique extensions of size 1 shows that there are at least γn such vertices. Let N be the new neighborhood of vi after these additions, then |N | ≥ γn. Now an edge vvi can also be added if v ∪C 0 induces a complete subgraph in G, where C 0 is any s − 2-clique in Gi [N ]. Let us repeatedly add the available edges (updating N with every addition). We claim that all the missing edges will be added eventually. Suppose not, i.e., some of the edges in Gi touching vi cannot be added using this procedure. There cannot be more than γn/2 of them, as that would contradict property P2 with S = N and T being the remaining neighbors of vi in Vi . But then take one such edge, vvi , and apply property P1 to C ∪ {v, vi }. It shows that there are γn disjoint clique extensions of size s−2, that is, γn disjoint s−2-sets in V 0 that form s-cliques with {v, vi }. But at most γn/2 of these cliques touch a missing edge other than vvi , so there is a C 0 among them that would have been a good choice for v. This contradiction establishes our claim and finishes the proof of the theorem. 3.4 Further remarks Many of the saturation results also generalize to hypergraphs. For example, n r r r r Bollobás [9] and Alon [1] proved that sat(K , K ) = w-sat(K , K ) = − n s n s r n−s+r r , where K denotes the complete r-uniform hypergraph on t vertices. t r It would therefore be very interesting to see how saturation behaves in Gr (n, p), the random r-uniform hypergraph. With some extra ideas, our proofs can be adapted to give tight results r -saturated hypergraphs, but the general Ksr -saturated case apabout Kr+1 pears to be more difficult. It is possible, however, that our methods can be pushed further to solve these questions, and we plan to return to the problem at a later occasion. Another interesting direction is to study the saturation problem for nonconstant probability ranges. For example, Theorem 3.1.1 about strong sat1 1 ≤ p ≤ 1 − o(n) in a fairly uration can be extended to the range logε(s) n straightforward manner. As for K3 -saturation, a bit of extra work yields the correct asymptotics in the range n−1/2+ε ≤ p 1, as well. Here we can show that sat(G(n, p), K3 ) = (1 + o(1)) np log np2 . Note also that for p n−1/2 there are much fewer triangles in G(n, p) than edges, so any saturated graph 35 3.4. FURTHER REMARKS must contain almost all the edges, i.e., sat(G(n, p), K3 ) = (1 + o(1)) n2 p. These results give us a fairly good understanding of K3 -saturation in random graphs. However, in general, Ks -saturation for sparse random graphs appears to be more difficult, and it seems that for p much smaller than log1 n the saturation numbers will depend on s. It is also not difficult to extend our weak saturation result, Theorem 3.1.3, to the range n−ε(s) ≤ p ≤ 1. A next step could be to obtain results for smaller values of p, in particular, it would be nice to determine the ex act probability range where the weak saturation number is (s − 2)n − s−1 . 2 Note that our lower bound holds as long as G(n, p) is weakly Ks -saturated in Kn . This problem was studied by Balogh, Bollobás and Morris [4], who showed that the probability threshold for G(n, p) to be weakly Ks -saturated (2s)−2 . It might be possible is around p ≈ n−1/λ(s) polylog n, where λ(s) = s−2 that w-sat(G(n, p), Ks ) changes its behavior at the same threshold. F -saturation in the complete graph has been studied for various different graphs F (see [27] for references). For example, Kászonyi and Tuza [37] showed that for any fixed (connected) graph F on at least 3 vertices, sat(n, F ) is linear in n. As we have seen, this is not true in G(n, p). However, analogous results in random host graphs could be of some interest. 36 Chapter 4 A random triadic process 4.1 Introduction The principle of triadic closure is an important concept in social network theory (see e.g. [20]). Roughly speaking, it says that when new friendships are formed in a social network, it is more likely to occur between two people sharing a common friend, thus “closing” a triangle, than elsewhere. We will consider a simplistic model of the evolution of a social network, where friendships can only be formed through a common friend, and triadic closure eventually occurs at any triangle with probability p, independently of other triangles. We refer to this process as the triadic process. Formally, let H = H(n, p) be a random 3-uniform hypergraph on [n] where each triple independently appears with probability p. The triadic process is the following graph process. We start with the star G0 on the same vertex set [n], containing all the edges incident to some vertex v0 , and repeatedly add any edge xy if there is a vertex z such that xz and zy are already in the graph and xzy ∈ H. We say that the process propagates if all the edges are added to the graph eventually. It is easy to see that this event does not depend on the order the edges are added in. In this chapter we prove that the threshold probability for propagation is 2√1 n . Theorem 4.1.1. Suppose p = √c , n for some constant c > 0. Then, 1) If c > 12 , then the triadic process propagates whp. √ 2) If c < 12 , then the triadic process stops at O(n n) edges whp. 37 4.1. INTRODUCTION As usual, we say that some property holds with high probability or whp if it holds with probability tending to 1 as n tends to infinity. Randomized graph processes have been intensively studied in the past decades. One notable example is the triangle-free process, originally motivated by the study of the Ramsey number R(3, n) (see e.g. [26]). In this process the edges are added one by one at random as long as they do not create a triangle in the graph. The triadic process is a slight variant of this, with a very similar nature. Indeed, our analysis makes good use of the tools developed by Bohman [7] when he applied the differential equation method to track the triangle-free process. Several other related processes were also analyzed using differential equations, e.g. [8]. For more information about this method we refer the interested reader to the excellent survey of Wormald [54]. Coja-Oghlan, Onsjö and Watanabe [17] investigated a similar kind of closure while analyzing connectivity properties of random hypergraphs. They say that a 3-hypergraph is propagation connected if its vertices can be ordered in some way v1 , . . . , vn so that each vi (i ≥ 3) forms a hyperedge with two preceding vertices. They obtain the threshold probability for the propagation connectivity of H(n, p) up to a small multiplicative constant. Using this directed notion of connectivity, our problem asks when the random 3-hypergraph on the line graph of Kn is propagation connected from the star. Our main motivation for considering the triadic process comes from the theory of random 2-dimensional simplicial complexes. A simplicial 2 V complex on the vertex set V is a set family Y ⊆ ≤3 closed under taking subsets. The dimension of a simplex σ ∈ Y is defined to be |σ| − 1. We use the terms vertices, edges and faces for 0, 1 and 2-dimensional simplices, respectively. The 1-skeleton of a 2-complex is the subcomplex containing its vertices and edges. The Linial–Meshulam model of random simplicial complexes, introduced in [41], is a generalization of the Erdős–Rényi random graph model and has been studied extensively in recent years. The random 2-complex Y2 (n, p) is defined to have the complete 1-skeleton, i.e., all vertices and edges, and each of the faces independently with probability p. The study of random complexes involves both topological invariants and combinatorial properties, including homology groups, homotopy groups, collapsibility, embeddability and spectral properties. 38 4.1. INTRODUCTION One of the oldest questions of this kind, asked by Linial and Meshulam [41], is how the fundamental group π1 (Y2 (n, p)) of the random 2-complex behaves. Babson, Hoffman and Kahle [3] showed that if p < n−α for some arbitrary α > 1/2 then the fundamental group is nontrivial whp. p On the other hand, they proved that π1 (Y2 (n, p)) is trivial for p > 4 log n/n, which means that the threshold probability for being simply connected should be close to n−1/2 . As a corollary of the first part of Theorem 4.1.1, we improve the upper bound on the threshold probability. Corollary 4.1.2. Let p = simply connected whp. √c n for some constant c > 12 . Then Y2 (n, p) is Proof. Suppose we have a 2-complex C such that one of its edges, e, is contained in a unique face f . Then we can collapse f onto the other two edges without changing the fundamental group of C. In fact, C − f − e is homotopy equivalent to C, the former complex being a deformation retract of the latter. We say that a 2-complex with complete 1-skeleton is a collapsible hypertree if we can apply a sequence of collapses to it and end up with a tree. Clearly, a collapsible hypertree has trivial fundamental group. Now observe that Theorem 4.1.1 implies that Y2 (n, p) contains a collapsible hypertree whp. Indeed, if the process propagates, then take C to be the subcomplex of the faces that correspond to the triples we used to add edges to the graph. Then by definition, the reverse of the triadic process on C is exactly a sequence of collapses resulting in a star. Basic results about the topology of complexes tell us that the addition of faces to a simply connected complex does not change the fundamental group, hence π1 (Y2 (n, p)) is trivial whp. After the publication of this result, we learned that Gundert and Wagner [32] independently improved the bound in [3], and showed that Y2 (n, p) is whp simply connected for p = √cn where c is a sufficiently large constant. 4.1.1 Proof outline Instead of exposing all the triples at once, we will be sampling them on the fly, trying to extend the edge set of the graph. Both the proofs of the upper bound and the lower bound consist of two phases. In the first phase we make one step at a time: we choose, uniformly at random, one (yet unsampled) triple spanning exactly two edges and expose it. With 39 4.2. THE DIFFERENTIAL EQUATION METHOD probability p the triple is selected, hence we can add the third edge to our edge set. The second phase proceeds in rounds: we simultaneously expose all the unsampled triples spanning two edges, and extend the edge set according to the outcome. The essence of the proof is to track the behavior of certain variables throughout the process. As we will see, this is not a very hard task to do in the second phase, using standard measure concentration inequalities. However, during the initial phase of the process, the codegrees (one of the variables we track) are not concentrated, which forces us to do a more careful analysis of the beginning of the process. For this we will use the differential equation method. We organize the rest of the chapter as follows. In Section 4.2 we give an overview of how we apply the differential equation method. A detailed analysis of the actual implementation follows in Section 4.3. We move on to the second phase of the process in Section 4.4, thereby completing the proof of Theorem 4.1.1. We finish with some remarks in Section 4.5. Notations: Throughout the chapter, we will omit floor and ceiling signs whenever they are not necessary. The sign ± will be used to represent both a two-element set of values and a whole interval, but it should be clear from the context which one is the case. 4.2 The differential equation method At any point in the process, we say that a vertex triple {u, v, w} is open if it spans exactly two edges but has not yet been sampled. We will also use the notation uvw for an open triple with edges uv and vw. By an open triple at u, we mean a triple uvw, i.e., one that has its missing edge adjacent to the vertex u. In each step, our process picks an open triple uniformly at random and samples it. If the answer is positive then we close the triple by adding the missing edge to the graph. To analyze this process we apply the differential equation method, using some ideas from [7]. For simplicity, let us denote the graph we obtain after i samples by Gi . We consider the following random variables: Dv (i) is the degree of the vertex v in Gi . Fv (i) is the number of open triples at v, so it is the number 40 4.2. THE DIFFERENTIAL EQUATION METHOD of ways for v to gain a new incident edge in Gi+1 . Xu,v (i) is the codegree of u and v, i.e., the number of common neighbors of u and v in Gi . To provide some insight, we first heuristically describe the process. Let us assume for now that the Dv (i) are concentrated around some value D(i), and similarly the Fv (i) are approximately equal to some value F (i). We further assume that the variables are very close to their expectations. In step i + 1 we choose an open triple uniformly at random, so each triple is chosen with probability P F2 v (i) ≈ nF2(i) , and then sample it. With v probability p the sample is successful, hence we can close the triple. As the number of open triples at a vertex v is about F (i), the change in the degree of a vertex v we expect to see is D(i + 1) − D(i) ≈ 2p . n Now let us see how Fv (i) is affected by a step. We gain open triples at v either if we successfully sample one of them (adding the edge vw), in which case new open triples are formed with the neighbors of w, or if we successfully sample a triple at some neighbor of v. On the other hand, we lose the sampled triple regardless of the outcome. The probability ≈ n2 , so of sampling an open triple at some specific vertex w is P2FwF(i) v v (i) assuming all the codegrees are negligible compared to D(i), the expected change is 2 F (i + 1) − F (i) ≈ (2pD(i) − 1). n To smooth out this discrete process, we introduce a continuous variable t and say that step i corresponds to time t = ti = ni2 . Let us also rescale D and F by considering the smooth √ functions d and f in t, where we want d(t) to be approximately√D(i)/ n and f (t) to be approximately F (i)/n. Note that, since p = c/ n, our assumptions so far suggest the following behavior: d(t + 1/n2 ) − d(t) d (t) ≈ ≈ n3/2 (D(i + 1) − D(i)) ≈ 2c 2 1/n 0 and f 0 (t) ≈ f (t + 1/n2 ) − f (t) ≈ n(F (i + 1) − F (i)) ≈ 4cd(t) − 2. 1/n2 41 4.2. THE DIFFERENTIAL EQUATION METHOD Let us emphasize that this little musing that we are presenting here is not a proof at all — a detailed analysis and the proof of concentration will follow in Section 4.3. However, it at least indicates why it is plausible to believe that the actual values of Dv (i) and Fv (i) follow the trajectories of d and f given by the system of differential equations d0 (t) = 2c and f 0 (t) = 4cd(t) − 2. In the previous paragraphs we made the assumption that the codegrees are negligible compared to the degrees, but since they are not concentrated, proving this still needs some thought. To this end, we introduce two more random variables. Yu,v (i) denotes the number of open 3-walks uww0 v from u to v, i.e., 3-walks where we require that uww0 be open (but allowing w = v), and Zu,v (i) is the number of open 4-walks uww0 w00 v (again, allowing vertex repetitions), where both uww0 and w0 w00 v are open. Note that Yu,v is not symmetric in u and v. The point is that Yu,v and Zu,v are concentrated (as we will see in Section 4.3), and — amazingly enough — their one-step behavior can be described with fairly simple formulas. So let us continue with our thought experiment and assume that all Yu,v (i) ≈ Y (i), all Zu,v (i) ≈ Z(i), and all variables are close to their expectations. First of all, the increase in the codegrees comes from a successful sample in a 3-walk, so we expect Xu,v (i + 1) − Xu,v (i) ≈ 4cy(t) 2p(Yu,v (i) + Yv,u (i)) ≈ 2 . nF (i) n f (t) This will be enough to prove a uniform O(log n) upper bound over all the codegrees, so we can keep ignoring the effect of X in the next few paragraphs. Let us look at the change in Yu,v (i). There are three different ways a new open 3-walk uww0 v can appear after step i+1, depending on which one of uw, ww0 and w0 v is the new edge. When uw is the new edge, there is a 4-walk utww0 v in Gi where utw is open. We can count such configurations by first choosing w0 as a neighbor of v and then choosing an open 3-walk utww0 . Note that for any such choice, uww0 will be an open triple in Gi+1 , except if w0 is the same as u, or a common neighbor of u and w. The latter cases are negligible, so there are about Y (i)D(i) possibilities in this case. Similarly, when ww0 is the new edge, new 3-walks come from 4-walks uwtw0 v, and we can count the number of options by first choosing w as a 42 4.2. THE DIFFERENTIAL EQUATION METHOD neighbor of u and then an open 3-walk from w to v. Again, the triple uww0 will be open in Gi+1 if w0 is neither u, nor a common neighbor of u and v, so we find Y (i)D(i) possibilities of this type. Finally, w0 v can only be the new edge if w0 tv was successfully sampled in some open 4-walk uww0 tv, so there are about Z(i) such options. On the other hand, we lose an open 3-walk if we sample its open triple, whether or not the sample is successful. As any particular triple is chosen with probability about nF2(i) , this means that we expect to see 2 Y (i + 1) − Y (i) ≈ p 2Y (i)D(i) + Z(i) − Y (i) . nF (i) The change in Z(i) is a bit easier to analyze: Once again, we obtain a new 4-walk uww0 w00 v if one of its edges is added in step i+1. We will assume it is the first edge, uw, but by symmetry our counting argument works for all other edges, as well. Then the 4-walk comes from a 5-walk utww0 w00 v in Gi . We can count the number of options by first taking an open triple vw00 w0 at v and then choosing an open 3-walk from u to w0 . Again, the created 4-walk will automatically be open unless w0 is u or a neighbor of u, so there are about Y (i)F (i) candidates of this type and 4Y (i)F (i) in total. And then of course, we lose an open 4-walk if we sample one of its two open triples, regardless of the outcome. This suggests 2 4pY (i)F (i) − 2Z(i) . Z(i + 1) − Z(i) ≈ nF (i) Once again, we are looking √ for smooth functions y and z such that y(t) is approximately Y (i)/ n and z(t) is about Z(i)/n. Then the same computation as before gives the differential equations y 0 (t) = 2 (2cd(t) − 1)y(t) + cz(t) f (t) and z 0 (t) = 4 2cy(t)f (t) − z(t) . f (t) We have yet to talk about the initial conditions of the above system of differential equations. Our process starts with a star centered at some vertex v0 , i.e., an n-vertex graph with n − 1 edges, all of them touching v0 . 43 4.3. CALCULATIONS Then Dv (0) = 1, Fv (0) = n − 2, Yu,v (0) = 0 and Zu,v (0) = n − 3 for any two vertices u and v other than v0 . For convenience, we will drop the center of the star from consideration in the sense that we do not define the variables with v0 among the indices. This is a technicality that allows us to prove concentration, and since our recurrence relations never use those variables, it causes no problem. Hence we obtain the initial conditions d(0) = 0, f (0) = 1, y(0) = 0 and z(0) = 1, and an easy calculation shows that the corresponding solution of our system of differential equations is f (t) = 1 − 2t + 4c2 t2 z(t) = f 2 (t). d(t) = 2ct y(t) = d(t)f (t) In the next section we prove that the variables indeed closely follow the paths defined by these functions. 4.3 Calculations In this section we show that our variables follow the prescribed trajectories up to some time T . Of course, we cannot hope to do so if f (t) vanishes somewhere on [0, T ], as that would mean that the process is expected to die before time T . Now if c > 1/2 √ then f has no positive root, so this is not an issue: we can take T √ = log n. However, if c ≤ 1/2 then f does 1− 1−4c2 reach 0, first at time T0 = . In this case T will be chosen to be a 4c2 constant arbitrarily close to T0 . The allowed deviation of each variable will be defined by one of the error functions g1 (t) = eKt n−1/6 where g2 (t) = (1 + d(t))eKt n−1/6 , and 1 d(t) + . K = 100 · max 1 + 0≤t≤T f (t) f (t) It is clearly enough to prove the first part of Theorem 4.1.1 for c ≤ 1, so from now on we will assume this is the case. Let us define Gi to be the event that all of the bounds below in Proposition 4.3.1(a)-(e) hold for every pair of vertices u and v and for all indices j = 0, . . . , i. This section is devoted to the proof of the following result, which is the key to proving that the variables follow the desired trajectories. 44 4.3. CALCULATIONS Proposition 4.3.1. Fix some vertices u and v. Then, conditioned on Gj−1 , each of the following bounds fails with probability at most n−10 . √ (a) Dv (j) ∈ d(tj ) ± g1 (tj ) n (b) Fv (j) ∈ f (tj ) ± g1 (tj ) n (c) Xu,v (j) ≤ 50 log n √ (d) Yu,v (j) ∈ y(tj ) ± g2 (tj ) n (e) Zu,v (j) ∈ z(tj ) ± g2 (tj ) n As a corollary, we obtain our main result. √ Theorem 4.3.2. Suppose c ≤ 1, and T ≤ log n and K are defined as above. Then the bounds in Proposition 4.3.1(a)-(e) hold with high probability for all vertices u and v and for every j = 0, . . . , T · n2 . Proof. It is easy to check that G0 always holds. If Bj is the event that, conditioned on Gj−1 , at least one of these bounds fails for j, then the failure T n2 probability is exactly P[∪j=1 Bj ]. A trivial union bound over all pairs of vertices and all equations in Proposition 4.3.1 shows that P[Bj ] ≤ 5n−8 , n2 hence another union bound over the indices gives P[∪Tj=1 Bj ] ≤ n−5 = o(1). To prove Proposition 4.3.1, we follow the strategy in [7] and analyze each random variable separately. Our plan is to use some martingale concentration inequalities to bound the probability of large deviation. However, since we cannot track the exact values of the expectations, only estimate them by some intervals, we will use two separate sequences to bound each variable: A submartingale to bound from below, and a supermartingale to bound from above. Recall that a stochastic process X0 , X1 , . . . is called a submartingale if E[Xi+1 |X1 , . . . , Xi ] ≥ Xi for all i, and a supermartingale if E[Xi+1 |X1 , . . . , Xi ] ≤ Xi for all i. We say that a sequence X0 , X1 , . . . of variables is (η, N )-bounded if Xi − η ≤ Xi+1 ≤ Xi + N for all i. We call a sequence of pairs X0± , X1± , . . . an (η, N )-bounded martingale pair, if X0+ , X1+ , . . . is an (η, N )-bounded submartingale and X0− , X1− , . . . is an (η, N )-bounded supermartingale. The following concentration results of Bohman [7] are essential for proving that the variables follow the desired trajectories: 45 4.3. CALCULATIONS Lemma 4.3.3 (Bohman). Suppose η ≤ N/10 and a < ηm. X0± , X1± , . . . is an (η, N )-bounded martingale pair then a2 + P[Xm ≤ −a] ≤ e− 3ηmN and If 0 ≡ a2 − P[Xm ≥ a] ≤ e− 3ηmN . The general idea for analyzing a random variable R(i), representing any of the above five variables, is the following. In step i, an open triple is sampled, and thus with probability p a new edge is added to our graph. We split the one-step change in R(i) into two non-negative P variables: Ai is the gain and Ci is the loss in step i, so R(j) = R(0) + ji=1 Ai − Ci . The gain comes from the contribution of the added edge after a successful sample. Loss can only occur when some open triple stops being open, either because it was sampled or because its missing edge was added through some other open triple (although the effect of the latter event is negligible compared to the former if the codegrees are small). Next we estimate the expectation of Ai (using the recurrence relations − we hinted at in Section 4.2), so that we can define A+ i and Ai , shifted copies of Ai with P non-negative and non-positive expectations, respectively. ± This way Bj = ji=1 A± i is an (η, N )-bounded martingale pair, where η is approximately the expectation and N is some trivial upper bound on Ai . We do the same with the Ci to define Ci± and the martingale pair P j Dj± = i=1 Ci± . Finally = Pj we establish a connection between the concentration of R(j) ± R(0) + i=1 Ai − Ci and the concentration of our shifted variables Bj and Dj± in Lemma 4.3.7, and then use the concentration of martingale pairs, Lemma 4.3.3, to bound the error probabilities in Corollary 4.3.8. The rest of this section is devoted to the actual calculations. The reader might want to skip the details at a first reading. The first subsection establishes the tools we use to prove concentration, while the remaining five subsections prove one by one the five parts of Proposition 4.3.1. 4.3.1 Tools The following claim will help us clean up the calculations of the expecta d(t) 1 tions. Recall that K = 100 · max0≤t≤T 1 + f (t) + f (t) . 46 4.3. CALCULATIONS Claim 4.3.4. Let 0 ≤ t ≤ T so that f (t) > 0 is bounded away from 0 (t might depend on n). If r(t) is one of the functions 1, d(t) or f (t) then the following inequalities hold: √ n) (r(t) ± g1 (t))(f (t) ± g1 (t)) 1 + O( log n K ⊆ r(t) ± g1 (t), f (t) ± g1 (t) 20 2 √ n) (r(t) ± g1 (t))(y(t) ± g2 (t)) 1 + O( log n K ⊆ r(t)d(t) ± g2 (t), f (t) ± g1 (t) 20 log n (z(t) ± g2 (t)) 1 + O( √n ) K ⊆ f (t) ± g2 (t). f (t) ± g1 (t) 20 Proof. Straightforward calculus shows that 2 1 1 g1 (t) g1 (t) ⊆ ± 2 +O . f (t) ± g1 (t) f (t) f (t) f 3 (t) Using this, we will multiply out the formulas on the left-hand side of the inequalities. Note that g1 (t) and g2 (t) are both O(n−1/7 ), so in the expanded formulas, any term containing two factors of the type gα (t) or a factor of √ n ) is consumed by an O(n−2/7 ) error term. Hence the left-hand O( polylog n side of the first inequality is contained in log n 1 g1 (t) −2/7 (r(t) ± g1 (t))(f (t) ± g1 (t)) 1 + O( √ ) ± + O(n ) f (t) f 2 (t) n 2r(t) K ⊆ r(t) ± + 1 g1 (t) + O(n−2/7 ) ⊆ r(t) ± g1 (t). f (t) 20 Similarly, the left-hand side of the second inequality is contained in log2 n 1 g1 (t) −2/7 (r(t) ± g1 (t))(y(t) ± g2 (t)) 1 + O( √ ) ± + O(n ) f (t) f 2 (t) n r(t) y(t) r(t)y(t) ⊆ r(t)d(t) ± + + g2 (t) + O(n−2/7 ) f (t) f (t)(1 + d(t)) f 2 (t)(1 + d(t)) K ⊆ r(t)d(t) ± g2 (t) 20 47 4.3. CALCULATIONS using y(t) = f (t)d(t) and g2 (t) = (1 + d(t))g1 (t). Finally, the left-hand side of the last inequality is contained in g1 (t) log n 1 −2/7 ± + O(n ) (z(t) ± g2 (t)) 1 + O( √ ) f (t) f 2 (t) n 1 z(t) ⊆ f (t) ± + g2 (t) + O(n−2/7 ) f (t) f 2 (t)(1 + d(t)) K ⊆ f (t) ± g1 (t) 20 using z(t) = f 2 (t). The remaining lemmas connect the concentration of the original variables and those shifted by the expectations. We will use the following observations in the calculations. Claim 4.3.5. Let s(t) be a differentiable function on [0, T ] such that supt∈[0,T ] |s0 (t)| = O(polylog n) and ti = ni2 . Then Z tj j−1 1 X s(τ )dτ + O(n−1 ). s(ti ) = n2 i=0 0 Proof. It is a well-known fact in numerical analysis that for reals a ≤ q ≤ b Z b ≤ (b − a)2 sup |s0 (t)|. s(τ )dτ − (b − a)s(q) t∈[a,b] a Taking a = q = ti with b = ti+1 and using ti+1 − ti = n12 , this gives Z ti+1 ti+1 − ti s(t ) i s(τ )dτ − 2 ≤ sup |s0 (t)|, n n2 t∈[ti ,ti+1 ] ti and summing these up for i = 0, . . . , j − 1, we get Z j−1 0 tj tj · sup X 1 t∈[0,tj ] |s (t)| s(t ) ≤ s(τ )dτ − i 0 n2 i=0 n2 polylog n =O ≤ O(n−1 ). 2 n 48 4.3. CALCULATIONS This claim will be applied when s is one of the functions d0 , f 0 , x0 , y 0 , z 0 , g10 and√g20 , in which case s0 is indeed bounded by O(polylog n) in the interval [0, log n]. Claim 4.3.6. For α ∈ {1, 2} we have Z t 1 gα (τ )dτ ≤ (gα (t) − n−1/6 ). K 0 Proof. Note that gα (t) = ϕ(t)eKt n−1/6 , where ϕ(t) is either constant 1 or 1 + d(t). In both cases, ϕ(0) = 1 and ϕ0 (t) ≥ 0 for t ≥ 0, so 0 1 gα0 (t) Kt −1/6 = ϕ(t)e n = (ϕ0 (t)eKt /K + ϕ(t)eKt )n−1/6 ≥ gα (t). K K Hence Z t Z gα (τ )dτ ≤ 0 0 t gα0 (τ ) 1 dτ = (gα (t) − n−1/6 ), K K as required. It is time to formally define the shifted variables. Recall that if R(i) represents one of our random variables, then we use the non-negative variables Ai and Ci for the one-step increase and decrease in R, respectively, so that R(i) − R(i − 1) = Ai − Ci . Our aim is to show that R(i) is approximately nγ r(ti ) for some real γ, where the error (the allowed fluctuation of R) is bounded by nγ gα (ti ) for some α ∈ {1, 2}. Here our choice of γ and α depends on the variable R represents: γ will be 1 for F and Z, 1/2 for D and Y , and 0 for X, while α will be 1 for D and F , and 2 for X, Y and Z. To show the concentration of R, we approximate Ai and Ci by their expectations, which, as we shall prove, lie in the intervals nγ−2 (rA (ti−1 ) ± K g (t )) and nγ−2 (rC (ti−1 ) ± K2 gα (ti−1 )), respectively, for some appro2 α i−1 priately chosen functions rA (t) and rC (t). Thus we can define the shifted + − variables A+ i and Ci having non-negative expectation, as well as Ai and Ci− having non-positive expectation as follows: A± i γ−2 = Ai − n K (rA (ti−1 ) ∓ gα (ti−1 )) 2 with K gα (ti−1 )) 2 with Ci± = Ci − nγ−2 (rC (ti−1 ) ∓ 49 Bj± = j X A± i i=1 j Dj± = X i=1 Ci± . and 4.3. CALCULATIONS P Lemma 4.3.7. Suppose the variable R satisfies R(j) = nγ r(0) + ji=1 Ai − Ci , where r is a polynomial in t such that r0 (t) = rA (t) − rC (t). Then R(j) ≤ nγ (r(tj ) + gα (tj )) − nγ−1/6 /2 + Bj− − Dj+ γ R(j) ≥ n (r(tj ) − gα (tj )) + n γ−1/6 /2 + Bj+ − and Dj− . Proof. Let us first consider the upper bound: γ R(j) = n r(0) + j X A i − Ci i=1 j X K − γ−2 = n r(0) + Ai + n (rA (ti−1 ) + gα (ti−1 )) 2 i=1 j X K + γ−2 − Ci + n (rC (ti−1 ) − gα (ti−1 )) 2 i=1 γ = Bj− − Dj+ + nγ r(0) +n γ−2 j X (rA (ti−1 ) − rC (ti−1 )) + n γ−2 i=1 j X Kgα (ti−1 ) i=1 Now we apply Claim 4.3.5 with functions rA (t) − rC (t) (a polynomial) and Kg √ α (t) (a product of a polynomial and an exponential function). As T ≤ log n, their derivatives are clearly bounded by O(polylog n) on [0, T ]. R(j) ≤ Bj− − Dj+ + nγ r(0) Z tj Z γ γ +n (rA (τ ) − rC (τ ))dτ + n 0 γ ≤ n (r(tj ) + gα (tj )) − n tj Kgα (τ )dτ + O(n−1 ) 0 γ−1/6 /2 + Bj− − Dj+ using Claim 4.3.6 and nγ−1/6 + O(n−1 ) ≥ nγ−1/6 /2 in the last step. The lower bound comes from an analogous argument by changing the appropriate signs. Using this, we can estimate the probability that R(j) deviates from its expectation: 50 4.3. CALCULATIONS Corollary 4.3.8. Suppose the numbers γ, α and the functions R, r, rA , rC satisfy the conditions of Lemma 4.3.7. Suppose furthermore that Bj± and Dj± are (η1 , N1 )-bounded and (η2 , N2 )-bounded martingale pairs, respectively, where ηβ Nβ ≤ ε and ηβ < Nβ /10 for β = 1, 2. Then the probability that R(j) 6∈ nγ (r(tj ) ± gα (tj )) is at most 4e− n2γ−1/3 50εj . Proof. Lemma 4.3.7 shows that R(j) > nγ (r(tj )+gα (tj )) implies nγ−1/6 /2 < Bj− −Dj+ , hence this event is contained in the union of the events nγ−1/6 /4 < Bj− and −nγ−1/6 /4 > Dj+ . A straightforward application of Lemma 4.3.3 then gives a bound of e− n2γ−1/3 50εj on the probability of each event, thus R(j) > n2γ−1/3 nγ (r(tj ) + gα (tj )) occurs with probability at most 2e− 50εj . A similar argument using the other inequality of Lemma 4.3.7 gives the same bound on the probability of the event R(j) < nγ (r(tj ) − gα (tj )), finishing the proof. 4.3.2 Degrees Recall that in this section, and also in the next four sections, we assume Gj−1 holds, i.e., the values of Dv , Fv , Xu,v , Yu,v and Zu,v are all in the prescribed intervals during the first j − 1 steps. Proof of Proposition 4.3.1(a). Let Ai be the indicator random variable of the eventP that an open triple at v was successfully sampled in step i. Then Dv (j) = ji=1 Ai . The probability that Ai+1 = 1 is 2c(f (ti ) ± g1 (ti )) 1 2pFv (i) K Kti −1/6 P ∈ 3/2 ⊆ 3/2 2c ± e n n (f (ti ) ± g1 (ti )) n 2 w Fw (i) using Claim 4.3.4. Set A± i = Ai − 1 n3/2 K 2c ∓ eKti−1 n−1/6 2 and Bj± = j X A± i , i=1 then Bj± is a ( n3c 3/2 , 1)-bounded martingale pair. So if we define Ci and rC (t) to be 0 for all i, then all the conditions of Corollary 4.3.8 are satisfied with the choice of rA (t) = 2c, r(t) = d(t), γ = 1/2,√α = 1 and ε = 3n−3/2 . Hence the probability that R(j) = Dv (j) is not in n(d(tj ) ± g1 (tj )) is less than √ −n1/6 4e log n ≤ n−10 , using 150j ≤ 150n2 log n ≤ n2 log n. 51 4.3. CALCULATIONS 4.3.3 Open triples Proof of Proposition 4.3.1(b). Here we break the one-step change in Fv (i) into two parts: Ai will be the gain in the open triples at v caused by the i’th P sample and Ci will be the loss, so that we can write Fv (j) = n − 1 + ji=1 Ai − Ci . We may lose a particular open triple uwv in two different ways: either if we sample it, or if we successfully sample another open triple with the same missing edge vu. There are at most Xu,v ≤ 50 log n candidates √ for this other triple and a successful sample has probability p = O(1/ n), so the linearity of expectation gives √ n )) 2Fv (i)(1 + O( log n P E[Ci+1 ] = F (i) w w ∈ √ n )) 2(f (ti ) ± g1 (ti ))(1 + O( log n n(f (ti ) ± g1 (ti )) 1 ⊆ n K Kti −1/6 2± e n 2 using Claim 4.3.4 Set Ci± 1 = Ci − n K Kti−1 −1/6 n 2∓ e 2 and Dj± = j X Ci± , i=1 then D0± , D1± , . . . is a ( n3 , 50 log n)-bounded martingale pair, because one sample can only “break” the open triples with the same missing edge, and there are at most codegree-many of them. On the other hand, as we have already mentioned in Section 4.2, there are two ways to obtain new open triples at v. The contribution of a new edge vu touching v in step i + 1 is Du (i) − Xu,v (i) because it creates an open triple at v with any edge of u except if the third edge is already there. Alternatively, a new edge incident to a neighbor u of v creates a new open P triple unless it connects to another neighbor of v. There are at most u,u0 ∈Dv (i) Xu,u0 (i) open triples that could create an edge between two 52 4.3. CALCULATIONS neighbors of v, so E[Ai+1 ] = P 2p Fw (i) X Du (i) − Xu,v (i) wu∈Fv (i) +P 2p Fw (i) X X Fu (i) − u∈Dv (i) O(Xu,u0 (i)) . u,u0 ∈Dv (i) Note how we abuse our notation to also think of the quantities Dv (i) and Fv (i) as the set they count. So u ∈ Dv (i) should be understood as a neighbor of v and wu ∈ Fv (i) refers to an open triple vwu. Using Claim 4.3.4 we get E[Ai+1 ] ⊆ ⊆ √ n )) 4c(d(ti ) ± g1 (ti ))(f (ti ) ± g1 (ti ))(1 − O( log n 1 n n(f (ti ) ± g1 (ti )) K Kti −1/6 4cd(ti ) ± e n . 2 This means that for 1 K Kti−1 −1/6 ± Ai = Ai − n 4cd(ti−1 ) ∓ e n 2 and Bj± = j X A± i , i=1 B0± , B1± , . . . is a martingale pair. √ Next, we show that it is a ( logn n , n log n)-bounded martingale pair. Indeed, adding an edge vw in step i + 1 can increase the number of open triples at v by at most Ai ≤ Dw (i) whereas an edge ww0 not touching v can√only increase Dw (i) = √ it by one. The upper bound then comes from ± . On the other hand, A is smallest O( n log n) ≤ n log n and Ai ≥ A± i i √ 2 when Ai = 0. Observing that 4cd(t) ≤ 8c log n we see that the change is bounded from below by (− log n/n). Therefore we can apply Corollary 4.3.8√with rA (t) = 4cd(t), rC (t) = 2, r(t) = f (t), γ = 1, α = 1, and ε = log2 n/ n to show that the probability that R(j) = Fv (j) + 1 (or Fv (j)) is not in the interval n f (tj ) ± g1 (tj ) is 3 1/6 at most 4e−n / log n ≤ n−10 . 4.3.4 3-walks Proof of Proposition 4.3.1(d). Once again, we break the one-step change in Yu,v (i) into two parts: Ai will be the gain in the open 3-walks from u 53 4.3. CALCULATIONS to v caused P by the i’th sample and Ci will be the loss, so we can write Yu,v (j) = ji=1 Ai − Ci . We lose a particular 3-walk uww0 v either if we sample its open triple uww0 , or if we add the missing edge uw0 by successfully sampling some other triple (as before, the latter event is unlikely since the codegrees are small). Then the linearity of expectation and Claim 4.3.4 gives √ n )) 2Yu,v (i)(1 + O( log n P E[Ci+1 ] = w Fw (i) ∈ √ n )) 2(y(t) ± g2 (t))(1 + O( log n n3/2 (f (t) ± g1 (t)) ⊆ 1 n3/2 K 2d(t) ± g2 (t) . 2 So defining Ci± = Ci − 1 n3/2 K 2d(ti−1 ) ∓ g2 (ti−1 ) 2 and Dj± = j X Ci± , i=1 n , 50 log n)-bounded martingale pair. we get that D0± , D1± , . . . is a ( log n3/2 Now let us look at Ai+1 , the number of ways a new open 3-walk uww0 v can be created in step i + 1. We follow the analysis described in Section 4.2. If uw is the new edge, then we need to count the 4-walks utww0 v in Gi where w0 is not u or a neighbor of u, and utw is open. Let N be the set of such candidates for w0 , then |N | = Dv (i) − O(log n), and the expected contribution to Ai+1 of this type is log n P √ 2p w0 ∈N Yu,w0 (i) 2c d(t) ± g1 (t) + O( n ) y(t) ± g2 (t) P ∈ n3/2 (f (t) ± g1 (t)) r Fr (i) 1 2cd(t)y(t) K ⊆ 3/2 ± g2 (t) n f (t) 6 Strictly speaking, we are using the linearity of expectation over the indicator variables for each fixed 3-walk uww0 v. The probability that this walk is created is the number of t’s such that utww0 is an open 3-walk in Gi , divided by the number of open triples. We similarly get that the expected contribution where ww0 is the new edge is P 2p w0 ∈N Yw0 ,u (i) 1 2cd(t)y(t) K P ∈ 3/2 ± g2 (t) , n f (t) 6 r Fr (i) 54 4.3. CALCULATIONS whereas new open 3-walks where w0 v is the new edge come from open 4walks uww0 tv in Gi , so the expected contribution of this type is 1 2cz(t) K 2pZu,v (i) 2c(z(t) ± g2 (t)) P ⊆ 3/2 ± g2 (t) . ∈ 3/2 n (f (t) ± g1 (t) n f (t) 6 r Fr (i) Putting all of these together, we see that for A± i = Ai − 1 n3/2 ! 2c 2d(ti−1 )y(ti−1 ) + z(ti−1 ) K ± g2 (ti−1 ) , f (ti−1 ) 2 P log2 n Bj± = ji=1 A± i is a martingale pair. In fact it is ( n3/2 , 50 log n)-bounded, since a new edge can contribute at most codegree-many new 3-walks. Now we can apply Corollary 4.3.8 with rA (t) = 2c 2d(t)y(t)+z(t) /f (t), rC (t) = 2y(t)/f (t), r(t) = y(t) (recall the differential equation that y satisfies to see that r0 = rA − rC ), γ = 1/2, α = 2 and ε = log4 n/n3/2 to show that the probability that R(j) = Yu,v (j) is not in the interval √ 5 1/6 n y(tj ) ± g2 (tj ) is at most 4e−n / log n ≤ n−10 . 4.3.5 4-walks Proof of Proposition 4.3.1(e). This time we define Ai to be the number of new open 4-walks created in step i and Ci toP be the number of open 4-walks we lose in step i, so that Zu,v (j) = n − 2 + ji=1 Ai − Ci . Once again, we lose an open 4-walk uww0 w00 v if one of its open triples uww0 or w0 w00 v is sampled, or if one of their missing edges uw0 or w0 v is added through a successful sample of a different open triple. Hence we get, using Claim 4.3.4 √ n )) 4Zu,v (i)(1 + O( log n P E[Ci+1 ] = w Fw (i) ∈ √ n )) 4(z(t) ± g2 (t))(1 + O( log n n(f (t) ± g1 (t)) 1 ⊆ n 4z(t) K ± g2 (t) . f (t) 2 So defining Ci± 1 = Ci − n 4z(ti−1 ) K ∓ g2 (ti−1 ) f (ti−1 ) 2 55 and Dj± = j X i=1 Ci± , 4.3. CALCULATIONS 2 we get that D0± , D1± , . . . is a ( logn n , 2500 log2 n)-bounded martingale pair, because an added edge of the form uw0 or w0 v can ruin at most Xu,w0 ·Xw0 ,v ≤ (50 log n)2 open 4-paths. On the other hand, the analysis in Section 4.2 shows that a new open 4-walk uww0 w00 v can be created in four different ways, based on which one of the four edges was added in step i + 1. In the case when uw is the new edge, we need to count the 5-walks utww0 w00 v where utw and vw00 w0 are open, and w0 is not u or a neighbor of u. Let M be the set of such edges w0 w00 for fixed u and v. Then q |M | = Fv (i) − Yv,u (i) − O(Xv,u (i)) = Fv (i) − O( n log3 n), hence the expected contribution in this case is 2 log n P √ 2p w0 w00 ∈M Yu,w0 (i) 2c f (t) ± g1 (t) + O( n ) y(t) ± g2 (t) P ∈ n(f (t) ± g1 (t)) r Fr (i) 1 2cf (t)y(t) K ± g2 (t) . ⊆ n f (t) 8 But the remaining three cases are essentially the same, we only need to switch u and v or the two indices of the variables Y . This means that 1 8cf (t)y(t) K E[Ai+1 ] ∈ ± g2 (t) , n f (t) 2 so we can define A± i 1 = Ai − n 8cf (ti−1 )y(ti−1 ) K ∓ g2 (ti−1 ) f (ti−1 ) 2 and Bj± = j X A± i , i=1 √ where B0± , B1± , . . . is a ( logn n , 3 n log2 n)-bounded martingale pair. This is because a new edge of the form uw can add at most Dv (i)Xw,w00 (i) = √ √ O( n log3/2 n) ≤ n log2 n new 4-walks and the same bound works for an edge touching v, whereas a new edge not touching u and v creates at most 100 log n open 4-walks: at most codegree-many in both of the positions ww0 and w0 w00 . Now we apply Corollary 4.3.8 with rA (t) = 8cf (t)y(t)/f (t), rC (t) = 4z(t)/f (t), r(t) = z(t) (the differential equation for z implies r0 = rA − rC ), √ 6 γ = 1, α = 2 and ε = log n/ n to show that the that R(j) = probability −n 1/6 / log7 n n + Zu,v (j) is not in the interval n z(tj ) ± g2 (tj ) is at most 4e ≤ n−10 . 3 56 4.4. THE SECOND PHASE 4.3.6 Codegrees Proof of Proposition 4.3.1(c). Let Ai = Xu,v (i)−Xu,v (i−1) Pj be the increase in the codegree of u and v in a step so that Xu,v (j) = 1 + i=1 Ai . It is easy to see that Ai+1 is the indicator random variable of the event that the open triple of an open 3-walk from u to v or from v to u is successfully sampled in step i + 1. The probability of this event is 1 4cy(t) K 2p(Yu,v (i) + Yv,u (i)) 4c(y(t) ± g2 (t)) P ∈ 2 ⊆ 2 ± g2 (t) . n (f (t) ± g1 (t)) n f (t) 2 w Fw (i) P 4cy(ti−1 ) 1 K So if we set A− = A − then Bj− = ji=1 A− + g (t ) i i i n2 f (ti−1 ) 2 2 i−1 √ n is a supermartingale and it is ( 10 nlog , 1)-bounded. Now we can apply 2 = 4cd(t) = 8c2 t, rC (t) = 0 Lemma 4.3.7 with γ = 0, α = 2, rA (t) = 4cy(t) f (t) and r(t) = 4c2 t2 + 1 to R(j) = Xu,v (j). Then the first inequality gives Xu,v (j) ≤ 1 + 4c2 t2j + g2 (tj ) + Bj− . √ Therefore (keeping in mind that tj ≤ log n and c ≤ 1) we see that if Xu,v (j) > 50 log n then Bj− > 25 log n. But by Lemma 4.3.3 this has probability at most e− 252 log2 n 30 log n ≤ e−10 log n = n−10 √ for any j ≤ n2 log n, finishing our claim. 4.4 The second phase In this section we analyze the second phase of the process and prove our main result, the lower and upper bounds on the threshold probability. Unlike in the first phase, where we made one step at a time, here we expose triples in rounds. In a round we simultaneously sample all the currently open triples, and then add the edges accordingly. Let us adapt our notation to the second phase as follows. From now on Dv (i), i = 0, 1, . . . will denote the degree of the vertex v after i rounds in the second phase. For example, Dv (0) is the degree of v at the end of the first phase, i.e., Dv (T n2 ) with the old notation. We similarly re-define the 57 4.4. THE SECOND PHASE other variables Fv , Xu,v , Yu,v and Zu,v , and let Gi denote the graph after the i’th round. As in the previous chapter, we will make use of the following Chernofftype inequalities (see, e.g., [33]). Claim 4.4.1. Let X ∼ Bin(n, p) be a binomial random variable. Then a2 1) P[X > np + a] ≤ e− 2(np+a/3) and a2 2) P[X < np − a] ≤ e− 2np . 4.4.1 The lower bound Suppose c < 12 is some fixed constant. Before we start the second phase, we need to decide how many √ steps the first phase should take. Recall that 1− 1−4c2 and that it is monotone decreasing in the f (t) has a root at T0 = 4c2 interval [0, T0 ]. It is easy to check that d(T0 ) < 1, so fix a positive constant cε < δ. We define the stopping δ < 1 − d(T0 ) and choose ε > 0 so that 1−2c time T to be in the interval [0, T0 ] so that f (T ) = ε/2. Hence if we apply Theorem 4.3.2 with this T , we get that after T n2 steps √ √ • Dv (0) ≤ (d(T ) + g1 (T )) n ≤ (1 − δ) n and • Fv (0) ≤ (ε/2 + g1 (T ))n ≤ εn for every vertex v. At this point, we move on to the second phase of the process. The plan is to show that √ the second phase ends in O(log n) rounds, while all the degrees stay below n. This would imply that the final graph has at √ most n n edges, in particular, it is not complete. The following statement bounds the degrees of the vertices in the first O(log n) rounds. Showing that in the meantime the second phase gets stuck will be an easy corollary. √ Claim 4.4.2. Let m = 4 log1/2c n. Then, with high probability, Dv (i) < n for every vertex v and 0 ≤ i ≤ m. Proof. We will prove by induction that with high probability √ • Dv (i) ≤ 1 − δ + (1 + 2c + . . . + (2c)i−1 )cε + i · n−1/6 n 58 and 4.4. THE SECOND PHASE • Fv (i) ≤ ((2c)i ε + 2n−1/6 )n hold for every vertex v and 1 ≤ i ≤ m. Note that, by our √ choice √ of ε, the cε + i · n−1/6 n < n. bound on the degrees is less than 1 − δ + 1−2c To proceed with the induction, we condition on the event that the bounds hold for i and then estimate the probability that they fail for i + 1 for some vertex v. First we show √ that the degree of each vertex increases by at most (2c)i cε + n−1/6 n in round i + 1. Indeed, the number of new edges that touch the vertex v is stochastically dominated by the binomial distribution c Bin Fv (i), √n . Hence, by the first Chernoff-bound in Claim 4.4.1, c 1/6 1/3 P Dv (i + 1) − Dv (i) > Fv (i) √ + (1 − 2c)n < e−Ω(n ) , n so a union bound over all the vertices shows that the first bound fails in 1/7 round i + 1 with probability at most e−Ω(n ) . The second inequality follows from the first one by an easy counting argument. Since we sample all the current open triples every round, the ones counted in Fv (i + 1) are all new triples, i.e., they contain at least one new edge added in round i+1. Now an open triple either has a new edge incident √ to v or not. If it does, we can choose it in at√most (2c)i cε + n−1/6 n ways, and then extend √ each choice in at most n ways to get a triple (as all degrees are below n). If not, then we first choose a neighbor of v and then a new incident edge. Consequently, the total number of open triples at v is √ √ Fv (i + 1) ≤ 2 · (2c)i cε + n−1/6 n · n = ((2c)i+1 ε + 2n−1/6 )n. Taking a union bound over all the m rounds then completes the proof. Corollary 4.4.3. Let Q(i) be the total number of open triples after i rounds. Then Q(m) = 0 with high probability. Proof. If a triple is open after the i’th round, then it contains at least one new edge. Of course, the number of open √ triples containing some fixed new edge uv is at most Dv (i) + Du (i) ≤ 2 n, whp. On the other hand, the number of new edges cannot exceed the number of positive samples in the i’th round, distributed as Bin(Q(i − 1), √cn ). Putting these together, this 59 4.4. THE SECOND PHASE means that Q(0), . . . , Q(m) is a√sequence of random variables where Q(i) is stochastically dominated by 2 n · Bin(Q(i − 1), √cn ). In particular, E[Q(i)] = E[E[Q(i)|Q(i − 1)]] ≤ E[2cQ(i − 1)] = 2cE[Q(i − 1)]. Using Q(0) ≤ n3 , a simple application of Markov’s inequality gives P[Q(m) > 0] ≤ E[Q(m)] ≤ (2c)m Q(0) ≤ n−4 · n3 = o(1). Proof of Theorem 4.1.1, part 2. Corollary 4.4.3 shows that whp the process runs out of open triples after at most m rounds in the second phase. According to Claim 4.4.2, at this final stage all vertices have degree at most √ 3/2 n, i.e., the graph has at most n 2 edges whp. 4.4.2 The upper bound 1 Suppose √ 2 < c ≤ 1 is fixed. Then we can run the first phase all the way, 2 for n log n steps. Indeed, as the function f (t) has a global minimum of f 4c12√ = 1 − 4c12 > 0, we can apply Theorem 4.3.2 with stopping time T = log n. Our plan is to give rapidly increasing lower bounds on the degrees and codegrees as the graph evolves, thus showing that we reach the complete graph in O(log log n) rounds. Let us analyze the first round separately. The initial parameters of the second phase are, as implied by Theorem 4.3.2, • Xu,v (0) ≤ 50 log n, • Zu,v (0) = 16c4 n log2 n + O(n log3/2 n) ≥ 2c4 n log2 n. for any vertices u and v. Lemma 4.4.4. There is some constant γ > 0, such that the codegree Xu,v (1) ≥ γ log2 n for every pair of vertices u, v with high probability. Proof. Fix u and v. We expect most of their new common neighbors to be vertices w with open triples to both u and v. So if X̃p,r denotes the number of open triples pqr, then we want many vertices w such that both X̃u,w and X̃w,v are relatively large. 60 4.4. THE SECOND PHASE Claim 4.4.5. For every pair of vertices u, v, there are at least a · n vertices c4 c4 and b = 50 are w such that X̃u,w (0), X̃v,w (0) ≥ b log n, where a = 2500 positive constants. Proof. Note that an open 4-walk is just a sequence of two open triples, hence X 2c4 n log2 n ≤ Zu,v (0) = X̃u,w (0) · X̃v,w (0). w∈V \{u,v} Here each summand is bounded by (50 log n)2 , so if fewer than an vertices w satisfy c4 log2 n ≤ X̃u,w (0) · X̃v,w (0), then the right hand sum above is less than c4 log2 n · n + (50 log n)2 · an = 2c4 n log2 n, a contradiction. At the same time, the bound on the codegrees implies that each w with c4 log2 n ≤ X̃u,w (0) · X̃v,w (0) satisfies our requirements. Now if some w shares at least b log n open triples with both u and v, then it becomes a new neighborof them after the first round with common b log n 2 2 √ n probability at least 1 − 1 − √cn ≥ cb2log (here and later in n this section we use that (1 − α)β ≤ 1 − αβ/2 for all αβ ≤ 1). These events are independent for different w’s, hence Xu,vis bounded from below by the Binomial random variable Bin an, c log2 n 4n 2 2 b2 . Then by Lemma 4.4.1, the 2 probability that Xu,v (1) is smaller than γ log n is e−Ω(log n) for a sufficiently small γ. A union bound over all pairs of vertices finishes the proof. To make our life easier, we consider a slightly different second phase from this point on. Instead of sampling open triples with success probability p, we will consider a sprinkling process, and sample all triples with success probability √n 4log n in each round (starting from round 2). This means that some triples will have a higher than p chance to exist, but as long as the number of rounds m is O(log log n), the effect is negligible: each triple is √ still sampled with probability at most c+o(1) . Formally we can say that we n 0 are proving the result for any constant c > c. To give a lower bound on the codegrees in Gi+1 , we define the following sequence: i−1 i−1 xi = γ 2 log2 +1 n, i = 1, . . . , m with xm+1 = n , 10 where we choose m = O(log log n) to be smallest possible xi−1 √ such that xm ≥ 14 n log n. Let us also set pi = 1 − 1 − √n 4log n . 61 4.4. THE SECOND PHASE Lemma 4.4.6. With high probability Xu,v (i) ≥ xi for all 1 ≤ i ≤ m + 1 and all pairs of vertices u and v. Proof. Lemma 4.4.4 shows that the lower bound on the codegrees holds for i = 1, so assume 2 ≤ i. We condition on the event that the statement holds for i − 1 and bound the probability that it fails for i. We claim that under these conditions G(n, pi ) is a subgraph of Gi . To see this, observe that if an edge uv is missing from √ Gi−1 , then it has Xu,v (i−1) ≥ xi−1 independent chances of probability 4/ n log n of being added in the i’th round. Moreover, these events are independent for the different nonedges, as the triples are sampled independently, and each triple has at most one missing edge. This means that missing edges are added independently with probability at least pi while existing edges are kept in the graph, thus indeed G(n, pi ) ⊆ Gi . We intend to use the Chernoff bound to show that all the codegrees in G(n, pi ), and thus also in Gi , exceed xi . For this, observe that the codegree of any fixed pair of vertices in G(n, pi ) is a binomial random variable Ri ∼ Bin(n − 2, p2i ). A straightforward calculation gives E[Ri ] > 2xi . Indeed, for 2 ≤ i ≤ m we have xi−1 2 (n − 2) · 4x2i−1 2x2 4 ≥ > i−1 = 2xi . (n − 2) 1 − 1 − √ n log n log n n log n √ whereas for i = m + 1 (using xm ≥ 14 n log n), xm 2 4 (n − 2) 1 − 1 − √ ≥ (n − 2)(1 − 1/e)2 > 2xm+1 , n log n 2 Thus, as xi ≥ δ log2 n, Claim 4.4.1 shows that P[Ri < xi ] = e−Ω(log n) . Now taking the union bound over all vertex pairs and over all i finishes the proof. Proof of Theorem 4.1.1, part 1. We claim that Gm+2 is the complete graph. Indeed, Lemma 4.4.6 shows that whp all the codegrees in Gm+1 are linear, so the probability that √ a fixed edge is missing from Gm+2 is at most (1 − √ −Ω( n/ log n) Ω(n) 4/ n log n) = e . A union bound over all pairs of vertices then completes the proof. 62 4.5. FURTHER REMARKS 4.5 Further remarks Probably the most natural question that one can ask is the following. What happens if the process starts with some other tree, and not the star? Intuitively it seems that we are in a worse situation as there are fewer open √ , then starttriples to start with. We would therefore expect that if p ≤ 21−ε n ing with any fixed tree, the triadic process fails to propagate whp. In fact, we believe that whp this holds for all trees simultaneously. Using the topology language, this is equivalent to saying that p = 2√1 n is the threshold probability for a random 2-complex to contain a collapsible hypertree (the upper bound comes from Corollary 4.1.2). We must note that a complex can have trivial fundamental group without actually containing a collapsible hypertree. A yet stronger question would be to ask for a lower bound matching the threshold estimate in Corollary 4.1.2 for being simply connected. Going in a different direction, it would also be interesting to study similar processes that are perhaps more meaningful from the social networks point of view. For example, a triadic process where vertices are discouraged to reach high degrees could be a more realistic model. 63 Chapter 5 Decomposing random graphs into few cycles and edges 5.1 Introduction Problems about packing and covering the edge set of a graph using cycles and paths have been intensively studied since the 1960s. One of the oldest questions in this area was asked by Erdős and Gallai [22, 23]. They conjectured that the edge set of any graph G on n vertices can be covered by n−1 cycles and edges, and can be partitioned into a union of O(n) cycles and edges. The covering part was proved by Pyber [46] in the 1980s, but the partitioning part is still open. As noted in [23], it is not hard to show that O(n log n) cycles end edges suffice. This bound was recently improved to O(n log log n) by Conlon, Fox and Sudakov in [18], where they also proved that the conjecture holds for random graphs and graphs of linear minimum degree. This chapter treats the problem in the case of random graphs in more detail. Let 0 ≤ p(n) ≤ 1 and define G(n, p) to be the random graph on n vertices, where the edges are included independently with probability p. We hope to find a close to optimal partition of the edges of a random graph into cycles and edges. Observe that in any such partition each odd-degree vertex needs to be incident to at least one edge, so if G(n, p) has s odddegree vertices, then we certainly need at least s/2 edges. Also, a typical random graph has about n2 p edges, whereas a cycle may contain no more cycles (or edges) to cover the than n edges, so we need at least about np 2 64 5.1. INTRODUCTION + 2s on remaining edges. This simple argument gives a lower bound of np 2 the optimum. In this chapter we show that this lower bound is asymptotically tight. Let odd(G) denote the number of odd-degree vertices in the graph G. We say that G(n, p) (with p = p(n)) satisfies some property P with high probability or whp if the probability that P holds tends to 1 as n approaches infinity. Our main result is the following theorem. Theorem 5.1.1. Let the edge probability p(n) satisfy p = ω log nlog n . Then whp G(n, p) can be split into odd(G(n,p)) + np + o(n) cycles and edges. 2 2 In fact, as we show in Lemma 5.2.2, for most of the p’s in the range, odd(G(n, p)) ∼ n2 . This immediately implies the following, perhaps more tangible, corollary. Corollary 5.1.2. Let p = p(n) be in the range [ω log nlog n , 1 − ω n1 ]. + o(n) cycles and edges. Then whp G(n, p) can be split into n4 + np 2 Here we use the standard notation of ω (f ) for any function that is asymptotically greater than the function f (n), i.e., g(n) = ω (f (n)) if limn→∞ fg(n) = ∞. In this chapter log stands for the natural logarithm, (n) and for the sake of clarity we omit the floor and ceiling signs whenever they are not essential. We call G an Euler graph if all the vertices of G have even degree (not requiring that G be connected). 5.1.1 Proof outline We will break the probability range into three parts (the sparse, the intermediate and the dense ranges), and prove Theorem 5.1.1 separately for each part. The proofs of the denser cases build on the sparser cases, but all the proofs have the following pattern: we start with deleting odd(G(n,p)) + o(n) 2 edges so that the remaining graph is Euler, and then we extract relatively few long cycles to reduce the problem to a sparser case. In the sparse case we will check the Tutte condition to show that there is a large matching on the odd-degree vertices, and then use expansion properties to iteratively find cycles that are much longer than the average degree. In the end, we are left with a sparse Euler subgraph, which breaks into o(n) cycles. 65 5.2. COVERING THE ODD-DEGREE VERTICES The denser cases are somewhat more complicated. We will need to break G(n, p) into several random subgraphs. These graphs will not be independent, but we can remove edges from one of them without affecting the random structure of the others. In the intermediate case we break G(n, p) into three random subgraphs, G(n, p) = G1 ∪ G2 ∪ G3 . First we find an edge set E0 in G2 such that G(n, p) − E0 is Euler. Then we break (G2 ∪ G3 ) − E0 into matchings and G1 into even sparser random graphs. Using a result by Broder, Frieze, Suen and Upfal [14] about paths connecting a prescribed set of vertex pairs in random graphs, we connect the matchings into cycles using the parts of G1 . The remaining edges are all from G1 , and the tools from the sparse case take care of them. In the dense case we break into four subgraphs, G(n, p) = G1 ∪G2 ∪G3 ∪ G4 , where G4 contains the majority of the edges. Again we start by finding the edge set E0 in G3 . Next, we apply a recent packing result by Knox, Kühn and Osthus [38] to find many edge-disjoint Hamilton cycles in G4 . Then we break the remaining edges from G3 ∪ G4 into matchings and use G2 to connect them into cycles. At this point we have a still intact random graph G1 and some edges from G2 left, and these fit into the intermediate setting, hence the previous results complete the proof. The chapter is organized as follows: in Section 5.2 we prove all the results we need about odd-degree vertices, including the typical existence of E0 and the fact that normally about half the vertices are odd. Section 5.3 shows why we can iteratively remove relatively long cycles from sparse random graphs, and proves Theorem 5.1.1 in the sparse range. In Section 5.4 we show the details of breaking into matchings and then connecting them into few cycles, and prove the main lemma for the intermediate range. Finally, we complete the proofs of the denser cases in Section 5.5. 5.2 Covering the odd-degree vertices The first step in the proof is to reduce the problem to an Euler subgraph by removing relatively few edges. This section is devoted to the discussion of the results we need about odd-degree vertices in random graphs. As in the previous chapters, we will use the Chernoff-type inequalities to bound large deviations. We have collected the variants needed for this chapter in the following claim. For proofs, see [2]. 66 5.2. COVERING THE ODD-DEGREE VERTICES Claim 5.2.1. Suppose we have independent indicator random variables X1 , . . . , Xn , where Xi = 1 with probability pi and Xi = 0 otherwise, for all i = 1, . . . , n. Let X = X1 + · · · + Xn be the sum of the variables and p = (p1 + · · · + pn )/n be the average probability so that E(X) = np. Then: 2 /n (a) P(X > np + a) ≤ e−2a for all a > 0, (b) P(X > 2np) ≤ e−np/20 , 2 /np (c) P(X < np − a) ≤ e−a and hence (d) P(X < np/2) ≤ e−np/20 . In particular, all the above estimates hold when X ∼ Bin(n, p) is a binomial random variable. In the coming results, we will also make use of the following easy observation: For X ∼ Bin(n, p), the probability that X is odd is exactly n−1 bX 2 c n (1 − p + p)n − (1 − p − p)n p2i+1 (1 − p)n−(2i+1) = 2i + 1 2 i=0 = 1 − (1 − 2p)n . 2 First we prove the following estimate on the number of vertices of odd degree in G(n, p). Claim 5.2.2. Suppose the probability p satisfies ω n1 < p < 1 − ω n1 . Then whp G ∼ G(n, p) contains n2 (1 + o(1)) vertices of odd degree. Proof. Let us fix a bipartition of the vertices into two nearly equal parts V = V1 ∪ V2 , where n1 = |V1 | = bn/2c and n2 = |V2 | = dn/2e. Our plan is to show that whp roughly half the vertices of V1 and roughly half the vertices of V2 have odd degree in G. So let Y1 be the number of odd-degree vertices in V1 . The probability n−1 that a specific vertex v has odd degree is 1−(1−2p) , so by the linearity of 2 expectation, the expected number of odd degree vertices in V1 is E(Y1 ) = n1 n1 (1 − 2p)n−1 n1 − ∼ 2 2 2 67 5.2. COVERING THE ODD-DEGREE VERTICES for ω (1/n) ≤ p ≤ 1 − ω (1/n). We still need to prove that Y1 is tightly concentrated around its mean. Let us now expose the random edges spanned by V1 . Then the degrees of the vertices in V1 are determined by the crossing edges between V1 and V2 . At this point, each vertex in V1 has n2 incident edges unexposed, and these edges are all different. So let v be any vertex in V1 . The probability that n2 1 ∼ , v is connected to an odd number of vertices in V2 is p1 = 1−(1−2p) 2 2 so the probability that v will end up having an odd degree is either p1 or (1 − p1 ) (depending on the parity of its degree in G[V1 ]). This means that, conditioning on any collection of edges spanned by V1 , Y1 is the sum of n1 indicator variables with probabilities p1 or 1 − p1 , so we can apply Claim 5.2.1 (a) and (c) with average probability min{p1 , 1 − p1 } ≤ p̄1 ≤ max{p1 , 1 − p1 } to get 2 /n 1 P(|Y1 − E(Y1 )| > a) ≤ e−2a 2 /n + e−a 1 p̄1 . Then taking a = n2/3 and using that p̄1 ∼ 1/2 we get 1/3 P(|Y1 − E(Y1 )| > n2/3 ) ≤ 3e−2n = o(1). Repeating the same argument for V2 , we see that if Y2 is the number of odd-degree vertices in V2 , then E(Y2 ) = n2 − n2 (1 − 2p)n−1 ∼ n2 /2 2 and 1/3 P(|Y2 − E(Y2 )| > n2/3 ) ≤ 3e−2n −2n1/3 So with probability at least 1 − 6e vertices is n2 (1 + o(1)). . = 1 − o(1) the number of odd In order to show that typically there is a small set of edges covering the odd vertices, we will need some properties of sparse random graphs. The following somewhat technical lemma collects the tools we use in the proof. 10 Lemma 5.2.3. Let p = p(n) satisfy ω log nlog n ≤ p ≤ logn n . Then the following properties hold whp for G ∼ G(n, p): 1) the giant component of G covers all but at most o(n) vertices, and all other components are trees, 68 5.2. COVERING THE ODD-DEGREE VERTICES 2) the independence number α(G) is at most 2 log(np) , p 3) the diameter of the giant component of G is at most 2 log n , log(np) 4) any set T of at most 2n1/10 vertices spans at most 2|T | edges, 5) any two disjoint vertex sets T, T 0 of the same size n1/10 ≤ t ≤ n/100 are connected by less than tnp/6 edges, and 6) all but at most n/2 log2 n vertices in G have at least np/5 odd-degree neighbors. Proof. For the first two properties we refer to Bollobás [12], while the third property was proved by Chung and Lu [16]. To prove the fourth one, note that the probability that a fixed set T of size t spans at least 2t edges is at t most (2t2) · p2t . So the probability that some T of size t ≤ 2n1/10 spans at least 2t edges is at most 1/10 2n X t=5 1/10 1/10 2 2n 2n X X n t · · p2t ≤ (nt4 p2 )t ≤ n−t/2 = o(1). t 2t t=5 t=5 We use a similar argument to prove the fifth property. For fixed T1 and T2 of size t, the probability that at least tnp/6 edges connect them in G tnp/6 t2 is at most tnp/6 p . So the probability that the property does not hold can be bounded from above by n/100 n/100 X n2 t2 X en 2t et2 p tnp/6 tnp/6 · · p ≤ t tnp/6 t tnp/6 1/10 1/10 t=n t=n n/100 ≤ X t=n1/10 e 2 n2 · t2 6et n log log n/6 !t . Here 6et < 12 , so for large enough n this probability is smaller than Pn/100n 1/10 (1/2)t < 2 · 2−n = o(1). t=n1/10 The last property has a similar flavor to Claim 5.2.2. We will prove that the probability that a particular vertex has fewer than np/5 odd neighbors is at most 1/ log3 n. Then the expected number of such vertices is at most n/ log3 n, hence the probability that there are more than n/2 log2 n of them is, by Markov’s inequality, at most 2/ log n = o(1). 69 5.2. COVERING THE ODD-DEGREE VERTICES So let us pick a vertex v in G. Using Claim 5.2.1, we see that the probability that it has fewer than np/2 or more than 2np neighbors is at most 2e−np/20 . Assume this is not the case, and expose the edges spanned by the neighborhood N of v. Now the number of odd neighbors of v is determined by the edges between N and N = V − (N ∪ v). In fact, the probability that a vertex u ∈ N is connected to an odd number of vertices n−d−1 in N is p1 = 1−(1−2p) ∼ 12 , where d = |N | < n/2 is the degree of v, 2 so u has an odd degree in G with probability p1 or (1 − p1 ). Thus the number of odd neighbors of v is a sum of d indicator random variables with probabilities p1 or (1 − p1 ). Another application of Claim 5.2.1 then shows that the probability that v has fewer than np/5 < dp1 /2 odd neighbors (conditioned on np/2 ≤ d ≤ 2np) is at most e−dp1 /20 < e−np/50 . Summarizing the previous paragraph, the probability that v has fewer than np/5 odd neighbors is at most 2e−np/20 + e−np/50 ≤ 3e−np/50 ≤ e−3 log log n = 1 log3 n for large enough n, establishing the sixth property. Now we are ready to prove the following statement, which will serve as a tool to get rid of odd degrees. We should point out that this lemma only works for p logn n . However, its flexibility – the fact that we can apply it to any set S – will prove useful when tackling the dense case. 10 Lemma 5.2.4. Let p = ω logn n but p ≤ logn n . Then whp in G ∼ G(n, p) for any vertex set S of even cardinality, there is a collection E0 of |S| + o(n) 2 edges in G such that S is exactly the set of vertices incident to an odd number of edges in E0 . Proof. Take any set S. First, we find a matching in S0 = S greedily, by selecting one edge ei at a time spanned by Si , and then removing the vertices of ei from Si to get Si+1 . At the end of the process, the remaining set of vertices S 0 ⊆ S is independent in G. The second property from . Now let us pair up the vertices Lemma 5.2.3 implies that |S 0 | ≤ 2 log(np) p 0 in S arbitrarily, and for each of these pairs {vj,1 , vj,2 } take a shortest path Pj in G connecting them. (Recall that for p = ω logn n the random graph G(n, p) is whp connected.) The third property ensures that each of the Pj 2 log n contains at most log(np) edges. Note that we do not assume these paths to be edge-disjoint and they may contain the ei ’s as well. 70 5.2. COVERING THE ODD-DEGREE VERTICES Let us define E0 to be the “mod 2 union” of the Pj and the ei , i.e., we include an edge e in E0 if it appears an odd number of times among them. Then the set of odd-degree vertices in E0 is indeed S, and the number of edges is at most |S| |S| 2 log(np) 2 log n + · = + o(n) 2 p log(np) 2 since np log n. We can actually push the probability p down a bit by giving up on the above mentioned flexibility. The following lemma takes care of the odddegree vertices and the small components in the sparse case. 10 Lemma 5.2.5. Let p = ω log nlog n , but p ≤ logn n , and let S be the set of odd-degree vertices in G ∼ G(n, p). Then whp there is a collection E0 of |S| + o(n) edges in G such that G − E0 is an Euler graph. 2 Proof. Let us assume that G satisfies all the properties from Lemma 5.2.3. Our plan is to show that there is a matching in the induced subgraph GS = G[S] covering all but at most n/ log2 n vertices in S, using the defect version of Tutte’s theorem on GS . For this we need that the deletion of any set T of t vertices from GS creates no more than t + n/ log2 n odd-vertex components. In fact, we will prove that the deletion of t vertices breaks GS into at most t + n/ log2 n components. n then this easily follows from the second property: if at least t If t ≥ 100 components were created, then we could find an independent set in G of size < t simply by picking a vertex from each component. But α(G) ≤ 2 log(np) p n , so this is impossible. 100 n Now suppose there is a set T of t < 100 vertices such that GS − T has n at least t + log2 n components. Here the number of components containing at least 2 log2 n vertices is clearly at most 2 logn 2 n . On the other hand, according to the sixth property of Lemma 5.2.3, there are at most 2 logn 2 n components containing a vertex of degree less than np/5 in GS . Thus there are t components of size at most 2 log2 n, and hence of average degree at most 4 by the fourth property, such that all the vertices contained in them have at least np/5 neighbors in GS . Each such component then has a vertex with at least np/5 − 4 neighbors in T . Pick a vertex like that from t of these components to form the set T 0 . 71 5.3. CYCLE DECOMPOSITIONS IN SPARSE RANDOM GRAPHS We see that there are at least t(np/5 − 4) > tnp/6 edges going between T and T 0 , but this contradicts the fourth property if t ≤ n1/10 and the fifth n property if n1/10 < t < 100 . So by Tutte’s theorem, we can find some edge set M that forms a matching in S, covering all but logn2 n of its vertices. Now let F be the set of edges appearing in the small components of G (recall that in this probability range, G(n, p) whp has a giant component and possibly some smaller ones). According to the first property, these edges span trees covering o(n) vertices in total, so |F | = o(n). We see that the set M ∪F takes care of all odd vertices outside the giant component and all but at most logn2 n of them inside. The rest of the proof follows the idea from the previous lemma: we pair up the remaining odd vertices arbitrarily and take shortest paths Pj in G connecting them. Once again, the third 2 log n . Then E0 , the property implies that each path has length at most log(np) “mod 2 union” of the Pj and M ∪ F satisfies all requirements and uses at most n |S| |S| 2 log n + o(n) + = + o(n) · 2 2 2 log n log(np) edges. 5.3 Cycle decompositions in sparse random graphs In this section we show that Euler subgraphs of sparse random graphs can be decomposed into o(n) edge-disjoint cycles. The following statement from [18], which we use to find long cycles, is an immediate consequence of applying Pósa’s rotation-extension technique [45] as described in [13]. Lemma 5.3.1. If a graph G does not contain any cycle of length at least 3t, then there is a set T of at most t vertices such that |N (T )| ≤ 2|T |. We say that a graph G is sufficiently sparse if any set of vertices S spans |S| less than r|S| edges, where r = max{ 12 log 2 , 7}. Note that any subgraph n of a sufficiently sparse graph is also sufficiently sparse. In what follows, we proceed by showing that on the one hand, for p small enough the graph G(n, p) is typically sufficiently sparse, while on the other hand, any sufficiently sparse graph contains few edge-disjoint cycles covering most of the edges. 72 5.3. CYCLE DECOMPOSITIONS IN SPARSE RANDOM GRAPHS Lemma 5.3.2. Let p = p(n) < n−1/6 and let G ∼ G(n, p). Then whp G is sufficiently sparse. Proof. For fixed s, the probability that there is an S of size s containing at least rs edges is at most s esp rs esp r s n . · 2 · prs ≤ ns · = n· 2r 2r s rs r s Put x = n · esp . If we further assume that r ≥ 12 log and r ≥ 7 then we 2 2r n −1/6 get that for large n and p < n 2 r 7 x ≤ n · (6ep log n) ≤ n · (6e) log2 n n1/6 7 = o(1), where we used the fact that 6ep log2 n < 1 for large n. Hence the probability that for some s there is a set S of size s which spans more than s max{ 12 log 2 , 7} · s edges, i.e., that G is not sufficiently sparse, is at most n n X i=2 xi < x2 < x = o(1). 1−x Corollary 5.3.3. For p < n−1/6 , whp all subgraphs of G(n, p) are sufficiently sparse. Using these lemmas we can derive one of our main tools, the fact that we can make a subgraph of a random graph relatively sparse by iteratively removing long cycles. Proposition 5.3.4. Let H be a sufficiently sparse n-vertex graph of average degree d > 84. Then it contains a cycle of length at least d log2 n. Proof. Assume to the contrary that there is no such cycle. Define H ⊆ G to be the d/2-core of G, i.e., the non-empty subgraph obtained by repeatedly removing vertices of degree less than d/2. Since there is no cycle of length at least d log2 n in H, we can apply Lemma 5.3.1 to find a set T of t ≤ d log2 n/3 vertices such that |N (T )| ≤ 2|T | = 2t. Let S = T ∪ N (T ) and s = |S|, then 73 5.3. CYCLE DECOMPOSITIONS IN SPARSE RANDOM GRAPHS the minimum degree condition implies that there are at least dt/4 edges incident to T , hence the set S spans at least sd/12 edges. Note that r = d/12 > 7. Also, since s ≤ 3t ≤ d log2 n, we have s r ≥ s/12 log2 n. So the set S spans at least max{ 12 log 2 , 7} · |S| edges, n contradicting the assumption that H is sufficiently sparse. Corollary 5.3.5. Let p < n−1/6 . Then whp G ∼ G(n, p) has the following property. If H is a subgraph of G on n0 vertices of average degree d, then H can be decomposed into a union of at most 2n0 / log n0 cycles and a graph H 0 of average degree at most 84. Proof. By Corollary 5.3.3 all subgraphs of G(n, p) are sufficiently sparse whp. So we can repeatedly apply Proposition 5.3.4 to remove cycles from H as follows. Let H0 = H. As long as the graph Hi has average degree di > 84, we find a cycle Ci+1 in it of length at least di log2 n0 and define Hi+1 = Hi − E(Ci+1 ). After some finite number (say l) of steps we end up with a graph H 0 = Hl of average degree at most 84. We need to bound l. Going from Hi of average degree di to Hj , the first graph of average degree below di /2, we remove at most di n0 /2 edges using cycles of length at least d2i log2 n0 . So the number of cycles we removed is at most n0 / log2 n0 . Thus if H had average degree d then l ≤ n0 log2 d/ log2 n0 ≤ 2n0 / log n0 , as needed. To conclude this section, we prove Theorem 5.1.1 in the range 10 ω ≤ p ≤ logn n by observing that whp G(n, p) contains no more than o(n) short cycles. log log n n 10 Lemma 5.3.6. Let p < logn n . Then whp G(n, p) contains no more than √ n cycles of length at most log log n. Proof. Let Xk be the number of cycles of length k in G(n, p). The number of cycles in Kn of length k is at most nk , and each cycle has probability pk of P being included in G(n, p), hence E(Xk ) ≤ (np)k ≤ (log10 n)k . So if log n X = log Xk is the number of cycles of length at most log log n, then k=3 we clearly have log log n E(X) ≤ X k=3 log log n E(Xk ) ≤ X k=3 using log10 n > 2. 74 (log10 n)k ≤ (log10 n)2 log log n 5.4. THE MAIN INGREDIENTS FOR THE DENSE CASE Hence we can apply √ Markov’s inequality to bound the probability that there are more than n short cycles: √ (log10 n)2 log log n √ P(X ≥ n) ≤ = exp 20(log log n)2 − (log n)/2 = o(1). n 10 Corollary 5.3.7. Let p < logn n . Then whp any Euler subgraph H of G ∼ G(n, p) can be decomposed into o(n) cycles. Proof. Use Corollary 5.3.5 to remove o(n) edge-disjoint cycles and end up with a graph H1 of average degree at most 84. Note that H1 is still an Euler graph, so we can break the edges of G1 into cycles arbitrarily. We claim that the number √ of cycles we get is o(n). Indeed, by Lemma 5.3.6, whp there are at most n = o(n) short cycles, i.e., of length at most log log n, while the number of long cycles can be bounded by the number of edges divided by log log n. Since H1 contains a linear number of edges, the number of long cycles is O( log nlog n ) = o(n) and we are done. Our theorem for small p is then an immediate consequence of Lemma 5.2.5 and Corollary 5.3.7. 10 Theorem 5.3.8. Let ω log nlog n < p < logn n . Then whp G ∼ G(n, p) can + o(n) cycles and edges. be decomposed into odd(G) 2 5.4 The main ingredients for the dense case For larger p we use a strong theorem by Broder, Frieze, Suen and Upfal [14] about the existence of edge-disjoint paths in random graphs connecting a prescribed set of vertex pairs. We need the following definition to state it: Suppose S is a set of vertices in a graph G. We define the maximum neighborhood-ratio function rG (S) to be maxv∈V (G) |N|NGG(v)∩S| . (v)| Theorem 5.4.1 (Broder-Frieze-Suen-Upfal). Let p = ω logn n . Then there are two constants α, β > 0 such that with probability at least 1 − n1 the following holds in G ∼ G(n, p). For any set F = {(ai , bi )|ai , bi ∈ V, i = 1, . . . , k} of at most α n log(np) disjoint pairs in G satisfying the property log n below, there are vertex-disjoint paths connecting ai to bi : 75 5.4. THE MAIN INGREDIENTS FOR THE DENSE CASE • There is no vertex v which has more than a β-fraction of its neighborhood covered by the vertices in F . In other words, rG (S) ≤ β, where S is the set of vertices appearing in F . We shall use the following statement to establish this property, so that we can apply Theorem 5.4.1 in our coming proofs. Lemma 5.4.2. Let M be a matching covering some vertices of V , and let 3 G ∼ G(n, logn n ) be a random graph on the same vertex set, where G and M are not necessarily independent. Then with probability 1 − n−ω(1) we 2 can break G into log n random graphs Gi ∼ G(n, logn n ) and M into log n 4 for submatchings Mi of at most logn n edges each, such that rGi (Si ) ≤ √log n all i = 1, . . . , log n, where Si is the set of endvertices of Mi . Proof. First, we partition the edge set of G into log n graphs G1 , . . . , Glog n by choosing an index ie for each edge e uniformly and independently at 2 random, and placing e in Gie . Then each Gi has distribution G(n, logn n ). Then we can apply Claim 5.2.1(b) and (d) to the degree of each vertex of each of the Gi ’s, and use the union bound to see that all the Gi have 2 minimum degree at least log2 n and maximum degree at most 2 log2 n with 2 probability 1 − 2n log n · e− log n/40 = 1 − n−ω(1) . Now break the M into log n random matchings M1 , . . . , Mlog n similarly, by placing each edge f ∈ M independently in Mif where if is a random index chosen uniformly. Since there are at most n/2 edges in M , another application of Claim 5.2.1(b) gives that with probability 1 − log ne−n/20 log n = 1 − n−ω(1) each Mi contains at most logn n edges. If Gi has maximum degree at most 2 log2 n, then the neighborhood of an arbitrary vertex v in Gi may meet at most 2 log2 n edges from M . The probability that at least (log n)3/2 of them are selected in Mi is at most log3/2 n 2 log2 n 2e log2 n 1 ≤ log3/2 n (log n)log3/2 n log3/2 · log n log3/2 n 2e ≤ ≤ e− log n·log log n . 1/2 log n So taking the union bound over all vertices v and indices i gives that with probability 1 − n−ω(1) , the neighborhood of any v in any Gi meets at most 76 5.4. THE MAIN INGREDIENTS FOR THE DENSE CASE log3/2 n of the edges in Mi , and thus it contains at most 2 log3/2 n ver2 tices from Si . Since all neighborhoods have size at least log2 n , we get that 4 rGi (Si ) ≤ √log for all i. n The next theorem is the main tool on our way to the proof of Theorem 5.1.1. 5 Theorem 5.4.3. Let 2 logn n ≤ p ≤ n−1/6 and G ∼ G(n, p). Suppose G is randomly split into G0 and G00 by placing its edges independently into 5 n G0 with probability p0 = 2 log and into G00 otherwise. Then whp for any np 00 0 subgraph H of G such that G ∪ H is Euler, G0 ∪ H can be decomposed into o(n) cycles. Proof. Note that, although G0 and G00 are far from being independent, G0 5 on its own has distribution G(n, p0 p) = G(n, 2 logn n ) and G00 has distribution G(n, (1 − p0 )p) where (1 − p0 )p < n−1/6 . It is easy to see from Claim 5.2.1 that whp G has maximum degree at most 2n5/6 . By Corollary 5.3.5, whp any H ⊆ G00 can be decomposed into o(n) cycles and a graph H0 of average degree at most 84. Therefore it is enough for us to show that whp G0 satisfies the following: for any H0 containing at most 42n edges such that G0 ∪ H0 is Euler, G0 ∪ H0 is an edge-disjoint union of o(n) cycles. Our plan is to break H0 into few matchings and then to use Theorem 5.4.1 on random subgraphs of G0 to connect them into cycles. 2 So define V0 ⊆ V to be the set of vertices of degree at least log2 n in H0 , and let V1 = V − V0 be the rest. Note that |V0 | = O( logn2 n ). We break into matchings in two rounds: first we take care of the edges spanned by V1 , then the ones crossing between V0 and V1 . Let us split G0 into two random 5 graphs G1 , G2 ∼ G(n, logn n ) by placing each edge of G0 independently in one of them with probability 1/2. Now let us consider the subgraph of H0 spanned by V1 : the maximum 2 degree is at most log2 n − 1, so we can break the edge set into log2 n matchings M1 , . . . , Mlog2 n . To find the cycles, we also split G1 into log2 n parts 3 G1,1 , . . . , G1,log2 n so that each G1,i has distribution G(n, logn n ). We can use Lemma 5.4.2 to further break each Mi and G1,i into log n parts Mi,j and G1,i,j , respectively, so that whp each Mi,j contains O(n/ log n) edges, and the endvertices of Mi,j only cover a o(1)-fraction of the neighborhood of any vertex in G1,i,j . 77 5.4. THE MAIN INGREDIENTS FOR THE DENSE CASE We create the cycles as follows: if Mi,j consists of the edges v1 v10 , . . . , vk vk0 , then we choose the corresponding set of pairs to be F1,i,j = {(v10 , v2 ), (v20 , v3 ), . . . , (vk0 , v1 )}. Here the above properties of Mi,j and G1,i,j ensure that we can apply Theorem 5.4.1, and with probability at least 1 − n1 the matching can be closed into a cycle. Hence with probability 3 1 − logn n = 1 − o(1) all the Mi,j ’s can be covered by log3 n cycles altogether. Let us turn to the edges of H0 between V0 and V1 , and define the following auxiliary multigraph Ga on V1 : for each v ∈ V0 , pair up its neighbors in V1 (except maybe one vertex, if the neighborhood has odd size) and let Ev be the set of edges – a matching – corresponding to this pairing. Define the edge set of Ga to be the disjoint union of the Ev for v ∈ V0 . The idea is that an edge ww0 ∈ Ev corresponds to the path wvw0 , so we want to find cycles covering the edges in Ga and then lead them through the original V0 − V1 edges. 2 By the definition of V1 , the maximum degree in Ga is at most log2 n − 1, so the edge set ∪v∈V0 Ev can be split into log2 n matchings N1 , . . . , Nlog2 n . Now it is time to break G2 into log2 n random subgraphs G2,i of distribution 3 G(n, logn n ) each. Once again, we use Lemma 5.4.2 to prepare for the cycle cover by splitting each of the Ni and G2,i into log n parts Ni,j and G2,i,j . When we define the set of pairs F2,i,j , we need to be a little bit careful: we must make sure that no cycle contains more than one edge from any given Ev . This way the cycles do not become self-intersecting after switching the edges from Ev back to the corresponding paths through v. Since the maximum degree of G, and hence the cardinality of Ev , is at most 2n5/6 , we may achieve this by using at most 2n5/6 cycles per matching. Indeed, split Ni,j into at most 2n5/6 subsets Ni,j,k so that none of the Ni,j,k contains more than one edge from the same Ev (this can be done greedily). Then define the sets of pairs F2,i,j,k for k = 1, . . . , 2n5/6 to close Ni,j,k into a cycle 5/6 the same way as before, and take F2,i,j = ∪2n k=1 F2,i,j,k . As above, all conditions of Theorem 5.4.1 are satisfied when we use G2,i,j to find the paths corresponding to F2,i,j that close Ni,j into cycles, and since the error probabilities were all O( n1 ), whp we can simultaneously do so for all i and j. We have log3 n matchings, so in total we get 2n5/6 log3 n = o(n) edge-disjoint cycles that cover all but o(n) edges of H0 between V0 and V1 (missing at most one incident edge for each v ∈ V0 ). Finally, we apply Corollary 5.3.5 on the subgraph of H0 induced by V0 to see that the edges spanned by V0 can be partitioned into O(n/ log3 n) 78 5.5. CYCLE-EDGE DECOMPOSITIONS IN DENSE RANDOM GRAPHS cycles and O(n/ log2 n) = o(n) edges (recall that |V0 | = O(n/ log2 n)). So far we have found o(n) edge-disjoint cycles in G0 ∪ H. Once we remove them, we get an Euler graph containing only o(n) edges from H. So we can find o(n) edge-disjoint cycles covering all of them and remove 5 these cycles, as well, to get an Euler subgraph of G0 ∼ G(n, logn n ). Now Corollary 5.3.7 shows that we can partition the remaining graph into o(n) cycles, concluding our proof. 5.5 Cycle-edge decompositions in dense random graphs At last, we are ready to prove Theorem 5.1.1 in the denser settings. The case p ≤ n−1/6 is fairly straightforward from our previous results, we just need to be a little bit careful. 6 Theorem 5.5.1. Let logn n ≤ p ≤ n−1/6 . Then whp G ∼ G(n, p) can be decomposed into odd(G) + o(n) cycles and edges. 2 Proof. We split G into the union of three disjoint random graphs G1 , G2 and G3 by putting each edge e ∈ E(G) independently into one of the graphs. 5 2 n we place e into G1 , with probability p2 = lognp n With probability p1 = 2 log np we place it into G2 , and with probability 1 − p1 − p2 we place it into G3 . 5 2 This way G1 ∼ G(n, 2 logn n ) and G2 ∼ G(n, logn n ). Now let S be the set of odd-degree vertices in G. Applying Lemma 5.2.4 to S in G2 gives a set E0 of |S|/2 + o(n) edges in G2 such that G − E0 is Euler. Taking H to be the subgraph G2 ∪ G3 − E0 of G00 = G2 ∪ G3 and setting G0 = G1 , we can apply Theorem 5.4.3 to split the edge set of G − E0 into o(n) cycles. The theorem follows. To get a tight result for larger p, we must remove cycles containing nearly all vertices. A recent result by Knox, Kühn and Osthus [38] helps us to find many edge-disjoint Hamilton cycles in random graphs. 50 Theorem 5.5.2 (Knox-Kühn-Osthus). Let logn n ≤ p ≤ 1 − n−1/5 and G ∼ G(n, p). Then whp G contains bδ(G)/2c edge-disjoint Hamilton cycles. Let us point out that we do not actually need such a strong result: δ(G)/2 − nε disjoint Hamilton cycles would also suffice for our purposes. 79 5.5. CYCLE-EDGE DECOMPOSITIONS IN DENSE RANDOM GRAPHS Theorem 5.5.3. Let p ≥ n−1/6 . Then whp G ∼ G(n, p) can be decomposed + np + o(n) edges. into odd(G) 2 2 Proof. Similarly to Theorem 5.5.1, we partition G into the union of four disjoint random graphs G1 , G2 , G3 and G4 by assigning each edge of G 5 n independently to G1 with probability p1 = 2 log , to G2 with probability np 4/5 3 2 log n p2 = n np , to G3 with probability p3 = lognp n and to G4 otherwise (with probability p4 = 1 − p1 − p2 − p3 = 1 − o(1)). It is easy to see from the Chernoff bound (Claim 5.2.1(a)) that whp the maximum degree of G3 is at most npp3 + n3/5 ≤ 2n3/5 , and the maximum degree of G4 is at most npp4 + n3/5 . Let us assume this is the case. Let S be the set of odd-degree vertices in G. Now as before, we use + o(n) edges in G3 such that G − E0 is Lemma 5.2.4 to find a set E0 of |S| 2 Euler. Notice that G4 ∼ G(n, pp4 ), where pp4 = p(1 − o(1)), so whp G4 has minimum degree at least npp4 −n3/5 by Claim 5.2.1(c). Also, pp4 < 1−n−1/5 3/5 (because pp2 > n−1/5 ), hence we can apply Theorem 5.5.2 and find npp4 −n 2 edge-disjoint Hamilton cycles in it. Let H0 be the graph obtained from G3 ∪ G4 − E0 by removing these cycles. Then the maximum degree of H0 is at most 4n3/5 , hence we can break the edge set of H0 into n4/5 matchings Mi . We want to use Theorem 5.4.1 to close the Mi ’s into cycles, so let us split 3 G2 into n4/5 random graphs G0i ∼ G(n, logn n ) uniformly. By Lemma 5.4.2 we can further partition each Mi and G0i , with probability 1 − n−ω(1) , into log n matchings and graphs, Mi,j and G0i,j in such a way that we can apply Theorem 5.4.1 on G0i,j for any pairing Fi,j on the vertices of Mi,j . Choose Fi,j , as before, so that the resulting paths together with Mi,j form a cycle. Then with probability at least 1 − n1 the theorem produces the required cycle, so whp all n4/5 log n cycles exist simultaneously. This way we find n4/5 log n = o(n) edge-disjoint cycles covering H0 and some edges in G2 . Let H be the graph containing the unused edges of G2 . Then G1 ∪ H is Euler, and we can apply Theorem 5.4.3 with the host graph G1 ∪ G2 from distribution G(n, p(p1 + p2 )), and the partition G0 = G1 , G00 = G2 . This gives us a decomposition of G0 ∪ H into o(n) cycles whp, completing our proof. 80 5.6. FURTHER REMARKS 5.6 Further remarks The above proof settles the question for p = ω log nlog n , but it would be nice to have a result for the whole probability range. The bottleneck in our proof is Lemma 5.2.5, where we obtain a small edge set E0 such that G(n, p) − E0 is Euler. We believe that similar ideas can be applied to prove this lemma for even smaller values of p if one puts more effort into finding short paths between vertices not covered by the matching. In any case, it seems that the asymptotics of the optimum is defined by the smallest such E0 for any p ≤ log n/n, so a complete solution to the problem would first need to describe this minimum in the whole range. Another direction might be to further explore the error term of our theorem. One clear obstacle to an improvement using our methods is Corollary 5.3.7. While we could slightly improve it to give a O(n/ log n) bound, showing that the error term is significantly smaller would need more ideas. Since the publication of our results, Glock, Kühn and Osthus [31] have applied tools about robust expanders to attack the problem and, partially improving our result, proved sharp bounds when p is a constant. 81 References [1] N. Alon, An extremal problem for sets with applications to graph theory. J. Combin. Theory Ser. A 40 (1985), 82-89. [2] N. Alon and J. H. Spencer, The probabilistic method, 3rd ed., Wiley (2008). [3] E. Babson, C. Hoffman and M. Kahle, The fundamental group of random 2-complexes, J. Amer. Math. Soc. 24 (2011), 1-28. [4] J. Balogh, B. Bollobás and R. Morris, Graph bootstrap percolation, Random Structures Algorithms 41 (2011), 413-440. [5] J. Balogh, B. Bollobás, R. Morris and O. Riordan, Linear algebra and bootstrap percolation, J. Combin. Theory Ser. A 119 (2012), 13281335. [6] J. Balogh, R. Morris and W. Samotij, Independent sets in hypergraphs, J. Amer. Math. Soc. 28 (2015), 669-709. [7] T. Bohman, The Triangle-Free Process, Adv. Math. 221 (2009), 16531677. [8] T. Bohman, A. Frieze and E. Lubetzky, Random triangle removal, preprint, arXiv:1203.4223 (2012). [9] B. Bollobás, On generalized graphs, Acta Math. Acad. Sci. Hungar 16 (1965), 447-452. [10] B. Bollobás, On a conjecture of Erdős, Hajnal and Moon, The American Mathematical Monthly 74 (1967), 178-179. 82 [11] B. Bollobás, Weakly k-saturated graphs, in: Beiträge zur Graphentheorie (Kolloquium, Manebach, 1967), Teubner, Leipzig (1968), 25-31. [12] B. Bollobás, Random graphs, 2nd ed., Cambridge Stud. Adv. Math. 73, Cambridge University Press (2001). [13] S. Brandt, H. Broersma, R. Diestel and M. Kriesell, Global connectivity and expansion: long cycles and factors in f -connected graphs, Combinatorica 26 (2006), 17-36. [14] A. Z. Broder, A. M. Frieze, S. Suen and E. Upfal, An efficient algorithm for the vertex-disjoint paths problem in random graphs, Proceedings of SODA ’96, pp. 261-268. [15] S. Choi and P. Guan, Minimum critical squarefree subgraph of a hypercube, in: Proceedings of the Thirty-Ninth Southeastern International Conference on Combinatorics, Graph Theory and Computing 189 (2008), 57-64. [16] F. Chung and L. Lu, The diameter of sparse random graphs, Adv. in Appl. Math. 26 (2001), 257-279. [17] A. Coja-Oghlan, M. Onsjö and O. Watanabe, Propagation Connectivity of Random Hypergraphs, Electron. J. Combin. 19 P17 (2012), 25pp. [18] D. Conlon, J. Fox and B. Sudakov, Cycle packing, Random Structures Algorithms 45 (2014), 608-626. [19] D. Conlon and W.T. Gowers, Combinatorial theorems in sparse random sets, preprint, arXiv:1011.4310 (2010). [20] D. Easley and J. Kleinberg, Networks, crowds, and markets: reasoning about a highly connected world, Cambridge University Press (2010). [21] P. Erdős, Some remarks on the theory of graphs, Bull. Amer. Math. Soc. 53 (1947), 292-294. [22] P. Erdős, On some of my conjectures in number theory and combinatorics, Proceedings of the fourteenth Southeastern conference on combinatorics, graph theory and computing (Boca Raton, Fla., 1983), Congr. Numer. 39 (1983), 3-19. 83 [23] P. Erdős, A. W. Goodman and L. Pósa, The representation of a graph by set intersections, Canad. J. Math. 18 (1966), 106-112. [24] P. Erdős, A. Hajnal and J.W. Moon, A problem in graph theory, The American Mathematical Monthly 71 (1964), 1107-1110. [25] P. Erdős and A. Rényi, On random graphs I., Publ. Math. Debrecen 6 (1959), 290-297. [26] P. Erdős, S. Suen and P. Winkler, On the size of a random maximal graph, Random Structures Algorithms 6 (1995), 309-318. [27] J. Faudree, R. Faudree and J.R. Schmitt, A survey of minimum saturated graphs and hypergraphs Electron. J. Combin. DS 19–Jul, 2011. [28] M. Ferrara, M.S. Jacobson, F. Pfender and P.S. Wenger, Graph saturation in multipartite graphs, J. Comb. 19 (2016), 1-19. [29] P. Frankl, An extremal problem for two families of sets, European J. Combin. 2 (1982), 125-127. [30] P. Frankl and V. Rödl, Large triangle-free subgraphs in graphs without K4 , Graphs Combin. 2 (1986), 135-144. [31] S. Glock, D. Kühn and D. Osthus, Optimal path and cycle decompositions of dense quasirandom graphs, J. Combin. Theory Ser. B 118 (2016), 88-108. [32] A. Gundert and U. Wagner, On topological minors in random simplicial complexes, Proc. Amer. Math. Soc., 144 (2016), 1815-1828. [33] S. Janson, T. Luczak and A. Ruciński, Random Graphs, Wiley (2000). [34] R. Johnson and T. Pinto, Saturated subgraphs of the hypercube, Combin. Probab. Comput., to appear. [35] G. Kalai, Weakly saturated graphs are rigid, Ann. Discrete Math. 20 (1984), 189-190. [36] G. Kalai, Hyperconnectivity of graphs, Graphs Combin. 1 (1985), 65-79. 84 [37] L. Kászonyi and Z. Tuza, Saturated graphs with minimal number of edges, J. Graph Theory 10 (1986), 203-210. [38] F. Knox, D. Kühn and D. Osthus, Edge-disjoint Hamilton cycles in random graphs, Random Structures Algorithms, 46 (2015), 397-445. [39] Y. Kohayakawa, V. Rödl and M. Schacht, The Turán theorem for random graphs, Combin. Probab. Comput. 13 (2004), 61-91. [40] M. Krivelevich, Bounding Ramsey numbers through large deviation inequalities, Random Structures Algorithms, 7 (1995), 145-155. [41] N. Linial and R. Meshulam, Homological connectivity of random 2dimensional complexes, Combinatorica 26 (2006), 475-487. [42] L. Lovász, Flats in matroids and geometric graphs, in: Combinatorial Surveys (Proc. 6th British Comb. Conf.), Academic Press (1977), 4586. [43] G. Moshkovitz and A. Shapira, Exact bounds for some hypergraph saturation problems, J. Combin. Theory Ser. B 111 (2015), 242-248. [44] N. Morrison, J. Noel and A. Scott, Saturation in the hypercube and bootstrap percolation, Combin. Probab. Comput., to appear. [45] L. Pósa, Hamiltonian circuits in random graphs, Discrete Math. 14 (1976), 359-364. [46] L. Pyber, An Erdős-Gallai conjecture, Combinatorica 5 (1985), 67-79. [47] B. Roberts, Partite saturation problems, preprint, arXiv:1506.02445 (2015). [48] V. Rödl and M. Schacht, Extremal results in random graphs, in: Erdős Centennial Volume, Bolyai Soc. Math. Stud. 25 (L. Lovász, I. Ruzsa and V. Sós, eds.), Springer (2013), 11-33. [49] D. Saxton and A. Thomason, Hypergraph containers, Invent. Math. 201 (2015), 925-992. [50] M. Schacht, Extremal results for random discrete structures, preprint. 85 [51] T. Szabó and V.H. Vu, Turán’s theorem in sparse random graphs, Random Structures Algorithms 23 (2003), 225-234. [52] P. Turán, On an extremal problem in graph theory, Matematikai és Fizikai Lapok 48 (1941), 436-452. [53] W. Wessel, Über eine Klasse paarer Graphen,I: Beweis einer Vermutung von Erdős, Hajnal and Moon, Wiss. Z. Hochsch. Ilmenau 12 (1966), 253-256. [54] N. Wormald, The differential equation method for random graph processes and greedy algorithms, Lectures on Approximation and Randomized Algorithms, PWN, Warsaw (1999), 73-155. [55] A. Zykov, On some properties of linear complexes (in Russian), Mat. Sbornik N. S. 24 (1949), 163-188. 86 Curriculum Vitae Education • 2008-2011: BSc in Mathematics (with honors), Eötvös Loránd University, Budapest, Hungary thesis supervisor: András Frank • 2011-2012: MASt (Part III) in Mathematics (with distinction), University of Cambridge, UK • 2012-2013: PhD in Mathematics (no degree), University of California, Los Angeles, CA, USA advisor: Benny Sudakov, transferred with no degree to: • 2013-2016: PhD in Mathematics, ETH Zürich, Switzerland advisor: Benny Sudakov Employment • 2012-2013: Teaching assistant, University of California, Los Angeles, CA, USA • 2013-2016: Teaching assistant, ETH Zürich, Switzerland 87 Publications • Domination in 3-tournaments (with B. Sudakov), submitted. • Saturation in random graphs (with B. Sudakov), Random Structures & Algorithms, to appear. • A random triadic process (with Y. Peled and B. Sudakov), SIAM Journal on Discrete Mathematics 30 (2016), 1-19. – An extended abstract for Eurocomb 2015 appeared in: Electronic Notes in Discrete Mathematics 49 (2015), 189-196. • Decomposing random graphs into few cycles and edges (with M. Krivelevich and B. Sudakov), Combinatorics, Probability and Computing 24 (2015), 857-872. • Ks,t -saturated bipartite graphs (with W. Gan and B. Sudakov), European Journal of Combinatorics 45 (2015), 12-20. • Separating path systems (with V. Falgas-Ravry, T. Kittipassorn, S. Letzter and B. P. Narayanan), Journal of Combinatorics 5 (2014), 335-354. 88
© Copyright 2025 ExpyDoc