blob: 41b4a69770ae397a1e3b3cb8a73bd78756729857 [file] [log] [blame]
{"name":"mTCP","tagline":"mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems","body":"***\r\n\r\n### What is mTCP?\r\n\r\nmTCP is a high-performance user-level TCP stack for multicore systems. Scaling the performance of short TCP connections is fundamentally challenging due to inefficiencies in the kernel. mTCP addresses these inefficiencies from the ground up - from packet I/O and TCP connection management all the way to the application interface.\r\n\r\n<div align=\"center\">\r\n <img src=\"http://shader.kaist.edu/mtcp/overview.png\" height=\"350\">\r\n <div><b> Figure 1. mTCP overview </b></div>\r\n</div>\r\n\r\nBesides adopting well-known techniques, our mTCP stack (1) translates expensive system calls to shared memory access between two threads within the same CPU core, (2) allows efficient flow-level event aggregation, and (3) performs batch processing of RX/TX packets for high I/O efficiency. mTCP on an 8-core machine improves the performance of small message transactions by a factor 25 (compared with the latest Linux TCP stack (kernel version 3.10.12)) and 3 (compared with with the best-performing research system). It also improves the performance of various popular applications by 33% (SSLShader) to 320% (lighttpd) compared with those on the Linux stack.\r\n\r\n### Why user-level TCP?\r\n\r\nMany high-performance network applications spend a significant portion of CPU cycles for TCP processing in the kernel. (e.g., ~80% inside kernel for lighttpd) Even worse, these CPU cycles are not utilized effectively; according to our measurements, Linux spends more than 4x the cycles than mTCP in handling the same number of TCP transactions.\r\n\r\nThen, can we design a user-level TCP stack that incorporates all existing optimizations into a single system? Can we bring the performance of existing packet I/O libraries to the TCP stack? To answer these questions, we build a TCP stack in the user level. User-level TCP is attractive for many reasons.\r\n\r\n* Easily depart from the kernel's complexity\r\n* Directly benefit from the optimizations in the high performance packet I/O libraries\r\n* Naturally aggregate flow-level events by packet-level I/O batching\r\n* Easily preserve the existing application programming interface\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."}