Programmers are driven to parallelize their programs because of both hardware limitations and the need for their applications to provide information within acceptable timescales. The modelling of yesterday's weather, while still of use, is of much less use than tomorrow's. Given this motivation, those researchers who build libraries for use in parallel codes must assess the performance when deployed at scale to ensure their end users can take full advantage of the computational resources available to them. Blindly measuring the execution time of applications provides little insight into what, if any, challenges the code faces to achieve optimal performance, and fails to provide enough information to confirm any gains made by attempts to optimize the code. This leads to the desire to gain greater insight by inspecting the call stack and communication patterns. The author reviews the definitions of the forms of scalability that are desirable for different applications, discusses tools for collecting performance data at varying levels of granularity, and describes methods for analyzing this data in the context of case studies performed with applications using the IBAMR library.