How To Create A Buffer In C
Character Buffer
Buffer Overflow
In Hack Proofing Your Network (Second Edition), 2002
The Code
This is a very simple program that does nothing but assign some values to some variables (Figure 8.1).
Figure 8.1. How the Stack Operates
The code in Figure 8.1 is very straightforward. It basically creates three stack variables: A 15-byte character buffer and two integer variables. It then assigns values to these variables as part of the function initialization. Finally, it returns a value of 1. The usefulness of such a simple program is apparent in examining how our compiler took the C code and created the function and stack from it. We will now examine the disassembly of the code to better understand what the compiler did to create this. For our disassembly, it was compiled as a Windows Console application, in Release mode.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9781928994701500112
Web server and web application testing
Jeremy Faircloth , in Penetration Tester's Open Source Toolkit (Third Edition), 2011
6.3.1.1 Stack-based overflows
A stack is simply a last in, first out (LIFO) abstract data type. Data is pushed onto a stack or popped off it (see Fig. 6.2).
FIGURE 6.2. A Simple Stack.
The simple stack shown in Fig. 6.2 has [A] at the bottom and [B] at the top. Now, let's push something onto the stack using a PUSH C command (see Fig. 6.3).
FIGURE 6.3. PUSH C.
Let's push another for good measure: PUSH D (see Fig. 6.4).
FIGURE 6.4. PUSH D.
Now let's see the effects of a POP command. POP effectively removes an element from the stack (see Fig. 6.5).
FIGURE 6.5. POP Removing One Element from the Stack.
Notice that [D] has been removed from the stack. Let's do it again for good measure (see Fig. 6.6).
FIGURE 6.6. POP Removing Another Element from the Stack.
Notice that [C] has been removed from the stack.
Stacks are used in modern computing as a method for passing arguments to a function and they are used to reference local function variables. On x86 processors, the stack is said to be inverted, meaning that the stack grows downward (see Fig. 6.7).
FIGURE 6.7. Inverted Stack.
When a function is called, its arguments are pushed onto the stack. The calling function's current address is also pushed onto the stack so that the function can return to the correct location once the function is complete. This is referred to as the saved Extended Instruction Pointer (EIP) or simply the Instruction Pointer (IP). The address of the base pointer is also then saved onto the stack.
Look at the following snippet of code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int foo()
{
char buffer[8];/* Point 2 */
strcpy(buffer, "AAAAAAAAAA";
/* Point 3 */
return 0;
}
int main(int argc, char **argv)
{
foo(); /* Point 1 */
return 1; /* address 0x08801234 */
}
During execution, the stack frame is set up at Point 1. The address of the next instruction after Point 1 is noted and saved on the stack with the previous value of the 32-bit Base Pointer (EBP). This is illustrated in Fig. 6.8.
FIGURE 6.8. Saved EIP.
Next, space is reserved on the stack for the buffer char array (eight characters) as shown in Fig. 6.9.
FIGURE 6.9. Buffer Pushed onto the Stack.
Now, let's examine whether the strcpy function was used to copy eight As as specified in our defined buffer or 10 As as defined in the actual string (see Fig. 6.10).
FIGURE 6.10. Too Many As.
On the left of Fig. 6.10 is an illustration of what the stack would have looked like had we performed a strcopy of six As into the buffer. The example on the right shows the start of a problem. In this instance, the extra As have overrun the space reserved for buffer [8], and have begun to overwrite the previously stored [EBP]. Let's see what happens if we copy 13 As and 20 As, respectively. This is illustrated in Fig. 6.11.
FIGURE 6.11. Stack Overflow.
In Fig. 6.11 , we can see that the old EIP value was completely overwritten when 20 characters were sent to the eight-character buffer. Technically, sixteen characters would have done the trick in this case. This means that once the foo() function was finished, the processor tried to resume execution at the address A A A A (0x41414141). Therefore, a classic stack overflow attack aims at overflowing a buffer on the stack to replace the saved EIP value with the address of the attacker's choosing. The goal would be to have the attacker's code available somewhere in memory and, using a stack overflow, cause that memory location to be the next instruction executed.
Note
A lot of this information may seem to be things that the average penetration tester doesn't need to know. Why would you need to understand how a stack overflow actually works when you can just download the latest Metasploit update?
In many cases, a company will have patches in place for the most common vulnerabilities and you may need to uncover uncommon or previously unknown exploits to perform your testing. In addition, sometimes the exploit will be coded for a specific software version on a specific operating system and need to be tweaked a little to work in your specific scenario. Having a solid understanding of these basics is very important.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9781597496278100066
BSD Sockets
James C. Foster , Mike Price , in Sockets, Shellcode, Porting, & Coding, 2005
Analysis
- ▪
-
At lines 8 through 16, the required header files for this program are included.
- ▪
-
At lines 18 and 19, the default UDP port and SNMP community name are specified. The standard port for the SNMP agent service is UDP port 161. The string public is used by default.
- ▪
-
At lines 23 through 75, the hexdispQ function is defined and implemented.This function accepts two parameters. The first parameter is a pointer to a character buffer. The second parameter is a signed integer value that indicates the length of the character buffer in bytes. This function formats the supplied character buffer into a human readable format and prints the formatted data to standard output. This format is similar to the format produced by the tcpdump program when used in conjunction with the —X flag.
- ▪
-
At lines 83 through 87, the bytes of the SNMP GetRequest value are defined. The SNMP1_PDU_HEAD value will later be copied into a character buffer followed by the SNMP community name and then by the SNMP1_PDU_TAIL value. When combined, these three values make up the SNMP GetRequest value that can be sent to a remote host.
- ▪
-
At lines 89 through 115, the makegetreqQ function is defined and implemented. This function is responsible for building an SNMP GetRequest value and storing this value in the supplied buffer. The first parameter to this function is a pointer to a character buffer. The second parameter is a signed integer that indicates the length of the character buffer in bytes. The third parameter is a pointer to a signed integer value in which the length of the created SNMP GetRequest value will be stored. The fourth parameter is the SNMP community name to be used. The SNMP GetRequest value built includes a request for the SNMP MIB-II system.sysName.O value, which is the hostname of the target system.
- ▪
-
At line 105, the makegetreq() function copies the SNMP1_PDU_HEAD value into the supplied character buffer.
- ▪
-
At line 106, the makegetreq() function copies the SNMP community name supplied by the caller into the character buffer after the SNMP1_PDU_HEAD value.
- ▪
-
At line 107, the makegetreq() function copies the SNMP1_PDU_TAIL into the character buffer after the SNMP1_PDU_HEAD and SNMP community name values.
- ▪
-
At line 109, the makegetreq() function stores the length of the suppUed SNMP community name plus the constant value 35 in the second byte of the character buffer. This is required to properly format the SNMP GetRequest value.
- ▪
-
At line 110, the makegetreq() function stores the length of the SNMP community name in the byte that follows the SNMP1_PDU_HEAD value, but precedes the SNMP community name value.
- ▪
-
At line 112, the makegetreq() function stores the length of the newly created SNMP GetRequest value in the olen variable.
- ▪
-
At line 114, the makegetreq() function returns a success. At this point, a vaUd SNMP GetRequst value has been built and stored in the suppUed character buffer.
- ▪
-
At line 122 through 127, the doresQ function is defined and implemented. This function is used to receive a SNMP GetResponse value that originated from a remote host that a SNMP GetRequest value was previously sent to. This function uses the recvfrom() function to receive the SNMP GetResponse value. If a response is received, the received data is passed to the hexdumpQ function to be formatted and displayed
- ▪
-
At lines 144 through 167, the doreq() function is defined and implemented. This function makes a SNMP GetRequest value, passes the value to the hexdump() function to be formatted and displayed, and then sends the value to the target IP address and port. The send() function is used to send the value to the target
- ▪
-
At lines 174 through 209, the makeudpsock() function is defined and implemented. This function converts the supplied target IP address from string "dot" notation to an unsigned integer format. It then uses the socket() function to create a socket descriptor suitable for sending and receiving UDP datagrams. The socket descriptor is then associated with the target IP address and port using the connect() function. If all operations are successful, the makeudpsock() function returns a valid socket descriptor. Otherwise, a negative integer value is returned.
- ▪
-
At lines 216 through 243, the scan() function is defined and implemented. This function calls the makeudpsock() function to create and initialize a socket descriptor. The created socket descriptor is then passed to the doreq() function, which in turns creates a SNMP GetRequest value and sends it to the target IP address and port. The dores() function is then called to receive a SNMP GetResponse value returned from target. If no error occurs, the scan() function returns zero. Otherwise, a negative integer value is returned.
- ▪
-
At lines 250 through 257, the usage() function is defined and implemented. This function prints out usage information for the SNMP1 program.
- ▪
-
At lines 260 through 316, the main() function is defined and implemented. This is the main entry point of the program. This function processes user-supplied command-line arguments and then calls the scan() function in order to perform the scan.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9781597490054500092
Optimizing Your Application
Shane Cook , in CUDA Programming, 2013
Overlapping GPU transfers
There are two strategies for trying to overlap transfer; first, to overlap transfer times with the compute time. We've look at this in detail in the last section, explicitly with the use of streams and implicitly with the use of zero-copy memory.
Streams are a very useful feature of GPU computing. By building independent work queues we can drive the GPU device in an asynchronous manner. That is, the CPU can push a number of work elements into a queue and then go off and do something else before having to service the GPU again.
To some extent, operating the GPU synchronously with stream 0 is like polling a serial device with a single character buffer. Such devices were used in the original serial port implementations for devices like modems that operated over the RS232 interface. These are now obsolete and have been replaced with USB1, USB2, and USB3 interfaces. The original serial controller, a UART, would raise an interrupt request to the processor to say it had received enough bits to decode one character and its single character buffer was full. Only once the CPU serviced the interrupt could the communications continue. One character at a time communication was never very fast, and highly CPU intensive. Such devices were rapidly replaced with UARTs that had a 16-character buffer in them. Thus, the frequency of the device raising an interrupt to the CPU was reduced by a factor of 16. It could process the incoming characters and accumulate them to create a reasonably sized transfer to the CPU's memory.
By creating a stream of work for the GPU we're effectively doing something similar. Instead of the GPU working in a synchronous manner with the CPU, and the CPU having to poll the GPU all the time to find out if it's ready, we just give it a chunk of work to be getting on with. We then only periodically have to check if it's now out of work, and if so, push some more work into the stream or work queue.
Through the CUDA stream interface we can also drive multiple GPU devices, providing you remember to switch the desired device before trying to access it. For asynchronous operation, pinned or page-locked memory is required for any transfers to and from the GPU.
On a single-processor system, all the GPUs will be connected to a single PCI-E switch. The purpose of a PCI-E switch is to connect the various high-speed components to the PCI-E bus. It also functions as a means for PCI-E cards to talk to one another without having to go to host memory.
Although we may have multiple PCI-E devices, in the case of our test machine, four GPUs on four separate X8 PCI-E 2.0 links, they are still connected to a single PCI-E controller. In addition, depending on the implementation, this controller may actually be on the CPU itself. Thus, if we perform a set of transfers to multiple GPUs at any one point in time, although the individual bandwidth to each device may be in the order of 5 GB/s in each direction, can the PCI-E switch, the CPU, the memory, and other components work at that speed if all devices become active?
With four GPUs present on a system, what scaling can be expected? With our I7 920 Nehalem system, we measured around 5 GB/s to a single card using a PCI-E 2.0 X16 link. With the AMD system, we have around 2.5–3 GB/s on the PCI-E 2.0 X8 link. As the number of PCI-E lanes are half that of the I7 system, these sorts of numbers are around what you might expect to achieve.
We modified the bandwidth test program we used earlier for measuring the PCI-E bandwidth to measure the bandwidth as we introduce more cards and more concurrent transfers. Any number of things can affect the transfers once we start introducing concurrent transfers to different GPUs. Anyone familiar with the multi-GPU scaling within the games industry will appreciate that simply inserting a second GPU does not guarantee twice the performance. Many benchmarks show that most commercial games benefit significantly from two GPU cards. Adding a third card often introduces some noticeable benefit, but nothing like the almost times two scaling that is often seen with a second card. Adding a fourth card will often cause the performance to drop.
Now this may not seem very intuitive, adding more hardware equals lower speed. However, it's the same issue we see on CPUs when the core count becomes too high for the surrounding components. A typical high-end motherboard/CPU solution will dedicate at most 32 PCI-E lands to the PCI-E bus. This means only two cards can run at full X16 PCI-E 2.0 speed. Anything more than this is achieved by the use of PCI-E switch chips, which multiplex (share) the PCI-E lines. This works well until the two cards on the PCI-E multiplexer both need to do a transfer at the same time.
The AMD system we've run most of these tests in this book on does not use a multiplexer, but drops the speed of each connected GPU to an X8 link when four GPUs are present. Thus, at 2.5–3 GB/s per device, we could achieve a theoretical maximum of 10–12.5 GB/s. In addition, being an AMD solution, the PCI-E controller is built into the processor, which also sits between the PCI-E system and main memory. The bandwidth to main memory is approximately 12.5 GB/s. Therefore, you can see this system would be unlikely to achieve the full potential of four GPUs. See Tables 9.2 and 9.3 and Figures 9.26 and 9.27.
Table 9.2. Bandwidth Effects of Multiple PCI-E Transfers to the Device
| 1 Device | 2 Devices | 3 Devices | 4 Devices | |
|---|---|---|---|---|
| 470 to device | 3151 | 3082 | 2495 | 1689 |
| GT9800GT to device | 0 | 3069 | 2490 | 1694 |
| GTX260 to device | 0 | 0 | 2930 | 1792 |
| GTX460 to device | 0 | 0 | 0 | 1822 |
Table 9.3. Bandwidth Effects of Multiple PCI-E Transfers from the Device
| 1 Device | 2 Devices | 3 Devices | 4 Devices | |
|---|---|---|---|---|
| 470 from device | 2615 | 2617 | 2245 | 1599 |
| GT9800 from device | 0 | 2616 | 2230 | 1596 |
| GTX260 from device | 0 | 0 | 2595 | 1522 |
| GTX460 from device | 0 | 0 | 0 | 1493 |
FIGURE 9.26. Multi-GPU PCI-E bandwidth to device AMD 905e Phenom II.
FIGURE 9.27. Multi-GPU PCI-E bandwidth from device AMD 905e Phenom II.
What you can see from Tables 9.2 and 9.3 is that transfers scale quite nicely to three GPUs. We're seeing approximately linear scaling. However, when the four GPUs compete for the available resources (CPU, memory bandwidth, and PCI-E switch bandwidth) the overall rate is slower.
The other multi-GPU platform we have to work with is a six-GPU system based on the Nehalem I7 platform and the ASUS supercomputer motherboard (P6T7WS) with 3 GTX295 Dual GPU cards. This uses dual NF200 PCI-E switch chips allowing each PCI-E card to work with a full X16 link. While this might be useful for inter-GPU communication, the P2P (peer-to-peer) model supported in CUDA 4.x, it does not extend the bandwidth available to and from the host if both cards are simultaneously using the bus. Internally, each GPU has to share the X16 PCI-E 2.0 link. Table 9.4 and Figure 9.28 show what effect this has.
Table 9.4. I7 Bandwidth to Device
| 1 Device | 2 Devices | 3 Devices | 4 Devices | 5 Devices | 6 Devices | |
|---|---|---|---|---|---|---|
| To device 0 | 5026 | 3120 | 2846 | 2459 | 2136 | 2248 |
| To device 1 | 0 | 3117 | 3328 | 2123 | 1876 | 1660 |
| To device 2 | 0 | 0 | 2773 | 2277 | 2065 | 2021 |
| To device 3 | 0 | 0 | 0 | 2095 | 1844 | 1588 |
| To device 4 | 0 | 0 | 0 | 0 | 1803 | 1607 |
| To device 5 | 0 | 0 | 0 | 0 | 0 | 1579 |
| Overall | 5026 | 6237 | 8947 | 8954 | 9724 | 10,703 |
FIGURE 9.28. I7 bandwidth to device.
As you can see from Table 9.4, we see an approximate linear increase in total bandwidth to the device. We achieve a peak of just over 10 GB/s, 20% or so higher than our AMD-based system.
We can see the bandwidth from the device is a different story (Table 9.5 and Figure 9.29). Bandwidth peaks with two devices, and is not significantly higher than our AMD system. This is not altogether unexpected if you consider the design in most GPU systems is based around gaming. In a game, most of the data is being sent to the GPU with very little if any coming back to the CPU host. Thus, we see a near linear scaling of up to three cards, which coincides with the top-end triple SLI (scalable link interface) gaming platforms. Vendors have little incentive to provide PCI-E bandwidth beyond this setup. As the GTX290 is actually a dual-GPU card, we may also be seeing that the internal SLI interface is not really able to push the limits of the card. We're clearly seeing some resource contention.
Table 9.5. I7 Bandwidth from Device
| 1 Device | 2 Devices | 3 Devices | 4 Devices | 5 Devices | 6 Devices | |
|---|---|---|---|---|---|---|
| From device 0 | 4608 | 3997 | 2065 | 1582 | 1485 | 1546 |
| From device 1 | 0 | 3976 | 3677 | 1704 | 1261 | 1024 |
| From device 2 | 0 | 0 | 2085 | 1645 | 1498 | 1536 |
| From device 3 | 0 | 0 | 0 | 1739 | 1410 | 1051 |
| From device 4 | 0 | 0 | 0 | 0 | 1287 | 1035 |
| From device 5 | 0 | 0 | 0 | 0 | 0 | 1049 |
| Overall | 4608 | 7973 | 7827 | 6670 | 6941 | 7241 |
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780124159334000090
Vulnerability Types
Russ Rogers , in Nessus Network Auditing (Second Edition), 2008
Buffer Overflows
Buffer overflows are perhaps the most famous type of critical vulnerability. They are caused by a programmer's failure to limit the amount of information that can be written in to a predefined buffer. When data is copied from one source (such as a network socket) into the buffer, an overflow can occur if the input data is greater than the size of the destination buffer. The programmer is responsible for checking the length value of the input prior to the copy operation.
If the length of input data is not checked, or the allocation routine for the destination buffer makes a mistake in the size of the input, the copy operation can result in memory corruption. Depending on where the destination buffer is stored in memory, this corruption can be used to hijack control of the vulnerable program. Although the exploitation details vary from platform to platform, nearly all buffer overflow flaws involving user input can result in the creation of a critical vulnerability.
For example, let's pretend that Bob has written an Internet chat system that requires users to provide their names when they connect. When developing this program, Bob uses a temporary 50-byte character buffer to store the name received from the connecting user. After all, nobody he knows has a name anywhere near that long, so it should be more than enough room.
Now, assume that Bob's program gets packaged, sold, and distributed for sale. A copy falls into the hands of curious Brian. Brian installs the server and uses the Telnet program to connect to the service. The service asks for his name, but instead of giving it Brian, he sends a long repeated string of the letter "A." To his surprise, the chat server immediately closes his Telnet session and refuses to accept new connections. Brian then runs the chat server again, this time with the help of a debugging tool. After sending the long string of "A" characters, the debugger shows that an exception occurred when trying to access the memory address 0x41414141 (the letter A has the hex value of 41).
Brian has seen this before. This appears to be a standard buffer overflow; the long name he provided has been copied over all other local variables in the vulnerable function, and has continued on to trash program state information in the process's stack memory. This can be exploited to execute arbitrary code on this system, such as an interactive command shell.
Nessus uses a variety of techniques to identify network services that are vulnerable to buffer overflow attacks. When the vulnerable service runs inside a single process, it is usually not possible to actually test for the overflow without crashing the service completely. To work around this limitation, Nessus employs techniques such as version fingerprinting, banner matches, and even partial overflows to determine whether a given service is vulnerable.
Buffer overflows in real software are often somewhat more complicated than this, but usually not by much—the basic principles remain the same. In the last few years, there have been buffer overflows discovered in products as diverse as gaim (OSVDB ID 3734, CAN-2004-0005), Mac OS X (OSVDB ID 3043, CAN-2003-1006), and Oracle (OSVDB ID 2449, CAN-2003-0727).
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B978159749208900006X
Hierarchy Smells
Girish Suryanarayana , ... Tushar Sharma , in Refactoring for Software Design Smells, 2015
Example 2
Consider a GUI application which provides rich support for drawing graphical objects. One of the features of this editor is its support for drawing lines with different styles such as double line, thick thin line, thin thick line, and triple line. Further, you can draw lines with different dash or dot styles (see Figure 6.7). When the user selects a line style, the GUI application passes this information to the underlying Operating System (OS) which renders the lines in the selected style.
FIGURE 6.7. Kinds of line types and dot types supported in a drawing application (Example 2).
One way to model this is to use an inheritance hierarchy (see Figure 6.8). But is this an effective design? In fact, there are two main problems with adopting such an inheritance hierarchy based approach for this example:
FIGURE 6.8. Class explosion in the LineStyle hierarchy to support multiple line styles (Example 2).
FIGURE 6.9. Partial hierarchy of java.nio.Buffer.
- •
-
As Blaha and Rumbaugh [52] observed, classification is meaningful if there are some significant subclasses that have more specialized behavior when compared to the base class. In this example, since the actual rendering of the line with different styles is performed by the underlying OS, the supertype and its subtypes do not differ in behavior. Clearly, modeling this relationship as generalization-specialization relationship is "over-engineering."
- •
-
If a new kind of style support is needed, then the number of derived classes can grow exponentially. In other words, the design can suffer from "class explosion" problem.
For these reasons, this design suffers from Unnecessary Hierarchy smell.
CASE STUDY
A "data buffer" is a temporary place to store data when we move data from one place to another. There are many uses of data buffer. For instance, we may want to use a buffer to read data from a text file into memory and cache it for performance reasons. There are different kinds of data that a buffer can handle; for instance, one can use a character buffer for reading a text file.
Now, consider a (partial) class hierarchy of the class java.nio.Buffer (see Figure 6.9). What is unusual about this hierarchy is that the subclasses correspond to the primitive types available in Java!
Intrigued by this Unnecessary Hierarchy, we set about searching for the source code for these classes. What we found was a single file named "X-Buffer.java.template"! Here is a code fragment from this file:
#warn This file is preprocessed before being compiled
package java.nio;
#if[char]
import java.io.IOException;

0 Response to "How To Create A Buffer In C"
Post a Comment