Code: READY AND FINAL!
README: WIP
Pipex mimics the functionality of the shell pipe command '|
' by executing ./pipex infile cmd1 cmd2 outfile
, which emulates the behavior of < infile cmd1 | cmd2 > outfile
. It facilitates the connection of the standard output of one command to the standard input of another command, creating a pipeline for data flow between commands executed within separate processes.
- Command Execution: Utilizing the PATH environment variable to execute commands via
execve()
. - Process Management: Creating child processes and establishing inter-process communication via
fork()
,waitpid()
,pipe()
, anddup2()
. - Error Handling: Ensuring robustness by implementing mechanisms to protect the program from unexpected behavior and failure, using
perror()
,strerror()
, anderrno
. - Imitating Shell Behavior: Replicating the behavior of the shell as closely as possible (zsh).
Environment variables are essential elements of the operating system's environment. They store information that various processes and applications utilize to configure their behavior and access system resources.
For example, common commands such as 'grep', 'ls', or 'cat' are exectuable files stored within the system. To determine the exact path(s) to a specific command, you can use which
in bash or where
in zsh, followed by the command name, such as which grep
or which ls
.
When calling a command, the terminal shell checks the PATH environment variable. This variable contains a list of directories, delimited by colons, where the operating system searches to find the executable file corresponding to the given command.
To view a list of all environment variables and their values, you can execute the env
command in the terminal. This command displays a list like this (excerpt):
[...]
LANGUAGE=en
USER=aschenk
SHELL=/bin/zsh
[...]
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
In a C program, you can access the list of environment variables by including char **envp
as the third argument to the main function, e.g. int main(int argc, char **argv, char **envp)
. The envp parameter is structured as an array of strings in the format "VARIABLE=value", for example envp = {"LANGUAGE=en", "PATH=/usr/local/sbin:[...]", "[...]", NULL}
.
To understand how Pipex retrieves the path to a specified command, please refer to the get_command_path()
function here.
So far so good – but why is it necessary to create multiple processes to execute multiple commands? Theoretically, you could save the output of a command in a variable and pass this as an input for another command, couldn't you? Such "command chaining" does work in shell scripting, e.g.:
output_of_command1=$(< infile.txt command1)
command2 "$output_of_command1" > outfile.txt
However, in C, you would use a system call from the exec()
family for this purpose ((for more information, see here). As per project requirements, Pipex uses execve()
(execute with vector of environment variables):
int execve(const char *path, char **const argv, char **const envp)
- const char *path: Represents the path to the command executable, e.g.
/usr/bin/ls
. - char **const argv: Represents the command arguments in a NULL-terminated char array, e.g.
{"ls", "-l", NULL}
. - char **const envp: Represents the list of environment variables.
Members of the exec()
family behave uniquely by loading and executing a new program (the command), effectively replacing the current process when called. They do not return to the original process after successful execution. This means that once execve()
is called successfully (not returning -1), any following code is not executed.
So, to execute commands with input/output redirection, such as cmd1 < infile | cmd2 > outfile
, each command execution requires a separate call to execve()
. Since execve()
replaces the current process, one process per command is necessary. The creation of additional processes is achieved through fork()
. To enable communication between these processes, pipe()
is used, which establishes a unidirectional communication channel.
Creating a new process is simply done by calling fork()
, which creates two identical copies of the program's execution environment, with one being the parent (return value of fork()
> 0) and the other being the child (return value of fork()
= 0).
Let's look at a simple program using fork()
:
DISCLAIMER:
Please note that the following code examples have been selected for relevance of explaining specific system calls,
they do not directly relate to the pipex project.
// fork.c
#include <stdio.h> // prinft()
#include <unistd.h> // fork(), usleep()
#include <sys/types.h> // pid_t: int or long representing process ID's (PIDs)
int main(void)
{
pid_t child_pid;
printf("Before the fork!\n");
child_pid = fork();
printf("After the fork! Child PID: %d\n", child_pid);
if (child_pid == 0) // Child process
printf("Hello from the child! Child PID: %d\n", child_pid);
else // Parent process
printf("Hello from the parent! Child PID: %d\n", child_pid);
return (0);
}
"Before the fork!" is printed out once, before fork()
is called. Then, "After the fork!" is printed out twice: Once by the parent process and once by the child process. This is because the fork()
call creates a new process, resulting in two separate execution paths. In the parent process, it returns the process ID (PID) of the child process (> 0), while in the child process, it returns 0. This makes it possible to execute different tasks by distinguishing between the PIDs (if (pid == 0)
for child process tasks and else
for parent process tasks).
Note that the parent and child processes run in parallel, meaning they execute independently and their execution order is somewhat random. While it's not straightforward to predict the exact order in which they will execute, introducing delays using functions like sleep() / usleep()
can help synchronize their behavior to some extent:
// sleep_fork.c
#include <stdio.h> // prinft()
#include <unistd.h> // fork(), usleep()
#include <sys/types.h> // pid_t: int or long representing process ID's (PIDs)
int main(void)
{
pid_t child_pid;
printf("Before the fork!\n");
child_pid = fork();
printf("After the fork! Child PID: %d\n", child_pid);
usleep(10); // Pause execution for 10 microseconds
if (child_pid == 0) // Child process
printf("Hello from the child! Child PID: %d\n", child_pid);
else // Parent process
printf("Hello from the parent! Child PID: %d\n", child_pid);
return (0);
}
A more controlled way for synchronizing the execution order can be achieved with waitpid()
. It halts the execution until the passed process terminates, allowing the parent process to wait for the completion of a specific child process before continuing its execution. waitpid()
can also be used to retrieve and propagate the exit status of a child process (learn more here).
// waitpid_fork.c
#include <stdio.h> // prinft()
#include <unistd.h> // fork(), usleep()
#include <sys/types.h> // pid_t: int or long representing process ID's (PIDs)
#include <sys/wait.h> // waitpid()
int main(void)
{
pid_t child_pid;
printf("Before the fork!\n");
child_pid = fork();
printf("After the fork! Child PID: %d\n", child_pid);
usleep(10); // Pause execution for 10 microseconds
if (child_pid == 0) // Child process
printf("Hello from the child! Child PID: %d\n", child_pid);
else // Parent process
{
waitpid(child_pid, NULL, 0); // waits for the child process to finish
printf("Hello from the parent! Child PID: %d\n", child_pid);
}
return (0);
}
#include <unistd.h> // pipe(), read(), write()
#include <stdio.h> // printf()
#include <string.h> // strlen()
int main(void)
{
int pipe_fd[2];
pid_t child_pid;
pid_t received_child_pid;
char message[] = "Hello from the child! PID:";
char buffer[42];
pipe(pipe_fd); // Pipe initialization
child_pid = fork();
if (child_pid == 0) // Child process
{
close(pipe_fd[0]); // Close the read end of the pipe
write(pipe_fd[1], message, strlen(message) + 1); // Write message to the pipe
write(pipe_fd[1], &child_pid, sizeof(pid_t)); // Write child PID to the pipe
close(pipe_fd[1]); // Close the write end of the pipe
}
else // Parent process
{
close(pipe_fd[1]); // Close the write end of the pipe
printf("Here is the pareny! PID: %d\n", child_pid);
read(pipe_fd[0], buffer, sizeof(buffer)); // Read message from pipe
read(pipe_fd[0], &received_child_pid, sizeof(pid_t)); // Read child PID from pipe
printf("The child says: '%s %d'\n", buffer, received_child_pid);
close(pipe_fd[0]); // Close the read end of the pipe
}
return (0);
}
#include <unistd.h> // pipe(), read()
#include <stdio.h> // printf()
int main(void)
{
int pipe_fd[2];
pid_t child_pid;
char buffer[42];
pipe(pipe_fd); // Pipe initialization
child_pid = fork();
if (child_pid == 0) // Child process
{
close(pipe_fd[0]); // Close the read end of the pipe
dup2(pipe_fd[1], 1); // Redirect stdout (fd = 0) to the write end of the pipe
// Now stdout is redirected to the pipe, so printf will write to the pipe
printf("Hello from the child! PID: %d", child_pid);
close(pipe_fd[1]); // Close the original write end of the pipe
}
else // Parent process
{
close(pipe_fd[1]); // Close the write end of the pipe
printf("Here is the parent! PID: %d\n", child_pid);
read(pipe_fd[0], buffer, sizeof(buffer)); // Read message from pipe
printf("The child says: '%s'\n", buffer);
close(pipe_fd[0]); // Close the read end of the pipe
}
return (0);
}
Using Z Shell (zsh).
Comparison (output shell + output pipex):
-
single invalid input:
- infile not existent
- infile no access
- invalid command
- invalid command option
- infile not existent & invalid command
- Note: The color-coded output signals 'success' (blue)! -> left side is handled in a process that does NOT report the EXIT status to the parten.
- Note: If multiple invalid comments: Only the file-related issue is addressed, not the invalid command -> process exits after file access fails.
- Note: An empty outfile.txt is created ('rw-r--r-- permissions) even if the pipe call failed.
- Same as above for the left side BUT color-coded output signals 'error' (red) -> right side is handled in a process that reports EXIT status to parent
- Same as above, error messages for both sides are printed out -> processes handling each side run parallel; having one exit does not result in the other process not being executed.
- Let's say you want to count the words in a file and store the results in another, but use the same file as input and output
< infile.txt wc -w | cat > infile.txt
, the result would be file with '0' in it -> The 'outfile' is created first as an empty file (overwriting the actual 'infile') and THEN process are initiated.
< infile.txt yes | head > outfile.txt
./pipex infile.txt yes head outfile.txt
The project badge used is retrieved from this repo by Ali Ogun.