To demonstrate this, we create a file
testFile.txt
using seq
command in Linux. For those who do not know about seq
command, it prints a sequence of numbers, which we would dump into a file. Let's do it.# Dump 'seq' output to 'testfile.txt' [root@LinuxFault split_test]# seq 10000 > testfile.txt # First 10 lines [root@LinuxFault split_test]$ head testfile.txt 1 2 3 4 5 6 7 8 9 10 # Last 10 lines [root@LinuxFault split_test]$ tail testfile.txt 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
1. Basic use of split
The basic usage of any command is when it is not used with any option. In this case, we would supply the file name as an argument or parameter to
split
command as shown below. When it gets executed, run ls
command to list the smaller parts of the file.[root@LinuxFault split_test]$ split testfile.txt # Large file has been split into number of smaller files [root@LinuxFault split_test]$ ls testfile.txt xaa xab xac xad xae xaf xag xah xai xaj
We could see a number of files with names in the format
x--
have been created. In order to make sure that they are the parts of the original file, we check the number of lines and even their contents.# Original file -> 10000 Lines # 10 parts -> 1000 lines each [root@LinuxFault split_test]$ wc -l * 10000 testfile.txt 1000 xaa 1000 xab 1000 xac 1000 xad 1000 xae 1000 xaf 1000 xag 1000 xah 1000 xai 1000 xaj 20000 total # Check the contents of first part [root@LinuxFault split_test]$ head xaa 1 2 3 4 5 6 7 8 9 10 # Check the contents of last file [root@LinuxFault split_test]$ tail xaj 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 [root@LinuxFault split_test]$
In this case, the file has been split into 10 smaller chunks based on a number of lines, such that every chunk consists of 1000 lines. Instead, you might want the file to be split into a specific number of chunks, say 5 chunks (so that, every chunk will contain 2000 lines). Let's see how to do that.
2. Split a file in 'n' smaller parts - Option -n
We can define the number of parts a file should be split into using option
-n
. The syntax for this is split -n [No. of chunks] [file name]
. Let's create 5 chunks of our file testfile.txt
.# Specify the number of chunks [root@LinuxFault split_test]$ split -n 5 testfile.txt # There are 5 chunks created [root@LinuxFault split_test]$ ls testfile.txt xaa xab xac xad xae # Their sizes may vary [root@LinuxFault split_test]$ wc -l * 10000 testfile.txt 2177 xaa 1955 xab 1956 xac 1955 xad 1957 xae 20000 total # But, they contribute to the same information [root@LinuxFault split_test]$ head xaa 1 2 3 4 5 6 7 8 9 10 [root@LinuxFault split_test]$ tail xae 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 [root@LinuxFault split_test]$
So, this command has created 5 chunks of the file for us, which might differ in their sizes, but eventually have same contents as in the original file when put together. Next, we see how chunks can be created based on size of every chunk.
3. Split a file into chunks of equal sizes - Option -b
We've seen how files can be split based on a number of lines and number of chunks, now we see how to split a file based on the size of every chunk so as to create chunks of equal sizes. For this, we use option
-b
as split -b [size] [file name]
, where the size must be mentioned in bytes.# We specify the chunk size to be 10000 bytes [root@LinuxFault split_test]$ split -b 10000 testfile.txt # It creates 5 chunks for us [root@LinuxFault split_test]$ ll total 108 -rw-r--r--. 1 root root 48894 Nov 11 14:09 testfile.txt -rw-r--r--. 1 root root 10000 Nov 11 14:38 xaa -rw-r--r--. 1 root root 10000 Nov 11 14:38 xab -rw-r--r--. 1 root root 10000 Nov 11 14:38 xac -rw-r--r--. 1 root root 10000 Nov 11 14:38 xad -rw-r--r--. 1 root root 8894 Nov 11 14:38 xae
We can see that there are 5 chunks created, 4 of which with a size of 10000 bytes and others with leftover data. Now, we can split a file based on the size of each chunk, number of chunks and lines in each chunk. Lines? Not yet. 1000 lines are the default value and we can modify it as per our need.
4. Creating chunks with 'n' lines each - Option -l
With
-l
option of split
command, we can set the number of lines each chunk should contain. The syntax is the same, with a different option this time. Let's split the file with each chunk having 1200 lines.# Specify the number of lines -> 1200 [root@LinuxFault split_test]$ split -l 1200 testfile.txt [root@LinuxFault split_test]$ ls testfile.txt xaa xab xac xad xae xaf xag xah xai [root@LinuxFault split_test]$ wc -l * 10000 testfile.txt 1200 xaa 1200 xab 1200 xac 1200 xad 1200 xae 1200 xaf 1200 xag 1200 xah 400 xai 20000 total # Verify their contents [root@LinuxFault split_test]$ head xaa 1 2 3 4 5 6 7 8 9 10 [root@LinuxFault split_test]$ tail xai 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
5. Numeric suffixes - Option -d
We have seen that the names of the chunks created are alphabetical if the format
x--
, where -
is also an alphabet. We can change this to a digit so that it reads as x01
, x02
and so on (it makes more sense), using option -d
as below.# Numeric Suffixes [root@LinuxFault split_test]$ split -d testfile.txt [root@LinuxFault split_test]$ ls testfile.txt x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
6. Suffix length - Option -a
We can also change the suffix length using option
-a
, so that x01
would read as x0001
if we specify suffix length = 4. Let's check this.# Suffix Length = 4 [root@LinuxFault split_test]$ split -d -a 4 testfile.txt [root@LinuxFault split_test]$ ls testfile.txt x0000 x0001 x0002 x0003 x0004 x0005 x0006 x0007 x0008 x0009
That's it! Thank you.