Powered by GitBook

linux

Why? 因为hadoop, spark 以及R在linux上跑的比较好。

文件操作

cd
ls
find
cat
mkdir，创建目录
rm，删除文件
rm -rf folder，删除非空目录
mv，移动
cp，复制
cp -r，复制目录

数据操作

head -n 3 data.csv
tail -n 3 data.csv
grep, grep 'data' todo.txt
wc -l file，统计行数 l->line
wc -w file，统计单词数 w->word
wc -c file，统计字符数 c->character
统计/home/han目录(含子目录)下所有的js文件
ls -IR/home/han |grep js| wc -l
sort，-n按数字排序, -d按字典序排序，-r逆序，-k N指定按第N列排序
sort -nrk 1 data.txt
sort unsorted.txt|uniq消除重复行
cut -f 2,4 filename 截取文件第2,4列
paste file1 file2 -d","按列拼接两个文件
iconv -f GBK -t UTF-8 file1 -o file2改变编码

results matching ""

No results matching ""