Overview: • About Miller • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Internationalization Using Miller: • Reference • Manpage • FAQ • Cookbook • Data examples • Installation, portability, dependencies, and testing • Documents by release Background: • Why C? • Why call it Miller? • How original is Miller? • Performance Repository: • Things to do • Contact information • GitHub repo |
• Rectangularizing data • Bulk rename of field names • Headerless CSV on input or output • Regularizing ragged CSV • Finding missing dates • Two-pass algorithms • Two-pass algorithms: computation of percentages • Two-pass algorithms: line-number ratios • Two-pass algorithms: records having max value • Filtering paragraphs of text • Doing arithmetic on fields with currency symbols • Program timing • Using out-of-stream variables • Mean without/with oosvars • Keyed mean without/with oosvars • Variance and standard deviation without/with oosvars • Min/max without/with oosvars • Keyed min/max without/with oosvars • Delta without/with oosvars • Keyed delta without/with oosvars • Exponentially weighted moving averages without/with oosvars Parsing log-file outputThis, of course, depends highly on what’s in your log files. But, as an example, suppose you have log-file lines such as2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378 grep 'various sorts' *.log | sed 's/.*} //' | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status Rectangularizing dataSuppose you have a method (in whatever language) which is printing things of the formouter=1 outer=2 outer=3 middle=10 middle=11 middle=12 middle=20 middle=21 middle=30 middle=31 inner1=100,inner2=101 inner1=120,inner2=121 inner1=200,inner2=201 inner1=210,inner2=211 inner1=300,inner2=301 inner1=312 inner1=313,inner2=314 outer=1 middle=10 inner1=100,inner2=101 middle=11 middle=12 inner1=120,inner2=121 outer=2 middle=20 inner1=200,inner2=201 middle=21 inner1=210,inner2=211 outer=3 middle=30 inner1=300,inner2=301 middle=31 inner1=312 inner1=313,inner2=314 $ mlr --from data/rect.txt put -q ' ispresent($outer) { unset @r } for (k, v in $*) { @r[k] = v } ispresent($inner1) { emit @r }' outer=1,middle=10,inner1=100,inner2=101 outer=1,middle=12,inner1=120,inner2=121 outer=2,middle=20,inner1=200,inner2=201 outer=2,middle=21,inner1=210,inner2=211 outer=3,middle=30,inner1=300,inner2=301 outer=3,middle=31,inner1=312,inner2=301 outer=3,middle=31,inner1=313,inner2=314 Bulk rename of field names$ cat data/spaces.csv a b c,def,g h i 123,4567,890 2468,1357,3579 9987,3312,4543 $ mlr --csv --rs lf rename -r -g ' ,_' data/spaces.csv a_b_c,def,g_h_i 123,4567,890 2468,1357,3579 9987,3312,4543 $ mlr --csv --irs lf --opprint rename -r -g ' ,_' data/spaces.csv a_b_c def g_h_i 123 4567 890 2468 1357 3579 9987 3312 4543 $ cat data/bulk-rename-for-loop.mlr for (oldk,v in $*) { @newk = gsub(oldk, " ", "_"); if (@newk != oldk) { unset $[oldk]; $[@newk] = v } } $ mlr --icsv --irs lf --opprint put -f data/bulk-rename-for-loop.mlr data/spaces.csv def a_b_c g_h_i 4567 123 890 1357 2468 3579 3312 9987 4543 Headerless CSV on input or outputSometimes we get CSV files which lack a header. For example:$ cat data/headerless.csv John,23,present Fred,34,present Alice,56,missing Carol,45,present $ mlr --csv --rs lf --implicit-csv-header cat data/headerless.csv 1,2,3 John,23,present Fred,34,present Alice,56,missing Carol,45,present $ mlr --icsv --irs lf --implicit-csv-header --opprint cat data/headerless.csv 1 2 3 John 23 present Fred 34 present Alice 56 missing Carol 45 present $ mlr --csv --rs lf --implicit-csv-header label name,age,status data/headerless.csv name,age,status John,23,present Fred,34,present Alice,56,missing Carol,45,present $ mlr --icsv --rs lf --implicit-csv-header --opprint label name,age,status data/headerless.csv name age status John 23 present Fred 34 present Alice 56 missing Carol 45 present $ head -5 data/colored-shapes.dkvp | mlr --ocsv cat color,shape,flag,i,u,v,w,x yellow,triangle,1,11,0.6321695890307647,0.9887207810889004,0.4364983936735774,5.7981881667050565 red,square,1,15,0.21966833570651523,0.001257332190235938,0.7927778364718627,2.944117399716207 red,circle,1,16,0.20901671281497636,0.29005231936593445,0.13810280912907674,5.065034003400998 red,square,0,48,0.9562743938458542,0.7467203085342884,0.7755423050923582,7.117831369597269 purple,triangle,0,51,0.4355354501763202,0.8591292672156728,0.8122903963006748,5.753094629505863 $ head -5 data/colored-shapes.dkvp | mlr --ocsv --headerless-csv-output cat yellow,triangle,1,11,0.6321695890307647,0.9887207810889004,0.4364983936735774,5.7981881667050565 red,square,1,15,0.21966833570651523,0.001257332190235938,0.7927778364718627,2.944117399716207 red,circle,1,16,0.20901671281497636,0.29005231936593445,0.13810280912907674,5.065034003400998 red,square,0,48,0.9562743938458542,0.7467203085342884,0.7755423050923582,7.117831369597269 purple,triangle,0,51,0.4355354501763202,0.8591292672156728,0.8122903963006748,5.753094629505863 Regularizing ragged CSVMiller handles compliant CSV: in particular, it’s an error if the number of data fields in a given data line don’t match the number of header lines. But in the event that you have a CSV file in which some lines have less than the full number of fields, you can use Miller to pad them out. The trick is to use NIDX format, for which each line stands on its own without respect to a header line.$ cat data/ragged.csv a,b,c 1,2,3 4,5 6 7,8,9 $ mlr --from data/ragged.csv --fs comma --nidx put ' @maxnf = max(@maxnf, NF); @nf = NF; while(@nf < @maxnf) { @nf += 1; $[@nf] = "" } ' a,b,c 1,2,3 4,5, 6,, 7,8,9 $ mlr --from data/ragged.csv --fs comma --nidx put ' @maxnf = max(@maxnf, NF); while(NF < @maxnf) { $[NF+1] = ""; } ' a,b,c 1,2,3 4,5, 6,, 7,8,9 Finding missing datesSuppose you have some date-stamped data which may (or may not) be missing entries for one or more dates:$ head -n 10 data/miss-date.csv date,qoh 2012-03-05,10055 2012-03-06,10486 2012-03-07,10430 2012-03-08,10674 2012-03-09,10880 2012-03-10,10718 2012-03-11,10795 2012-03-12,11043 2012-03-13,11177 $ wc -l data/miss-date.csv 1372 data/miss-date.csv $ mlr --from data/miss-date.csv --icsv \ cat -n \ then put '$datestamp = strptime($date, "%Y-%m-%d")' \ then step -a delta -f datestamp \ | head $ mlr --from data/miss-date.csv --icsv \ cat -n \ then put '$datestamp = strptime($date, "%Y-%m-%d")' \ then step -a delta -f datestamp \ then filter '$datestamp_delta != 86400 && $n != 1' $ mlr cat -n then filter '$n >= 770 && $n <= 780' data/miss-date.csv n=770,1=2014-04-12,2=129435 n=771,1=2014-04-13,2=129868 n=772,1=2014-04-14,2=129797 n=773,1=2014-04-15,2=129919 n=774,1=2014-04-16,2=130181 n=775,1=2014-04-19,2=130140 n=776,1=2014-04-20,2=130271 n=777,1=2014-04-21,2=130368 n=778,1=2014-04-22,2=130368 n=779,1=2014-04-23,2=130849 n=780,1=2014-04-24,2=131026 $ mlr cat -n then filter '$n >= 1115 && $n <= 1125' data/miss-date.csv n=1115,1=2015-03-25,2=181006 n=1116,1=2015-03-26,2=180995 n=1117,1=2015-03-27,2=181043 n=1118,1=2015-03-28,2=181112 n=1119,1=2015-03-29,2=181306 n=1120,1=2015-03-31,2=181625 n=1121,1=2015-04-01,2=181494 n=1122,1=2015-04-02,2=181718 n=1123,1=2015-04-03,2=181835 n=1124,1=2015-04-04,2=182104 n=1125,1=2015-04-05,2=182528 Two-pass algorithmsMiller is a streaming record processor; commands are performed once per record. This makes Miller particularly suitable for single-pass algorithms, allowing many of its verbs to process files that are (much) larger than the amount of RAM present in your system. (Of course, Miller verbs such as sort, tac, etc. all must ingest and retain all input records before emitting any output records.) You can also use out-of-stream variables to perform multi-pass computations, at the price of retaining all input records in memory.Two-pass algorithms: computation of percentagesFor example, mapping numeric values down a column to the percentage between their min and max values is two-pass: on the first pass you find the min and max values, then on the second, map each record’s value to a percentage.$ mlr --from data/small --opprint put -q ' # These are executed once per record, which is the first pass. # The key is to use NR to index an out-of-stream variable to # retain all the x-field values. @x_min = min($x, @x_min); @x_max = max($x, @x_max); @x[NR] = $x; # The second pass is in a for-loop in an end-block. end { for (nr, x in @x) { @x_pct[nr] = 100 * (@x[nr] - @x_min) / (@x_max - @x_min); } emit (@x, @x_pct), "NR" } ' NR x x_pct 1 0.346790 25.661943 2 0.758680 100.000000 3 0.204603 0.000000 4 0.381399 31.908236 5 0.573289 66.540542 Two-pass algorithms: line-number ratiosSimilarly, finding the total record count requires first reading through all the data:$ mlr --opprint --from data/small put -q ' @records[NR] = $*; end { for((I,k),v in @records) { @records[I]["I"] = I; @records[I]["N"] = NR; @records[I]["PCT"] = 100*I/NR } emit @records,"I" } ' then reorder -f I,N,PCT I N PCT a b i x y 1 5 20 pan pan 1 0.3467901443380824 0.7268028627434533 2 5 40 eks pan 2 0.7586799647899636 0.5221511083334797 3 5 60 wye wye 3 0.20460330576630303 0.33831852551664776 4 5 80 eks wye 4 0.38139939387114097 0.13418874328430463 5 5 100 wye pan 5 0.5732889198020006 0.8636244699032729 Two-pass algorithms: records having max valueThe idea is to retain records having the largest value of n in the following data:$ mlr --itsv --irs lf --opprint cat data/maxrows.tsv a b n score purple red 5 0.743231 blue purple 2 0.093710 red purple 2 0.802103 purple red 5 0.389055 red purple 2 0.880457 orange red 2 0.540349 purple purple 1 0.634451 orange purple 5 0.257223 orange purple 5 0.693499 red red 4 0.981355 blue purple 5 0.157052 purple purple 1 0.441784 red purple 1 0.124912 orange blue 1 0.921944 blue purple 4 0.490909 purple red 5 0.454779 green purple 4 0.198278 orange blue 5 0.705700 red red 3 0.940705 purple red 5 0.072936 orange blue 3 0.389463 orange purple 2 0.664985 blue purple 1 0.371813 red purple 4 0.984571 green purple 5 0.203577 green purple 3 0.900873 purple purple 0 0.965677 blue purple 2 0.208785 purple purple 1 0.455077 red purple 4 0.477187 blue red 4 0.007487 $ cat data/maxrows.mlr # Retain all records @records[NR] = $*; # Track max value of n @maxn = max(@maxn, $n); # After all records have been read, loop through retained records # and print those with the max n value end { for ((nr,k),v in @records) { if (k == "n") { if (@records[nr]["n"] == @maxn) { emit @records[nr] } } } } $ mlr --itsv --irs lf --opprint put -q -f data/maxrows.mlr data/maxrows.tsv a b n score purple red 5 0.743231 purple red 5 0.389055 orange purple 5 0.257223 orange purple 5 0.693499 blue purple 5 0.157052 purple red 5 0.454779 orange blue 5 0.705700 purple red 5 0.072936 green purple 5 0.203577 Filtering paragraphs of textThe idea is to use a record separator which is a pair of newlines. Then, if you want each paragraph to be a record with a single value, use a field-separator which isn’t present in the input data (e.g. a control-A which is octal 001). Or, if you want each paragraph to have its lines as separate values, use newline as field separator.$ cat paragraphs.txt The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. Sphynx of black quartz, judge my vow. Sphynx of black quartz, judge my vow. Sphynx of black quartz, judge my vow. Sphynx of black quartz, judge my vow. Sphynx of black quartz, judge my vow. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. $ mlr --from paragraphs.txt --nidx --rs '\n\n' --fs '\001' filter '$1 =~ "the"' The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. The quick brown fox jumped over the lazy dogs. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. Now is the time for all good people to come to the aid of their country. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly on the plain. $ mlr --from paragraphs.txt --nidx --rs '\n\n' --fs '\n' cut -f 1,3 The quick brown fox jumped over the lazy dogs. The quick brown fox jumped brown fox jumped over the lazy dogs. The quick brown fox jumped over the Now is the time for all good people to come to the aid of their country. Now the time for all good people to come to the aid of their country. Now is the Sphynx of black quartz, judge my vow. Sphynx of black quartz, judge my vow. Sphynx of black quartz, judge my vow. The rain in Spain falls mainly on the plain. The rain in Spain falls mainly falls mainly on the plain. The rain in Spain falls mainly on the plain. The Doing arithmetic on fields with currency symbols$ cat sample.csv EventOccurred,EventType,Description,Status,PaymentType,NameonAccount,TransactionNumber,Amount 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,John,1,$230.36 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Fred,2,$32.25 10/1/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Bob,3,$39.02 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Alice,4,$57.54 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Jungle,5,$230.36 10/1/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Joe,6,$281.96 10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,7,$188.19 10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,8,$188.19 10/2/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Anthony,9,$250.00 $ mlr --icsv --opprint cat sample.csv EventOccurred EventType Description Status PaymentType NameonAccount TransactionNumber Amount 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking John 1 $230.36 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Fred 2 $32.25 10/1/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Bob 3 $39.02 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Alice 4 $57.54 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Jungle 5 $230.36 10/1/2015 Charged Back Reason: Payment Stopped Disputed Checking Joe 6 $281.96 10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 7 $188.19 10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 8 $188.19 10/2/2015 Charged Back Reason: Payment Stopped Disputed Checking Anthony 9 $250.00 $ mlr --csv put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv Amount_sum 1497.870000 $ mlr --csv --ofmt '%.2lf' put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv Amount_sum 1497.87 Program timingThis admittedly artificial example demonstrates using Miller time and stats functions to introspectly acquire some information about Miller’s own runtime. The delta function computes the difference between successive timestamps.$ ruby -e '10000.times{|i|puts "i=#{i+1}"}' > lines.txt $ head -n 5 lines.txt i=1 i=2 i=3 i=4 i=5 mlr --ofmt '%.9le' --opprint put '$t=systime()' then step -a delta -f t lines.txt | head -n 7 i t t_delta 1 1430603027.018016 1.430603027e+09 2 1430603027.018043 2.694129944e-05 3 1430603027.018048 5.006790161e-06 4 1430603027.018052 4.053115845e-06 5 1430603027.018055 2.861022949e-06 6 1430603027.018058 3.099441528e-06 mlr --ofmt '%.9le' --oxtab \ put '$t=systime()' then \ step -a delta -f t then \ filter '$i>1' then \ stats1 -a min,mean,max -f t_delta \ lines.txt t_delta_min 2.861022949e-06 t_delta_mean 4.077508505e-06 t_delta_max 5.388259888e-05 Using out-of-stream variablesOne of Miller’s strengths is its compact notation: for example, given input of the form$ head -n 5 ../data/medium a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729 $ mlr --oxtab stats1 -a sum -f x ../data/medium x_sum 4986.019682 $ mlr --opprint stats1 -a sum -f x -g b ../data/medium b x_sum pan 965.763670 wye 1023.548470 zee 979.742016 eks 1016.772857 hat 1000.192668 $ mlr --oxtab put -q ' @x_sum += $x; end { emit @x_sum } ' data/medium x_sum 4986.019682 $ mlr --opprint put -q ' @x_sum[$b] += $x; end { emit @x_sum, "b" } ' data/medium b x_sum pan 965.763670 wye 1023.548470 zee 979.742016 eks 1016.772857 hat 1000.192668 Mean without/with oosvars$ mlr --opprint stats1 -a mean -f x data/medium x_mean 0.498602 $ mlr --opprint put -q ' @x_sum += $x; @x_count += 1; end { @x_mean = @x_sum / @x_count; emit @x_mean } ' data/medium x_mean 0.498602 Keyed mean without/with oosvars$ mlr --opprint stats1 -a mean -f x -g a,b data/medium a b x_mean pan pan 0.513314 eks pan 0.485076 wye wye 0.491501 eks wye 0.483895 wye pan 0.499612 zee pan 0.519830 eks zee 0.495463 zee wye 0.514267 hat wye 0.493813 pan wye 0.502362 zee eks 0.488393 hat zee 0.509999 hat eks 0.485879 wye hat 0.497730 pan eks 0.503672 eks eks 0.522799 hat hat 0.479931 hat pan 0.464336 zee zee 0.512756 pan hat 0.492141 pan zee 0.496604 zee hat 0.467726 wye zee 0.505907 eks hat 0.500679 wye eks 0.530604 $ mlr --opprint put -q ' @x_sum[$a][$b] += $x; @x_count[$a][$b] += 1; end{ for ((a, b), v in @x_sum) { @x_mean[a][b] = @x_sum[a][b] / @x_count[a][b]; } emit @x_mean, "a", "b" } ' data/medium a b x_mean pan pan 0.513314 pan wye 0.502362 pan eks 0.503672 pan hat 0.492141 pan zee 0.496604 eks pan 0.485076 eks wye 0.483895 eks zee 0.495463 eks eks 0.522799 eks hat 0.500679 wye wye 0.491501 wye pan 0.499612 wye hat 0.497730 wye zee 0.505907 wye eks 0.530604 zee pan 0.519830 zee wye 0.514267 zee eks 0.488393 zee zee 0.512756 zee hat 0.467726 hat wye 0.493813 hat zee 0.509999 hat eks 0.485879 hat hat 0.479931 hat pan 0.464336 Variance and standard deviation without/with oosvars$ mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium x_count 10000 x_sum 4986.019682 x_mean 0.498602 x_var 0.084270 x_stddev 0.290293 $ cat variance.mlr @n += 1; @sumx += $x; @sumx2 += $x**2; end { @mean = @sumx / @n; @var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1); @stddev = sqrt(@var); emitf @n, @sumx, @sumx2, @mean, @var, @stddev } $ mlr --oxtab put -q -f variance.mlr data/medium n 10000 sumx 4986.019682 sumx2 3328.652400 mean 0.498602 var 0.084270 stddev 0.290293 Min/max without/with oosvars$ mlr --oxtab stats1 -a min,max -f x data/medium x_min 0.000045 x_max 0.999953 $ mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium x_min 0.000045 x_max 0.999953 Keyed min/max without/with oosvars$ mlr --opprint stats1 -a min,max -f x -g a data/medium a x_min x_max pan 0.000204 0.999403 eks 0.000692 0.998811 wye 0.000187 0.999823 zee 0.000549 0.999490 hat 0.000045 0.999953 $ mlr --opprint --from data/medium put -q ' @min[$a] = min(@min[$a], $x); @max[$a] = max(@max[$a], $x); end{ emit (@min, @max), "a"; } ' a min max pan 0.000204 0.999403 eks 0.000692 0.998811 wye 0.000187 0.999823 zee 0.000549 0.999490 hat 0.000045 0.999953 Delta without/with oosvars$ mlr --opprint step -a delta -f x data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0.411890 wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077 eks wye 4 0.38139939387114097 0.13418874328430463 0.176796 wye pan 5 0.5732889198020006 0.8636244699032729 0.191890 $ mlr --opprint put '$x_delta = ispresent(@last) ? $x - @last : 0; @last = $x' data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0.411890 wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077 eks wye 4 0.38139939387114097 0.13418874328430463 0.176796 wye pan 5 0.5732889198020006 0.8636244699032729 0.191890 Keyed delta without/with oosvars$ mlr --opprint step -a delta -f x -g a data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0 wye wye 3 0.20460330576630303 0.33831852551664776 0 eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281 wye pan 5 0.5732889198020006 0.8636244699032729 0.368686 $ mlr --opprint put '$x_delta = ispresent(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0 wye wye 3 0.20460330576630303 0.33831852551664776 0 eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281 wye pan 5 0.5732889198020006 0.8636244699032729 0.368686 Exponentially weighted moving averages without/with oosvars$ mlr --opprint step -a ewma -d 0.1 -f x data/small a b i x y x_ewma_0.1 pan pan 1 0.3467901443380824 0.7268028627434533 0.346790 eks pan 2 0.7586799647899636 0.5221511083334797 0.387979 wye wye 3 0.20460330576630303 0.33831852551664776 0.369642 eks wye 4 0.38139939387114097 0.13418874328430463 0.370817 wye pan 5 0.5732889198020006 0.8636244699032729 0.391064 $ mlr --opprint put ' begin{ @a=0.1 }; $e = NR==1 ? $x : @a * $x + (1 - @a) * @e; @e=$e ' data/small a b i x y e pan pan 1 0.3467901443380824 0.7268028627434533 0.346790 eks pan 2 0.7586799647899636 0.5221511083334797 0.387979 wye wye 3 0.20460330576630303 0.33831852551664776 0.369642 eks wye 4 0.38139939387114097 0.13418874328430463 0.370817 wye pan 5 0.5732889198020006 0.8636244699032729 0.391064 |