Perl 스크립트를 적절하게 난독 화하는 방법은 무엇입니까?
다음 Perl 코드 ( source ) 를 난독 화하려고합니다 .
#!/usr/bin/perl
(my$d=q[AA GTCAGTTCCT
CGCTATGTA ACACACACCA
TTTGTGAGT ATGTAACATA
CTCGCTGGC TATGTCAGAC
AGATTGATC GATCGATAGA
ATGATAGATC GAACGAGTGA
TAGATAGAGT GATAGATAGA
GAGAGA GATAGAACGA
TC GATAGAGAGA
TAGATAGACA G
ATCGAGAGAC AGATA
GAACGACAGA TAGATAGAT
TGAGTGATAG ACTGAGAGAT
AGATAGATTG ATAGATAGAT
AGATAGATAG ACTGATAGAT
AGAGTGATAG ATAGAATGAG
AGATAGACAG ACAGACAGAT
AGATAGACAG AGAGACAGAT
TGATAGATAG ATAGATAGAT
TGATAGATAG AATGATAGAT
AGATTGAGTG ACAGATCGAT
AGAACCTTTCT CAGTAACAGT
CTTTCTCGC TGGCTTGCTT
TCTAA CAACCTTACT
G ACTGCCTTTC
TGAGATAGAT CGA
TAGATAGATA GACAGAC
AGATAGATAG ATAGAATGAC
AGACAGAGAG ACAGAATGAT
CGAGAGACAG ATAGATAGAT
AGAATGATAG ACAGATAGAC
AGATAGATAG ACAGACAGAT
AGACAGACTG ATAGATAGAT
AGATAGATAG AATGACAGAT
CGATTGAATG ACAGATAGAT
CGACAGATAG ATAGACAGAT
AGAGTGATAG ATTGATCGAC
TGATTGATAG ACTGATTGAT
AGACAGATAG AGTGACAGAT
CGACAGA TAGATAGATA
GATA GATAGATAG
ATAGACAGA G
AGATAGATAG ACA
GTCGCAAGTTC GCTCACA
])=~s/\s+//g;%a=map{chr $_=>$i++}65,84,67,
71;$p=join$;,keys%a;while($d=~/([$p]{4})/g
){next if$j++%96>=16;$c=0;for$d(0..3){$c+=
$a{substr($1,$d,1)}*(4**$d)}$perl.=chr $c}
eval $perl;
실행하면 인쇄됩니다. Just another genome hacker.
코드 트로프 Deparse
및 perltidy
( perl -MO=Deparse jagh.pl | perltidy
)를 실행 한 후 코드는 다음과 같습니다.
( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;
(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );
$p = join( $;, keys %a );
while ( $d =~ /([$p]{4})/g ) {
next if $j++ % 96 >= 16;
$c = 0;
foreach $d ( 0 .. 3 ) {
$c += $a{ substr $1, $d, 1 } * 4**$d;
}
$perl .= chr $c;
}
여기에 제가 스스로 해독 할 수 있었던 것이 있습니다.
( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;
$d
(이중 나선)의 모든 공백을 제거합니다 .
(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );
같은 키를 사용하여 해시를 만드는 A
, T
, C
과 G
및 값으로 0
, 1
, 2
와 3
. 저는 일반적으로 Python으로 코딩하므로 Python의 사전 {'A': 0, 'B': 1, 'C': 2, 'D': 3}
으로 변환됩니다 .
$p = join( $;, keys %a );
joins the keys of the hash with $;
the subscript separator for multidimensional array emulation. The documentation says that the default is "\034", the same as SUBSEP in awk, but when I do:
my @ascii = unpack("C*", $p);
print @ascii[1];
I get the value 28
? Also, it is not clear to me how this emulates a multidimensional array. Is $p
now something like [['A'], ['T'], ['C'], ['G']]
in Python?
while ( $d =~ /([$p]{4})/g ) {
As long as $d
matches ([$p]{4})
, execute the code in the while block. but since I don't completely understand what structure $p
is, i also have a hard time understanding what happens here.
next if $j++ % 96 >= 16;
Continue if the the $j
modulo 96 is greater or equal to 16. $j
increments with each pass of the while loop (?).
$c = 0;
foreach $d ( 0 .. 3 ) {
$c += $a{ substr $1, $d, 1 } * 4**$d;
}
For $d
in the range from 0
to 3
extract some substring, but at this point I'm completely lost. The last few lines concatenate everything and evaluates the result.
Caution: don't blindly run obfuscated perl, especially if there's an eval
, backticks, system
, open
, etc. call somewhere in it and that might not be all too obvious*. De-obfuscating it with Deparse
and carefully replacing the eval
s with print statements is a must until you understand what's going on. Running in a sandbox/with an unprivileged user/in a VM should be considered too.
*s&&$_ⅇ
evaluates $_
for intance.
First observation: 034
is octal. It's equal to 28 (dec) or 0x1c (hex), so nothing fishy there.
The $;
thing is purely obfuscation, can't find a reason to use that in particular. $p
will just be a string A.T.C.G
(with .
replaced by $;
, whatever it is).
So in the regex [$p]
matches any of {'A', 'T', 'C', 'G', $;}
. Since $;
never appears in $d
, it's useless there. In turn [$p]{4}
matches any sequence of four letters in the above set, as if this had been used (ignoring the useless $;
):
while ( $d =~ /([ATCG]{4})/g ) { ... }
If you had to write this yourself, after having removed whitespace, you'd just grab each successive substring of $d
of length four (assuming there are no other chars in $d
).
Now this part is fun:
foreach $d ( 0 .. 3 ) {
$c += $a{ substr $1, $d, 1 } * 4**$d;
}
$1
holds the current four-letter codepoint.substr $1, $d, 1
returns each successive letter from that codepoint.%a
mapsA
to 00b (binary),T
to 01b,C
to 10b, andG
to 11b.A 00 T 01 C 10 G 11
multiplying by
4**$d
will be equivalent to a bitwise left shift of 0, 2, 4 and 6.
So this funny construct allows you to build any 8bit value in the base-four system with ATCG
as digits!
i.e. it does the following conversions:
A A A A
AAAA -> 00000000
T A A T
TAAT -> 01000001 -> capital A in ascii
T A A C
CAAT -> 01000010 -> capital B in ascii
CAATTCCTGGCTGTATTTCTTTCTGCCT -> BioGeek
This part:
next if $j++ % 96 >= 16;
makes the above conversion run only for the first 16 "codepoints", skips the next 80, then converts for the next 16, skips the next 80, etc. It essentially just skips parts of the ellipse (junk DNA removal system).
Here's an ugly text to DNA converter that you could use to produce anything to replace the helix (doesn't handle the 80 skip thing):
use strict;
use warnings;
my $in = shift;
my %conv = ( 0 => 'A', 1 => 'T', 2 => 'C', 3 => 'G');
for (my $i=0; $i<length($in); $i++) {
my $chr = substr($in, $i, 1);
my $chv = ord($chr);
my $encoded ="";
$encoded .= $conv{($chv >> 0) & 0x3};
$encoded .= $conv{($chv >> 2) & 0x3};
$encoded .= $conv{($chv >> 4) & 0x3};
$encoded .= $conv{($chv >> 6) & 0x3};
print $encoded;
}
print "\n";
$ perl q.pl 'print "BioGeek\n";'
AAGTCAGTTCCTCGCTATGTAACACACACAATTCCTGGCTGTATTTCTTTCTGCCTAGTTCGCTCACAGCGA
Stick in $d
that instead of the helix (and remove the skipping part in the decoder).
ReferenceURL : https://stackoverflow.com/questions/9342164/how-to-properly-deobfusacte-a-perl-script
'IT TIP' 카테고리의 다른 글
Java Casting : Java 11에서는 LambdaConversionException이 발생하지만 1.8에서는 발생하지 않습니다. (0) | 2021.01.08 |
---|---|
Can I fade in a background image (CSS: background-image) with jQuery? (0) | 2021.01.08 |
특정 파일 / 커밋에 대한 Pull-Request (0) | 2021.01.08 |
C ++ 생성자 / 소멸자 상속 (0) | 2021.01.08 |
svn 저장소에서 업데이트하면“Could not read chunk size”오류가 반환됩니다. (0) | 2021.01.08 |