Python 2: Using regex to pull out whole lines from text file with substring from another -
i have noob question. using python 2.7.6 on linux system.
what trying achieve use specific numbers in list, correspond last number in database
text file, pull out whole line in database
text file , print (going write line text file later).
code trying use:
reg = re.compile(r'(\d+)$') line in "text file database": if list_line in reg.findall(line): print line
what have found can input string like
list_line = "9"
and output whole line of corresponding database entry fine. trying use list_line
input strings 1 one in loop doesn't work.
can please me out or direct me relevant source?
appendix:
the text file database
text file contains data similar these:
gnl acep_1.0 acep10001-pa 1 gnl acep_1.0 acep10002-pa 2 gnl acep_1.0 acep10003-pa 3 gnl acep_1.0 acep10004-pa 4 gnl acep_1.0 acep10005-pa 5 gnl acep_1.0 acep10006-pa 7 gnl acep_1.0 acep10007-pa 6 gnl acep_1.0 acep10008-pa 8 gnl acep_1.0 acep10009-pa 9 gnl acep_1.0 acep10010-pa 10
the search text file list_line
looks similar this:
2 5 4 6
updated original code:
#import extensions import linecache import re #set re.compiler parameters reg = re.compile(r'(\d+)$') #designate , open list file in_list = raw_input("list input: ") open_list = open(in_list, "r") #count lines in list file total_lines = sum(1 line in open_list) print total_lines #open out file in write mode outfile = raw_input("output: ") open_outfile = open(outfile, "w") #designate db string db = raw_input("db input: ") open_db = open(db, "r") read_db = open_db.read() split_db = read_db.splitlines() print split_db #set line_number value 0 line_number = 0 #count through line numbers , print line while line_number < total_lines: line_number = line_number + 1 print line_number list_line = linecache.getline(in_list, line_number) print list_line line in split_db: if list_line in reg.findall(line) : print line #close files open_list.close() open_outfile.close() open_db.close()
short version: for
loop going through "database" file once, looking corresponding text , stopping. if have multiple lines want pull out, in list_line
file, you'll end pulling out single line.
also, way you're looking line number isn't great idea. happens if you're looking line 5, second line happens have digit 5
somewhere in data? e.g., if second line looks like:
gnl acep_1.0 acep15202-pa 2
then searching "5" return line instead of 1 intended. instead, since know line number going last number on line, should take advantage of python's str.split()
function (which splits string on spaces, , returns last item of , fact can use -1
list index last item of list, so:
def get_one_line(line_number_string): open("database_file.txt", "r") datafile: # open file reading line in datafile: # how 1 line @ time in python items = line.rstrip().split() if items[-1] == line_number_string: return line
one thing haven't talked rstrip()
function. when iterate on file in python, each line as-is, newline characters still intact. when print later, you'll using print
-- print
prints newline character @ end of give it. unless use rstrip()
you'll end 2 newlines characters instead of one, resulting in blank line between every line of output.
the other thing you're not familiar there with
statement. without going detail, ensures database file closed when return line
statement executed. details of how with
works interesting reading knows lot python, python newbie won't want dive yet. remember when open file, try use with open("filename") some_variable:
, python right thing™.
okay. have get_one_line()
function, can use this:
with open("list_line.txt", "r") line_number_file: line in line_number_file: line_number_string = line.rstrip() # don't want newline character database_line = get_one_line(line_number_string) print database_line # or whatever need
note: if you're using python 3, replace print line
print(line)
: in python 3, print
statement became function.
there's more code (for example, opening database file every single time line kind of inefficient -- reading whole thing memory once , looking data afterwards better). enough started with, , if database file small, time you'd lose worrying efficiency far more time you'd lose doing simple-but-slower way.
so see if helps you, come , ask more questions if there's don't understand or isn't working.
Comments
Post a Comment