list - Duplicate keys print in python? -
i made 'dictionary of lists' object , key in string format. accessed 10 documents , used each unique term(word) key , saved in list. i.e.word_tokens["abc"] = ["1:4","5:2","8:5"]
means word = "abc" occur 4 times in document 1, 2 times in document 5 , 5 times in document 8.
my code:
import nltk nltk.tokenize import word_tokenize stop_words_file = open("englishst.txt",'r') stop_words = [] st in stop_words_file: st = st.strip() stop_words.append(st) stop_words_file.close() filename = "docs-1/doc-" word_tokens = {} //dictionary object cnt = 0 in range(1,10): file_name = filename + str(i) file = open(file_name,'r') sentence in file: word = [] word = word_tokenize(sentence) w in word: w = w.lower() if w not in stop_words: if word_tokens.get(w) == none: dummy = [] dummy.append(str(i)+":1") word_tokens[w] = dummy else: dummy = [] dummy = word_tokens[w] tempstr = dummy[-1] temp = tempstr.split(':') if temp[0] == str(i): temp[1] = str(int(temp[1])+1) dummy[-1] = temp[0]+':'+temp[1] word_tokens[w] = dummy else: dummy = word_tokens[w] dummy.append(str(i)+":1") word_tokens[w] = dummy cnt = cnt+1 file.close() if len(word_tokens) != 0: print(dict_count) fname = dictfilename + str(dict_count) f = open(fname, "w+") f.write(str(word_tokens)) f.close() j = 1 key,val in word_tokens.items(): print(j,key,val) j = j + 1 print(word_tokens)
while directly printing dictionary no multiple keys same values there when iterating on dictionary using loop multiple keys(i.e same key occuring more once) , have remove duplicate keys , append values duplicate keys single key.
while writing print(word_tokens)
{'neurobeachin': ['1:1'], '(': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'nbea': ['1:6'], ')': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'regulates': ['1:1'], 'neuronal': ['1:1'], 'membrane': ['1:1'], 'protein': ['1:1', '8:2'], 'trafficking': ['1:1'], 'required': ['1:1'], 'development': ['1:1', '2:1', '6:1', '7:1', '9:2'],...... }
while writing for key,val in word_tokens.items():
1 neurobeachin ['1:1'] 2 ( ['1:5'] 3 nbea ['1:6'] 4 ) ['1:5'] ..... 102 obesity ['1:1'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7'] ...... 220 investigation ['2:1'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3'] ...... 296 products ['3:1'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19'] ...............
i want iterate each (key, value) pair gives me answer above, can suggest right approach?
i unfamiliar nltk
library. however, reason think seeing "duplicates" because for key,val in word_tokens.items()
nested under for in range(1,10)
.
have tried moving for key,val in word_tokens.items()
inside outside?
the below code blocks large done demonstrate why think encountering problem. beyond fixing of nested loop, should strive use with open()
rather file.open()
context management.
i took dictionary word_tokens
, performed code (without parsing tokens of course) , achieved result looking for:
>>> word_tokens = {'neurobeachin': ['1:1'], '(': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'nbea': ['1:6'], ')': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'regulates': ['1:1'], 'neuronal': ['1:1'], 'membrane': ['1:1'], 'protein': ['1:1', '8:2'], 'trafficking': ['1:1'], 'required': ['1:1'], 'development': ['1:1', '2:1', '6:1', '7:1', '9:2']} >>> j = 1 >>> key, value in word_tokens.items(): print (j, key, value) j = j + 1 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] >>>
now test hypothesis (somewhat...since dictionary technically growing while looping within nested loop):
>>> _ in range(1, 10): j = 1 key, value in word_tokens.items(): print (j, key, value) j = j + 1 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] >>>
Comments
Post a Comment