list - Duplicate keys print in python? -


i made 'dictionary of lists' object , key in string format. accessed 10 documents , used each unique term(word) key , saved in list. i.e.word_tokens["abc"] = ["1:4","5:2","8:5"] means word = "abc" occur 4 times in document 1, 2 times in document 5 , 5 times in document 8.

my code:

import nltk nltk.tokenize import word_tokenize   stop_words_file = open("englishst.txt",'r')  stop_words = [] st in stop_words_file:     st = st.strip()     stop_words.append(st)  stop_words_file.close()   filename = "docs-1/doc-" word_tokens = {}          //dictionary object cnt = 0 in range(1,10):     file_name = filename + str(i)      file = open(file_name,'r')      sentence in file:         word = []         word = word_tokenize(sentence)         w in word:             w = w.lower()             if w not in stop_words:                  if word_tokens.get(w) == none:                     dummy = []                     dummy.append(str(i)+":1")                     word_tokens[w] = dummy                 else:                     dummy = []                     dummy = word_tokens[w]                     tempstr = dummy[-1]                     temp = tempstr.split(':')                     if temp[0] == str(i):                         temp[1] = str(int(temp[1])+1)                         dummy[-1] = temp[0]+':'+temp[1]                         word_tokens[w] = dummy                     else:                           dummy = word_tokens[w]                         dummy.append(str(i)+":1")                         word_tokens[w] = dummy                  cnt = cnt+1      file.close()     if len(word_tokens) != 0:         print(dict_count)         fname = dictfilename + str(dict_count)         f = open(fname, "w+")         f.write(str(word_tokens))         f.close()      j = 1     key,val in word_tokens.items():         print(j,key,val)         j = j + 1      print(word_tokens) 

while directly printing dictionary no multiple keys same values there when iterating on dictionary using loop multiple keys(i.e same key occuring more once) , have remove duplicate keys , append values duplicate keys single key.

while writing print(word_tokens)

{'neurobeachin': ['1:1'], '(': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'nbea': ['1:6'], ')': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'regulates': ['1:1'], 'neuronal': ['1:1'], 'membrane': ['1:1'], 'protein': ['1:1', '8:2'], 'trafficking': ['1:1'], 'required': ['1:1'], 'development': ['1:1', '2:1', '6:1', '7:1', '9:2'],...... } 

while writing for key,val in word_tokens.items():

1 neurobeachin ['1:1'] 2 ( ['1:5'] 3 nbea ['1:6'] 4 ) ['1:5'] ..... 102 obesity ['1:1'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7'] ...... 220 investigation ['2:1'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3'] ...... 296 products ['3:1'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19'] ............... 

i want iterate each (key, value) pair gives me answer above, can suggest right approach?

i unfamiliar nltk library. however, reason think seeing "duplicates" because for key,val in word_tokens.items() nested under for in range(1,10).

have tried moving for key,val in word_tokens.items() inside outside?

the below code blocks large done demonstrate why think encountering problem. beyond fixing of nested loop, should strive use with open() rather file.open() context management.

i took dictionary word_tokens , performed code (without parsing tokens of course) , achieved result looking for:

>>> word_tokens = {'neurobeachin': ['1:1'], '(': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'nbea': ['1:6'], ')': ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'], 'regulates': ['1:1'], 'neuronal': ['1:1'], 'membrane': ['1:1'], 'protein': ['1:1', '8:2'], 'trafficking': ['1:1'], 'required': ['1:1'], 'development': ['1:1', '2:1', '6:1', '7:1', '9:2']} >>> j = 1 >>> key, value in word_tokens.items():         print (j, key, value)         j = j + 1   1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] >>>  

now test hypothesis (somewhat...since dictionary technically growing while looping within nested loop):

>>> _ in range(1, 10):         j = 1         key, value in word_tokens.items():             print (j, key, value)             j = j + 1   1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] 1 neurobeachin ['1:1'] 2 ( ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 3 nbea ['1:6'] 4 ) ['1:5', '2:7', '3:3', '4:19', '5:5', '7:1', '8:2', '9:1'] 5 regulates ['1:1'] 6 neuronal ['1:1'] 7 membrane ['1:1'] 8 protein ['1:1', '8:2'] 9 trafficking ['1:1'] 10 required ['1:1'] 11 development ['1:1', '2:1', '6:1', '7:1', '9:2'] >>>  

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -