python - Count mismatch using graph.V().count() and graph.V().has().count() -


so, simpleaddition of verteces in dataframe , edges using own python api failing. set of steps follows:

step 1: cleaning db before adding anything. achieved using:

g.v().drop().iterate() 

that followed iterating on dataframe rows add verteces:

ret = g.addv(label).property(kwargs['colname'], node_val).next() 

i keep reference id of vertex add edges it. issue crops up. do:

g.v().count() 

each time , returns 0 expected , want, error in first iteration itself:

gremlin_python.driver.driver_remote_connection.gremlinservererror: 500: adding property key [name] , value [indiana jones] violates uniqueness constraint [name]

so, do:

print(g.v().has('name', 'indiana jones').count().next() print(g.v().count().next)) 

the output are:

1 0 

isnt strange because count of verteces 0 whereas specific property vertex returning 1. why that? skeleton code follows:

1: clean db 2: use df.iterrows() add vertex in csv file iteratively. 3: created set of vertex, add edges it. 

coming 2nd point, addition of edges returns black list. code follows:

left_node = g.v(int(graph_node_left)) right_node = g.v(int(graph_node_right)) graph_ref = left_node.adde(label).to(right_node).id().tolist() print(_, ":,:", left_node, ":,:", right_node, ":,:", label, ":,:", graph_ref) 

the output follows (only 2 output displayed in question due confidentiality issues):

14 :,: [['v', 42025072], ['adde', 'mentions'], ['to', [['v', 41422936]]], ['id']] :,: [['v', 41422936]] :,: mentions :,: [{'@type': 'janusgraph:relationidentifier', '@value': {'value': 'osaha-p0qr4-6mmt-onu54', '@class': 'java.util.hashmap'}}] 15 :,: [['v', 82559144], ['adde', 'mentions'], ['to', [['v', 516232]]], ['id']] :,: [['v', 516232]] :,: mentions :,: [] 

why did become blank suddenly? blank list?

trivial issues have started working on gremlin python recently.

also, if approach wrong, great if can suggest better way add vertece , edges csv file more optimally!

thanks

edit 1:

storage backend: cassandra

storage host: localhost

search backend: elastic search

dataset: twitter data generated https://github.com/ibm/janusgraph-utils/blob/master/readme.md

any more information required let me know.

thanks

edit 1:

after incorporating suggestion jason, setup cache false.

i'm using users.csv dataset generated contains unique 100 elements/users. when add db, should reflect 100 right?

so, i'm achieving addition db using:

keyname = kwargs['colname']    // in our case 'name' _, row in df.iterrows():   // df stores names of users needs added graph     keyvalue = row[keyname]     ret = self.g.addv(label).property(keyname+"_{}".format(k), keyvalue).next()    // label in out case 'user'     time.sleep(0.1)    // done avoid issue wherein data not committed server 

before running above snipped clean database each time logically there no question of residual vertex being present.

that achieved using:

gremlin>> graph = janusgraphfactory.open("conf/gremlin-server/janusgraph-cassandra-es.properties") gremlin>> g = graph.traversal() gremlin>> g.v().drop().iterate() gremlin>> graph.tx().commit() 

once data store cleaned, , users data pushed using 2 codes snippet above, goto gremlin shell , following:

gremlin>> graph = janusgraphfactory.open("conf/gremlin-server/janusgraph-cassandra-es.properties") gremlin>> g = graph.traversal() gremlin>> g.v().haslabel('user').count() 

logically should return me 100 number of unique elements in dataframe. expected output

100 

actual output

24 

but number varrying. in 1st run 23, 24.

why so? somehow there doesnt seem consistency.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -