python - Count mismatch using graph.V().count() and graph.V().has().count() -
so, simpleaddition of verteces in dataframe , edges using own python api failing. set of steps follows:
step 1: cleaning db before adding anything. achieved using:
g.v().drop().iterate()
that followed iterating on dataframe rows add verteces:
ret = g.addv(label).property(kwargs['colname'], node_val).next()
i keep reference id of vertex add edges it. issue crops up. do:
g.v().count()
each time , returns 0 expected , want, error in first iteration itself:
gremlin_python.driver.driver_remote_connection.gremlinservererror: 500: adding property key [name] , value [indiana jones] violates uniqueness constraint [name]
so, do:
print(g.v().has('name', 'indiana jones').count().next() print(g.v().count().next))
the output are:
1 0
isnt strange because count of verteces 0 whereas specific property vertex returning 1. why that? skeleton code follows:
1: clean db 2: use df.iterrows() add vertex in csv file iteratively. 3: created set of vertex, add edges it.
coming 2nd point, addition of edges returns black list. code follows:
left_node = g.v(int(graph_node_left)) right_node = g.v(int(graph_node_right)) graph_ref = left_node.adde(label).to(right_node).id().tolist() print(_, ":,:", left_node, ":,:", right_node, ":,:", label, ":,:", graph_ref)
the output follows (only 2 output displayed in question due confidentiality issues):
14 :,: [['v', 42025072], ['adde', 'mentions'], ['to', [['v', 41422936]]], ['id']] :,: [['v', 41422936]] :,: mentions :,: [{'@type': 'janusgraph:relationidentifier', '@value': {'value': 'osaha-p0qr4-6mmt-onu54', '@class': 'java.util.hashmap'}}] 15 :,: [['v', 82559144], ['adde', 'mentions'], ['to', [['v', 516232]]], ['id']] :,: [['v', 516232]] :,: mentions :,: []
why did become blank suddenly? blank list?
trivial issues have started working on gremlin python recently.
also, if approach wrong, great if can suggest better way add vertece , edges csv file more optimally!
thanks
edit 1:
storage backend: cassandra
storage host: localhost
search backend: elastic search
dataset: twitter data generated https://github.com/ibm/janusgraph-utils/blob/master/readme.md
any more information required let me know.
thanks
edit 1:
after incorporating suggestion jason, setup cache false.
i'm using users.csv dataset generated contains unique 100 elements/users. when add db, should reflect 100 right?
so, i'm achieving addition db using:
keyname = kwargs['colname'] // in our case 'name' _, row in df.iterrows(): // df stores names of users needs added graph keyvalue = row[keyname] ret = self.g.addv(label).property(keyname+"_{}".format(k), keyvalue).next() // label in out case 'user' time.sleep(0.1) // done avoid issue wherein data not committed server
before running above snipped clean database each time logically there no question of residual vertex being present.
that achieved using:
gremlin>> graph = janusgraphfactory.open("conf/gremlin-server/janusgraph-cassandra-es.properties") gremlin>> g = graph.traversal() gremlin>> g.v().drop().iterate() gremlin>> graph.tx().commit()
once data store cleaned, , users data pushed using 2 codes snippet above, goto gremlin shell , following:
gremlin>> graph = janusgraphfactory.open("conf/gremlin-server/janusgraph-cassandra-es.properties") gremlin>> g = graph.traversal() gremlin>> g.v().haslabel('user').count()
logically should return me 100 number of unique elements in dataframe. expected output
100
actual output
24
but number varrying. in 1st run 23, 24.
why so? somehow there doesnt seem consistency.
Comments
Post a Comment