django - 2 RabbitMQ workers and 2 Scrapyd daemons running on 2 local Ubuntu instances, in which one of the rabbitmq worker is not working -


i working on building "scrapy spiders control panel" in testing existing solution available on [distributed multi-user scrapy spiders control panel] https://github.com/aaldaber/distributed-multi-user-scrapy-system-with-a-web-ui.

i trying run on local ubuntu dev machine having issues scrapd daemon. 1 of workers, linkgenerator working scraper worker1 not working. can not figure out why scrapyd won't run on local instance.

background information configuration.

the application comes bundled django, scrapy, pipeline mongodb (for saving scraped items) , scrapy scheduler rabbitmq (for distributing links among workers). have 2 local ubuntu instances in django, mongodb, scrapyd daemon , rabbitmq server running on instance1. on scrapyd daemon running on instance2. rabbitmq workers:

  • linkgenerator
  • worker1

ip configurations instances:

  • ip local ubuntu instance1: 192.168.0.101
  • ip local ubuntu instance2: 192.168.0.106

list of tools used:

  • mongodb server
  • rabbitmq server
  • scrapy scrapyd api
  • one rabbitmq linkgenerator worker (workername: linkgenerator) server scrapy installed , running scrapyd daemon on local ubuntu instance1: 192.168.0.101
  • another 1 rabbitmq scraper worker (workername: worker1) server scrapy installed , running scrapyd daemon on local ubuntu instance2: 192.168.0.106

instance1: 192.168.0.101

"instance1" on django, rabbitmq, scrapyd daemon servers running -- ip : 192.168.0.101

instance2: 192.168.0.106

scrapy installed on instance2 , running scrapyd daemon

scrapy control panel ui snapshot:

from snapshot, control panel outlook can been seen, there 2 workers, linkgenerator worked worker1 did not, logs given in end of post

rabbitmq status info

linkgenerator worker can push message rabbitmq queue, linkgenerator spider generates start_urls "scraper spider* consumed scraper (worker1), not working, please see logs worker1 in end of post

rabbitmq settings

the below file contains settings mongodb , rabbitmq:

scheduler = ".rabbitmq.scheduler.scheduler" scheduler_persist = true rabbitmq_host = 'scrapydevu79' rabbitmq_port = 5672 rabbitmq_username = 'guest' rabbitmq_password = 'guest'  mongodb_public_address = 'onescience:27017'  # shown on web interface, won't used connecting db mongodb_uri = 'localhost:27017'  # actual uri connect db mongodb_user = 'tariq' mongodb_password = 'toor' mongodb_sharded = true mongodb_buffer_data = 100  # set link generator worker address here link_generator = 'http://192.168.0.101:6800' scrapers = ['http://192.168.0.106:6800'] linux_user_creation_enabled = false  # set true if want linux user account 
linkgenerator scrapy.cfg settings:
[settings] default = tester2_fda_trial20.settings [deploy:linkgenerator] url = http://192.168.0.101:6800 project = tester2_fda_trial20 
scraper scrapy.cfg settings:
[settings] default = tester2_fda_trial20.settings  [deploy:worker1] url = http://192.168.0.101:6800 project = tester2_fda_trial20 
scrapyd.conf file settings instance1 (192.168.0.101)

cat /etc/scrapyd/scrapyd.conf

[scrapyd] eggs_dir   = /var/lib/scrapyd/eggs dbs_dir    = /var/lib/scrapyd/dbs items_dir  = /var/lib/scrapyd/items logs_dir   = /var/log/scrapyd  max_proc    = 0 max_proc_per_cpu = 4 finished_to_keep = 100 poll_interval = 5.0 bind_address = 0.0.0.0 #bind_address = 127.0.0.1 http_port   = 6800 debug       = on runner      = scrapyd.runner application = scrapyd.app.application launcher    = scrapyd.launcher.launcher webroot     = scrapyd.website.root  [services] schedule.json     = scrapyd.webservice.schedule cancel.json       = scrapyd.webservice.cancel addversion.json   = scrapyd.webservice.addversion listprojects.json = scrapyd.webservice.listprojects listversions.json = scrapyd.webservice.listversions listspiders.json  = scrapyd.webservice.listspiders delproject.json   = scrapyd.webservice.deleteproject delversion.json   = scrapyd.webservice.deleteversion listjobs.json     = scrapyd.webservice.listjobs daemonstatus.json = scrapyd.webservice.daemonstatus 
scrapyd.conf file settings instance2 (192.168.0.106)

cat /etc/scrapyd/scrapyd.conf

[scrapyd] eggs_dir   = /var/lib/scrapyd/eggs dbs_dir    = /var/lib/scrapyd/dbs items_dir  = /var/lib/scrapyd/items logs_dir   = /var/log/scrapyd  max_proc    = 0 max_proc_per_cpu = 4 finished_to_keep = 100 poll_interval = 5.0 bind_address = 0.0.0.0 #bind_address = 127.0.0.1 http_port   = 6800 debug       = on runner      = scrapyd.runner application = scrapyd.app.application launcher    = scrapyd.launcher.launcher webroot     = scrapyd.website.root  [services] schedule.json     = scrapyd.webservice.schedule cancel.json       = scrapyd.webservice.cancel addversion.json   = scrapyd.webservice.addversion listprojects.json = scrapyd.webservice.listprojects listversions.json = scrapyd.webservice.listversions listspiders.json  = scrapyd.webservice.listspiders delproject.json   = scrapyd.webservice.deleteproject delversion.json   = scrapyd.webservice.deleteversion listjobs.json     = scrapyd.webservice.listjobs daemonstatus.json = scrapyd.webservice.daemonstatus 
rabbitmq status

sudo service rabbitmq-server status

[sudo] password mtaziz: status of node rabbit@scrapydevu79 [{pid,53715}, {running_applications,    [{rabbitmq_shovel_management,         "management extension shovel plugin","3.6.11"},     {rabbitmq_shovel,"data shovel rabbitmq","3.6.11"},     {rabbitmq_management,"rabbitmq management console","3.6.11"},     {rabbitmq_web_dispatch,"rabbitmq web dispatcher","3.6.11"},     {rabbitmq_management_agent,"rabbitmq management agent","3.6.11"},     {rabbit,"rabbitmq","3.6.11"},     {os_mon,"cpo  cxc 138 46","2.2.14"},     {cowboy,"small, fast, modular http server.","1.0.4"},     {ranch,"socket acceptor pool tcp protocols.","1.3.0"},     {ssl,"erlang/otp ssl application","5.3.2"},     {public_key,"public key infrastructure","0.21"},     {cowlib,"support library manipulating web protocols.","1.0.2"},     {crypto,"crypto version 2","3.2"},     {amqp_client,"rabbitmq amqp client","3.6.11"},     {rabbit_common,         "modules shared rabbitmq-server , rabbitmq-erlang-client",         "3.6.11"},     {inets,"inets  cxc 138 49","5.9.7"},     {mnesia,"mnesia  cxc 138 12","4.11"},     {compiler,"erts  cxc 138 10","4.9.4"},     {xmerl,"xml parser","1.3.5"},     {syntax_tools,"syntax tools","1.6.12"},     {asn1,"the erlang asn1 compiler version 2.0.4","2.0.4"},     {sasl,"sasl  cxc 138 11","2.3.4"},     {stdlib,"erts  cxc 138 10","1.19.4"},     {kernel,"erts  cxc 138 10","2.16.4"}]}, {os,{unix,linux}}, {erlang_version,    "erlang r16b03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true]\n"}, {memory,    [{connection_readers,0},     {connection_writers,0},     {connection_channels,0},     {connection_other,6856},     {queue_procs,145160},     {queue_slave_procs,0},     {plugins,1959248},     {other_proc,22328920},     {metrics,160112},     {mgmt_db,655320},     {mnesia,83952},     {other_ets,2355800},     {binary,96920},     {msg_index,47352},     {code,27101161},     {atom,992409},     {other_system,31074022},     {total,87007232}]}, {alarms,[]}, {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{http,15672,"::"}]}, {vm_memory_calculation_strategy,rss}, {vm_memory_high_watermark,0.4}, {vm_memory_limit,3343646720}, {disk_free_limit,50000000}, {disk_free,56257699840}, {file_descriptors,    [{total_limit,924},{total_used,2},{sockets_limit,829},{sockets_used,0}]}, {processes,[{limit,1048576},{used,351}]}, {run_queue,0}, {uptime,34537}, {kernel,{net_ticktime,60}}] 
scrapyd daemon on instance1 ( 192.168.0.101 ) running status:

scrapyd

2017-09-11t06:16:07+0600 [-] loading /home/mtaziz/.virtualenvs/onescience_dist_env/local/lib/python2.7/site-packages/scrapyd/txapp.py... 2017-09-11t06:16:07+0600 [-] scrapyd web console available @ http://0.0.0.0:6800/ 2017-09-11t06:16:07+0600 [-] loaded. 2017-09-11t06:16:07+0600 [twisted.scripts._twistd_unix.unixapplogger#info] twistd 17.5.0 (/home/mtaziz/.virtualenvs/onescience_dist_env/bin/python 2.7.6) starting up. 2017-09-11t06:16:07+0600 [twisted.scripts._twistd_unix.unixapplogger#info] reactor class: twisted.internet.epollreactor.epollreactor. 2017-09-11t06:16:07+0600 [-] site starting on 6800 2017-09-11t06:16:07+0600 [twisted.web.server.site#info] starting factory <twisted.web.server.site instance @ 0x7f5e265c77a0> 2017-09-11t06:16:07+0600 [launcher] scrapyd 1.2.0 started: max_proc=16, runner='scrapyd.runner' 2017-09-11t06:16:07+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:16:07 +0000] "get /listprojects.json http/1.1" 200 98 "-" "python-requests/2.18.4" 2017-09-11t06:16:07+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:16:07 +0000] "get /listversions.json?project=tester2_fda_trial20 http/1.1" 200 80 "-" "python-requests/2.18.4" 2017-09-11t06:16:07+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:16:07 +0000] "get /listjobs.json?project=tester2_fda_trial20 http/1.1" 200 92 "-" "python-requests/2.18.4" 
scrapyd daemon on instance2 (192.168.0.106) running status:

scrapyd

2017-09-11t06:09:28+0600 [-] loading /home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapyd/txapp.py... 2017-09-11t06:09:28+0600 [-] scrapyd web console available @ http://0.0.0.0:6800/ 2017-09-11t06:09:28+0600 [-] loaded. 2017-09-11t06:09:28+0600 [twisted.scripts._twistd_unix.unixapplogger#info] twistd 17.5.0 (/home/mtaziz/.virtualenvs/scrapydevenv/bin/python 2.7.6) starting up. 2017-09-11t06:09:28+0600 [twisted.scripts._twistd_unix.unixapplogger#info] reactor class: twisted.internet.epollreactor.epollreactor. 2017-09-11t06:09:28+0600 [-] site starting on 6800 2017-09-11t06:09:28+0600 [twisted.web.server.site#info] starting factory <twisted.web.server.site instance @ 0x7fbe6eaeac20> 2017-09-11t06:09:28+0600 [launcher] scrapyd 1.2.0 started: max_proc=16, runner='scrapyd.runner' 2017-09-11t06:09:32+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:09:32 +0000] "get /listprojects.json http/1.1" 200 98 "-" "python-requests/2.18.4" 2017-09-11t06:09:32+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:09:32 +0000] "get /listversions.json?project=tester2_fda_trial20 http/1.1" 200 80 "-" "python-requests/2.18.4" 2017-09-11t06:09:32+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:09:32 +0000] "get /listjobs.json?project=tester2_fda_trial20 http/1.1" 200 92 "-" "python-requests/2.18.4" 2017-09-11t06:09:37+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:09:37 +0000] "get /listprojects.json http/1.1" 200 98 "-" "python-requests/2.18.4" 2017-09-11t06:09:37+0600 [twisted.python.log#info] "192.168.0.101" - - [11/sep/2017:00:09:37 +0000] "get /listversions.json?project=tester2_fda_trial20 http/1.1" 200 80 "-" "python-requests/2.18.4" 
worker1 logs

after updating code rabbitmq server settings followed suggestions made @tarun lalwani

the suggestion use rabbitmq server ip - 192.168.0.101:5672 instead of 127.0.0.1:5672. after updated suggested tarun lalwani got new problems below............

2017-09-11 15:49:18 [scrapy.utils.log] info: scrapy 1.4.0 started (bot: tester2_fda_trial20) 2017-09-11 15:49:18 [scrapy.utils.log] info: overridden settings: {'newspider_module': 'tester2_fda_trial20.spiders', 'robotstxt_obey': true, 'log_level': 'info', 'spider_modules': ['tester2_fda_trial20.spiders'], 'bot_name': 'tester2_fda_trial20', 'feed_uri': 'file:///var/lib/scrapyd/items/tester2_fda_trial20/tester2_fda_trial20/79b1123a96d611e79276000c29bad697.jl', 'scheduler': 'tester2_fda_trial20.rabbitmq.scheduler.scheduler', 'telnetconsole_enabled': false, 'log_file': '/var/log/scrapyd/tester2_fda_trial20/tester2_fda_trial20/79b1123a96d611e79276000c29bad697.log'} 2017-09-11 15:49:18 [scrapy.middleware] info: enabled extensions: ['scrapy.extensions.feedexport.feedexporter',  'scrapy.extensions.memusage.memoryusage',  'scrapy.extensions.logstats.logstats',  'scrapy.extensions.corestats.corestats'] 2017-09-11 15:49:18 [scrapy.middleware] info: enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.robotstxtmiddleware',  'scrapy.downloadermiddlewares.httpauth.httpauthmiddleware',  'scrapy.downloadermiddlewares.downloadtimeout.downloadtimeoutmiddleware',  'scrapy.downloadermiddlewares.defaultheaders.defaultheadersmiddleware',  'scrapy.downloadermiddlewares.useragent.useragentmiddleware',  'scrapy.downloadermiddlewares.retry.retrymiddleware',  'scrapy.downloadermiddlewares.redirect.metarefreshmiddleware',  'scrapy.downloadermiddlewares.httpcompression.httpcompressionmiddleware',  'scrapy.downloadermiddlewares.redirect.redirectmiddleware',  'scrapy.downloadermiddlewares.cookies.cookiesmiddleware',  'scrapy.downloadermiddlewares.httpproxy.httpproxymiddleware',  'scrapy.downloadermiddlewares.stats.downloaderstats'] 2017-09-11 15:49:18 [scrapy.middleware] info: enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.httperrormiddleware',  'scrapy.spidermiddlewares.offsite.offsitemiddleware',  'scrapy.spidermiddlewares.referer.referermiddleware',  'scrapy.spidermiddlewares.urllength.urllengthmiddleware',  'scrapy.spidermiddlewares.depth.depthmiddleware'] 2017-09-11 15:49:18 [scrapy.middleware] info: enabled item pipelines: ['tester2_fda_trial20.pipelines.fdatrial20pipeline',  'tester2_fda_trial20.mongodb.scrapy_mongodb.mongodbpipeline'] 2017-09-11 15:49:18 [scrapy.core.engine] info: spider opened 2017-09-11 15:49:18 [pika.adapters.base_connection] info: connecting 192.168.0.101:5672 2017-09-11 15:49:18 [pika.adapters.blocking_connection] info: created channel=1 2017-09-11 15:49:18 [scrapy.core.engine] info: closing spider (shutdown) 2017-09-11 15:49:18 [pika.adapters.blocking_connection] info: channel.close(0, normal shutdown) 2017-09-11 15:49:18 [pika.channel] info: channel.close(0, normal shutdown) 2017-09-11 15:49:18 [scrapy.utils.signal] error: error caught on signal handler: <bound method ?.close_spider of <scrapy.extensions.feedexport.feedexporter object @ 0x7f94878b8c50>> traceback (most recent call last):   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybedeferred     result = f(*args, **kw)   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustapply     return receiver(*arguments, **named)   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapy/extensions/feedexport.py", line 201, in close_spider     slot = self.slot attributeerror: 'feedexporter' object has no attribute 'slot' 2017-09-11 15:49:18 [scrapy.utils.signal] error: error caught on signal handler: <bound method ?.spider_closed of <tester2fda_trial20spider 'tester2_fda_trial20' @ 0x7f9484f897d0>> traceback (most recent call last):   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybedeferred     result = f(*args, **kw)   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustapply     return receiver(*arguments, **named)   file "/tmp/user/1000/tester2_fda_trial20-10-d4req9.egg/tester2_fda_trial20/spiders/tester2_fda_trial20.py", line 28, in spider_closed attributeerror: 'tester2fda_trial20spider' object has no attribute 'statstask' 2017-09-11 15:49:18 [scrapy.statscollectors] info: dumping scrapy stats: {'finish_reason': 'shutdown',  'finish_time': datetime.datetime(2017, 9, 11, 9, 49, 18, 159896),  'log_count/error': 2,  'log_count/info': 10} 2017-09-11 15:49:18 [scrapy.core.engine] info: spider closed (shutdown) 2017-09-11 15:49:18 [twisted] critical: unhandled error in deferred: 2017-09-11 15:49:18 [twisted] critical:  traceback (most recent call last):   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlinecallbacks     result = g.send(result)   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl     six.reraise(*exc_info)   file "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl     yield self.engine.open_spider(self.spider, start_requests) operationfailure: command son([('saslstart', 1), ('mechanism', 'scram-sha-1'), ('payload', binary('n,,n=tariq,r=mjy5otq0otywmja4', 0)), ('autoauthorize', 1)]) on namespace admin.$cmd failed: authentication failed. 

mongodbpipeline

# coding:utf-8  import datetime  pymongo import errors pymongo.mongo_client import mongoclient pymongo.mongo_replica_set_client import mongoreplicasetclient pymongo.read_preferences import readpreference scrapy.exporters import baseitemexporter try:     urllib.parse import quote except:     urllib import quote  def not_set(string):     """ check if string none or ''      :returns: bool - true if string empty     """     if string none:         return true     elif string == '':         return true     return false   class mongodbpipeline(baseitemexporter):     """ mongodb pipeline class """     # default options     config = {         'uri': 'mongodb://localhost:27017',         'fsync': false,         'write_concern': 0,         'database': 'scrapy-mongodb',         'collection': 'items',         'replica_set': none,         'buffer': none,         'append_timestamp': false,         'sharded': false     }      # needed sending acknowledgement signals rabbitmq persisted items     queue = none     acked_signals = []      # item buffer     item_buffer = dict()      def load_spider(self, spider):         self.crawler = spider.crawler         self.settings = spider.settings         self.queue = self.crawler.engine.slot.scheduler.queue      def open_spider(self, spider):         self.load_spider(spider)          # configure connection         self.configure()          self.spidername = spider.name         self.config['uri'] = 'mongodb://' + self.config['username'] + ':' + quote(self.config['password']) + '@' + self.config['uri'] + '/admin'         self.shardedcolls = []          if self.config['replica_set'] not none:             self.connection = mongoreplicasetclient(                 self.config['uri'],                 replicaset=self.config['replica_set'],                 w=self.config['write_concern'],                 fsync=self.config['fsync'],                 read_preference=readpreference.primary_preferred)         else:             # connecting stand alone mongodb             self.connection = mongoclient(                 self.config['uri'],                 fsync=self.config['fsync'],                 read_preference=readpreference.primary)          # set collection         self.database = self.connection[spider.name]          # autoshard db         if self.config['sharded']:             db_statuses = self.connection['config']['databases'].find({})             partitioned = []             notpartitioned = []             status in db_statuses:                 if status['partitioned']:                     partitioned.append(status['_id'])                 else:                     notpartitioned.append(status['_id'])             if spider.name in notpartitioned or spider.name not in partitioned:                 try:                     self.connection.admin.command('enablesharding', spider.name)                 except errors.operationfailure:                     pass             else:                 collections = self.connection['config']['collections'].find({})                 coll in collections:                     if (spider.name + '.') in coll['_id']:                         if coll['dropped'] not true:                             if coll['_id'].index(spider.name + '.') == 0:                                 self.shardedcolls.append(coll['_id'][coll['_id'].index('.') + 1:])      def configure(self):         """ configure mongodb connection """          # set regular options         options = [             ('uri', 'mongodb_uri'),             ('fsync', 'mongodb_fsync'),             ('write_concern', 'mongodb_replica_set_w'),             ('database', 'mongodb_database'),             ('collection', 'mongodb_collection'),             ('replica_set', 'mongodb_replica_set'),             ('buffer', 'mongodb_buffer_data'),             ('append_timestamp', 'mongodb_add_timestamp'),             ('sharded', 'mongodb_sharded'),             ('username', 'mongodb_user'),             ('password', 'mongodb_password')         ]          key, setting in options:             if not not_set(self.settings[setting]):                 self.config[key] = self.settings[setting]      def process_item(self, item, spider):         """ process item , add mongodb          :type item: item object         :param item: item put mongodb         :type spider: basespider object         :param spider: spider running queries         :returns: item object         """         item_name = item.__class__.__name__          # if working sharded db, collection sharded         if self.config['sharded']:             if item_name not in self.shardedcolls:                 try:                     self.connection.admin.command('shardcollection', '%s.%s' % (self.spidername, item_name), key={'_id': "hashed"})                     self.shardedcolls.append(item_name)                 except errors.operationfailure:                     self.shardedcolls.append(item_name)          itemtoinsert = dict(self._get_serialized_fields(item))          if self.config['buffer']:             if item_name not in self.item_buffer:                 self.item_buffer[item_name] = []                 self.item_buffer[item_name].append([])                 self.item_buffer[item_name].append(0)              self.item_buffer[item_name][1] += 1              if self.config['append_timestamp']:                 itemtoinsert['scrapy-mongodb'] = {'ts': datetime.datetime.utcnow()}              self.item_buffer[item_name][0].append(itemtoinsert)              if self.item_buffer[item_name][1] == self.config['buffer']:                 self.item_buffer[item_name][1] = 0                 self.insert_item(self.item_buffer[item_name][0], spider, item_name)              return item          self.insert_item(itemtoinsert, spider, item_name)         return item      def close_spider(self, spider):         """ method called when spider closed          :type spider: basespider object         :param spider: spider running queries         :returns: none         """         key in self.item_buffer:             if self.item_buffer[key][0]:                 self.insert_item(self.item_buffer[key][0], spider, key)      def insert_item(self, item, spider, item_name):         """ process item , add mongodb          :type item: (item object) or [(item object)]         :param item: item(s) put mongodb         :type spider: basespider object         :param spider: spider running queries         :returns: item object         """         self.collection = self.database[item_name]          if not isinstance(item, list):              if self.config['append_timestamp']:                 item['scrapy-mongodb'] = {'ts': datetime.datetime.utcnow()}              ack_signal = item['ack_signal']             item.pop('ack_signal', none)             self.collection.insert(item, continue_on_error=true)             if ack_signal not in self.acked_signals:                 self.queue.acknowledge(ack_signal)                 self.acked_signals.append(ack_signal)         else:             signals = []             eachitem in item:                 signals.append(eachitem['ack_signal'])                 eachitem.pop('ack_signal', none)             self.collection.insert(item, continue_on_error=true)             del item[:]             ack_signal in signals:                 if ack_signal not in self.acked_signals:                     self.queue.acknowledge(ack_signal)                     self.acked_signals.append(ack_signal) 

to sum up, believe problem lies in scrapyd daemons running on both instances somehow scraper or worker1 can not access it, not figure out, did not find use cases on stackoverflow.

any highly appreciated in regard. thank in advance!


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -